All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 0/8] udp: GRO L4 improvements
@ 2021-03-21 17:01 Paolo Abeni
  2021-03-21 17:01 ` [PATCH net-next 1/8] udp: fixup csum for GSO receive slow path Paolo Abeni
                   ` (7 more replies)
  0 siblings, 8 replies; 35+ messages in thread
From: Paolo Abeni @ 2021-03-21 17:01 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Jakub Kicinski, Steffen Klassert,
	Willem de Bruijn, Alexander Lobakin

This series improves the UDP L4 - either 'forward' or 'frag_list' -
co-existence with UDP tunnel GRO, allowing the first to take place
correctly even for encapsulated UDP traffic.

The first for patches are mostly bugfixes, addressing some GRO 
edge-cases when both tunnels and L4 are present, enabled and in use.

The next 3 patches avoid unneeded segmentation when UDP GRO
traffic traverses in the receive path UDP tunnels.

Finally, some self-tests are included, covering the relevant
GRO scenarios.

Even if most patches are actually bugfixes, this series is
targeting net-next, as overall it makes available a new feature.

Paolo Abeni (8):
  udp: fixup csum for GSO receive slow path
  udp: skip fwd/list GRO for tunnel packets
  udp: properly complete L4 GRO over UDP tunnel packet
  udp: never accept GSO_FRAGLIST packets
  vxlan: allow L4 GRO passthrou
  geneve: allow UDP L4 GRO passthrou
  bareudp: allow UDP L4 GRO passthrou
  selftests: net: add UDP GRO forwarding self-tests

 drivers/net/bareudp.c                     |   1 +
 drivers/net/geneve.c                      |   1 +
 drivers/net/vxlan.c                       |   1 +
 include/linux/udp.h                       |  15 +-
 include/net/udp.h                         |  18 ++
 net/ipv4/udp.c                            |  22 +-
 net/ipv4/udp_offload.c                    |  27 ++-
 net/ipv6/udp.c                            |   5 +
 net/ipv6/udp_offload.c                    |   3 +-
 tools/testing/selftests/net/Makefile      |   1 +
 tools/testing/selftests/net/udpgro_fwd.sh | 251 ++++++++++++++++++++++
 11 files changed, 330 insertions(+), 15 deletions(-)
 create mode 100755 tools/testing/selftests/net/udpgro_fwd.sh

-- 
2.26.2


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH net-next 1/8] udp: fixup csum for GSO receive slow path
  2021-03-21 17:01 [PATCH net-next 0/8] udp: GRO L4 improvements Paolo Abeni
@ 2021-03-21 17:01 ` Paolo Abeni
  2021-03-22 13:18   ` Willem de Bruijn
  2021-03-21 17:01 ` [PATCH net-next 2/8] udp: skip fwd/list GRO for tunnel packets Paolo Abeni
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 35+ messages in thread
From: Paolo Abeni @ 2021-03-21 17:01 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Jakub Kicinski, Steffen Klassert,
	Willem de Bruijn, Alexander Lobakin

When looping back UDP GSO over UDP tunnel packets to an UDP socket,
the individual packet csum is currently set to CSUM_NONE. That causes
unexpected/wrong csum validation errors later in the UDP receive path.

We could possibly addressing the issue with some additional check and
csum mangling in the UDP tunnel code. Since the issue affects only
this UDP receive slow path, let's set a suitable csum status there.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 include/net/udp.h | 18 ++++++++++++++++++
 net/ipv4/udp.c    | 10 ++++++++++
 net/ipv6/udp.c    |  5 +++++
 3 files changed, 33 insertions(+)

diff --git a/include/net/udp.h b/include/net/udp.h
index d4d064c592328..007683eb3e113 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -515,6 +515,24 @@ static inline struct sk_buff *udp_rcv_segment(struct sock *sk,
 	return segs;
 }
 
+static inline void udp_post_segment_fix_csum(struct sk_buff *skb, int level)
+{
+	/* UDP-lite can't land here - no GRO */
+	WARN_ON_ONCE(UDP_SKB_CB(skb)->partial_cov);
+
+	/* GRO already validated the csum up to 'level', and we just
+	 * consumed one header, update the skb accordingly
+	 */
+	UDP_SKB_CB(skb)->cscov = skb->len;
+	if (level) {
+		skb->ip_summed = CHECKSUM_UNNECESSARY;
+		skb->csum_level = 0;
+	} else {
+		skb->ip_summed = CHECKSUM_NONE;
+		skb->csum_valid = 1;
+	}
+}
+
 #ifdef CONFIG_BPF_SYSCALL
 struct sk_psock;
 struct proto *udp_bpf_get_proto(struct sock *sk, struct sk_psock *psock);
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 4a0478b17243a..ff54135c51ffa 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -2168,6 +2168,7 @@ static int udp_queue_rcv_one_skb(struct sock *sk, struct sk_buff *skb)
 static int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 {
 	struct sk_buff *next, *segs;
+	int csum_level;
 	int ret;
 
 	if (likely(!udp_unexpected_gso(sk, skb)))
@@ -2175,9 +2176,18 @@ static int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 
 	BUILD_BUG_ON(sizeof(struct udp_skb_cb) > SKB_GSO_CB_OFFSET);
 	__skb_push(skb, -skb_mac_offset(skb));
+	csum_level = !!(skb_shinfo(skb)->gso_type &
+			(SKB_GSO_UDP_TUNNEL | SKB_GSO_UDP_TUNNEL_CSUM));
 	segs = udp_rcv_segment(sk, skb, true);
 	skb_list_walk_safe(segs, skb, next) {
 		__skb_pull(skb, skb_transport_offset(skb));
+
+		/* UDP GSO packets looped back after adding UDP encap land here with CHECKSUM none,
+		 * instead of adding another check in the tunnel fastpath, we can force valid
+		 * csums here (packets are locally generated).
+		 * Additionally fixup the UDP CB
+		 */
+		udp_post_segment_fix_csum(skb, csum_level);
 		ret = udp_queue_rcv_one_skb(sk, skb);
 		if (ret > 0)
 			ip_protocol_deliver_rcu(dev_net(skb->dev), skb, ret);
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index d25e5a9252fdb..e7d4bf3a65c72 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -739,16 +739,21 @@ static int udpv6_queue_rcv_one_skb(struct sock *sk, struct sk_buff *skb)
 static int udpv6_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 {
 	struct sk_buff *next, *segs;
+	int csum_level;
 	int ret;
 
 	if (likely(!udp_unexpected_gso(sk, skb)))
 		return udpv6_queue_rcv_one_skb(sk, skb);
 
 	__skb_push(skb, -skb_mac_offset(skb));
+	csum_level = !!(skb_shinfo(skb)->gso_type &
+			(SKB_GSO_UDP_TUNNEL | SKB_GSO_UDP_TUNNEL_CSUM));
 	segs = udp_rcv_segment(sk, skb, false);
 	skb_list_walk_safe(segs, skb, next) {
 		__skb_pull(skb, skb_transport_offset(skb));
 
+		/* see comments in udp_queue_rcv_skb() */
+		udp_post_segment_fix_csum(skb, csum_level);
 		ret = udpv6_queue_rcv_one_skb(sk, skb);
 		if (ret > 0)
 			ip6_protocol_deliver_rcu(dev_net(skb->dev), skb, ret,
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH net-next 2/8] udp: skip fwd/list GRO for tunnel packets
  2021-03-21 17:01 [PATCH net-next 0/8] udp: GRO L4 improvements Paolo Abeni
  2021-03-21 17:01 ` [PATCH net-next 1/8] udp: fixup csum for GSO receive slow path Paolo Abeni
@ 2021-03-21 17:01 ` Paolo Abeni
  2021-03-22 13:24   ` Willem de Bruijn
  2021-03-21 17:01 ` [PATCH net-next 3/8] udp: properly complete L4 GRO over UDP tunnel packet Paolo Abeni
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 35+ messages in thread
From: Paolo Abeni @ 2021-03-21 17:01 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Jakub Kicinski, Steffen Klassert,
	Willem de Bruijn, Alexander Lobakin

If UDP GRO forwarding (or list) is enabled, and there are
udp tunnel available in the system, we could end-up doing L4
aggregation for packets targeting the UDP tunnel.

That could inner protocol corruption, as no overaly network
parameters is taken in account at aggregation time.

Just skip the fwd GRO if this packet could land in an UDP
tunnel. The current check is broader than what is strictly
needed, as the UDP tunnel could be e.g. on top of a different
device, but is simple and the performance downside looks not
relevant.

Fixes: 9fd1ff5d2ac7 ("udp: Support UDP fraglist GRO/GSO.")
Fixes: 36707061d6ba ("udp: allow forwarding of plain (non-fraglisted) UDP GRO packets")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/ipv4/udp_offload.c | 19 +++++++++++--------
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index c5b4b586570fe..25134a3548e99 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -515,21 +515,24 @@ struct sk_buff *udp_gro_receive(struct list_head *head, struct sk_buff *skb,
 	unsigned int off = skb_gro_offset(skb);
 	int flush = 1;
 
+	/* we can do L4 aggregation only if the packet can't land in a tunnel
+	 * otherwise we could corrupt the inner stream
+	 */
 	NAPI_GRO_CB(skb)->is_flist = 0;
-	if (skb->dev->features & NETIF_F_GRO_FRAGLIST)
-		NAPI_GRO_CB(skb)->is_flist = sk ? !udp_sk(sk)->gro_enabled: 1;
+	if (!sk || !udp_sk(sk)->gro_receive) {
+		if (skb->dev->features & NETIF_F_GRO_FRAGLIST)
+			NAPI_GRO_CB(skb)->is_flist = sk ? !udp_sk(sk)->gro_enabled : 1;
 
-	if ((!sk && (skb->dev->features & NETIF_F_GRO_UDP_FWD)) ||
-	    (sk && udp_sk(sk)->gro_enabled) || NAPI_GRO_CB(skb)->is_flist) {
-		pp = call_gro_receive(udp_gro_receive_segment, head, skb);
+		if ((!sk && (skb->dev->features & NETIF_F_GRO_UDP_FWD)) ||
+		    (sk && udp_sk(sk)->gro_enabled) || NAPI_GRO_CB(skb)->is_flist)
+			pp = call_gro_receive(udp_gro_receive_segment, head, skb);
 		return pp;
 	}
 
-	if (!sk || NAPI_GRO_CB(skb)->encap_mark ||
+	if (NAPI_GRO_CB(skb)->encap_mark ||
 	    (uh->check && skb->ip_summed != CHECKSUM_PARTIAL &&
 	     NAPI_GRO_CB(skb)->csum_cnt == 0 &&
-	     !NAPI_GRO_CB(skb)->csum_valid) ||
-	    !udp_sk(sk)->gro_receive)
+	     !NAPI_GRO_CB(skb)->csum_valid))
 		goto out;
 
 	/* mark that this skb passed once through the tunnel gro layer */
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH net-next 3/8] udp: properly complete L4 GRO over UDP tunnel packet
  2021-03-21 17:01 [PATCH net-next 0/8] udp: GRO L4 improvements Paolo Abeni
  2021-03-21 17:01 ` [PATCH net-next 1/8] udp: fixup csum for GSO receive slow path Paolo Abeni
  2021-03-21 17:01 ` [PATCH net-next 2/8] udp: skip fwd/list GRO for tunnel packets Paolo Abeni
@ 2021-03-21 17:01 ` Paolo Abeni
  2021-03-22 13:30   ` Willem de Bruijn
  2021-03-21 17:01 ` [PATCH net-next 4/8] udp: never accept GSO_FRAGLIST packets Paolo Abeni
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 35+ messages in thread
From: Paolo Abeni @ 2021-03-21 17:01 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Jakub Kicinski, Steffen Klassert,
	Willem de Bruijn, Alexander Lobakin

After the previous patch the stack can do L4 UDP aggregation
on top of an UDP tunnel.

The current GRO complete code tries frag based aggregation first;
in the above scenario will generate corrupted frames.

We need to try first UDP tunnel based aggregation, if the GRO
packet requires that. We can use time GRO 'encap_mark' field
to track the need GRO complete action. If encap_mark is set,
skip the frag_list aggregation.

On tunnel encap GRO complete clear such field, so that an inner
frag_list GRO complete could take action.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/ipv4/udp_offload.c | 8 +++++++-
 net/ipv6/udp_offload.c | 3 ++-
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index 25134a3548e99..54e06b88af69a 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -642,6 +642,11 @@ int udp_gro_complete(struct sk_buff *skb, int nhoff,
 		skb_shinfo(skb)->gso_type = uh->check ? SKB_GSO_UDP_TUNNEL_CSUM
 					: SKB_GSO_UDP_TUNNEL;
 
+		/* clear the encap mark, so that inner frag_list gro_complete
+		 * can take place
+		 */
+		NAPI_GRO_CB(skb)->encap_mark = 0;
+
 		/* Set encapsulation before calling into inner gro_complete()
 		 * functions to make them set up the inner offsets.
 		 */
@@ -665,7 +670,8 @@ INDIRECT_CALLABLE_SCOPE int udp4_gro_complete(struct sk_buff *skb, int nhoff)
 	const struct iphdr *iph = ip_hdr(skb);
 	struct udphdr *uh = (struct udphdr *)(skb->data + nhoff);
 
-	if (NAPI_GRO_CB(skb)->is_flist) {
+	/* do fraglist only if there is no outer UDP encap (or we already processed it) */
+	if (NAPI_GRO_CB(skb)->is_flist && !NAPI_GRO_CB(skb)->encap_mark) {
 		uh->len = htons(skb->len - nhoff);
 
 		skb_shinfo(skb)->gso_type |= (SKB_GSO_FRAGLIST|SKB_GSO_UDP_L4);
diff --git a/net/ipv6/udp_offload.c b/net/ipv6/udp_offload.c
index faa823c242923..b3d9ed96e5ea5 100644
--- a/net/ipv6/udp_offload.c
+++ b/net/ipv6/udp_offload.c
@@ -163,7 +163,8 @@ INDIRECT_CALLABLE_SCOPE int udp6_gro_complete(struct sk_buff *skb, int nhoff)
 	const struct ipv6hdr *ipv6h = ipv6_hdr(skb);
 	struct udphdr *uh = (struct udphdr *)(skb->data + nhoff);
 
-	if (NAPI_GRO_CB(skb)->is_flist) {
+	/* do fraglist only if there is no outer UDP encap (or we already processed it) */
+	if (NAPI_GRO_CB(skb)->is_flist && !NAPI_GRO_CB(skb)->encap_mark) {
 		uh->len = htons(skb->len - nhoff);
 
 		skb_shinfo(skb)->gso_type |= (SKB_GSO_FRAGLIST|SKB_GSO_UDP_L4);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH net-next 4/8] udp: never accept GSO_FRAGLIST packets
  2021-03-21 17:01 [PATCH net-next 0/8] udp: GRO L4 improvements Paolo Abeni
                   ` (2 preceding siblings ...)
  2021-03-21 17:01 ` [PATCH net-next 3/8] udp: properly complete L4 GRO over UDP tunnel packet Paolo Abeni
@ 2021-03-21 17:01 ` Paolo Abeni
  2021-03-22 13:42   ` Willem de Bruijn
  2021-03-21 17:01 ` [PATCH net-next 5/8] vxlan: allow L4 GRO passthrou Paolo Abeni
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 35+ messages in thread
From: Paolo Abeni @ 2021-03-21 17:01 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Jakub Kicinski, Steffen Klassert,
	Willem de Bruijn, Alexander Lobakin

Currently the UDP protocol delivers GSO_FRAGLIST packets to
the sockets without the expected segmentation.

This change addresses the issue introducing and maintaining
a per socket bitmask of GSO types requiring segmentation.
Enabling GSO removes SKB_GSO_UDP_L4 from such mask, while
GSO_FRAGLIST packets are never accepted

Note: this also updates the 'unused' field size to really
fit the otherwise existing hole. It's size become incorrect
after commit bec1f6f69736 ("udp: generate gso with UDP_SEGMENT").

Fixes: 9fd1ff5d2ac7 ("udp: Support UDP fraglist GRO/GSO.")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 include/linux/udp.h | 10 ++++++----
 net/ipv4/udp.c      | 12 +++++++++++-
 2 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/include/linux/udp.h b/include/linux/udp.h
index aa84597bdc33c..6da342f15f351 100644
--- a/include/linux/udp.h
+++ b/include/linux/udp.h
@@ -51,7 +51,7 @@ struct udp_sock {
 					   * different encapsulation layer set
 					   * this
 					   */
-			 gro_enabled:1;	/* Can accept GRO packets */
+			 gro_enabled:1;	/* Request GRO aggregation */
 	/*
 	 * Following member retains the information to create a UDP header
 	 * when the socket is uncorked.
@@ -68,7 +68,10 @@ struct udp_sock {
 #define UDPLITE_SEND_CC  0x2  		/* set via udplite setsockopt         */
 #define UDPLITE_RECV_CC  0x4		/* set via udplite setsocktopt        */
 	__u8		 pcflag;        /* marks socket as UDP-Lite if > 0    */
-	__u8		 unused[3];
+	__u8		 unused[1];
+	unsigned int	 unexpected_gso;/* GSO types this socket can't accept,
+					 * any of SKB_GSO_UDP_L4 or SKB_GSO_FRAGLIST
+					 */
 	/*
 	 * For encapsulation sockets.
 	 */
@@ -131,8 +134,7 @@ static inline void udp_cmsg_recv(struct msghdr *msg, struct sock *sk,
 
 static inline bool udp_unexpected_gso(struct sock *sk, struct sk_buff *skb)
 {
-	return !udp_sk(sk)->gro_enabled && skb_is_gso(skb) &&
-	       skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4;
+	return skb_is_gso(skb) && skb_shinfo(skb)->gso_type & udp_sk(sk)->unexpected_gso;
 }
 
 #define udp_portaddr_for_each_entry(__sk, list) \
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index ff54135c51ffa..1ba6d153c2f0a 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1600,8 +1600,13 @@ EXPORT_SYMBOL_GPL(udp_destruct_sock);
 
 int udp_init_sock(struct sock *sk)
 {
-	skb_queue_head_init(&udp_sk(sk)->reader_queue);
+	struct udp_sock *up = udp_sk(sk);
+
+	skb_queue_head_init(&up->reader_queue);
 	sk->sk_destruct = udp_destruct_sock;
+
+	/* do not accept any GSO packet by default */
+	up->unexpected_gso = SKB_GSO_FRAGLIST | SKB_GSO_UDP_L4;
 	return 0;
 }
 EXPORT_SYMBOL_GPL(udp_init_sock);
@@ -2674,8 +2679,13 @@ int udp_lib_setsockopt(struct sock *sk, int level, int optname,
 
 	case UDP_GRO:
 		lock_sock(sk);
+
+		/* when enabling GRO, accept the related GSO packet type */
+		up->unexpected_gso = SKB_GSO_FRAGLIST;
 		if (valbool)
 			udp_tunnel_encap_enable(sk->sk_socket);
+		else
+			up->unexpected_gso |= SKB_GSO_UDP_L4;
 		up->gro_enabled = valbool;
 		release_sock(sk);
 		break;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH net-next 5/8] vxlan: allow L4 GRO passthrou
  2021-03-21 17:01 [PATCH net-next 0/8] udp: GRO L4 improvements Paolo Abeni
                   ` (3 preceding siblings ...)
  2021-03-21 17:01 ` [PATCH net-next 4/8] udp: never accept GSO_FRAGLIST packets Paolo Abeni
@ 2021-03-21 17:01 ` Paolo Abeni
  2021-03-21 17:01 ` [PATCH net-next 6/8] geneve: allow UDP " Paolo Abeni
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 35+ messages in thread
From: Paolo Abeni @ 2021-03-21 17:01 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Jakub Kicinski, Steffen Klassert,
	Willem de Bruijn, Alexander Lobakin

When passing up an UDP GSO packet with L4 aggregation, there is
no need to segment it at the vxlan level. We can propagate the
packet untouched and let it be segmented later, if needed.

Introduce an helper to allow let the UDP socket accepting any
L4 aggregation and use it in the vxlan driver.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 drivers/net/vxlan.c | 1 +
 include/linux/udp.h | 5 +++++
 2 files changed, 6 insertions(+)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 7665817f3cb61..39ee1300cdd9d 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -3484,6 +3484,7 @@ static struct socket *vxlan_create_sock(struct net *net, bool ipv6,
 	if (err < 0)
 		return ERR_PTR(err);
 
+	udp_allow_gso(sock->sk);
 	return sock;
 }
 
diff --git a/include/linux/udp.h b/include/linux/udp.h
index 6da342f15f351..0444f2fb6002e 100644
--- a/include/linux/udp.h
+++ b/include/linux/udp.h
@@ -137,6 +137,11 @@ static inline bool udp_unexpected_gso(struct sock *sk, struct sk_buff *skb)
 	return skb_is_gso(skb) && skb_shinfo(skb)->gso_type & udp_sk(sk)->unexpected_gso;
 }
 
+static inline void udp_allow_gso(struct sock *sk)
+{
+	udp_sk(sk)->unexpected_gso = 0;
+}
+
 #define udp_portaddr_for_each_entry(__sk, list) \
 	hlist_for_each_entry(__sk, list, __sk_common.skc_portaddr_node)
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH net-next 6/8] geneve: allow UDP L4 GRO passthrou
  2021-03-21 17:01 [PATCH net-next 0/8] udp: GRO L4 improvements Paolo Abeni
                   ` (4 preceding siblings ...)
  2021-03-21 17:01 ` [PATCH net-next 5/8] vxlan: allow L4 GRO passthrou Paolo Abeni
@ 2021-03-21 17:01 ` Paolo Abeni
  2021-03-21 17:01 ` [PATCH net-next 7/8] bareudp: " Paolo Abeni
  2021-03-21 17:01 ` [PATCH net-next 8/8] selftests: net: add UDP GRO forwarding self-tests Paolo Abeni
  7 siblings, 0 replies; 35+ messages in thread
From: Paolo Abeni @ 2021-03-21 17:01 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Jakub Kicinski, Steffen Klassert,
	Willem de Bruijn, Alexander Lobakin

Similar to the previous commit, let even geneve
passthrou the L4 GRO packets

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 drivers/net/geneve.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index 4ac0373326efd..5d7a2b1469f4c 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -461,6 +461,7 @@ static struct socket *geneve_create_sock(struct net *net, bool ipv6,
 	if (err < 0)
 		return ERR_PTR(err);
 
+	udp_allow_gso(sock->sk);
 	return sock;
 }
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH net-next 7/8] bareudp: allow UDP L4 GRO passthrou
  2021-03-21 17:01 [PATCH net-next 0/8] udp: GRO L4 improvements Paolo Abeni
                   ` (5 preceding siblings ...)
  2021-03-21 17:01 ` [PATCH net-next 6/8] geneve: allow UDP " Paolo Abeni
@ 2021-03-21 17:01 ` Paolo Abeni
  2021-03-21 17:01 ` [PATCH net-next 8/8] selftests: net: add UDP GRO forwarding self-tests Paolo Abeni
  7 siblings, 0 replies; 35+ messages in thread
From: Paolo Abeni @ 2021-03-21 17:01 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Jakub Kicinski, Steffen Klassert,
	Willem de Bruijn, Alexander Lobakin

Similar to the previous commit, let even geneve
passthrou the L4 GRO packets

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 drivers/net/bareudp.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/bareudp.c b/drivers/net/bareudp.c
index 7511bca9c15ed..edfad93e7b686 100644
--- a/drivers/net/bareudp.c
+++ b/drivers/net/bareudp.c
@@ -218,6 +218,7 @@ static struct socket *bareudp_create_sock(struct net *net, __be16 port)
 	if (err < 0)
 		return ERR_PTR(err);
 
+	udp_allow_gso(sock->sk);
 	return sock;
 }
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH net-next 8/8] selftests: net: add UDP GRO forwarding self-tests
  2021-03-21 17:01 [PATCH net-next 0/8] udp: GRO L4 improvements Paolo Abeni
                   ` (6 preceding siblings ...)
  2021-03-21 17:01 ` [PATCH net-next 7/8] bareudp: " Paolo Abeni
@ 2021-03-21 17:01 ` Paolo Abeni
  2021-03-22 13:44   ` Willem de Bruijn
  7 siblings, 1 reply; 35+ messages in thread
From: Paolo Abeni @ 2021-03-21 17:01 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Jakub Kicinski, Steffen Klassert,
	Willem de Bruijn, Alexander Lobakin

create a bunch of virtual topology and verify that
GRO_FRAG_LIST and GRO_FWD aggregate the ingress
packets as expected, and the aggregate packets
are segmented correctly when landing on a socket

Also test L4 aggregation on top of UDP tunnel (vxlan)

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 tools/testing/selftests/net/Makefile      |   1 +
 tools/testing/selftests/net/udpgro_fwd.sh | 251 ++++++++++++++++++++++
 2 files changed, 252 insertions(+)
 create mode 100755 tools/testing/selftests/net/udpgro_fwd.sh

diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index 25f198bec0b25..2d71b283dde36 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -23,6 +23,7 @@ TEST_PROGS += drop_monitor_tests.sh
 TEST_PROGS += vrf_route_leaking.sh
 TEST_PROGS += bareudp.sh
 TEST_PROGS += unicast_extensions.sh
+TEST_PROGS += udpgro_fwd.sh
 TEST_PROGS_EXTENDED := in_netns.sh
 TEST_GEN_FILES =  socket nettest
 TEST_GEN_FILES += psock_fanout psock_tpacket msg_zerocopy reuseport_addr_any
diff --git a/tools/testing/selftests/net/udpgro_fwd.sh b/tools/testing/selftests/net/udpgro_fwd.sh
new file mode 100755
index 0000000000000..ac7ac56a27524
--- /dev/null
+++ b/tools/testing/selftests/net/udpgro_fwd.sh
@@ -0,0 +1,251 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+
+readonly BASE="ns-$(mktemp -u XXXXXX)"
+readonly SRC=2
+readonly DST=1
+readonly DST_NAT=100
+readonly NS_SRC=$BASE$SRC
+readonly NS_DST=$BASE$DST
+
+# "baremetal" network used for raw UDP traffic
+readonly BM_NET_V4=192.168.1.
+readonly BM_NET_V6=2001:db8::
+
+# "overlay" network used for UDP over UDP tunnel traffic
+readonly OL_NET_V4=172.16.1.
+readonly OL_NET_V6=2002:db8::
+readonly NPROCS=`nproc`
+
+cleanup() {
+	local ns
+	local -r jobs="$(jobs -p)"
+	[ -n "${jobs}" ] && kill -1 ${jobs} 2>/dev/null
+
+	for ns in $NS_SRC $NS_DST; do
+		ip netns del $ns 2>/dev/null
+	done
+}
+
+trap cleanup EXIT
+
+create_ns() {
+	local net
+	local ns
+
+	for ns in $NS_SRC $NS_DST; do
+		ip netns add $ns
+		ip -n $ns link set dev lo up
+	done
+
+	ip link add name veth$SRC type veth peer name veth$DST
+
+	for ns in $SRC $DST; do
+		ip link set dev veth$ns netns $BASE$ns
+		ip -n $BASE$ns link set dev veth$ns up
+		ip -n $BASE$ns addr add dev veth$ns $BM_NET_V4$ns/24
+		ip -n $BASE$ns addr add dev veth$ns $BM_NET_V6$ns/64 nodad
+	done
+	ip -n $NS_DST link set veth$DST xdp object ../bpf/xdp_dummy.o section xdp_dummy 2>/dev/null
+}
+
+create_vxlan_endpoint() {
+	local -r netns=$1
+	local -r bm_dev=$2
+	local -r bm_rem_addr=$3
+	local -r vxlan_dev=$4
+	local -r vxlan_id=$5
+	local -r vxlan_port=4789
+
+	ip -n $netns link set dev $bm_dev up
+	ip -n $netns link add dev $vxlan_dev type vxlan id $vxlan_id \
+				dstport $vxlan_port remote $bm_rem_addr
+	ip -n $netns link set dev $vxlan_dev up
+}
+
+create_vxlan_pair() {
+	local ns
+
+	create_ns
+
+	for ns in $SRC $DST; do
+		# note that 3 - $SRC == $DST and 3 - $DST == $SRC
+		create_vxlan_endpoint $BASE$ns veth$ns $BM_NET_V4$((3 - $ns)) vxlan$ns 4
+		ip -n $BASE$ns addr add dev vxlan$ns $OL_NET_V4$ns/24
+	done
+	for ns in $SRC $DST; do
+		create_vxlan_endpoint $BASE$ns veth$ns $BM_NET_V6$((3 - $ns)) vxlan6$ns 6
+		ip -n $BASE$ns addr add dev vxlan6$ns $OL_NET_V6$ns/24 nodad
+	done
+}
+
+is_ipv6() {
+	if [[ $1 =~ .*:.* ]]; then
+		return 0
+	fi
+	return 1
+}
+
+run_test() {
+	local -r msg=$1
+	local -r dst=$2
+	local -r pkts=$3
+	local -r vxpkts=$4
+	local bind=$5
+	local rx_args=""
+	local rx_family="-4"
+	local family=-4
+	local filter=IpInReceives
+	local ipt=iptables
+
+	printf "%-40s" "$msg"
+
+	if is_ipv6 $dst; then
+		# rx program does not support '-6' and implies ipv6 usage by default
+		rx_family=""
+		family=-6
+		filter=Ip6InReceives
+		ipt=ip6tables
+	fi
+
+	rx_args="$rx_family"
+	[ -n "$bind" ] && rx_args="$rx_args -b $bind"
+
+	# send a single GSO packet, segmented in 10 UDP frames.
+	# Always expect 10 UDP frames on RX side as rx socket does
+	# not enable GRO
+	ip netns exec $NS_DST $ipt -A INPUT -p udp --dport 4789
+	ip netns exec $NS_DST $ipt -A INPUT -p udp --dport 8000
+	ip netns exec $NS_DST ./udpgso_bench_rx -C 1000 -R 10 -n 10 -l 1300 $rx_args &
+	local spid=$!
+	sleep 0.1
+	ip netns exec $NS_SRC ./udpgso_bench_tx $family -M 1 -s 13000 -S 1300 -D $dst
+	local retc=$?
+	wait $spid
+	local rets=$?
+	if [ ${rets} -ne 0 ] || [ ${retc} -ne 0 ]; then
+		echo " fail client exit code $retc, server $rets"
+		ret=1
+		return
+	fi
+
+	local rcv=`ip netns exec $NS_DST $ipt"-save" -c | grep 'dport 8000' | \
+							  sed -e 's/\[//' -e 's/:.*//'`
+	if [ $rcv != $pkts ]; then
+		echo " fail - received $rvs packets, expected $pkts"
+		ret=1
+		return
+	fi
+
+	local vxrcv=`ip netns exec $NS_DST $ipt"-save" -c | grep 'dport 4789' | \
+							    sed -e 's/\[//' -e 's/:.*//'`
+
+	# upper net can generate a little noise, allow some tolerance
+	if [ $vxrcv -lt $vxpkts -o $vxrcv -gt $((vxpkts + 3)) ]; then
+		echo " fail - received $vxrcv vxlan packets, expected $vxpkts"
+		ret=1
+		return
+	fi
+	echo " ok"
+}
+
+run_bench() {
+	local -r msg=$1
+	local -r dst=$2
+	local family=-4
+
+	printf "%-40s" "$msg"
+	if [ $NPROCS -lt 2 ]; then
+		echo " skip - needed 2 CPUs found $NPROCS"
+		return
+	fi
+
+	is_ipv6 $dst && family=-6
+
+	# bind the sender and the receiver to different CPUs to try
+	# get reproducible results
+	ip netns exec $NS_DST bash -c "echo 2 > /sys/class/net/veth$DST/queues/rx-0/rps_cpus"
+	ip netns exec $NS_DST taskset 0x2 ./udpgso_bench_rx -C 1000 -R 10  &
+	local spid=$!
+	sleep 0.1
+	ip netns exec $NS_SRC taskset 0x1 ./udpgso_bench_tx $family -l 3 -S 1300 -D $dst
+	local retc=$?
+	wait $spid
+	local rets=$?
+	if [ ${rets} -ne 0 ] || [ ${retc} -ne 0 ]; then
+		echo " fail client exit code $retc, server $rets"
+		ret=1
+		return
+	fi
+}
+
+for family in 4 6; do
+	BM_NET=$BM_NET_V4
+	OL_NET=$OL_NET_V4
+	IPT=iptables
+	SUFFIX=24
+	VXDEV=vxlan
+
+	if [ $family = 6 ]; then
+		BM_NET=$BM_NET_V6
+		OL_NET=$OL_NET_V6
+		SUFFIX="64 nodad"
+		VXDEV=vxlan6
+		IPT=ip6tables
+	fi
+
+	echo "IPv$family"
+
+	create_ns
+	run_test "No GRO" $BM_NET$DST 10 0
+	cleanup
+
+	create_ns
+	ip netns exec $NS_DST ethtool -K veth$DST rx-gro-list on
+	run_test "GRO frag list" $BM_NET$DST 1 0
+	cleanup
+
+	# UDP GRO fwd skips aggregation when find an udp socket with the GRO option
+	# if there is an UDP tunnel in the running system, such lookup happen
+	# take place.
+	# use NAT to circumvent GRO FWD check
+	create_ns
+	ip -n $NS_DST addr add dev veth$DST $BM_NET$DST_NAT/$SUFFIX
+	ip netns exec $NS_DST ethtool -K veth$DST rx-udp-gro-forwarding on
+	ip netns exec $NS_DST $IPT -t nat -I PREROUTING -d $BM_NET$DST_NAT \
+					-j DNAT --to-destination $BM_NET$DST
+	run_test "GRO fwd" $BM_NET$DST_NAT 1 0 $BM_NET$DST
+	cleanup
+
+	create_ns
+	run_bench "UDP fwd perf" $BM_NET$DST
+	ip netns exec $NS_DST ethtool -K veth$DST rx-udp-gro-forwarding on
+	run_bench "UDP GRO fwd perf" $BM_NET$DST
+	cleanup
+
+	create_vxlan_pair
+	ip netns exec $NS_DST ethtool -K veth$DST rx-gro-list on
+	run_test "GRO frag list over UDP tunnel" $OL_NET$DST 1 1
+	cleanup
+
+	# use NAT to circumvent GRO FWD check
+	create_vxlan_pair
+	ip -n $NS_DST addr add dev $VXDEV$DST $OL_NET$DST_NAT/$SUFFIX
+	ip netns exec $NS_DST ethtool -K veth$DST rx-udp-gro-forwarding on
+	ip netns exec $NS_DST $IPT -t nat -I PREROUTING -d $OL_NET$DST_NAT \
+					-j DNAT --to-destination $OL_NET$DST
+
+	# load arp cache before running the test to reduce the amount of
+	# stray traffic on top of the UDP tunnel
+	ip netns exec $NS_SRC ping -q -c 1 $OL_NET$DST_NAT >/dev/null
+	run_test "GRO fwd over UDP tunnel" $OL_NET$DST_NAT 1 1 $OL_NET$DST
+	cleanup
+
+	create_vxlan_pair
+	run_bench "UDP tunnel fwd perf" $OL_NET$DST
+	ip netns exec $NS_DST ethtool -K veth$DST rx-udp-gro-forwarding on
+	run_bench "UDP tunnel GRO fwd perf" $OL_NET$DST
+	cleanup
+done
+
+exit $ret
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next 1/8] udp: fixup csum for GSO receive slow path
  2021-03-21 17:01 ` [PATCH net-next 1/8] udp: fixup csum for GSO receive slow path Paolo Abeni
@ 2021-03-22 13:18   ` Willem de Bruijn
  2021-03-22 16:34     ` Paolo Abeni
  0 siblings, 1 reply; 35+ messages in thread
From: Willem de Bruijn @ 2021-03-22 13:18 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Network Development, David S. Miller, Jakub Kicinski,
	Steffen Klassert, Alexander Lobakin

On Sun, Mar 21, 2021 at 1:01 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> When looping back UDP GSO over UDP tunnel packets to an UDP socket,
> the individual packet csum is currently set to CSUM_NONE. That causes
> unexpected/wrong csum validation errors later in the UDP receive path.
>
> We could possibly addressing the issue with some additional check and
> csum mangling in the UDP tunnel code. Since the issue affects only
> this UDP receive slow path, let's set a suitable csum status there.
>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
>  include/net/udp.h | 18 ++++++++++++++++++
>  net/ipv4/udp.c    | 10 ++++++++++
>  net/ipv6/udp.c    |  5 +++++
>  3 files changed, 33 insertions(+)
>
> diff --git a/include/net/udp.h b/include/net/udp.h
> index d4d064c592328..007683eb3e113 100644
> --- a/include/net/udp.h
> +++ b/include/net/udp.h
> @@ -515,6 +515,24 @@ static inline struct sk_buff *udp_rcv_segment(struct sock *sk,
>         return segs;
>  }
>
> +static inline void udp_post_segment_fix_csum(struct sk_buff *skb, int level)
> +{
> +       /* UDP-lite can't land here - no GRO */
> +       WARN_ON_ONCE(UDP_SKB_CB(skb)->partial_cov);
> +
> +       /* GRO already validated the csum up to 'level', and we just
> +        * consumed one header, update the skb accordingly
> +        */
> +       UDP_SKB_CB(skb)->cscov = skb->len;
> +       if (level) {
> +               skb->ip_summed = CHECKSUM_UNNECESSARY;
> +               skb->csum_level = 0;
> +       } else {
> +               skb->ip_summed = CHECKSUM_NONE;
> +               skb->csum_valid = 1;
> +       }

why does this function also update these fields for non-tunneled
packets? the commit only describes an issue with tunneled packets.

> +}
> +
>  #ifdef CONFIG_BPF_SYSCALL
>  struct sk_psock;
>  struct proto *udp_bpf_get_proto(struct sock *sk, struct sk_psock *psock);
> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> index 4a0478b17243a..ff54135c51ffa 100644
> --- a/net/ipv4/udp.c
> +++ b/net/ipv4/udp.c
> @@ -2168,6 +2168,7 @@ static int udp_queue_rcv_one_skb(struct sock *sk, struct sk_buff *skb)
>  static int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
>  {
>         struct sk_buff *next, *segs;
> +       int csum_level;
>         int ret;
>
>         if (likely(!udp_unexpected_gso(sk, skb)))
> @@ -2175,9 +2176,18 @@ static int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
>
>         BUILD_BUG_ON(sizeof(struct udp_skb_cb) > SKB_GSO_CB_OFFSET);
>         __skb_push(skb, -skb_mac_offset(skb));
> +       csum_level = !!(skb_shinfo(skb)->gso_type &
> +                       (SKB_GSO_UDP_TUNNEL | SKB_GSO_UDP_TUNNEL_CSUM));
>         segs = udp_rcv_segment(sk, skb, true);
>         skb_list_walk_safe(segs, skb, next) {
>                 __skb_pull(skb, skb_transport_offset(skb));
> +
> +               /* UDP GSO packets looped back after adding UDP encap land here with CHECKSUM none,
> +                * instead of adding another check in the tunnel fastpath, we can force valid
> +                * csums here (packets are locally generated).
> +                * Additionally fixup the UDP CB
> +                */
> +               udp_post_segment_fix_csum(skb, csum_level);

How does this code differentiates locally generated packets with udp
tunnel headers from packets arriving from the wire, for which the
inner checksum may be incorrect?

>                 ret = udp_queue_rcv_one_skb(sk, skb);
>                 if (ret > 0)
>                         ip_protocol_deliver_rcu(dev_net(skb->dev), skb, ret);
> diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
> index d25e5a9252fdb..e7d4bf3a65c72 100644
> --- a/net/ipv6/udp.c
> +++ b/net/ipv6/udp.c
> @@ -739,16 +739,21 @@ static int udpv6_queue_rcv_one_skb(struct sock *sk, struct sk_buff *skb)
>  static int udpv6_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
>  {
>         struct sk_buff *next, *segs;
> +       int csum_level;
>         int ret;
>
>         if (likely(!udp_unexpected_gso(sk, skb)))
>                 return udpv6_queue_rcv_one_skb(sk, skb);
>
>         __skb_push(skb, -skb_mac_offset(skb));
> +       csum_level = !!(skb_shinfo(skb)->gso_type &
> +                       (SKB_GSO_UDP_TUNNEL | SKB_GSO_UDP_TUNNEL_CSUM));
>         segs = udp_rcv_segment(sk, skb, false);
>         skb_list_walk_safe(segs, skb, next) {
>                 __skb_pull(skb, skb_transport_offset(skb));
>
> +               /* see comments in udp_queue_rcv_skb() */
> +               udp_post_segment_fix_csum(skb, csum_level);
>                 ret = udpv6_queue_rcv_one_skb(sk, skb);
>                 if (ret > 0)
>                         ip6_protocol_deliver_rcu(dev_net(skb->dev), skb, ret,
> --
> 2.26.2
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next 2/8] udp: skip fwd/list GRO for tunnel packets
  2021-03-21 17:01 ` [PATCH net-next 2/8] udp: skip fwd/list GRO for tunnel packets Paolo Abeni
@ 2021-03-22 13:24   ` Willem de Bruijn
  2021-03-22 16:41     ` Paolo Abeni
  0 siblings, 1 reply; 35+ messages in thread
From: Willem de Bruijn @ 2021-03-22 13:24 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Network Development, David S. Miller, Jakub Kicinski,
	Steffen Klassert, Alexander Lobakin

On Sun, Mar 21, 2021 at 1:01 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> If UDP GRO forwarding (or list) is enabled,

Please explicitly mention the gso type SKB_GSO_FRAGLIST. I, at least,
didn't immediately grasp that gro forwarding is an alias for that.

> and there are
> udp tunnel available in the system, we could end-up doing L4
> aggregation for packets targeting the UDP tunnel.

Is this specific to UDP tunnels, or can this also occur with others,
such as GRE? (not implying that this patchset needs to address those
at the same time)

> That could inner protocol corruption, as no overaly network
> parameters is taken in account at aggregation time.

nit: overaly .. is taken -> overlay .. are taken

You mean the packets on the frag list may have mtu exceeding the mtu
of the tunnel? Please make the constraint more explicit.

> Just skip the fwd GRO if this packet could land in an UDP
> tunnel.

Could you make more clear that this does not skip UDP GRO, only
switches from fraglist-based to pure SKB_GSO_UDP_L4.

> The current check is broader than what is strictly
> needed, as the UDP tunnel could be e.g. on top of a different
> device, but is simple and the performance downside looks not
> relevant.
>
> Fixes: 9fd1ff5d2ac7 ("udp: Support UDP fraglist GRO/GSO.")
> Fixes: 36707061d6ba ("udp: allow forwarding of plain (non-fraglisted) UDP GRO packets")
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next 3/8] udp: properly complete L4 GRO over UDP tunnel packet
  2021-03-21 17:01 ` [PATCH net-next 3/8] udp: properly complete L4 GRO over UDP tunnel packet Paolo Abeni
@ 2021-03-22 13:30   ` Willem de Bruijn
  2021-03-22 16:59     ` Paolo Abeni
  0 siblings, 1 reply; 35+ messages in thread
From: Willem de Bruijn @ 2021-03-22 13:30 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Network Development, David S. Miller, Jakub Kicinski,
	Steffen Klassert, Alexander Lobakin

On Sun, Mar 21, 2021 at 1:01 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> After the previous patch the stack can do L4 UDP aggregation
> on top of an UDP tunnel.
>
> The current GRO complete code tries frag based aggregation first;
> in the above scenario will generate corrupted frames.
>
> We need to try first UDP tunnel based aggregation, if the GRO
> packet requires that. We can use time GRO 'encap_mark' field
> to track the need GRO complete action. If encap_mark is set,
> skip the frag_list aggregation.
>
> On tunnel encap GRO complete clear such field, so that an inner
> frag_list GRO complete could take action.
>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
>  net/ipv4/udp_offload.c | 8 +++++++-
>  net/ipv6/udp_offload.c | 3 ++-
>  2 files changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
> index 25134a3548e99..54e06b88af69a 100644
> --- a/net/ipv4/udp_offload.c
> +++ b/net/ipv4/udp_offload.c
> @@ -642,6 +642,11 @@ int udp_gro_complete(struct sk_buff *skb, int nhoff,
>                 skb_shinfo(skb)->gso_type = uh->check ? SKB_GSO_UDP_TUNNEL_CSUM
>                                         : SKB_GSO_UDP_TUNNEL;
>
> +               /* clear the encap mark, so that inner frag_list gro_complete
> +                * can take place
> +                */
> +               NAPI_GRO_CB(skb)->encap_mark = 0;
> +
>                 /* Set encapsulation before calling into inner gro_complete()
>                  * functions to make them set up the inner offsets.
>                  */
> @@ -665,7 +670,8 @@ INDIRECT_CALLABLE_SCOPE int udp4_gro_complete(struct sk_buff *skb, int nhoff)
>         const struct iphdr *iph = ip_hdr(skb);
>         struct udphdr *uh = (struct udphdr *)(skb->data + nhoff);
>
> -       if (NAPI_GRO_CB(skb)->is_flist) {
> +       /* do fraglist only if there is no outer UDP encap (or we already processed it) */
> +       if (NAPI_GRO_CB(skb)->is_flist && !NAPI_GRO_CB(skb)->encap_mark) {

Sorry, I don't follow. I thought the point was to avoid fraglist if an
outer udp tunnel header is present. But the above code clears the mark
and allows entering the fraglist branch exactly when such a header is
encountered?

>                 uh->len = htons(skb->len - nhoff);
>
>                 skb_shinfo(skb)->gso_type |= (SKB_GSO_FRAGLIST|SKB_GSO_UDP_L4);
> diff --git a/net/ipv6/udp_offload.c b/net/ipv6/udp_offload.c
> index faa823c242923..b3d9ed96e5ea5 100644
> --- a/net/ipv6/udp_offload.c
> +++ b/net/ipv6/udp_offload.c
> @@ -163,7 +163,8 @@ INDIRECT_CALLABLE_SCOPE int udp6_gro_complete(struct sk_buff *skb, int nhoff)
>         const struct ipv6hdr *ipv6h = ipv6_hdr(skb);
>         struct udphdr *uh = (struct udphdr *)(skb->data + nhoff);
>
> -       if (NAPI_GRO_CB(skb)->is_flist) {
> +       /* do fraglist only if there is no outer UDP encap (or we already processed it) */
> +       if (NAPI_GRO_CB(skb)->is_flist && !NAPI_GRO_CB(skb)->encap_mark) {
>                 uh->len = htons(skb->len - nhoff);
>
>                 skb_shinfo(skb)->gso_type |= (SKB_GSO_FRAGLIST|SKB_GSO_UDP_L4);
> --
> 2.26.2
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next 4/8] udp: never accept GSO_FRAGLIST packets
  2021-03-21 17:01 ` [PATCH net-next 4/8] udp: never accept GSO_FRAGLIST packets Paolo Abeni
@ 2021-03-22 13:42   ` Willem de Bruijn
  2021-03-22 17:09     ` Paolo Abeni
  0 siblings, 1 reply; 35+ messages in thread
From: Willem de Bruijn @ 2021-03-22 13:42 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Network Development, David S. Miller, Jakub Kicinski,
	Steffen Klassert, Alexander Lobakin

On Sun, Mar 21, 2021 at 1:01 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> Currently the UDP protocol delivers GSO_FRAGLIST packets to
> the sockets without the expected segmentation.
>
> This change addresses the issue introducing and maintaining
> a per socket bitmask of GSO types requiring segmentation.
> Enabling GSO removes SKB_GSO_UDP_L4 from such mask, while
> GSO_FRAGLIST packets are never accepted
>
> Note: this also updates the 'unused' field size to really
> fit the otherwise existing hole. It's size become incorrect
> after commit bec1f6f69736 ("udp: generate gso with UDP_SEGMENT").
>
> Fixes: 9fd1ff5d2ac7 ("udp: Support UDP fraglist GRO/GSO.")
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
>  include/linux/udp.h | 10 ++++++----
>  net/ipv4/udp.c      | 12 +++++++++++-
>  2 files changed, 17 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/udp.h b/include/linux/udp.h
> index aa84597bdc33c..6da342f15f351 100644
> --- a/include/linux/udp.h
> +++ b/include/linux/udp.h
> @@ -51,7 +51,7 @@ struct udp_sock {
>                                            * different encapsulation layer set
>                                            * this
>                                            */
> -                        gro_enabled:1; /* Can accept GRO packets */
> +                        gro_enabled:1; /* Request GRO aggregation */

unnecessary comment change?

>         /*
>          * Following member retains the information to create a UDP header
>          * when the socket is uncorked.
> @@ -68,7 +68,10 @@ struct udp_sock {
>  #define UDPLITE_SEND_CC  0x2           /* set via udplite setsockopt         */
>  #define UDPLITE_RECV_CC  0x4           /* set via udplite setsocktopt        */
>         __u8             pcflag;        /* marks socket as UDP-Lite if > 0    */
> -       __u8             unused[3];
> +       __u8             unused[1];
> +       unsigned int     unexpected_gso;/* GSO types this socket can't accept,
> +                                        * any of SKB_GSO_UDP_L4 or SKB_GSO_FRAGLIST
> +                                        */

An extra unsigned int for this seems overkill.

Current sockets that support SKB_GSO_UDP_L4 implicitly also support
SKB_GSO_FRAGLIST. This patch makes explicit that the second is not
supported..

>         /*
>          * For encapsulation sockets.
>          */
> @@ -131,8 +134,7 @@ static inline void udp_cmsg_recv(struct msghdr *msg, struct sock *sk,
>
>  static inline bool udp_unexpected_gso(struct sock *sk, struct sk_buff *skb)
>  {
> -       return !udp_sk(sk)->gro_enabled && skb_is_gso(skb) &&
> -              skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4;
> +       return skb_is_gso(skb) && skb_shinfo(skb)->gso_type & udp_sk(sk)->unexpected_gso;

.. just update this function as follows ?

 -       return !udp_sk(sk)->gro_enabled && skb_is_gso(skb) &&
 -              skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4;
+       return skb_is_gso(skb) &&
+                 (skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST ||
!udp_sk(sk)->gro_enabled)

where the latter is shorthand for

  (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4 && !udp_sk(sk)->gro_enabled)

but the are the only two GSO types that could arrive here.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next 8/8] selftests: net: add UDP GRO forwarding self-tests
  2021-03-21 17:01 ` [PATCH net-next 8/8] selftests: net: add UDP GRO forwarding self-tests Paolo Abeni
@ 2021-03-22 13:44   ` Willem de Bruijn
  2021-03-22 17:18     ` Paolo Abeni
  2021-03-23 17:12     ` Paolo Abeni
  0 siblings, 2 replies; 35+ messages in thread
From: Willem de Bruijn @ 2021-03-22 13:44 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Network Development, David S. Miller, Jakub Kicinski,
	Steffen Klassert, Alexander Lobakin

On Sun, Mar 21, 2021 at 1:02 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> create a bunch of virtual topology and verify that
> GRO_FRAG_LIST and GRO_FWD aggregate the ingress

what are these constants? Aliases for SKB_GSO_FRAGLIST and ?

> packets as expected, and the aggregate packets
> are segmented correctly when landing on a socket
>
> Also test L4 aggregation on top of UDP tunnel (vxlan)
>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Nice comprehensive test, thanks!

> ---
>  tools/testing/selftests/net/Makefile      |   1 +
>  tools/testing/selftests/net/udpgro_fwd.sh | 251 ++++++++++++++++++++++
>  2 files changed, 252 insertions(+)
>  create mode 100755 tools/testing/selftests/net/udpgro_fwd.sh
>
> diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
> index 25f198bec0b25..2d71b283dde36 100644
> --- a/tools/testing/selftests/net/Makefile
> +++ b/tools/testing/selftests/net/Makefile
> @@ -23,6 +23,7 @@ TEST_PROGS += drop_monitor_tests.sh
>  TEST_PROGS += vrf_route_leaking.sh
>  TEST_PROGS += bareudp.sh
>  TEST_PROGS += unicast_extensions.sh
> +TEST_PROGS += udpgro_fwd.sh
>  TEST_PROGS_EXTENDED := in_netns.sh
>  TEST_GEN_FILES =  socket nettest
>  TEST_GEN_FILES += psock_fanout psock_tpacket msg_zerocopy reuseport_addr_any
> diff --git a/tools/testing/selftests/net/udpgro_fwd.sh b/tools/testing/selftests/net/udpgro_fwd.sh
> new file mode 100755
> index 0000000000000..ac7ac56a27524
> --- /dev/null
> +++ b/tools/testing/selftests/net/udpgro_fwd.sh
> @@ -0,0 +1,251 @@
> +#!/bin/sh
> +# SPDX-License-Identifier: GPL-2.0
> +
> +readonly BASE="ns-$(mktemp -u XXXXXX)"
> +readonly SRC=2
> +readonly DST=1
> +readonly DST_NAT=100
> +readonly NS_SRC=$BASE$SRC
> +readonly NS_DST=$BASE$DST
> +
> +# "baremetal" network used for raw UDP traffic
> +readonly BM_NET_V4=192.168.1.
> +readonly BM_NET_V6=2001:db8::
> +
> +# "overlay" network used for UDP over UDP tunnel traffic
> +readonly OL_NET_V4=172.16.1.
> +readonly OL_NET_V6=2002:db8::

is it okay to use a prod64 prefix for this? should this be another
subnet of 2001:db8:: instead? of fd..

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next 1/8] udp: fixup csum for GSO receive slow path
  2021-03-22 13:18   ` Willem de Bruijn
@ 2021-03-22 16:34     ` Paolo Abeni
  2021-03-24  1:45       ` Willem de Bruijn
  0 siblings, 1 reply; 35+ messages in thread
From: Paolo Abeni @ 2021-03-22 16:34 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Network Development, David S. Miller, Jakub Kicinski,
	Steffen Klassert, Alexander Lobakin

On Mon, 2021-03-22 at 09:18 -0400, Willem de Bruijn wrote:
> On Sun, Mar 21, 2021 at 1:01 PM Paolo Abeni <pabeni@redhat.com> wrote:
> > When looping back UDP GSO over UDP tunnel packets to an UDP socket,
> > the individual packet csum is currently set to CSUM_NONE. That causes
> > unexpected/wrong csum validation errors later in the UDP receive path.
> > 
> > We could possibly addressing the issue with some additional check and
> > csum mangling in the UDP tunnel code. Since the issue affects only
> > this UDP receive slow path, let's set a suitable csum status there.
> > 
> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > ---
> >  include/net/udp.h | 18 ++++++++++++++++++
> >  net/ipv4/udp.c    | 10 ++++++++++
> >  net/ipv6/udp.c    |  5 +++++
> >  3 files changed, 33 insertions(+)
> > 
> > diff --git a/include/net/udp.h b/include/net/udp.h
> > index d4d064c592328..007683eb3e113 100644
> > --- a/include/net/udp.h
> > +++ b/include/net/udp.h
> > @@ -515,6 +515,24 @@ static inline struct sk_buff *udp_rcv_segment(struct sock *sk,
> >         return segs;
> >  }
> > 
> > +static inline void udp_post_segment_fix_csum(struct sk_buff *skb, int level)
> > +{
> > +       /* UDP-lite can't land here - no GRO */
> > +       WARN_ON_ONCE(UDP_SKB_CB(skb)->partial_cov);
> > +
> > +       /* GRO already validated the csum up to 'level', and we just
> > +        * consumed one header, update the skb accordingly
> > +        */
> > +       UDP_SKB_CB(skb)->cscov = skb->len;
> > +       if (level) {
> > +               skb->ip_summed = CHECKSUM_UNNECESSARY;
> > +               skb->csum_level = 0;
> > +       } else {
> > +               skb->ip_summed = CHECKSUM_NONE;
> > +               skb->csum_valid = 1;
> > +       }
> 
> why does this function also update these fields for non-tunneled
> packets? the commit only describes an issue with tunneled packets.
> 
> > +}
> > +
> >  #ifdef CONFIG_BPF_SYSCALL
> >  struct sk_psock;
> >  struct proto *udp_bpf_get_proto(struct sock *sk, struct sk_psock *psock);
> > diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> > index 4a0478b17243a..ff54135c51ffa 100644
> > --- a/net/ipv4/udp.c
> > +++ b/net/ipv4/udp.c
> > @@ -2168,6 +2168,7 @@ static int udp_queue_rcv_one_skb(struct sock *sk, struct sk_buff *skb)
> >  static int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
> >  {
> >         struct sk_buff *next, *segs;
> > +       int csum_level;
> >         int ret;
> > 
> >         if (likely(!udp_unexpected_gso(sk, skb)))
> > @@ -2175,9 +2176,18 @@ static int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
> > 
> >         BUILD_BUG_ON(sizeof(struct udp_skb_cb) > SKB_GSO_CB_OFFSET);
> >         __skb_push(skb, -skb_mac_offset(skb));
> > +       csum_level = !!(skb_shinfo(skb)->gso_type &
> > +                       (SKB_GSO_UDP_TUNNEL | SKB_GSO_UDP_TUNNEL_CSUM));
> >         segs = udp_rcv_segment(sk, skb, true);
> >         skb_list_walk_safe(segs, skb, next) {
> >                 __skb_pull(skb, skb_transport_offset(skb));
> > +
> > +               /* UDP GSO packets looped back after adding UDP encap land here with CHECKSUM none,
> > +                * instead of adding another check in the tunnel fastpath, we can force valid
> > +                * csums here (packets are locally generated).
> > +                * Additionally fixup the UDP CB
> > +                */
> > +               udp_post_segment_fix_csum(skb, csum_level);
> 
> How does this code differentiates locally generated packets with udp
> tunnel headers from packets arriving from the wire, for which the
> inner checksum may be incorrect?

First thing first, thank you for the detailed review. Digesting all the
comments will take time, so please excuse for some latency.

I'll try to reply to both your question here because the replies are
related.

My understanding is that UDP GRO, when processing UDP over UDP traffic
with the appropriate features bit set, will validate the checksum for
both the inner and the outer header - udp{4,6}_gro_receive will be
traversed twice, the fist one for the outer header, the 2nd for the
inner.

So when we reach here, the inner header csum could not be incorrect,
and I don't do anything to differentiate locally generated GSO packets
and GROed one to keep the code simpler.

The udp_post_segment_fix_csum() always set the csum info - even for non
tunneled packets to avoid additional branches/make the code more
complex. The csum should be valid in any scenario.

I guess I can mention the above either in a code comment and/or in the
commit message.

Cheers,

Paolo


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next 2/8] udp: skip fwd/list GRO for tunnel packets
  2021-03-22 13:24   ` Willem de Bruijn
@ 2021-03-22 16:41     ` Paolo Abeni
  2021-03-24  1:54       ` Willem de Bruijn
  0 siblings, 1 reply; 35+ messages in thread
From: Paolo Abeni @ 2021-03-22 16:41 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Network Development, David S. Miller, Jakub Kicinski,
	Steffen Klassert, Alexander Lobakin

On Mon, 2021-03-22 at 09:24 -0400, Willem de Bruijn wrote:
> On Sun, Mar 21, 2021 at 1:01 PM Paolo Abeni <pabeni@redhat.com> wrote:
> > If UDP GRO forwarding (or list) is enabled,
> 
> Please explicitly mention the gso type SKB_GSO_FRAGLIST. I, at least,
> didn't immediately grasp that gro forwarding is an alias for that.

I see the commit message was not clear at all, I'm sorry.

The above means:

gso_type & (NETIF_F_GSO_UDP_L4 | NETIF_F_GSO_FRAGLIST) 

:)

> > and there are
> > udp tunnel available in the system, we could end-up doing L4
> > aggregation for packets targeting the UDP tunnel.
> 
> Is this specific to UDP tunnels, or can this also occur with others,
> such as GRE? (not implying that this patchset needs to address those
> at the same time)

I did not look at that before your suggestion. Thanks for pointing out.

I think the problem is specific to UDP: when processing the outer UDP
header that is potentially eligible for both NETIF_F_GSO_UDP_L4 and
gro_receive aggregation and that is the root cause of the problem
addressed here.


> > Just skip the fwd GRO if this packet could land in an UDP
> > tunnel.
> 
> Could you make more clear that this does not skip UDP GRO, only
> switches from fraglist-based to pure SKB_GSO_UDP_L4.

Sure, I'll try to rewrite the commit message.

Thanks!

Paolo


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next 3/8] udp: properly complete L4 GRO over UDP tunnel packet
  2021-03-22 13:30   ` Willem de Bruijn
@ 2021-03-22 16:59     ` Paolo Abeni
  2021-03-24  2:13       ` Willem de Bruijn
  0 siblings, 1 reply; 35+ messages in thread
From: Paolo Abeni @ 2021-03-22 16:59 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Network Development, David S. Miller, Jakub Kicinski,
	Steffen Klassert, Alexander Lobakin

On Mon, 2021-03-22 at 09:30 -0400, Willem de Bruijn wrote:
> On Sun, Mar 21, 2021 at 1:01 PM Paolo Abeni <pabeni@redhat.com> wrote:
> > After the previous patch the stack can do L4 UDP aggregation
> > on top of an UDP tunnel.
> > 
> > The current GRO complete code tries frag based aggregation first;
> > in the above scenario will generate corrupted frames.
> > 
> > We need to try first UDP tunnel based aggregation, if the GRO
> > packet requires that. We can use time GRO 'encap_mark' field
> > to track the need GRO complete action. If encap_mark is set,
> > skip the frag_list aggregation.
> > 
> > On tunnel encap GRO complete clear such field, so that an inner
> > frag_list GRO complete could take action.
> > 
> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > ---
> >  net/ipv4/udp_offload.c | 8 +++++++-
> >  net/ipv6/udp_offload.c | 3 ++-
> >  2 files changed, 9 insertions(+), 2 deletions(-)
> > 
> > diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
> > index 25134a3548e99..54e06b88af69a 100644
> > --- a/net/ipv4/udp_offload.c
> > +++ b/net/ipv4/udp_offload.c
> > @@ -642,6 +642,11 @@ int udp_gro_complete(struct sk_buff *skb, int nhoff,
> >                 skb_shinfo(skb)->gso_type = uh->check ? SKB_GSO_UDP_TUNNEL_CSUM
> >                                         : SKB_GSO_UDP_TUNNEL;
> > 
> > +               /* clear the encap mark, so that inner frag_list gro_complete
> > +                * can take place
> > +                */
> > +               NAPI_GRO_CB(skb)->encap_mark = 0;
> > +
> >                 /* Set encapsulation before calling into inner gro_complete()
> >                  * functions to make them set up the inner offsets.
> >                  */
> > @@ -665,7 +670,8 @@ INDIRECT_CALLABLE_SCOPE int udp4_gro_complete(struct sk_buff *skb, int nhoff)
> >         const struct iphdr *iph = ip_hdr(skb);
> >         struct udphdr *uh = (struct udphdr *)(skb->data + nhoff);
> > 
> > -       if (NAPI_GRO_CB(skb)->is_flist) {
> > +       /* do fraglist only if there is no outer UDP encap (or we already processed it) */
> > +       if (NAPI_GRO_CB(skb)->is_flist && !NAPI_GRO_CB(skb)->encap_mark) {
> 
> Sorry, I don't follow. I thought the point was to avoid fraglist if an
> outer udp tunnel header is present. But the above code clears the mark
> and allows entering the fraglist branch exactly when such a header is
> encountered?

The relevant UDP packet has gone through:

[l2/l3 GRO] -> udp_gro_receive  -> udp_sk(sk)->gro_receive -> [some
more GRO layers] -> udp_gro_receive (again)

The first udp_gro_receive set NAPI_GRO_CB(skb)->encap_mark, the
latter udp_gro_receive set NAPI_GRO_CB(skb)->is_flist.

Then, at GRO complete time:

[l2/l3 GRO] -> udp{4,6}_gro_complete -> udp_sk(sk)->gro_complete ->
[more GRO layers] -> udp{4,6}_gro_complete (again).

In the first udp{4,6}_gro_complete invocation 'encap_mark' is 1, so
with this patch we do the 'udp_sk(sk)->gro_complete' path. In the
second udp{4,6}_gro_complete invocation 'encap_mark' has been cleared
(by udp_gro_complete), so we do the SKB_GSO_FRAGLIST completion.

In case SKB_GSO_FRAGLIST with no UDP tunnel, 'encap_mark' is 0 and we
do the SKB_GSO_FRAGLIST completion.

Another alternative, possibly more readable, would be avoid clearing
'encap_mark' in udp_gro_complete() and replacing the above check with:

	if (NAPI_GRO_CB(skb)->is_flist &&
            (!NAPI_GRO_CB(skb)->encap_mark ||
 	     (NAPI_GRO_CB(skb)->encap_mark && skb->encapsulation))) {

I opted otherwise to simplify the conditional expression.

Please let me know if the above is somewhat more clear and if you have
preferecens between the two options.

Cheers,

Paolo


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next 4/8] udp: never accept GSO_FRAGLIST packets
  2021-03-22 13:42   ` Willem de Bruijn
@ 2021-03-22 17:09     ` Paolo Abeni
  2021-03-24  2:21       ` Willem de Bruijn
  0 siblings, 1 reply; 35+ messages in thread
From: Paolo Abeni @ 2021-03-22 17:09 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Network Development, David S. Miller, Jakub Kicinski,
	Steffen Klassert, Alexander Lobakin

On Mon, 2021-03-22 at 09:42 -0400, Willem de Bruijn wrote:
> On Sun, Mar 21, 2021 at 1:01 PM Paolo Abeni <pabeni@redhat.com> wrote:
> > Currently the UDP protocol delivers GSO_FRAGLIST packets to
> > the sockets without the expected segmentation.
> > 
> > This change addresses the issue introducing and maintaining
> > a per socket bitmask of GSO types requiring segmentation.
> > Enabling GSO removes SKB_GSO_UDP_L4 from such mask, while
> > GSO_FRAGLIST packets are never accepted
> > 
> > Note: this also updates the 'unused' field size to really
> > fit the otherwise existing hole. It's size become incorrect
> > after commit bec1f6f69736 ("udp: generate gso with UDP_SEGMENT").
> > 
> > Fixes: 9fd1ff5d2ac7 ("udp: Support UDP fraglist GRO/GSO.")
> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > ---
> >  include/linux/udp.h | 10 ++++++----
> >  net/ipv4/udp.c      | 12 +++++++++++-
> >  2 files changed, 17 insertions(+), 5 deletions(-)
> > 
> > diff --git a/include/linux/udp.h b/include/linux/udp.h
> > index aa84597bdc33c..6da342f15f351 100644
> > --- a/include/linux/udp.h
> > +++ b/include/linux/udp.h
> > @@ -51,7 +51,7 @@ struct udp_sock {
> >                                            * different encapsulation layer set
> >                                            * this
> >                                            */
> > -                        gro_enabled:1; /* Can accept GRO packets */
> > +                        gro_enabled:1; /* Request GRO aggregation */
> 
> unnecessary comment change?

Before this patch 'gro_enabled' was used in udp_unexpected_gso() to
check for GSO packets acceptance, after this patch such field is not
used there anymore, so does not carry explicilty the 'accept GRO
packets' semantic.

Anyway I don't have strong feeling regarding changing or not this
comment

> >         /*
> >          * Following member retains the information to create a UDP header
> >          * when the socket is uncorked.
> > @@ -68,7 +68,10 @@ struct udp_sock {
> >  #define UDPLITE_SEND_CC  0x2           /* set via udplite setsockopt         */
> >  #define UDPLITE_RECV_CC  0x4           /* set via udplite setsocktopt        */
> >         __u8             pcflag;        /* marks socket as UDP-Lite if > 0    */
> > -       __u8             unused[3];
> > +       __u8             unused[1];
> > +       unsigned int     unexpected_gso;/* GSO types this socket can't accept,
> > +                                        * any of SKB_GSO_UDP_L4 or SKB_GSO_FRAGLIST
> > +                                        */
> 
> An extra unsigned int for this seems overkill.

Should be more clear after the next patch.

Using an explicit 'acceptable GSO types' field makes the patch 5/8
quite simple.

After this patch the 'udp_sock' struct size remains unchanged and even
the number of 'udp_sock' cachelines touched for every packet is
unchanged.

I opted for an 'unsigned int' so that I could simply copy a gso_type
there.

> Current sockets that support SKB_GSO_UDP_L4 implicitly also support
> SKB_GSO_FRAGLIST. This patch makes explicit that the second is not
> supported..
> 
> >         /*
> >          * For encapsulation sockets.
> >          */
> > @@ -131,8 +134,7 @@ static inline void udp_cmsg_recv(struct msghdr *msg, struct sock *sk,
> > 
> >  static inline bool udp_unexpected_gso(struct sock *sk, struct sk_buff *skb)
> >  {
> > -       return !udp_sk(sk)->gro_enabled && skb_is_gso(skb) &&
> > -              skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4;
> > +       return skb_is_gso(skb) && skb_shinfo(skb)->gso_type & udp_sk(sk)->unexpected_gso;
> 
> .. just update this function as follows ?
> 
>  -       return !udp_sk(sk)->gro_enabled && skb_is_gso(skb) &&
>  -              skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4;
> +       return skb_is_gso(skb) &&
> +                 (skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST ||
> !udp_sk(sk)->gro_enabled)
> 
> where the latter is shorthand for
> 
>   (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4 && !udp_sk(sk)->gro_enabled)
> 
> but the are the only two GSO types that could arrive here.

With the above patch 5/8 becomes messy ?!?

Thanks!

Paolo


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next 8/8] selftests: net: add UDP GRO forwarding self-tests
  2021-03-22 13:44   ` Willem de Bruijn
@ 2021-03-22 17:18     ` Paolo Abeni
  2021-03-23 17:12     ` Paolo Abeni
  1 sibling, 0 replies; 35+ messages in thread
From: Paolo Abeni @ 2021-03-22 17:18 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Network Development, David S. Miller, Jakub Kicinski,
	Steffen Klassert, Alexander Lobakin

On Mon, 2021-03-22 at 09:44 -0400, Willem de Bruijn wrote:
> On Sun, Mar 21, 2021 at 1:02 PM Paolo Abeni <pabeni@redhat.com> wrote:
> > create a bunch of virtual topology and verify that
> > GRO_FRAG_LIST and GRO_FWD aggregate the ingress
> 
> what are these constants? Aliases for SKB_GSO_FRAGLIST and ?

Well, I was inaccurate in many ways :(

I was speaking about device features, so it's really:

NETIF_F_GSO_FRAGLIST and NETIF_F_GRO_UDP_FWD

I really would love a commit message "compiler" to clarify this sort of
thing before submission;)

Thanks,

Paolo


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next 8/8] selftests: net: add UDP GRO forwarding self-tests
  2021-03-22 13:44   ` Willem de Bruijn
  2021-03-22 17:18     ` Paolo Abeni
@ 2021-03-23 17:12     ` Paolo Abeni
  1 sibling, 0 replies; 35+ messages in thread
From: Paolo Abeni @ 2021-03-23 17:12 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Network Development, David S. Miller, Jakub Kicinski,
	Steffen Klassert, Alexander Lobakin

Hello,

On Mon, 2021-03-22 at 09:44 -0400, Willem de Bruijn wrote:
> > diff --git a/tools/testing/selftests/net/udpgro_fwd.sh b/tools/testing/selftests/net/udpgro_fwd.sh
> > new file mode 100755
> > index 0000000000000..ac7ac56a27524
> > --- /dev/null
> > +++ b/tools/testing/selftests/net/udpgro_fwd.sh
> > @@ -0,0 +1,251 @@
> > +#!/bin/sh
> > +# SPDX-License-Identifier: GPL-2.0
> > +
> > +readonly BASE="ns-$(mktemp -u XXXXXX)"
> > +readonly SRC=2
> > +readonly DST=1
> > +readonly DST_NAT=100
> > +readonly NS_SRC=$BASE$SRC
> > +readonly NS_DST=$BASE$DST
> > +
> > +# "baremetal" network used for raw UDP traffic
> > +readonly BM_NET_V4=192.168.1.
> > +readonly BM_NET_V6=2001:db8::
> > +
> > +# "overlay" network used for UDP over UDP tunnel traffic
> > +readonly OL_NET_V4=172.16.1.
> > +readonly OL_NET_V6=2002:db8::
> 
> is it okay to use a prod64 prefix for this? should this be another
> subnet of 2001:db8:: instead? of fd..

It looks like this comment slipped out of my sight... yep, I'll use
2001:db8:1:: for the overlay network in the next iteration.

Thanks!

Paolo


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next 1/8] udp: fixup csum for GSO receive slow path
  2021-03-22 16:34     ` Paolo Abeni
@ 2021-03-24  1:45       ` Willem de Bruijn
  2021-03-24  1:49         ` Willem de Bruijn
  2021-03-24 14:37         ` Paolo Abeni
  0 siblings, 2 replies; 35+ messages in thread
From: Willem de Bruijn @ 2021-03-24  1:45 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Network Development, David S. Miller, Jakub Kicinski,
	Steffen Klassert, Alexander Lobakin

On Mon, Mar 22, 2021 at 12:36 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On Mon, 2021-03-22 at 09:18 -0400, Willem de Bruijn wrote:
> > On Sun, Mar 21, 2021 at 1:01 PM Paolo Abeni <pabeni@redhat.com> wrote:
> > > When looping back UDP GSO over UDP tunnel packets to an UDP socket,
> > > the individual packet csum is currently set to CSUM_NONE. That causes
> > > unexpected/wrong csum validation errors later in the UDP receive path.
> > >
> > > We could possibly addressing the issue with some additional check and
> > > csum mangling in the UDP tunnel code. Since the issue affects only
> > > this UDP receive slow path, let's set a suitable csum status there.
> > >
> > > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > > ---
> > >  include/net/udp.h | 18 ++++++++++++++++++
> > >  net/ipv4/udp.c    | 10 ++++++++++
> > >  net/ipv6/udp.c    |  5 +++++
> > >  3 files changed, 33 insertions(+)
> > >
> > > diff --git a/include/net/udp.h b/include/net/udp.h
> > > index d4d064c592328..007683eb3e113 100644
> > > --- a/include/net/udp.h
> > > +++ b/include/net/udp.h
> > > @@ -515,6 +515,24 @@ static inline struct sk_buff *udp_rcv_segment(struct sock *sk,
> > >         return segs;
> > >  }
> > >
> > > +static inline void udp_post_segment_fix_csum(struct sk_buff *skb, int level)
> > > +{
> > > +       /* UDP-lite can't land here - no GRO */
> > > +       WARN_ON_ONCE(UDP_SKB_CB(skb)->partial_cov);
> > > +
> > > +       /* GRO already validated the csum up to 'level', and we just
> > > +        * consumed one header, update the skb accordingly
> > > +        */
> > > +       UDP_SKB_CB(skb)->cscov = skb->len;
> > > +       if (level) {
> > > +               skb->ip_summed = CHECKSUM_UNNECESSARY;
> > > +               skb->csum_level = 0;
> > > +       } else {
> > > +               skb->ip_summed = CHECKSUM_NONE;
> > > +               skb->csum_valid = 1;
> > > +       }
> >
> > why does this function also update these fields for non-tunneled
> > packets? the commit only describes an issue with tunneled packets.
> >
> > > +}
> > > +
> > >  #ifdef CONFIG_BPF_SYSCALL
> > >  struct sk_psock;
> > >  struct proto *udp_bpf_get_proto(struct sock *sk, struct sk_psock *psock);
> > > diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> > > index 4a0478b17243a..ff54135c51ffa 100644
> > > --- a/net/ipv4/udp.c
> > > +++ b/net/ipv4/udp.c
> > > @@ -2168,6 +2168,7 @@ static int udp_queue_rcv_one_skb(struct sock *sk, struct sk_buff *skb)
> > >  static int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
> > >  {
> > >         struct sk_buff *next, *segs;
> > > +       int csum_level;
> > >         int ret;
> > >
> > >         if (likely(!udp_unexpected_gso(sk, skb)))
> > > @@ -2175,9 +2176,18 @@ static int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
> > >
> > >         BUILD_BUG_ON(sizeof(struct udp_skb_cb) > SKB_GSO_CB_OFFSET);
> > >         __skb_push(skb, -skb_mac_offset(skb));
> > > +       csum_level = !!(skb_shinfo(skb)->gso_type &
> > > +                       (SKB_GSO_UDP_TUNNEL | SKB_GSO_UDP_TUNNEL_CSUM));
> > >         segs = udp_rcv_segment(sk, skb, true);
> > >         skb_list_walk_safe(segs, skb, next) {
> > >                 __skb_pull(skb, skb_transport_offset(skb));
> > > +
> > > +               /* UDP GSO packets looped back after adding UDP encap land here with CHECKSUM none,
> > > +                * instead of adding another check in the tunnel fastpath, we can force valid
> > > +                * csums here (packets are locally generated).
> > > +                * Additionally fixup the UDP CB
> > > +                */
> > > +               udp_post_segment_fix_csum(skb, csum_level);
> >
> > How does this code differentiates locally generated packets with udp
> > tunnel headers from packets arriving from the wire, for which the
> > inner checksum may be incorrect?
>
> First thing first, thank you for the detailed review. Digesting all the
> comments will take time, so please excuse for some latency.

Apologies for my own delayed response. I also need to take time to
understand the existing code and diffs :) And have a few questions.

> I'll try to reply to both your question here because the replies are
> related.
>
> My understanding is that UDP GRO, when processing UDP over UDP traffic

This is a UDP GSO packet egress packet that was further encapsulated
with a GSO_UDP_TUNNEL on egress, then looped to the ingress path?

Then in the ingress path it has traversed the GRO layer.

Is this veth with XDP? That seems unlikely for GSO packets. But there
aren't many paths that will loop a packet through napi_gro_receive or
napi_gro_frags.

> with the appropriate features bit set, will validate the checksum for
> both the inner and the outer header - udp{4,6}_gro_receive will be
> traversed twice, the fist one for the outer header, the 2nd for the
> inner.

GRO will validate multiple levels of checksums with CHECKSUM_COMPLETE.
It can only validate the outer checksum with CHECKSUM_UNNECESSARY, I
believe?

As for looped packets with CHECKSUM_PARTIAL: we definitely have found
bugs in that path before. I think it's fine to set csum_valid on any
packets that can unambiguously be identified as such. Hence the
detailed questions above on which exact packets this code is
targeting, so that there are not accidental false positives that look
the same but have a different ip_summed.

> So when we reach here, the inner header csum could not be incorrect,
> and I don't do anything to differentiate locally generated GSO packets
> and GROed one to keep the code simpler.

But the correctness of the patch depends on them being locally
generated, if I'm not mistaken. Then I would make it explicit.

> The udp_post_segment_fix_csum() always set the csum info - even for non
> tunneled packets to avoid additional branches/make the code more
> complex. The csum should be valid in any scenario.
>
> I guess I can mention the above either in a code comment and/or in the
> commit message.

I don't follow this. The patch adds an else clause for non-tunneled
packets. Why does it update these? It does not seem like avoiding
additional branches, but adding them. So I must be missing something
more involved.

> Cheers,
>
> Paolo
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next 1/8] udp: fixup csum for GSO receive slow path
  2021-03-24  1:45       ` Willem de Bruijn
@ 2021-03-24  1:49         ` Willem de Bruijn
  2021-03-24 14:37         ` Paolo Abeni
  1 sibling, 0 replies; 35+ messages in thread
From: Willem de Bruijn @ 2021-03-24  1:49 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Network Development, David S. Miller, Jakub Kicinski,
	Steffen Klassert, Alexander Lobakin

> > > > @@ -2168,6 +2168,7 @@ static int udp_queue_rcv_one_skb(struct sock *sk, struct sk_buff *skb)
> > > >  static int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
> > > >  {
> > > >         struct sk_buff *next, *segs;
> > > > +       int csum_level;
> > > >         int ret;
> > > >
> > > >         if (likely(!udp_unexpected_gso(sk, skb)))
> > > > @@ -2175,9 +2176,18 @@ static int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
> > > >
> > > >         BUILD_BUG_ON(sizeof(struct udp_skb_cb) > SKB_GSO_CB_OFFSET);
> > > >         __skb_push(skb, -skb_mac_offset(skb));
> > > > +       csum_level = !!(skb_shinfo(skb)->gso_type &
> > > > +                       (SKB_GSO_UDP_TUNNEL | SKB_GSO_UDP_TUNNEL_CSUM));
> > > >         segs = udp_rcv_segment(sk, skb, true);
> > > >         skb_list_walk_safe(segs, skb, next) {
> > > >                 __skb_pull(skb, skb_transport_offset(skb));
> > > > +
> > > > +               /* UDP GSO packets looped back after adding UDP encap land here with CHECKSUM none,
> > > > +                * instead of adding another check in the tunnel fastpath, we can force valid
> > > > +                * csums here (packets are locally generated).
> > > > +                * Additionally fixup the UDP CB
> > > > +                */

Btw, instead of this comment plus a comment in the ipv6 code that
points back to this ipv4 comment, I would move the explanation to
the udp_post_segment_fix_csum function itself.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next 2/8] udp: skip fwd/list GRO for tunnel packets
  2021-03-22 16:41     ` Paolo Abeni
@ 2021-03-24  1:54       ` Willem de Bruijn
  2021-03-24 14:50         ` ! Paolo Abeni
  0 siblings, 1 reply; 35+ messages in thread
From: Willem de Bruijn @ 2021-03-24  1:54 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Network Development, David S. Miller, Jakub Kicinski,
	Steffen Klassert, Alexander Lobakin

> > > and there are
> > > udp tunnel available in the system, we could end-up doing L4
> > > aggregation for packets targeting the UDP tunnel.
> >
> > Is this specific to UDP tunnels, or can this also occur with others,
> > such as GRE? (not implying that this patchset needs to address those
> > at the same time)

I suppose GRE tunnels do not advertise GSO_UDP_L4 support, so GSO
packets would get segmented before entering the tunnel device.

Forwarded datagrams exceeding egress device MTU (whether tunnel or
not) is a wholly separate problem.

> I did not look at that before your suggestion. Thanks for pointing out.
>
> I think the problem is specific to UDP: when processing the outer UDP
> header that is potentially eligible for both NETIF_F_GSO_UDP_L4 and
> gro_receive aggregation and that is the root cause of the problem
> addressed here.

Can you elaborate on the exact problem? The commit mentions "inner
protocol corruption, as no overaly network parameters is taken in
account at aggregation time."

My understanding is that these are udp gro aggregated GSO_UDP_L4
packets forwarded to a udp tunnel device. They are not encapsulated
yet. Which overlay network parameters are not, but should have been,
taken account at aggregation time?

>
>
> > > Just skip the fwd GRO if this packet could land in an UDP
> > > tunnel.
> >
> > Could you make more clear that this does not skip UDP GRO, only
> > switches from fraglist-based to pure SKB_GSO_UDP_L4.
>
> Sure, I'll try to rewrite the commit message.
>
> Thanks!
>
> Paolo
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next 3/8] udp: properly complete L4 GRO over UDP tunnel packet
  2021-03-22 16:59     ` Paolo Abeni
@ 2021-03-24  2:13       ` Willem de Bruijn
  0 siblings, 0 replies; 35+ messages in thread
From: Willem de Bruijn @ 2021-03-24  2:13 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Willem de Bruijn, Network Development, David S. Miller,
	Jakub Kicinski, Steffen Klassert, Alexander Lobakin

On Mon, Mar 22, 2021 at 1:00 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On Mon, 2021-03-22 at 09:30 -0400, Willem de Bruijn wrote:
> > On Sun, Mar 21, 2021 at 1:01 PM Paolo Abeni <pabeni@redhat.com> wrote:
> > > After the previous patch the stack can do L4 UDP aggregation
> > > on top of an UDP tunnel.
> > >
> > > The current GRO complete code tries frag based aggregation first;
> > > in the above scenario will generate corrupted frames.
> > >
> > > We need to try first UDP tunnel based aggregation, if the GRO
> > > packet requires that. We can use time GRO 'encap_mark' field
> > > to track the need GRO complete action. If encap_mark is set,
> > > skip the frag_list aggregation.
> > >
> > > On tunnel encap GRO complete clear such field, so that an inner
> > > frag_list GRO complete could take action.
> > >
> > > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > > ---
> > >  net/ipv4/udp_offload.c | 8 +++++++-
> > >  net/ipv6/udp_offload.c | 3 ++-
> > >  2 files changed, 9 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
> > > index 25134a3548e99..54e06b88af69a 100644
> > > --- a/net/ipv4/udp_offload.c
> > > +++ b/net/ipv4/udp_offload.c
> > > @@ -642,6 +642,11 @@ int udp_gro_complete(struct sk_buff *skb, int nhoff,
> > >                 skb_shinfo(skb)->gso_type = uh->check ? SKB_GSO_UDP_TUNNEL_CSUM
> > >                                         : SKB_GSO_UDP_TUNNEL;
> > >
> > > +               /* clear the encap mark, so that inner frag_list gro_complete
> > > +                * can take place
> > > +                */
> > > +               NAPI_GRO_CB(skb)->encap_mark = 0;
> > > +
> > >                 /* Set encapsulation before calling into inner gro_complete()
> > >                  * functions to make them set up the inner offsets.
> > >                  */
> > > @@ -665,7 +670,8 @@ INDIRECT_CALLABLE_SCOPE int udp4_gro_complete(struct sk_buff *skb, int nhoff)
> > >         const struct iphdr *iph = ip_hdr(skb);
> > >         struct udphdr *uh = (struct udphdr *)(skb->data + nhoff);
> > >
> > > -       if (NAPI_GRO_CB(skb)->is_flist) {
> > > +       /* do fraglist only if there is no outer UDP encap (or we already processed it) */
> > > +       if (NAPI_GRO_CB(skb)->is_flist && !NAPI_GRO_CB(skb)->encap_mark) {
> >
> > Sorry, I don't follow. I thought the point was to avoid fraglist if an
> > outer udp tunnel header is present. But the above code clears the mark
> > and allows entering the fraglist branch exactly when such a header is
> > encountered?
>
> The relevant UDP packet has gone through:
>
> [l2/l3 GRO] -> udp_gro_receive  -> udp_sk(sk)->gro_receive -> [some
> more GRO layers] -> udp_gro_receive (again)
>
> The first udp_gro_receive set NAPI_GRO_CB(skb)->encap_mark, the
> latter udp_gro_receive set NAPI_GRO_CB(skb)->is_flist.
>
> Then, at GRO complete time:
>
> [l2/l3 GRO] -> udp{4,6}_gro_complete -> udp_sk(sk)->gro_complete ->
> [more GRO layers] -> udp{4,6}_gro_complete (again).
>
> In the first udp{4,6}_gro_complete invocation 'encap_mark' is 1, so
> with this patch we do the 'udp_sk(sk)->gro_complete' path. In the
> second udp{4,6}_gro_complete invocation 'encap_mark' has been cleared
> (by udp_gro_complete), so we do the SKB_GSO_FRAGLIST completion.
>
> In case SKB_GSO_FRAGLIST with no UDP tunnel, 'encap_mark' is 0 and we
> do the SKB_GSO_FRAGLIST completion.
>
> Another alternative, possibly more readable, would be avoid clearing
> 'encap_mark' in udp_gro_complete() and replacing the above check with:
>
>         if (NAPI_GRO_CB(skb)->is_flist &&
>             (!NAPI_GRO_CB(skb)->encap_mark ||
>              (NAPI_GRO_CB(skb)->encap_mark && skb->encapsulation))) {
>
> I opted otherwise to simplify the conditional expression.
>
> Please let me know if the above is somewhat more clear and if you have
> preferecens between the two options.

Got it now, thanks. Yes, this patch makes sense.

When the GRO layer has built a GSO packet with both inner UDP
(GSO_UDP_L4) and UDP tunnel header (GSO_UDP_TUNNEL), then on
completion udp{4,6}_gro_complete will be called twice. This function
will enter its is_flist branch immediately, even though that is only
correct on the second call, as GSO_FRAGLIST is only relevant for the
inner packet. Skip this while encap_mark == 1, identifying processing
of the outer tunnel header. Clear the field to enter the path on the next
round, for the inner header.

This is subject to only supporting a single layer of encap, but that
is indeed the current state afaik.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next 4/8] udp: never accept GSO_FRAGLIST packets
  2021-03-22 17:09     ` Paolo Abeni
@ 2021-03-24  2:21       ` Willem de Bruijn
  2021-03-24 18:59         ` Paolo Abeni
  0 siblings, 1 reply; 35+ messages in thread
From: Willem de Bruijn @ 2021-03-24  2:21 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Network Development, David S. Miller, Jakub Kicinski,
	Steffen Klassert, Alexander Lobakin

On Mon, Mar 22, 2021 at 1:12 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On Mon, 2021-03-22 at 09:42 -0400, Willem de Bruijn wrote:
> > On Sun, Mar 21, 2021 at 1:01 PM Paolo Abeni <pabeni@redhat.com> wrote:
> > > Currently the UDP protocol delivers GSO_FRAGLIST packets to
> > > the sockets without the expected segmentation.
> > >
> > > This change addresses the issue introducing and maintaining
> > > a per socket bitmask of GSO types requiring segmentation.
> > > Enabling GSO removes SKB_GSO_UDP_L4 from such mask, while
> > > GSO_FRAGLIST packets are never accepted
> > >
> > > Note: this also updates the 'unused' field size to really
> > > fit the otherwise existing hole. It's size become incorrect
> > > after commit bec1f6f69736 ("udp: generate gso with UDP_SEGMENT").
> > >
> > > Fixes: 9fd1ff5d2ac7 ("udp: Support UDP fraglist GRO/GSO.")
> > > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > > ---
> > >  include/linux/udp.h | 10 ++++++----
> > >  net/ipv4/udp.c      | 12 +++++++++++-
> > >  2 files changed, 17 insertions(+), 5 deletions(-)
> > >

> > >         /*
> > >          * Following member retains the information to create a UDP header
> > >          * when the socket is uncorked.
> > > @@ -68,7 +68,10 @@ struct udp_sock {
> > >  #define UDPLITE_SEND_CC  0x2           /* set via udplite setsockopt         */
> > >  #define UDPLITE_RECV_CC  0x4           /* set via udplite setsocktopt        */
> > >         __u8             pcflag;        /* marks socket as UDP-Lite if > 0    */
> > > -       __u8             unused[3];
> > > +       __u8             unused[1];
> > > +       unsigned int     unexpected_gso;/* GSO types this socket can't accept,
> > > +                                        * any of SKB_GSO_UDP_L4 or SKB_GSO_FRAGLIST
> > > +                                        */
> >
> > An extra unsigned int for this seems overkill.
>
> Should be more clear after the next patch.
>
> Using an explicit 'acceptable GSO types' field makes the patch 5/8
> quite simple.
>
> After this patch the 'udp_sock' struct size remains unchanged and even
> the number of 'udp_sock' cachelines touched for every packet is
> unchanged.

But there is opportunity cost, of course. Next time we need to add
something to the struct, we will add a new cacheline.

A 32-bit field for just 2 bits, where 1 already exists does seem like overkill.

More importantly, I just think it's less obvious code than a pair of fields

  accepts_udp_l4:1,
  accepts_udp_fraglist:1,

Local sockets can only accept the first, as there does not exist an
interface to pass along the multiple frag sizes that a frag_list based
approach might have.

Sockets with encap_rcv != NULL may opt-in to being able to handle either.

I think explicit code will be more maintainable. At the cost of
perhaps two branches instead of one, admittedly. But that seems
premature optimization.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next 1/8] udp: fixup csum for GSO receive slow path
  2021-03-24  1:45       ` Willem de Bruijn
  2021-03-24  1:49         ` Willem de Bruijn
@ 2021-03-24 14:37         ` Paolo Abeni
  2021-03-24 22:36           ` Willem de Bruijn
  1 sibling, 1 reply; 35+ messages in thread
From: Paolo Abeni @ 2021-03-24 14:37 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Network Development, David S. Miller, Jakub Kicinski,
	Steffen Klassert, Alexander Lobakin

On Tue, 2021-03-23 at 21:45 -0400, Willem de Bruijn wrote:
> On Mon, Mar 22, 2021 at 12:36 PM Paolo Abeni <pabeni@redhat.com> wrote:
> > On Mon, 2021-03-22 at 09:18 -0400, Willem de Bruijn wrote:
> > > On Sun, Mar 21, 2021 at 1:01 PM Paolo Abeni <pabeni@redhat.com> wrote:
> > > > When looping back UDP GSO over UDP tunnel packets to an UDP socket,
> > > > the individual packet csum is currently set to CSUM_NONE. That causes
> > > > unexpected/wrong csum validation errors later in the UDP receive path.
> > > > 
> > > > We could possibly addressing the issue with some additional check and
> > > > csum mangling in the UDP tunnel code. Since the issue affects only
> > > > this UDP receive slow path, let's set a suitable csum status there.
> > > > 
> > > > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > > > ---
> > > >  include/net/udp.h | 18 ++++++++++++++++++
> > > >  net/ipv4/udp.c    | 10 ++++++++++
> > > >  net/ipv6/udp.c    |  5 +++++
> > > >  3 files changed, 33 insertions(+)
> > > > 
> > > > diff --git a/include/net/udp.h b/include/net/udp.h
> > > > index d4d064c592328..007683eb3e113 100644
> > > > --- a/include/net/udp.h
> > > > +++ b/include/net/udp.h
> > > > @@ -515,6 +515,24 @@ static inline struct sk_buff *udp_rcv_segment(struct sock *sk,
> > > >         return segs;
> > > >  }
> > > > 
> > > > +static inline void udp_post_segment_fix_csum(struct sk_buff *skb, int level)
> > > > +{
> > > > +       /* UDP-lite can't land here - no GRO */
> > > > +       WARN_ON_ONCE(UDP_SKB_CB(skb)->partial_cov);
> > > > +
> > > > +       /* GRO already validated the csum up to 'level', and we just
> > > > +        * consumed one header, update the skb accordingly
> > > > +        */
> > > > +       UDP_SKB_CB(skb)->cscov = skb->len;
> > > > +       if (level) {
> > > > +               skb->ip_summed = CHECKSUM_UNNECESSARY;
> > > > +               skb->csum_level = 0;
> > > > +       } else {
> > > > +               skb->ip_summed = CHECKSUM_NONE;
> > > > +               skb->csum_valid = 1;
> > > > +       }
> > > 
> > > why does this function also update these fields for non-tunneled
> > > packets? the commit only describes an issue with tunneled packets.
> > > 
> > > > +}
> > > > +
> > > >  #ifdef CONFIG_BPF_SYSCALL
> > > >  struct sk_psock;
> > > >  struct proto *udp_bpf_get_proto(struct sock *sk, struct sk_psock *psock);
> > > > diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> > > > index 4a0478b17243a..ff54135c51ffa 100644
> > > > --- a/net/ipv4/udp.c
> > > > +++ b/net/ipv4/udp.c
> > > > @@ -2168,6 +2168,7 @@ static int udp_queue_rcv_one_skb(struct sock *sk, struct sk_buff *skb)
> > > >  static int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
> > > >  {
> > > >         struct sk_buff *next, *segs;
> > > > +       int csum_level;
> > > >         int ret;
> > > > 
> > > >         if (likely(!udp_unexpected_gso(sk, skb)))
> > > > @@ -2175,9 +2176,18 @@ static int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
> > > > 
> > > >         BUILD_BUG_ON(sizeof(struct udp_skb_cb) > SKB_GSO_CB_OFFSET);
> > > >         __skb_push(skb, -skb_mac_offset(skb));
> > > > +       csum_level = !!(skb_shinfo(skb)->gso_type &
> > > > +                       (SKB_GSO_UDP_TUNNEL | SKB_GSO_UDP_TUNNEL_CSUM));
> > > >         segs = udp_rcv_segment(sk, skb, true);
> > > >         skb_list_walk_safe(segs, skb, next) {
> > > >                 __skb_pull(skb, skb_transport_offset(skb));
> > > > +
> > > > +               /* UDP GSO packets looped back after adding UDP encap land here with CHECKSUM none,
> > > > +                * instead of adding another check in the tunnel fastpath, we can force valid
> > > > +                * csums here (packets are locally generated).
> > > > +                * Additionally fixup the UDP CB
> > > > +                */
> > > > +               udp_post_segment_fix_csum(skb, csum_level);
> > > 
> > > How does this code differentiates locally generated packets with udp
> > > tunnel headers from packets arriving from the wire, for which the
> > > inner checksum may be incorrect?
> > 
> > First thing first, thank you for the detailed review. Digesting all the
> > comments will take time, so please excuse for some latency.
> 
> Apologies for my own delayed response. I also need to take time to
> understand the existing code and diffs :) And have a few questions.
> 
> > I'll try to reply to both your question here because the replies are
> > related.
> > 
> > My understanding is that UDP GRO, when processing UDP over UDP traffic
> 
> This is a UDP GSO packet egress packet that was further encapsulated
> with a GSO_UDP_TUNNEL on egress, then looped to the ingress path?
> 
> Then in the ingress path it has traversed the GRO layer.
> 
> Is this veth with XDP? That seems unlikely for GSO packets. But there
> aren't many paths that will loop a packet through napi_gro_receive or
> napi_gro_frags.

This patch addresses the following scenario:

sk ->vxlan -> veth -> (xdp in use, TSO disabled, GRO on) -> veth -> vxlan -> sk

What I meant here is that the issue is not visible with:

(baremetal, NETIF_F_GRO_UDP_FWD | NETIF_F_GRO_FRAGLIST enabled -> vxlan -> sk

> > with the appropriate features bit set, will validate the checksum for
> > both the inner and the outer header - udp{4,6}_gro_receive will be
> > traversed twice, the fist one for the outer header, the 2nd for the
> > inner.
> 
> GRO will validate multiple levels of checksums with CHECKSUM_COMPLETE.
> It can only validate the outer checksum with CHECKSUM_UNNECESSARY, I
> believe?

I possibly miss some bits here ?!?

AFAICS:

udp4_gro_receive() -> skb_gro_checksum_validate_zero_check() ->
__skb_gro_checksum_validate -> (if  not already valid)
__skb_gro_checksum_validate_complete() -> (if not CHECKSUM_COMPLETE)
__skb_gro_checksum_complete()

the latter will validate the UDP checksum at the current nesting level
(and set the csum-related bits in the GRO skb cb to the same status
as CHECKSUM_COMPLETE)

When processing UDP over UDP tunnel packet, the gro call chain will be:

[l2/l3 GRO] -> udp4_gro_receive (validate outher header csum) -> 
udp_sk(sk)->gro_receive -> [other gro layers] -> 
udp4_gro_receive (validate inner header csum) -> ...

> As for looped packets with CHECKSUM_PARTIAL: we definitely have found
> bugs in that path before. I think it's fine to set csum_valid on any
> packets that can unambiguously be identified as such. Hence the
> detailed questions above on which exact packets this code is
> targeting, so that there are not accidental false positives that look
> the same but have a different ip_summed.

I see this change is controversial. Since the addressed scenario is
really a corner case, a simpler alternative would be
replacing udp_post_segment_fix_csum with:

static inline void udp_post_segment_fix_cb(struct sk_buff *skb, int level)
{
	/* UDP-lite can't land here - no GRO */
	WARN_ON_ONCE(UDP_SKB_CB(skb)->partial_cov);

	/* UDP CB mirrors the GSO packet, we must re-init it */
	UDP_SKB_CB(skb)->cscov = skb->len;
}

the downside will be an additional, later, unneeded csum validation for the 

sk ->vxlan -> veth -> (xdp in use, TSO disabled, GRO on) -> veth -> vxlan -> sk

scenario. WDYT?

Thanks!

Paolo


^ permalink raw reply	[flat|nested] 35+ messages in thread

* !
  2021-03-24  1:54       ` Willem de Bruijn
@ 2021-03-24 14:50         ` Paolo Abeni
  2021-03-24 22:45           ` ! Willem de Bruijn
  0 siblings, 1 reply; 35+ messages in thread
From: Paolo Abeni @ 2021-03-24 14:50 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Network Development, David S. Miller, Jakub Kicinski,
	Steffen Klassert, Alexander Lobakin

On Tue, 2021-03-23 at 21:54 -0400, Willem de Bruijn wrote:
> > I did not look at that before your suggestion. Thanks for pointing out.
> > 
> > I think the problem is specific to UDP: when processing the outer UDP
> > header that is potentially eligible for both NETIF_F_GSO_UDP_L4 and
> > gro_receive aggregation and that is the root cause of the problem
> > addressed here.
> 
> Can you elaborate on the exact problem? The commit mentions "inner
> protocol corruption, as no overaly network parameters is taken in
> account at aggregation time."
> 
> My understanding is that these are udp gro aggregated GSO_UDP_L4
> packets forwarded to a udp tunnel device. They are not encapsulated
> yet. Which overlay network parameters are not, but should have been,
> taken account at aggregation time?

The scenario is as follow: 

* a NIC has NETIF_F_GRO_UDP_FWD or NETIF_F_GRO_FRAGLIST enabled
* an UDP tunnel is configured/enabled in the system
* the above NIC receives some UDP-tunneled packets, targeting the
mentioned tunnel
* the packets go through gro_receive and they reache
'udp_gro_receive()' while processing the outer UDP header.

without this patch, udp_gro_receive_segment() will kick in and the
outer UDP header will be aggregated according to SKB_GSO_FRAGLIST
or SKB_GSO_UDP_L4, even if this is really e.g. a vxlan packet.

Different vxlan ids will be ignored/aggregated to the same GSO packet.
Inner headers will be ignored, too, so that e.g. TCP over vxlan push
packets will be held in the GRO engine till the next flush, etc.

Please let me know if the above is more clear.

Thanks!

Paolo


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next 4/8] udp: never accept GSO_FRAGLIST packets
  2021-03-24  2:21       ` Willem de Bruijn
@ 2021-03-24 18:59         ` Paolo Abeni
  2021-03-24 22:12           ` Willem de Bruijn
  0 siblings, 1 reply; 35+ messages in thread
From: Paolo Abeni @ 2021-03-24 18:59 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Network Development, David S. Miller, Jakub Kicinski,
	Steffen Klassert, Alexander Lobakin

On Tue, 2021-03-23 at 22:21 -0400, Willem de Bruijn wrote:
> On Mon, Mar 22, 2021 at 1:12 PM Paolo Abeni <pabeni@redhat.com> wrote:
> > On Mon, 2021-03-22 at 09:42 -0400, Willem de Bruijn wrote:
> > > On Sun, Mar 21, 2021 at 1:01 PM Paolo Abeni <pabeni@redhat.com> wrote:
> > > > Currently the UDP protocol delivers GSO_FRAGLIST packets to
> > > > the sockets without the expected segmentation.
> > > > 
> > > > This change addresses the issue introducing and maintaining
> > > > a per socket bitmask of GSO types requiring segmentation.
> > > > Enabling GSO removes SKB_GSO_UDP_L4 from such mask, while
> > > > GSO_FRAGLIST packets are never accepted
> > > > 
> > > > Note: this also updates the 'unused' field size to really
> > > > fit the otherwise existing hole. It's size become incorrect
> > > > after commit bec1f6f69736 ("udp: generate gso with UDP_SEGMENT").
> > > > 
> > > > Fixes: 9fd1ff5d2ac7 ("udp: Support UDP fraglist GRO/GSO.")
> > > > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > > > ---
> > > >  include/linux/udp.h | 10 ++++++----
> > > >  net/ipv4/udp.c      | 12 +++++++++++-
> > > >  2 files changed, 17 insertions(+), 5 deletions(-)
> > > > 
> > > >         /*
> > > >          * Following member retains the information to create a UDP header
> > > >          * when the socket is uncorked.
> > > > @@ -68,7 +68,10 @@ struct udp_sock {
> > > >  #define UDPLITE_SEND_CC  0x2           /* set via udplite setsockopt         */
> > > >  #define UDPLITE_RECV_CC  0x4           /* set via udplite setsocktopt        */
> > > >         __u8             pcflag;        /* marks socket as UDP-Lite if > 0    */
> > > > -       __u8             unused[3];
> > > > +       __u8             unused[1];
> > > > +       unsigned int     unexpected_gso;/* GSO types this socket can't accept,
> > > > +                                        * any of SKB_GSO_UDP_L4 or SKB_GSO_FRAGLIST
> > > > +                                        */
> > > 
> > > An extra unsigned int for this seems overkill.
> > 
> > Should be more clear after the next patch.
> > 
> > Using an explicit 'acceptable GSO types' field makes the patch 5/8
> > quite simple.
> > 
> > After this patch the 'udp_sock' struct size remains unchanged and even
> > the number of 'udp_sock' cachelines touched for every packet is
> > unchanged.
> 
> But there is opportunity cost, of course. Next time we need to add
> something to the struct, we will add a new cacheline.
> 
> A 32-bit field for just 2 bits, where 1 already exists does seem like overkill.
> 
> More importantly, I just think it's less obvious code than a pair of fields
> 
>   accepts_udp_l4:1,
>   accepts_udp_fraglist:1,
> 
> Local sockets can only accept the first, as there does not exist an
> interface to pass along the multiple frag sizes that a frag_list based
> approach might have.
> 
> Sockets with encap_rcv != NULL may opt-in to being able to handle either.
> 
> I think explicit code will be more maintainable. 

ok

> At the cost of
> perhaps two branches instead of one, admittedly. But that seems
> premature optimization.

well, if it don't hurt too much your eyes, something along the
following could save udp_sock space and code branches:

    rejects_udp_l4_fraglist:2;

#define SKB_GSO_UDP_L4_SHIFT (NETIF_F_GSO_UDP_L4_BIT - NETIF_F_GSO_SHIFT)
 static inline bool udp_unexpected_gso(struct sock *sk, struct sk_buff *skb)
 {
	BUILD_BUG_ON(1 << SKB_GSO_UDP_L4_SHIFT != SKB_GSO_UDP_L4);
	BUILD_BUG_ON(1 << (SKB_GSO_UDP_L4_SHIFT + 1) != SKB_GSO_FRAGLIST);
 	return skb_is_gso(skb) && skb_shinfo(skb)->gso_type &
		(udp_sk(sk)->rejects_udp_l4_fraglist << SKB_GSO_UDP_L4_SHIFT);
 }

(not sure if /me runs/hides ;)

/P




^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next 4/8] udp: never accept GSO_FRAGLIST packets
  2021-03-24 18:59         ` Paolo Abeni
@ 2021-03-24 22:12           ` Willem de Bruijn
  2021-03-25 11:50             ` Paolo Abeni
  0 siblings, 1 reply; 35+ messages in thread
From: Willem de Bruijn @ 2021-03-24 22:12 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Network Development, David S. Miller, Jakub Kicinski,
	Steffen Klassert, Alexander Lobakin

On Wed, Mar 24, 2021 at 3:00 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On Tue, 2021-03-23 at 22:21 -0400, Willem de Bruijn wrote:
> > On Mon, Mar 22, 2021 at 1:12 PM Paolo Abeni <pabeni@redhat.com> wrote:
> > > On Mon, 2021-03-22 at 09:42 -0400, Willem de Bruijn wrote:
> > > > On Sun, Mar 21, 2021 at 1:01 PM Paolo Abeni <pabeni@redhat.com> wrote:
> > > > > Currently the UDP protocol delivers GSO_FRAGLIST packets to
> > > > > the sockets without the expected segmentation.
> > > > >
> > > > > This change addresses the issue introducing and maintaining
> > > > > a per socket bitmask of GSO types requiring segmentation.
> > > > > Enabling GSO removes SKB_GSO_UDP_L4 from such mask, while
> > > > > GSO_FRAGLIST packets are never accepted
> > > > >
> > > > > Note: this also updates the 'unused' field size to really
> > > > > fit the otherwise existing hole. It's size become incorrect
> > > > > after commit bec1f6f69736 ("udp: generate gso with UDP_SEGMENT").
> > > > >
> > > > > Fixes: 9fd1ff5d2ac7 ("udp: Support UDP fraglist GRO/GSO.")
> > > > > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > > > > ---
> > > > >  include/linux/udp.h | 10 ++++++----
> > > > >  net/ipv4/udp.c      | 12 +++++++++++-
> > > > >  2 files changed, 17 insertions(+), 5 deletions(-)
> > > > >
> > > > >         /*
> > > > >          * Following member retains the information to create a UDP header
> > > > >          * when the socket is uncorked.
> > > > > @@ -68,7 +68,10 @@ struct udp_sock {
> > > > >  #define UDPLITE_SEND_CC  0x2           /* set via udplite setsockopt         */
> > > > >  #define UDPLITE_RECV_CC  0x4           /* set via udplite setsocktopt        */
> > > > >         __u8             pcflag;        /* marks socket as UDP-Lite if > 0    */
> > > > > -       __u8             unused[3];
> > > > > +       __u8             unused[1];
> > > > > +       unsigned int     unexpected_gso;/* GSO types this socket can't accept,
> > > > > +                                        * any of SKB_GSO_UDP_L4 or SKB_GSO_FRAGLIST
> > > > > +                                        */
> > > >
> > > > An extra unsigned int for this seems overkill.
> > >
> > > Should be more clear after the next patch.
> > >
> > > Using an explicit 'acceptable GSO types' field makes the patch 5/8
> > > quite simple.
> > >
> > > After this patch the 'udp_sock' struct size remains unchanged and even
> > > the number of 'udp_sock' cachelines touched for every packet is
> > > unchanged.
> >
> > But there is opportunity cost, of course. Next time we need to add
> > something to the struct, we will add a new cacheline.
> >
> > A 32-bit field for just 2 bits, where 1 already exists does seem like overkill.
> >
> > More importantly, I just think it's less obvious code than a pair of fields
> >
> >   accepts_udp_l4:1,
> >   accepts_udp_fraglist:1,
> >
> > Local sockets can only accept the first, as there does not exist an
> > interface to pass along the multiple frag sizes that a frag_list based
> > approach might have.
> >
> > Sockets with encap_rcv != NULL may opt-in to being able to handle either.
> >
> > I think explicit code will be more maintainable.
>
> ok
>
> > At the cost of
> > perhaps two branches instead of one, admittedly. But that seems
> > premature optimization.
>
> well, if it don't hurt too much your eyes, something along the
> following could save udp_sock space and code branches:
>
>     rejects_udp_l4_fraglist:2;
>
> #define SKB_GSO_UDP_L4_SHIFT (NETIF_F_GSO_UDP_L4_BIT - NETIF_F_GSO_SHIFT)
>  static inline bool udp_unexpected_gso(struct sock *sk, struct sk_buff *skb)
>  {
>         BUILD_BUG_ON(1 << SKB_GSO_UDP_L4_SHIFT != SKB_GSO_UDP_L4);
>         BUILD_BUG_ON(1 << (SKB_GSO_UDP_L4_SHIFT + 1) != SKB_GSO_FRAGLIST);
>         return skb_is_gso(skb) && skb_shinfo(skb)->gso_type &
>                 (udp_sk(sk)->rejects_udp_l4_fraglist << SKB_GSO_UDP_L4_SHIFT);
>  }
>
> (not sure if /me runs/hides ;)

:)

My opinion is just one, but I do find this a lot less readable and
hence maintainable than

  if (likely(!skb_is_gso(skb)))
     return true;

  if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4 && !udp_sk(sk)->accept_udp_l4)
    return false;

  if (skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST &&
!udp_sk(sk)->accept_udp_fraglist)
    return false;

  return true;

at no obvious benefit. The tunnel gso code is hard enough to fathom as it is.

> /P
>
>
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next 1/8] udp: fixup csum for GSO receive slow path
  2021-03-24 14:37         ` Paolo Abeni
@ 2021-03-24 22:36           ` Willem de Bruijn
  2021-03-25 10:56             ` Paolo Abeni
  0 siblings, 1 reply; 35+ messages in thread
From: Willem de Bruijn @ 2021-03-24 22:36 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Network Development, David S. Miller, Jakub Kicinski,
	Steffen Klassert, Alexander Lobakin

> > This is a UDP GSO packet egress packet that was further encapsulated
> > with a GSO_UDP_TUNNEL on egress, then looped to the ingress path?
> >
> > Then in the ingress path it has traversed the GRO layer.
> >
> > Is this veth with XDP? That seems unlikely for GSO packets. But there
> > aren't many paths that will loop a packet through napi_gro_receive or
> > napi_gro_frags.
>
> This patch addresses the following scenario:
>
> sk ->vxlan -> veth -> (xdp in use, TSO disabled, GRO on) -> veth -> vxlan -> sk
>
> What I meant here is that the issue is not visible with:
>
> (baremetal, NETIF_F_GRO_UDP_FWD | NETIF_F_GRO_FRAGLIST enabled -> vxlan -> sk
>
> > > with the appropriate features bit set, will validate the checksum for
> > > both the inner and the outer header - udp{4,6}_gro_receive will be
> > > traversed twice, the fist one for the outer header, the 2nd for the
> > > inner.
> >
> > GRO will validate multiple levels of checksums with CHECKSUM_COMPLETE.
> > It can only validate the outer checksum with CHECKSUM_UNNECESSARY, I
> > believe?
>
> I possibly miss some bits here ?!?
>
> AFAICS:
>
> udp4_gro_receive() -> skb_gro_checksum_validate_zero_check() ->
> __skb_gro_checksum_validate -> (if  not already valid)
> __skb_gro_checksum_validate_complete() -> (if not CHECKSUM_COMPLETE)
> __skb_gro_checksum_complete()
>
> the latter will validate the UDP checksum at the current nesting level
> (and set the csum-related bits in the GRO skb cb to the same status
> as CHECKSUM_COMPLETE)
>
> When processing UDP over UDP tunnel packet, the gro call chain will be:
>
> [l2/l3 GRO] -> udp4_gro_receive (validate outher header csum) ->
> udp_sk(sk)->gro_receive -> [other gro layers] ->
> udp4_gro_receive (validate inner header csum) -> ...

Agreed. But __skb_gro_checksum_validate on the first invocation will
call skb_gro_incr_csum_unnecessary, so that on the second invocation
csum_cnt == 0 and triggers a real checksum validation?

At least, that is my understanding. Intuitively, CHECKSUM_UNNECESSARY
only validates the first checksum, so says nothing about the validity
of any subsequent ones.

But it seems I'm mistaken?

__skb_gro_checksum_validate has an obvious exception for locally
generated packets by excluding CHECKSUM_PARTIAL. Looped packets
usually have CHECKSUM_PARTIAL set. Unfortunately, this is similar to
the issue that I looked at earlier this year with looped UDP packets
with MSG_MORE: those are also changed to CHECKSUM_NONE (and exposed a
checksum bug: 52cbd23a119c).

Is there perhaps some other way that we can identify that these are
local packets? They should trivially avoid all checksum checks.

> > As for looped packets with CHECKSUM_PARTIAL: we definitely have found
> > bugs in that path before. I think it's fine to set csum_valid on any
> > packets that can unambiguously be identified as such. Hence the
> > detailed questions above on which exact packets this code is
> > targeting, so that there are not accidental false positives that look
> > the same but have a different ip_summed.
>
> I see this change is controversial.

I have no concerns with the fix. It is just a very narrow path (veth +
xdp - tso + gro ..), and to minimize risk I would try to avoid
updating state of unrelated packets. That was my initial comment: I
don't understand the need for the else clause.

> Since the addressed scenario is
> really a corner case, a simpler alternative would be
> replacing udp_post_segment_fix_csum with:
>
> static inline void udp_post_segment_fix_cb(struct sk_buff *skb, int level)
> {
>         /* UDP-lite can't land here - no GRO */
>         WARN_ON_ONCE(UDP_SKB_CB(skb)->partial_cov);
>
>         /* UDP CB mirrors the GSO packet, we must re-init it */
>         UDP_SKB_CB(skb)->cscov = skb->len;
> }
>
> the downside will be an additional, later, unneeded csum validation for the
>
> sk ->vxlan -> veth -> (xdp in use, TSO disabled, GRO on) -> veth -> vxlan -> sk
>
> scenario. WDYT?

No, let's definitely avoid an unneeded checksum verification.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: !
  2021-03-24 14:50         ` ! Paolo Abeni
@ 2021-03-24 22:45           ` Willem de Bruijn
  0 siblings, 0 replies; 35+ messages in thread
From: Willem de Bruijn @ 2021-03-24 22:45 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Willem de Bruijn, Network Development, David S. Miller,
	Jakub Kicinski, Steffen Klassert, Alexander Lobakin

On Wed, Mar 24, 2021 at 10:51 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On Tue, 2021-03-23 at 21:54 -0400, Willem de Bruijn wrote:
> > > I did not look at that before your suggestion. Thanks for pointing out.
> > >
> > > I think the problem is specific to UDP: when processing the outer UDP
> > > header that is potentially eligible for both NETIF_F_GSO_UDP_L4 and
> > > gro_receive aggregation and that is the root cause of the problem
> > > addressed here.
> >
> > Can you elaborate on the exact problem? The commit mentions "inner
> > protocol corruption, as no overaly network parameters is taken in
> > account at aggregation time."
> >
> > My understanding is that these are udp gro aggregated GSO_UDP_L4
> > packets forwarded to a udp tunnel device. They are not encapsulated
> > yet. Which overlay network parameters are not, but should have been,
> > taken account at aggregation time?
>
> The scenario is as follow:
>
> * a NIC has NETIF_F_GRO_UDP_FWD or NETIF_F_GRO_FRAGLIST enabled
> * an UDP tunnel is configured/enabled in the system
> * the above NIC receives some UDP-tunneled packets, targeting the
> mentioned tunnel
> * the packets go through gro_receive and they reache
> 'udp_gro_receive()' while processing the outer UDP header.
>
> without this patch, udp_gro_receive_segment() will kick in and the
> outer UDP header will be aggregated according to SKB_GSO_FRAGLIST
> or SKB_GSO_UDP_L4, even if this is really e.g. a vxlan packet.
>
> Different vxlan ids will be ignored/aggregated to the same GSO packet.
> Inner headers will be ignored, too, so that e.g. TCP over vxlan push
> packets will be held in the GRO engine till the next flush, etc.
>
> Please let me know if the above is more clear.

Yes, thanks a lot! That's very concrete.

When processing the outer UDP tunnel header in the gro completion
path, it is incorrectly identified as an inner UDP transport layer due
to NAPI_GRO_CB(skb) identifying that such a layer is present
(is_flist).

The issue is that the UDP GRO layer distinguishes between tunnel and
transport layer too late, in udp_gro_complete, while an offending
assumption of that UDP == transport layer was already made in the
callers udp4_gro_complete and udp6_gro_complete.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next 1/8] udp: fixup csum for GSO receive slow path
  2021-03-24 22:36           ` Willem de Bruijn
@ 2021-03-25 10:56             ` Paolo Abeni
  2021-03-25 13:53               ` Willem de Bruijn
  0 siblings, 1 reply; 35+ messages in thread
From: Paolo Abeni @ 2021-03-25 10:56 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Network Development, David S. Miller, Jakub Kicinski,
	Steffen Klassert, Alexander Lobakin

Hello,

On Wed, 2021-03-24 at 18:36 -0400, Willem de Bruijn wrote:
> > > This is a UDP GSO packet egress packet that was further encapsulated
> > > with a GSO_UDP_TUNNEL on egress, then looped to the ingress path?
> > > 
> > > Then in the ingress path it has traversed the GRO layer.
> > > 
> > > Is this veth with XDP? That seems unlikely for GSO packets. But there
> > > aren't many paths that will loop a packet through napi_gro_receive or
> > > napi_gro_frags.
> > 
> > This patch addresses the following scenario:
> > 
> > sk ->vxlan -> veth -> (xdp in use, TSO disabled, GRO on) -> veth -> vxlan -> sk
> > 
> > What I meant here is that the issue is not visible with:
> > 
> > (baremetal, NETIF_F_GRO_UDP_FWD | NETIF_F_GRO_FRAGLIST enabled -> vxlan -> sk
> > 
> > > > with the appropriate features bit set, will validate the checksum for
> > > > both the inner and the outer header - udp{4,6}_gro_receive will be
> > > > traversed twice, the fist one for the outer header, the 2nd for the
> > > > inner.
> > > 
> > > GRO will validate multiple levels of checksums with CHECKSUM_COMPLETE.
> > > It can only validate the outer checksum with CHECKSUM_UNNECESSARY, I
> > > believe?
> > 
> > I possibly miss some bits here ?!?
> > 
> > AFAICS:
> > 
> > udp4_gro_receive() -> skb_gro_checksum_validate_zero_check() ->
> > __skb_gro_checksum_validate -> (if  not already valid)
> > __skb_gro_checksum_validate_complete() -> (if not CHECKSUM_COMPLETE)
> > __skb_gro_checksum_complete()
> > 
> > the latter will validate the UDP checksum at the current nesting level
> > (and set the csum-related bits in the GRO skb cb to the same status
> > as CHECKSUM_COMPLETE)
> > 
> > When processing UDP over UDP tunnel packet, the gro call chain will be:
> > 
> > [l2/l3 GRO] -> udp4_gro_receive (validate outher header csum) ->
> > udp_sk(sk)->gro_receive -> [other gro layers] ->
> > udp4_gro_receive (validate inner header csum) -> ...
> 
> Agreed. But __skb_gro_checksum_validate on the first invocation will
> call skb_gro_incr_csum_unnecessary, so that on the second invocation
> csum_cnt == 0 and triggers a real checksum validation?
> 
> At least, that is my understanding. Intuitively, CHECKSUM_UNNECESSARY
> only validates the first checksum, so says nothing about the validity
> of any subsequent ones.
> 
> But it seems I'm mistaken?

AFAICS, it depends ;) From skbuff.h:

 *   skb->csum_level indicates the number of consecutive checksums found in
 *   the packet minus one that have been verified as CHECKSUM_UNNECESSARY.

if skb->csum_level > 0, the NIC validate additional headers. The intel
ixgbe driver use that for vxlan RX csum offload. Such field translates
into:

	NAPI_GRO_CB(skb)->csum_cnt

inside the GRO engine, and skb_gro_incr_csum_unnecessary takes care of
the updating it after validation.

> __skb_gro_checksum_validate has an obvious exception for locally
> generated packets by excluding CHECKSUM_PARTIAL. Looped packets
> usually have CHECKSUM_PARTIAL set. Unfortunately, this is similar to
> the issue that I looked at earlier this year with looped UDP packets
> with MSG_MORE: those are also changed to CHECKSUM_NONE (and exposed a
> checksum bug: 52cbd23a119c).
> 
> Is there perhaps some other way that we can identify that these are
> local packets? They should trivially avoid all checksum checks.
> 
> > > As for looped packets with CHECKSUM_PARTIAL: we definitely have found
> > > bugs in that path before. I think it's fine to set csum_valid on any
> > > packets that can unambiguously be identified as such. Hence the
> > > detailed questions above on which exact packets this code is
> > > targeting, so that there are not accidental false positives that look
> > > the same but have a different ip_summed.
> > 
> > I see this change is controversial.
> 
> I have no concerns with the fix. It is just a very narrow path (veth +
> xdp - tso + gro ..), and to minimize risk I would try to avoid
> updating state of unrelated packets. That was my initial comment: I
> don't understand the need for the else clause.

The branch is there because I wrote this patch before the patches 5,6,7
later in this series. GSO UDP L4 over UDP tunnel packets were segmented
at the UDP tunnel level, and that 'else' branch preserves the
appropriate 'csum_level' value to avoid later (if/when the packet lands
in a plain UDP socket) csum validation.

> > Since the addressed scenario is
> > really a corner case, a simpler alternative would be
> > replacing udp_post_segment_fix_csum with:
> > 
> > static inline void udp_post_segment_fix_cb(struct sk_buff *skb, int level)
> > {
> >         /* UDP-lite can't land here - no GRO */
> >         WARN_ON_ONCE(UDP_SKB_CB(skb)->partial_cov);
> > 
> >         /* UDP CB mirrors the GSO packet, we must re-init it */
> >         UDP_SKB_CB(skb)->cscov = skb->len;
> > }
> > 
> > the downside will be an additional, later, unneeded csum validation for the
> > 
> > sk ->vxlan -> veth -> (xdp in use, TSO disabled, GRO on) -> veth -> vxlan -> sk
> > 
> > scenario. WDYT?
> 
> No, let's definitely avoid an unneeded checksum verification.

Ok.

My understanding is that the following should be better:

static inline void udp_post_segment_fix_csum(struct sk_buff *skb)
{
	/* UDP-lite can't land here - no GRO */
	WARN_ON_ONCE(UDP_SKB_CB(skb)->partial_cov);

	/* UDP packets generated with UDP_SEGMENT and traversing:
	 * UDP tunnel(xmit) -> veth (segmentation) -> veth (gro) -> UDP tunnel (rx)
	 * land here with CHECKSUM_NONE. Instead of adding another check
	 * in the tunnel fastpath, we can force valid csums here:
	 * packets are locally generated and the GRO engine already validated
	 * the csum.
	 * Additionally fixup the UDP CB
	 */
	UDP_SKB_CB(skb)->cscov = skb->len;
	if (skb->ip_summed == CHECKSUM_NONE && !skb->csum_valid)
		skb->csum_valid = 1;
}

I'll use the above in v2.

Thanks!

Paolo


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next 4/8] udp: never accept GSO_FRAGLIST packets
  2021-03-24 22:12           ` Willem de Bruijn
@ 2021-03-25 11:50             ` Paolo Abeni
  0 siblings, 0 replies; 35+ messages in thread
From: Paolo Abeni @ 2021-03-25 11:50 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Network Development, David S. Miller, Jakub Kicinski,
	Steffen Klassert, Alexander Lobakin

On Wed, 2021-03-24 at 18:12 -0400, Willem de Bruijn wrote:
> On Wed, Mar 24, 2021 at 3:00 PM Paolo Abeni <pabeni@redhat.com> wrote:
> > On Tue, 2021-03-23 at 22:21 -0400, Willem de Bruijn wrote:
> > > On Mon, Mar 22, 2021 at 1:12 PM Paolo Abeni <pabeni@redhat.com> wrote:
> > > > On Mon, 2021-03-22 at 09:42 -0400, Willem de Bruijn wrote:
> > > > > On Sun, Mar 21, 2021 at 1:01 PM Paolo Abeni <pabeni@redhat.com> wrote:
> > > > > > Currently the UDP protocol delivers GSO_FRAGLIST packets to
> > > > > > the sockets without the expected segmentation.
> > > > > > 
> > > > > > This change addresses the issue introducing and maintaining
> > > > > > a per socket bitmask of GSO types requiring segmentation.
> > > > > > Enabling GSO removes SKB_GSO_UDP_L4 from such mask, while
> > > > > > GSO_FRAGLIST packets are never accepted
> > > > > > 
> > > > > > Note: this also updates the 'unused' field size to really
> > > > > > fit the otherwise existing hole. It's size become incorrect
> > > > > > after commit bec1f6f69736 ("udp: generate gso with UDP_SEGMENT").
> > > > > > 
> > > > > > Fixes: 9fd1ff5d2ac7 ("udp: Support UDP fraglist GRO/GSO.")
> > > > > > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > > > > > ---
> > > > > >  include/linux/udp.h | 10 ++++++----
> > > > > >  net/ipv4/udp.c      | 12 +++++++++++-
> > > > > >  2 files changed, 17 insertions(+), 5 deletions(-)
> > > > > > 
> > > > > >         /*
> > > > > >          * Following member retains the information to create a UDP header
> > > > > >          * when the socket is uncorked.
> > > > > > @@ -68,7 +68,10 @@ struct udp_sock {
> > > > > >  #define UDPLITE_SEND_CC  0x2           /* set via udplite setsockopt         */
> > > > > >  #define UDPLITE_RECV_CC  0x4           /* set via udplite setsocktopt        */
> > > > > >         __u8             pcflag;        /* marks socket as UDP-Lite if > 0    */
> > > > > > -       __u8             unused[3];
> > > > > > +       __u8             unused[1];
> > > > > > +       unsigned int     unexpected_gso;/* GSO types this socket can't accept,
> > > > > > +                                        * any of SKB_GSO_UDP_L4 or SKB_GSO_FRAGLIST
> > > > > > +                                        */
> > > > > 
> > > > > An extra unsigned int for this seems overkill.
> > > > 
> > > > Should be more clear after the next patch.
> > > > 
> > > > Using an explicit 'acceptable GSO types' field makes the patch 5/8
> > > > quite simple.
> > > > 
> > > > After this patch the 'udp_sock' struct size remains unchanged and even
> > > > the number of 'udp_sock' cachelines touched for every packet is
> > > > unchanged.
> > > 
> > > But there is opportunity cost, of course. Next time we need to add
> > > something to the struct, we will add a new cacheline.
> > > 
> > > A 32-bit field for just 2 bits, where 1 already exists does seem like overkill.
> > > 
> > > More importantly, I just think it's less obvious code than a pair of fields
> > > 
> > >   accepts_udp_l4:1,
> > >   accepts_udp_fraglist:1,
> > > 
> > > Local sockets can only accept the first, as there does not exist an
> > > interface to pass along the multiple frag sizes that a frag_list based
> > > approach might have.
> > > 
> > > Sockets with encap_rcv != NULL may opt-in to being able to handle either.
> > > 
> > > I think explicit code will be more maintainable.
> > 
> > ok
> > 
> > > At the cost of
> > > perhaps two branches instead of one, admittedly. But that seems
> > > premature optimization.
> > 
> > well, if it don't hurt too much your eyes, something along the
> > following could save udp_sock space and code branches:
> > 
> >     rejects_udp_l4_fraglist:2;
> > 
> > #define SKB_GSO_UDP_L4_SHIFT (NETIF_F_GSO_UDP_L4_BIT - NETIF_F_GSO_SHIFT)
> >  static inline bool udp_unexpected_gso(struct sock *sk, struct sk_buff *skb)
> >  {
> >         BUILD_BUG_ON(1 << SKB_GSO_UDP_L4_SHIFT != SKB_GSO_UDP_L4);
> >         BUILD_BUG_ON(1 << (SKB_GSO_UDP_L4_SHIFT + 1) != SKB_GSO_FRAGLIST);
> >         return skb_is_gso(skb) && skb_shinfo(skb)->gso_type &
> >                 (udp_sk(sk)->rejects_udp_l4_fraglist << SKB_GSO_UDP_L4_SHIFT);
> >  }
> > 
> > (not sure if /me runs/hides ;)
> 
> :)
> 
> My opinion is just one, but I do find this a lot less readable and
> hence maintainable than
> 
>   if (likely(!skb_is_gso(skb)))
>      return true;
> 
>   if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4 && !udp_sk(sk)->accept_udp_l4)
>     return false;
> 
>   if (skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST &&
> !udp_sk(sk)->accept_udp_fraglist)
>     return false;
> 
>   return true;
> 
> at no obvious benefit. The tunnel gso code is hard enough to fathom as it is.

ok.

I'm only doubtful about the likely() annotation: systems with UDP
tunnels likely expect receiving a majority of UDP-encaped traffic,
which in turn will likely be GRO (e.g. TCP over UDP-tunnel).

In my next iteration I'll use the above, dropping the annotation.

Cheers,

Paolo


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next 1/8] udp: fixup csum for GSO receive slow path
  2021-03-25 10:56             ` Paolo Abeni
@ 2021-03-25 13:53               ` Willem de Bruijn
  2021-03-25 16:47                 ` Paolo Abeni
  0 siblings, 1 reply; 35+ messages in thread
From: Willem de Bruijn @ 2021-03-25 13:53 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Willem de Bruijn, Network Development, David S. Miller,
	Jakub Kicinski, Steffen Klassert, Alexander Lobakin

On Thu, Mar 25, 2021 at 6:57 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
> Hello,
>
> On Wed, 2021-03-24 at 18:36 -0400, Willem de Bruijn wrote:
> > > > This is a UDP GSO packet egress packet that was further encapsulated
> > > > with a GSO_UDP_TUNNEL on egress, then looped to the ingress path?
> > > >
> > > > Then in the ingress path it has traversed the GRO layer.
> > > >
> > > > Is this veth with XDP? That seems unlikely for GSO packets. But there
> > > > aren't many paths that will loop a packet through napi_gro_receive or
> > > > napi_gro_frags.
> > >
> > > This patch addresses the following scenario:
> > >
> > > sk ->vxlan -> veth -> (xdp in use, TSO disabled, GRO on) -> veth -> vxlan -> sk
> > >
> > > What I meant here is that the issue is not visible with:
> > >
> > > (baremetal, NETIF_F_GRO_UDP_FWD | NETIF_F_GRO_FRAGLIST enabled -> vxlan -> sk
> > >
> > > > > with the appropriate features bit set, will validate the checksum for
> > > > > both the inner and the outer header - udp{4,6}_gro_receive will be
> > > > > traversed twice, the fist one for the outer header, the 2nd for the
> > > > > inner.
> > > >
> > > > GRO will validate multiple levels of checksums with CHECKSUM_COMPLETE.
> > > > It can only validate the outer checksum with CHECKSUM_UNNECESSARY, I
> > > > believe?
> > >
> > > I possibly miss some bits here ?!?
> > >
> > > AFAICS:
> > >
> > > udp4_gro_receive() -> skb_gro_checksum_validate_zero_check() ->
> > > __skb_gro_checksum_validate -> (if  not already valid)
> > > __skb_gro_checksum_validate_complete() -> (if not CHECKSUM_COMPLETE)
> > > __skb_gro_checksum_complete()
> > >
> > > the latter will validate the UDP checksum at the current nesting level
> > > (and set the csum-related bits in the GRO skb cb to the same status
> > > as CHECKSUM_COMPLETE)
> > >
> > > When processing UDP over UDP tunnel packet, the gro call chain will be:
> > >
> > > [l2/l3 GRO] -> udp4_gro_receive (validate outher header csum) ->
> > > udp_sk(sk)->gro_receive -> [other gro layers] ->
> > > udp4_gro_receive (validate inner header csum) -> ...
> >
> > Agreed. But __skb_gro_checksum_validate on the first invocation will
> > call skb_gro_incr_csum_unnecessary, so that on the second invocation
> > csum_cnt == 0 and triggers a real checksum validation?
> >
> > At least, that is my understanding. Intuitively, CHECKSUM_UNNECESSARY
> > only validates the first checksum, so says nothing about the validity
> > of any subsequent ones.
> >
> > But it seems I'm mistaken?
>
> AFAICS, it depends ;) From skbuff.h:
>
>  *   skb->csum_level indicates the number of consecutive checksums found in
>  *   the packet minus one that have been verified as CHECKSUM_UNNECESSARY.
>
> if skb->csum_level > 0, the NIC validate additional headers. The intel
> ixgbe driver use that for vxlan RX csum offload. Such field translates
> into:
>
>         NAPI_GRO_CB(skb)->csum_cnt
>
> inside the GRO engine, and skb_gro_incr_csum_unnecessary takes care of
> the updating it after validation.

True. I glanced over those cases.

More importantly, where exactly do these looped packets get converted
from CHECKSUM_PARTIAL to CHECKSUM_NONE before this patch?

> > __skb_gro_checksum_validate has an obvious exception for locally
> > generated packets by excluding CHECKSUM_PARTIAL. Looped packets
> > usually have CHECKSUM_PARTIAL set. Unfortunately, this is similar to
> > the issue that I looked at earlier this year with looped UDP packets
> > with MSG_MORE: those are also changed to CHECKSUM_NONE (and exposed a
> > checksum bug: 52cbd23a119c).
> >
> > Is there perhaps some other way that we can identify that these are
> > local packets? They should trivially avoid all checksum checks.
> >
> > > > As for looped packets with CHECKSUM_PARTIAL: we definitely have found
> > > > bugs in that path before. I think it's fine to set csum_valid on any
> > > > packets that can unambiguously be identified as such. Hence the
> > > > detailed questions above on which exact packets this code is
> > > > targeting, so that there are not accidental false positives that look
> > > > the same but have a different ip_summed.
> > >
> > > I see this change is controversial.
> >
> > I have no concerns with the fix. It is just a very narrow path (veth +
> > xdp - tso + gro ..), and to minimize risk I would try to avoid
> > updating state of unrelated packets. That was my initial comment: I
> > don't understand the need for the else clause.
>
> The branch is there because I wrote this patch before the patches 5,6,7
> later in this series. GSO UDP L4 over UDP tunnel packets were segmented
> at the UDP tunnel level, and that 'else' branch preserves the
> appropriate 'csum_level' value to avoid later (if/when the packet lands
> in a plain UDP socket) csum validation.
>
> > > Since the addressed scenario is
> > > really a corner case, a simpler alternative would be
> > > replacing udp_post_segment_fix_csum with:
> > >
> > > static inline void udp_post_segment_fix_cb(struct sk_buff *skb, int level)
> > > {
> > >         /* UDP-lite can't land here - no GRO */
> > >         WARN_ON_ONCE(UDP_SKB_CB(skb)->partial_cov);
> > >
> > >         /* UDP CB mirrors the GSO packet, we must re-init it */
> > >         UDP_SKB_CB(skb)->cscov = skb->len;
> > > }
> > >
> > > the downside will be an additional, later, unneeded csum validation for the
> > >
> > > sk ->vxlan -> veth -> (xdp in use, TSO disabled, GRO on) -> veth -> vxlan -> sk
> > >
> > > scenario. WDYT?
> >
> > No, let's definitely avoid an unneeded checksum verification.
>
> Ok.
>
> My understanding is that the following should be better:
>
> static inline void udp_post_segment_fix_csum(struct sk_buff *skb)
> {
>         /* UDP-lite can't land here - no GRO */
>         WARN_ON_ONCE(UDP_SKB_CB(skb)->partial_cov);
>
>         /* UDP packets generated with UDP_SEGMENT and traversing:
>          * UDP tunnel(xmit) -> veth (segmentation) -> veth (gro) -> UDP tunnel (rx)
>          * land here with CHECKSUM_NONE. Instead of adding another check
>          * in the tunnel fastpath, we can force valid csums here:
>          * packets are locally generated and the GRO engine already validated
>          * the csum.
>          * Additionally fixup the UDP CB
>          */
>         UDP_SKB_CB(skb)->cscov = skb->len;
>         if (skb->ip_summed == CHECKSUM_NONE && !skb->csum_valid)
>                 skb->csum_valid = 1;
> }
>
> I'll use the above in v2.

Do I understand correctly that this avoids matching tunneled packets
that arrive from the network with rx checksumming disabled, because
__skb_gro_checksum_complete will have been called on the outer packet
and have set skb->csum_valid?

Yes, this just (1) identifying the packet as being of local source and
then (2) setting csum_valid sounds great to me, thanks.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next 1/8] udp: fixup csum for GSO receive slow path
  2021-03-25 13:53               ` Willem de Bruijn
@ 2021-03-25 16:47                 ` Paolo Abeni
  0 siblings, 0 replies; 35+ messages in thread
From: Paolo Abeni @ 2021-03-25 16:47 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Network Development, David S. Miller, Jakub Kicinski,
	Steffen Klassert, Alexander Lobakin

On Thu, 2021-03-25 at 09:53 -0400, Willem de Bruijn wrote:
> On Thu, Mar 25, 2021 at 6:57 AM Paolo Abeni <pabeni@redhat.com> wrote:
> > AFAICS, it depends ;) From skbuff.h:
> > 
> >  *   skb->csum_level indicates the number of consecutive checksums found in
> >  *   the packet minus one that have been verified as CHECKSUM_UNNECESSARY.
> > 
> > if skb->csum_level > 0, the NIC validate additional headers. The intel
> > ixgbe driver use that for vxlan RX csum offload. Such field translates
> > into:
> > 
> >         NAPI_GRO_CB(skb)->csum_cnt
> > 
> > inside the GRO engine, and skb_gro_incr_csum_unnecessary takes care of
> > the updating it after validation.
> 
> True. I glanced over those cases.
> 
> More importantly, where exactly do these looped packets get converted
> from CHECKSUM_PARTIAL to CHECKSUM_NONE before this patch?

Very good question! It took a bit finding the exact place.

int __iptunnel_pull_header(struct sk_buff *skb, int hdr_len,
			   __be16 inner_proto, bool raw_proto, bool xnet)
{
	if (unlikely(!pskb_may_pull(skb, hdr_len)))
		return -ENOMEM;

	skb_pull_rcsum(skb, hdr_len);
        // here ^^^ via skb_pull_rcsum -> skb_postpull_rcsum() -> __skb_postpull_rcsum()

well, this is actually with _this_ patch applied: it does not change
the place where the ip_summed is set.

> > My understanding is that the following should be better:
> > 
> > static inline void udp_post_segment_fix_csum(struct sk_buff *skb)
> > {
> >         /* UDP-lite can't land here - no GRO */
> >         WARN_ON_ONCE(UDP_SKB_CB(skb)->partial_cov);
> > 
> >         /* UDP packets generated with UDP_SEGMENT and traversing:
> >          * UDP tunnel(xmit) -> veth (segmentation) -> veth (gro) -> UDP tunnel (rx)
> >          * land here with CHECKSUM_NONE. Instead of adding another check
> >          * in the tunnel fastpath, we can force valid csums here:
> >          * packets are locally generated and the GRO engine already validated
> >          * the csum.
> >          * Additionally fixup the UDP CB
> >          */
> >         UDP_SKB_CB(skb)->cscov = skb->len;
> >         if (skb->ip_summed == CHECKSUM_NONE && !skb->csum_valid)
> >                 skb->csum_valid = 1;
> > }
> > 
> > I'll use the above in v2.
> 
> Do I understand correctly that this avoids matching tunneled packets
> that arrive from the network with rx checksumming disabled, because
> __skb_gro_checksum_complete will have been called on the outer packet
> and have set skb->csum_valid?

Exactly. I did the test, and perf probes showed that.

> Yes, this just (1) identifying the packet as being of local source and
> then (2) setting csum_valid sounds great to me, thanks.

Will try to submit v2 soon, after some more testing.

Thanks for all the feedback!

Paolo


^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2021-03-25 16:48 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-21 17:01 [PATCH net-next 0/8] udp: GRO L4 improvements Paolo Abeni
2021-03-21 17:01 ` [PATCH net-next 1/8] udp: fixup csum for GSO receive slow path Paolo Abeni
2021-03-22 13:18   ` Willem de Bruijn
2021-03-22 16:34     ` Paolo Abeni
2021-03-24  1:45       ` Willem de Bruijn
2021-03-24  1:49         ` Willem de Bruijn
2021-03-24 14:37         ` Paolo Abeni
2021-03-24 22:36           ` Willem de Bruijn
2021-03-25 10:56             ` Paolo Abeni
2021-03-25 13:53               ` Willem de Bruijn
2021-03-25 16:47                 ` Paolo Abeni
2021-03-21 17:01 ` [PATCH net-next 2/8] udp: skip fwd/list GRO for tunnel packets Paolo Abeni
2021-03-22 13:24   ` Willem de Bruijn
2021-03-22 16:41     ` Paolo Abeni
2021-03-24  1:54       ` Willem de Bruijn
2021-03-24 14:50         ` ! Paolo Abeni
2021-03-24 22:45           ` ! Willem de Bruijn
2021-03-21 17:01 ` [PATCH net-next 3/8] udp: properly complete L4 GRO over UDP tunnel packet Paolo Abeni
2021-03-22 13:30   ` Willem de Bruijn
2021-03-22 16:59     ` Paolo Abeni
2021-03-24  2:13       ` Willem de Bruijn
2021-03-21 17:01 ` [PATCH net-next 4/8] udp: never accept GSO_FRAGLIST packets Paolo Abeni
2021-03-22 13:42   ` Willem de Bruijn
2021-03-22 17:09     ` Paolo Abeni
2021-03-24  2:21       ` Willem de Bruijn
2021-03-24 18:59         ` Paolo Abeni
2021-03-24 22:12           ` Willem de Bruijn
2021-03-25 11:50             ` Paolo Abeni
2021-03-21 17:01 ` [PATCH net-next 5/8] vxlan: allow L4 GRO passthrou Paolo Abeni
2021-03-21 17:01 ` [PATCH net-next 6/8] geneve: allow UDP " Paolo Abeni
2021-03-21 17:01 ` [PATCH net-next 7/8] bareudp: " Paolo Abeni
2021-03-21 17:01 ` [PATCH net-next 8/8] selftests: net: add UDP GRO forwarding self-tests Paolo Abeni
2021-03-22 13:44   ` Willem de Bruijn
2021-03-22 17:18     ` Paolo Abeni
2021-03-23 17:12     ` Paolo Abeni

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.