All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net 0/4] udp: more FRAGLIST fixes
@ 2021-05-05 15:35 Paolo Abeni
  2021-05-05 15:35 ` [PATCH net 1/4] net: fix double-free on fraglist GSO skbs Paolo Abeni
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Paolo Abeni @ 2021-05-05 15:35 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Jakub Kicinski, Steffen Klassert,
	Willem de Bruijn, Miaohe Lin

This series includes a bunch of fixes for SKB_GSO_FRAGLIST
packets, popped out while enabling the mentioned GRO type
in non trivial forwarding scenarios.

The last patch extends the existing UDP GRO self-tests to
cover the currently bugged cases.

Paolo Abeni (4):
  net: fix double-free on fraglist GSO skbs
  udp: fix out-of-bound at segmentation time
  udp: fix outer header csum for SKB_GSO_FRAGLIST over UDP tunnel
  selftests: more UDP GRO tests

 net/core/skbuff.c                             |  6 +++
 net/ipv4/udp.c                                |  2 +-
 net/ipv4/udp_offload.c                        | 45 +++++++++++-----
 tools/testing/selftests/net/set_sysfs_attr.sh | 15 ++++++
 tools/testing/selftests/net/udpgro_fwd.sh     | 54 +++++++++++++++++++
 5 files changed, 107 insertions(+), 15 deletions(-)
 create mode 100755 tools/testing/selftests/net/set_sysfs_attr.sh

-- 
2.26.2


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH net 1/4] net: fix double-free on fraglist GSO skbs
  2021-05-05 15:35 [PATCH net 0/4] udp: more FRAGLIST fixes Paolo Abeni
@ 2021-05-05 15:35 ` Paolo Abeni
  2021-05-05 16:13   ` Willem de Bruijn
  2021-05-05 15:35 ` [PATCH net 2/4] udp: fix out-of-bound at segmentation time Paolo Abeni
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 15+ messages in thread
From: Paolo Abeni @ 2021-05-05 15:35 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Jakub Kicinski, Steffen Klassert,
	Willem de Bruijn, Miaohe Lin

While segmenting a SKB_GSO_FRAGLIST GSO packet, if the destructor
callback is available, the skb destructor is invoked on each
aggregated packet via skb_release_head_state().

Such field (and the pairer skb->sk) is left untouched, so the same
destructor is invoked again when the segmented skbs are freed, leading
to double-free/UaF of the relevant socket.

The problem is observable since commit c75fb320d482 ("veth: use
skb_orphan_partial instead of skb_orphan"), which allowed non orphaned
skbs to enter the GRO engine, but the root cause is there since the
introduction of GSO fraglist

This change addresses the above issue explixitly clearing the segmented
skb destructor after the skb_release_head_state() call.

Fixes: c75fb320d482 ("veth: use skb_orphan_partial instead of skb_orphan")
Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/core/skbuff.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 3ad22870298c..75f3e70f8cd5 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3818,6 +3818,12 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb,
 		skb_release_head_state(nskb);
 		 __copy_skb_header(nskb, skb);
 
+		/* __copy_skb_header() does not initialize the sk-related fields,
+		 * and skb_release_head_state() already orphaned nskb
+		 */
+		nskb->sk = NULL;
+		nskb->destructor = NULL;
+
 		skb_headers_offset_update(nskb, skb_headroom(nskb) - skb_headroom(skb));
 		skb_copy_from_linear_data_offset(skb, -tnl_hlen,
 						 nskb->data - tnl_hlen,
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net 2/4] udp: fix out-of-bound at segmentation time
  2021-05-05 15:35 [PATCH net 0/4] udp: more FRAGLIST fixes Paolo Abeni
  2021-05-05 15:35 ` [PATCH net 1/4] net: fix double-free on fraglist GSO skbs Paolo Abeni
@ 2021-05-05 15:35 ` Paolo Abeni
  2021-05-05 15:35 ` [PATCH net 3/4] udp: fix outer header csum for SKB_GSO_FRAGLIST over UDP tunnel Paolo Abeni
  2021-05-05 15:35 ` [PATCH net 4/4] selftests: more UDP GRO tests Paolo Abeni
  3 siblings, 0 replies; 15+ messages in thread
From: Paolo Abeni @ 2021-05-05 15:35 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Jakub Kicinski, Steffen Klassert,
	Willem de Bruijn, Miaohe Lin

In the following scenario:

GRO -> SKB_GSO_FRAGLIST aggregation -> forward ->
  xmit over UDP tunnel -> segmentation

__udp_gso_segment_list() will take place and later
skb_udp_tunnel_segment() will try to make the segmented
packets outer UDP header checksum via gso_make_checksum().

The latter expect valids SKB_GSO_CB(skb)->csum and
SKB_GSO_CB(skb)->csum_start, but such fields are not
initialized by __udp_gso_segment_list().

gso_make_checksum() will end-up using a negative offset and
that will trigger the following splat:

 ==================================================================
 BUG: KASAN: slab-out-of-bounds in do_csum+0x3d8/0x400
 Read of size 1 at addr ffff888113ab5880 by task napi/br_port-81/1105

 CPU: 1 PID: 1105 Comm: napi/br_port-81 Not tainted 5.12.0-rc2.mptcp_autotune_ce84e1323bebe+ #268
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
 Call Trace:
  dump_stack+0xfa/0x151
  print_address_description.constprop.0+0x16/0xa0
  __kasan_report.cold+0x37/0x80
  kasan_report+0x3a/0x50
  do_csum+0x3d8/0x400
  csum_partial+0x21/0x30
  __skb_udp_tunnel_segment+0xd79/0x1ae0
  skb_udp_tunnel_segment+0x233/0x460
  udp4_ufo_fragment+0x50d/0x720
  inet_gso_segment+0x525/0x1120
  skb_mac_gso_segment+0x278/0x570

__udp_gso_segment_list() already has all the relevant data handy,
fix the issue traversing the segments list and updating the
GSO CB, if this is a tunnel GSO packet.

The issue is present since SKB_GSO_FRAGLIST introduction, but is
observable only since commit 18f25dc39990 ("udp: skip L4 aggregation
for UDP tunnel packets")

Fixes: 18f25dc39990 ("udp: skip L4 aggregation for UDP tunnel packets")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/ipv4/udp_offload.c | 45 +++++++++++++++++++++++++++++-------------
 1 file changed, 31 insertions(+), 14 deletions(-)

diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index 54e06b88af69..9f6022ba6bcd 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -214,8 +214,10 @@ static void __udpv4_gso_segment_csum(struct sk_buff *seg,
 	*oldip = *newip;
 }
 
-static struct sk_buff *__udpv4_gso_segment_list_csum(struct sk_buff *segs)
+static void __udp_gso_segment_list_csum(struct sk_buff *segs, bool is_ipv6,
+					bool is_tunnel)
 {
+	bool update_csum = !is_ipv6;
 	struct sk_buff *seg;
 	struct udphdr *uh, *uh2;
 	struct iphdr *iph, *iph2;
@@ -224,31 +226,45 @@ static struct sk_buff *__udpv4_gso_segment_list_csum(struct sk_buff *segs)
 	uh = udp_hdr(seg);
 	iph = ip_hdr(seg);
 
-	if ((udp_hdr(seg)->dest == udp_hdr(seg->next)->dest) &&
+	if (update_csum && (udp_hdr(seg)->dest == udp_hdr(seg->next)->dest) &&
 	    (udp_hdr(seg)->source == udp_hdr(seg->next)->source) &&
 	    (ip_hdr(seg)->daddr == ip_hdr(seg->next)->daddr) &&
 	    (ip_hdr(seg)->saddr == ip_hdr(seg->next)->saddr))
-		return segs;
+		update_csum = false;
+
+	if (!update_csum && !is_tunnel)
+		return;
 
+	/* this skb is CHECKSUM_NONE, if it has also a tunnel header
+	 * later __skb_udp_tunnel_segment() will try to make the
+	 * outer header csum and will need valid csum* fields
+	 */
+	SKB_GSO_CB(seg)->csum = ~uh->check;
+	SKB_GSO_CB(seg)->csum_start = seg->transport_header;
 	while ((seg = seg->next)) {
 		uh2 = udp_hdr(seg);
-		iph2 = ip_hdr(seg);
-
-		__udpv4_gso_segment_csum(seg,
-					 &iph2->saddr, &iph->saddr,
-					 &uh2->source, &uh->source);
-		__udpv4_gso_segment_csum(seg,
-					 &iph2->daddr, &iph->daddr,
-					 &uh2->dest, &uh->dest);
-	}
+		if (update_csum) {
+			iph2 = ip_hdr(seg);
+
+			__udpv4_gso_segment_csum(seg,
+						 &iph2->saddr, &iph->saddr,
+						 &uh2->source, &uh->source);
+			__udpv4_gso_segment_csum(seg,
+						 &iph2->daddr, &iph->daddr,
+						 &uh2->dest, &uh->dest);
+		}
 
-	return segs;
+		SKB_GSO_CB(seg)->csum = ~uh2->check;
+		SKB_GSO_CB(seg)->csum_start = seg->transport_header;
+	}
 }
 
 static struct sk_buff *__udp_gso_segment_list(struct sk_buff *skb,
 					      netdev_features_t features,
 					      bool is_ipv6)
 {
+	bool is_tunnel = !!(skb_shinfo(skb)->gso_type &
+			    (SKB_GSO_UDP_TUNNEL | SKB_GSO_UDP_TUNNEL_CSUM));
 	unsigned int mss = skb_shinfo(skb)->gso_size;
 
 	skb = skb_segment_list(skb, features, skb_mac_header_len(skb));
@@ -257,7 +273,8 @@ static struct sk_buff *__udp_gso_segment_list(struct sk_buff *skb,
 
 	udp_hdr(skb)->len = htons(sizeof(struct udphdr) + mss);
 
-	return is_ipv6 ? skb : __udpv4_gso_segment_list_csum(skb);
+	__udp_gso_segment_list_csum(skb, is_ipv6, is_tunnel);
+	return skb;
 }
 
 struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net 3/4] udp: fix outer header csum for SKB_GSO_FRAGLIST over UDP tunnel
  2021-05-05 15:35 [PATCH net 0/4] udp: more FRAGLIST fixes Paolo Abeni
  2021-05-05 15:35 ` [PATCH net 1/4] net: fix double-free on fraglist GSO skbs Paolo Abeni
  2021-05-05 15:35 ` [PATCH net 2/4] udp: fix out-of-bound at segmentation time Paolo Abeni
@ 2021-05-05 15:35 ` Paolo Abeni
  2021-05-05 15:35 ` [PATCH net 4/4] selftests: more UDP GRO tests Paolo Abeni
  3 siblings, 0 replies; 15+ messages in thread
From: Paolo Abeni @ 2021-05-05 15:35 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Jakub Kicinski, Steffen Klassert,
	Willem de Bruijn, Miaohe Lin

In the following scenario:

    GRO -> SKB_GSO_FRAGLIST aggregation -> forward ->
      xmit over UDP tunnel

SKB_GSO_FRAGLIST packet traverse an UDP tunnel in the xmit path.
The tunnel code sets up the outer header csum via udp_set_csum()
and the latter assumes that each GSO packet is CHECKSUM_PARTIAL.

Since the introduction of SKB_GSO_FRAGLIST and veth GRO, the above
assumption is not true anymore as SKB_GSO_FRAGLIST are
CHECKSUM_UNNECESSARY, and the csum for the outer header will be
left uninitialized.

All the above will cause wrong outer UDP header checksum when the
mentioned packet will be xmitted on some real NIC.

This change addresses the issue explicitly checking for both gso and
CHECKSUM_PARTIAL in udp_set_csum(), so that the mentioned packet will
be processed correctly.

Fixes: 9fd1ff5d2ac7 ("udp: Support UDP fraglist GRO/GSO.")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/ipv4/udp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 15f5504adf5b..055fceb18bea 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -857,7 +857,7 @@ void udp_set_csum(bool nocheck, struct sk_buff *skb,
 
 	if (nocheck) {
 		uh->check = 0;
-	} else if (skb_is_gso(skb)) {
+	} else if (skb_is_gso(skb) && skb->ip_summed == CHECKSUM_PARTIAL) {
 		uh->check = ~udp_v4_check(len, saddr, daddr, 0);
 	} else if (skb->ip_summed == CHECKSUM_PARTIAL) {
 		uh->check = 0;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net 4/4] selftests: more UDP GRO tests
  2021-05-05 15:35 [PATCH net 0/4] udp: more FRAGLIST fixes Paolo Abeni
                   ` (2 preceding siblings ...)
  2021-05-05 15:35 ` [PATCH net 3/4] udp: fix outer header csum for SKB_GSO_FRAGLIST over UDP tunnel Paolo Abeni
@ 2021-05-05 15:35 ` Paolo Abeni
  3 siblings, 0 replies; 15+ messages in thread
From: Paolo Abeni @ 2021-05-05 15:35 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Jakub Kicinski, Steffen Klassert,
	Willem de Bruijn, Miaohe Lin

Introduces explicit scenarios for the issues addressed
on the previous patches, and functionally checks the
expected aggregation and segmentation.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 tools/testing/selftests/net/set_sysfs_attr.sh | 15 ++++++
 tools/testing/selftests/net/udpgro_fwd.sh     | 54 +++++++++++++++++++
 2 files changed, 69 insertions(+)
 create mode 100755 tools/testing/selftests/net/set_sysfs_attr.sh

diff --git a/tools/testing/selftests/net/set_sysfs_attr.sh b/tools/testing/selftests/net/set_sysfs_attr.sh
new file mode 100755
index 000000000000..fd042a162326
--- /dev/null
+++ b/tools/testing/selftests/net/set_sysfs_attr.sh
@@ -0,0 +1,15 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+
+if [ $# -lt 2 ]; then
+	echo -e "syntax:\n$0 <device> <attr> [<value>]"
+	exit 0
+fi
+
+mount -t sysfs /sys 2>/dev/null
+if [ $# -lt 3 ]; then
+	cat /sys/class/net/$1/$2
+	exit $?
+fi
+
+echo $3 > /sys/class/net/$1/$2
diff --git a/tools/testing/selftests/net/udpgro_fwd.sh b/tools/testing/selftests/net/udpgro_fwd.sh
index a8fa64136282..8569b3f81fd7 100755
--- a/tools/testing/selftests/net/udpgro_fwd.sh
+++ b/tools/testing/selftests/net/udpgro_fwd.sh
@@ -79,6 +79,16 @@ create_vxlan_pair() {
 	done
 }
 
+move_address() {
+	local -r ns=$1
+	local -r src_dev=$2
+	local -r dst_dev=$3
+	local -r addr=$4
+
+	ip -n $ns addr del dev $src_dev $addr
+	ip -n $ns addr add dev $dst_dev $addr nodad 2>/dev/null
+}
+
 is_ipv6() {
 	if [[ $1 =~ .*:.* ]]; then
 		return 0
@@ -86,6 +96,35 @@ is_ipv6() {
 	return 1
 }
 
+create_bridge() {
+	local vxdev=vxlan$SRC
+	local src=$1
+	local i
+
+	is_ipv6 $src && vxdev=vxlan6$SRC
+
+	create_vxlan_pair
+	ip -n $NS_SRC link add name br_port type veth peer name br_port_peer
+	ip -n $NS_SRC link add name br0 type bridge
+	ip -n $NS_SRC link set dev br0 up
+	ip -n $NS_SRC link set dev br_port_peer up
+	ip -n $NS_SRC link set dev br_port up master br0
+	ip -n $NS_SRC link set dev $vxdev master br0
+	move_address $NS_SRC $vxdev br_port_peer $src/24
+
+	ip -n $NS_SRC link set br_port xdp object ../bpf/xdp_dummy.o section xdp_dummy 2>/dev/null
+	ip netns exec $NS_SRC ./set_sysfs_attr.sh br_port threaded 1
+
+	# slowing down the input path, we will help the napi thread to
+	# aggregate as much packets as possible via GRO
+	modprobe br_netfilter 2>/dev/null
+	ip netns exec $NS_SRC sysctl -qw net.bridge.bridge-nf-call-iptables=1
+	ip netns exec $NS_SRC sysctl -qw net.bridge.bridge-nf-call-ip6tables=1
+	for i in `seq 1 100`; do
+		ip netns exec $NS_SRC $IPT -A FORWARD
+	done
+}
+
 run_test() {
 	local -r msg=$1
 	local -r dst=$2
@@ -228,6 +267,21 @@ for family in 4 6; do
 	run_test "GRO frag list over UDP tunnel" $OL_NET$DST 1 1
 	cleanup
 
+	create_bridge $OL_NET$SRC
+	ip netns exec $NS_SRC ethtool -K br_port rx-gro-list on
+	ip netns exec $NS_SRC ethtool -K br_port_peer tx-udp-segmentation off
+	ip netns exec $NS_DST ethtool -K veth$DST rx-gro-list on
+	run_test "GRO frag list over UDP tunnel segmentation" $OL_NET$DST 1 1
+	cleanup
+
+	create_bridge $OL_NET$SRC
+	ip netns exec $NS_SRC ethtool -K br_port rx-gro-list on
+	ip netns exec $NS_SRC ethtool -K br_port_peer tx-udp-segmentation off
+	ip netns exec $NS_DST ethtool -K veth$DST rx-gro-list on
+	ip netns exec $NS_SRC ethtool -K veth$SRC tx off
+	run_test "GRO frag list over UDP tunnel segmentation (tx csum off)" $OL_NET$DST 1 1
+	cleanup
+
 	# use NAT to circumvent GRO FWD check
 	create_vxlan_pair
 	ip -n $NS_DST addr add dev $VXDEV$DST $OL_NET$DST_NAT/$SUFFIX
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH net 1/4] net: fix double-free on fraglist GSO skbs
  2021-05-05 15:35 ` [PATCH net 1/4] net: fix double-free on fraglist GSO skbs Paolo Abeni
@ 2021-05-05 16:13   ` Willem de Bruijn
  2021-05-05 17:28     ` Paolo Abeni
  0 siblings, 1 reply; 15+ messages in thread
From: Willem de Bruijn @ 2021-05-05 16:13 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Network Development, David S. Miller, Jakub Kicinski,
	Steffen Klassert, Willem de Bruijn, Miaohe Lin

On Wed, May 5, 2021 at 11:37 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
> While segmenting a SKB_GSO_FRAGLIST GSO packet, if the destructor
> callback is available, the skb destructor is invoked on each
> aggregated packet via skb_release_head_state().
>
> Such field (and the pairer skb->sk) is left untouched, so the same
> destructor is invoked again when the segmented skbs are freed, leading
> to double-free/UaF of the relevant socket.

Similar to skb_segment, should the destructor be swapped with the last
segment and callback delayed, instead of called immediately as part of
segmentation?

        /* Following permits correct backpressure, for protocols
         * using skb_set_owner_w().
         * Idea is to tranfert ownership from head_skb to last segment.
         */
        if (head_skb->destructor == sock_wfree) {
                swap(tail->truesize, head_skb->truesize);
                swap(tail->destructor, head_skb->destructor);
                swap(tail->sk, head_skb->sk);
        }

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net 1/4] net: fix double-free on fraglist GSO skbs
  2021-05-05 16:13   ` Willem de Bruijn
@ 2021-05-05 17:28     ` Paolo Abeni
  2021-05-05 17:30       ` Willem de Bruijn
  0 siblings, 1 reply; 15+ messages in thread
From: Paolo Abeni @ 2021-05-05 17:28 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Network Development, David S. Miller, Jakub Kicinski,
	Steffen Klassert, Willem de Bruijn, Miaohe Lin

On Wed, 2021-05-05 at 12:13 -0400, Willem de Bruijn wrote:
> On Wed, May 5, 2021 at 11:37 AM Paolo Abeni <pabeni@redhat.com> wrote:
> > While segmenting a SKB_GSO_FRAGLIST GSO packet, if the destructor
> > callback is available, the skb destructor is invoked on each
> > aggregated packet via skb_release_head_state().
> > 
> > Such field (and the pairer skb->sk) is left untouched, so the same
> > destructor is invoked again when the segmented skbs are freed, leading
> > to double-free/UaF of the relevant socket.
> 
> Similar to skb_segment, should the destructor be swapped with the last
> segment and callback delayed, instead of called immediately as part of
> segmentation?
> 
>         /* Following permits correct backpressure, for protocols
>          * using skb_set_owner_w().
>          * Idea is to tranfert ownership from head_skb to last segment.
>          */
>         if (head_skb->destructor == sock_wfree) {
>                 swap(tail->truesize, head_skb->truesize);
>                 swap(tail->destructor, head_skb->destructor);
>                 swap(tail->sk, head_skb->sk);
>         }

My understanding is that one assumption in the original
SKB_GSO_FRAGLIST implementation was that SKB_GSO_FRAGLIST skbs are not
owned by any socket. 

AFAICS the above assumption was true until:

commit c75fb320d482a5ce6e522378d137fd2c3bf79225
Author: Paolo Abeni <pabeni@redhat.com>
Date:   Fri Apr 9 13:04:37 2021 +0200

    veth: use skb_orphan_partial instead of skb_orphan

after that, if the skb is owned, skb->destructor is sock_efree(), so
the above code should not trigger.

More importantly SKB_GSO_FRAGLIST can only be applied if the inner-
most protocol is UDP, so
commit 432c856fcf45c468fffe2e5029cb3f95c7dc9475
and d6a4a10411764cf1c3a5dad4f06c5ebe5194488b should not be relevant. 

Thanks!

Paolo


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net 1/4] net: fix double-free on fraglist GSO skbs
  2021-05-05 17:28     ` Paolo Abeni
@ 2021-05-05 17:30       ` Willem de Bruijn
  2021-05-06 11:06         ` Paolo Abeni
  0 siblings, 1 reply; 15+ messages in thread
From: Willem de Bruijn @ 2021-05-05 17:30 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Network Development, David S. Miller, Jakub Kicinski,
	Steffen Klassert, Willem de Bruijn, Miaohe Lin

On Wed, May 5, 2021 at 1:28 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On Wed, 2021-05-05 at 12:13 -0400, Willem de Bruijn wrote:
> > On Wed, May 5, 2021 at 11:37 AM Paolo Abeni <pabeni@redhat.com> wrote:
> > > While segmenting a SKB_GSO_FRAGLIST GSO packet, if the destructor
> > > callback is available, the skb destructor is invoked on each
> > > aggregated packet via skb_release_head_state().
> > >
> > > Such field (and the pairer skb->sk) is left untouched, so the same
> > > destructor is invoked again when the segmented skbs are freed, leading
> > > to double-free/UaF of the relevant socket.
> >
> > Similar to skb_segment, should the destructor be swapped with the last
> > segment and callback delayed, instead of called immediately as part of
> > segmentation?
> >
> >         /* Following permits correct backpressure, for protocols
> >          * using skb_set_owner_w().
> >          * Idea is to tranfert ownership from head_skb to last segment.
> >          */
> >         if (head_skb->destructor == sock_wfree) {
> >                 swap(tail->truesize, head_skb->truesize);
> >                 swap(tail->destructor, head_skb->destructor);
> >                 swap(tail->sk, head_skb->sk);
> >         }
>
> My understanding is that one assumption in the original
> SKB_GSO_FRAGLIST implementation was that SKB_GSO_FRAGLIST skbs are not
> owned by any socket.
>
> AFAICS the above assumption was true until:
>
> commit c75fb320d482a5ce6e522378d137fd2c3bf79225
> Author: Paolo Abeni <pabeni@redhat.com>
> Date:   Fri Apr 9 13:04:37 2021 +0200
>
>     veth: use skb_orphan_partial instead of skb_orphan
>
> after that, if the skb is owned, skb->destructor is sock_efree(), so
> the above code should not trigger.

Okay, great.

> More importantly SKB_GSO_FRAGLIST can only be applied if the inner-
> most protocol is UDP, so
> commit 432c856fcf45c468fffe2e5029cb3f95c7dc9475
> and d6a4a10411764cf1c3a5dad4f06c5ebe5194488b should not be relevant.

I think the first does apply, as it applies to any protocol that uses
sock_wfree, not just tcp_wfree? Anyway, the point is moot indeed.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net 1/4] net: fix double-free on fraglist GSO skbs
  2021-05-05 17:30       ` Willem de Bruijn
@ 2021-05-06 11:06         ` Paolo Abeni
  2021-05-06 14:32           ` Willem de Bruijn
  0 siblings, 1 reply; 15+ messages in thread
From: Paolo Abeni @ 2021-05-06 11:06 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Network Development, David S. Miller, Jakub Kicinski,
	Steffen Klassert, Willem de Bruijn, Miaohe Lin

On Wed, 2021-05-05 at 13:30 -0400, Willem de Bruijn wrote:
> On Wed, May 5, 2021 at 1:28 PM Paolo Abeni <pabeni@redhat.com> wrote:
> > On Wed, 2021-05-05 at 12:13 -0400, Willem de Bruijn wrote:
> > > On Wed, May 5, 2021 at 11:37 AM Paolo Abeni <pabeni@redhat.com> wrote:
> > > > While segmenting a SKB_GSO_FRAGLIST GSO packet, if the destructor
> > > > callback is available, the skb destructor is invoked on each
> > > > aggregated packet via skb_release_head_state().
> > > > 
> > > > Such field (and the pairer skb->sk) is left untouched, so the same
> > > > destructor is invoked again when the segmented skbs are freed, leading
> > > > to double-free/UaF of the relevant socket.
> > > 
> > > Similar to skb_segment, should the destructor be swapped with the last
> > > segment and callback delayed, instead of called immediately as part of
> > > segmentation?
> > > 
> > >         /* Following permits correct backpressure, for protocols
> > >          * using skb_set_owner_w().
> > >          * Idea is to tranfert ownership from head_skb to last segment.
> > >          */
> > >         if (head_skb->destructor == sock_wfree) {
> > >                 swap(tail->truesize, head_skb->truesize);
> > >                 swap(tail->destructor, head_skb->destructor);
> > >                 swap(tail->sk, head_skb->sk);
> > >         }
> > 
> > My understanding is that one assumption in the original
> > SKB_GSO_FRAGLIST implementation was that SKB_GSO_FRAGLIST skbs are not
> > owned by any socket.
> > 
> > AFAICS the above assumption was true until:
> > 
> > commit c75fb320d482a5ce6e522378d137fd2c3bf79225
> > Author: Paolo Abeni <pabeni@redhat.com>
> > Date:   Fri Apr 9 13:04:37 2021 +0200
> > 
> >     veth: use skb_orphan_partial instead of skb_orphan
> > 
> > after that, if the skb is owned, skb->destructor is sock_efree(), so
> > the above code should not trigger.
> 
> Okay, great.
> 
> > More importantly SKB_GSO_FRAGLIST can only be applied if the inner-
> > most protocol is UDP, so
> > commit 432c856fcf45c468fffe2e5029cb3f95c7dc9475
> > and d6a4a10411764cf1c3a5dad4f06c5ebe5194488b should not be relevant.
> 
> I think the first does apply, as it applies to any protocol that uses
> sock_wfree, not just tcp_wfree? Anyway, the point is moot indeed.

If we want to be safe about future possible sock_wfree users, I think
the approach here should be different: in skb_segment(), tail-
>destructor is expected to be NULL, while skb_segment_list(), all the
list skbs can be owned by the same socket. Possibly we could open-
code skb_release_head_state(), omitting the skb orphaning part
for sock_wfree() destructor.

Note that the this is not currently needed - sock_wfree destructor
can't reach there.

Given all the above, I'm unsure if you are fine with (or at least do
not oppose to) the code proposed in this patch?

Thanks,

Paolo


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net 1/4] net: fix double-free on fraglist GSO skbs
  2021-05-06 11:06         ` Paolo Abeni
@ 2021-05-06 14:32           ` Willem de Bruijn
  2021-05-06 15:55             ` Paolo Abeni
  0 siblings, 1 reply; 15+ messages in thread
From: Willem de Bruijn @ 2021-05-06 14:32 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Network Development, David S. Miller, Jakub Kicinski,
	Steffen Klassert, Willem de Bruijn, Miaohe Lin

On Thu, May 6, 2021 at 7:07 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On Wed, 2021-05-05 at 13:30 -0400, Willem de Bruijn wrote:
> > On Wed, May 5, 2021 at 1:28 PM Paolo Abeni <pabeni@redhat.com> wrote:
> > > On Wed, 2021-05-05 at 12:13 -0400, Willem de Bruijn wrote:
> > > > On Wed, May 5, 2021 at 11:37 AM Paolo Abeni <pabeni@redhat.com> wrote:
> > > > > While segmenting a SKB_GSO_FRAGLIST GSO packet, if the destructor
> > > > > callback is available, the skb destructor is invoked on each
> > > > > aggregated packet via skb_release_head_state().
> > > > >
> > > > > Such field (and the pairer skb->sk) is left untouched, so the same
> > > > > destructor is invoked again when the segmented skbs are freed, leading
> > > > > to double-free/UaF of the relevant socket.
> > > >
> > > > Similar to skb_segment, should the destructor be swapped with the last
> > > > segment and callback delayed, instead of called immediately as part of
> > > > segmentation?
> > > >
> > > >         /* Following permits correct backpressure, for protocols
> > > >          * using skb_set_owner_w().
> > > >          * Idea is to tranfert ownership from head_skb to last segment.
> > > >          */
> > > >         if (head_skb->destructor == sock_wfree) {
> > > >                 swap(tail->truesize, head_skb->truesize);
> > > >                 swap(tail->destructor, head_skb->destructor);
> > > >                 swap(tail->sk, head_skb->sk);
> > > >         }
> > >
> > > My understanding is that one assumption in the original
> > > SKB_GSO_FRAGLIST implementation was that SKB_GSO_FRAGLIST skbs are not
> > > owned by any socket.
> > >
> > > AFAICS the above assumption was true until:
> > >
> > > commit c75fb320d482a5ce6e522378d137fd2c3bf79225
> > > Author: Paolo Abeni <pabeni@redhat.com>
> > > Date:   Fri Apr 9 13:04:37 2021 +0200
> > >
> > >     veth: use skb_orphan_partial instead of skb_orphan
> > >
> > > after that, if the skb is owned, skb->destructor is sock_efree(), so
> > > the above code should not trigger.
> >
> > Okay, great.
> >
> > > More importantly SKB_GSO_FRAGLIST can only be applied if the inner-
> > > most protocol is UDP, so
> > > commit 432c856fcf45c468fffe2e5029cb3f95c7dc9475
> > > and d6a4a10411764cf1c3a5dad4f06c5ebe5194488b should not be relevant.
> >
> > I think the first does apply, as it applies to any protocol that uses
> > sock_wfree, not just tcp_wfree? Anyway, the point is moot indeed.
>
> If we want to be safe about future possible sock_wfree users, I think
> the approach here should be different: in skb_segment(), tail-
> >destructor is expected to be NULL, while skb_segment_list(), all the
> list skbs can be owned by the same socket. Possibly we could open-
> code skb_release_head_state(), omitting the skb orphaning part
> for sock_wfree() destructor.
>
> Note that the this is not currently needed - sock_wfree destructor
> can't reach there.
>
> Given all the above, I'm unsure if you are fine with (or at least do
> not oppose to) the code proposed in this patch?

Yes. Thanks for clarifying, Paolo.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net 1/4] net: fix double-free on fraglist GSO skbs
  2021-05-06 14:32           ` Willem de Bruijn
@ 2021-05-06 15:55             ` Paolo Abeni
  2021-05-06 21:17               ` Jakub Kicinski
  0 siblings, 1 reply; 15+ messages in thread
From: Paolo Abeni @ 2021-05-06 15:55 UTC (permalink / raw)
  To: Willem de Bruijn, David S. Miller, Jakub Kicinski
  Cc: Network Development, Steffen Klassert, Willem de Bruijn, Miaohe Lin

On Thu, 2021-05-06 at 10:32 -0400, Willem de Bruijn wrote:
> On Thu, May 6, 2021 at 7:07 AM Paolo Abeni <pabeni@redhat.com> wrote:
> > On Wed, 2021-05-05 at 13:30 -0400, Willem de Bruijn wrote:
> > > On Wed, May 5, 2021 at 1:28 PM Paolo Abeni <pabeni@redhat.com> wrote:
> > > > On Wed, 2021-05-05 at 12:13 -0400, Willem de Bruijn wrote:
> > > > > On Wed, May 5, 2021 at 11:37 AM Paolo Abeni <pabeni@redhat.com> wrote:
> > > > > > While segmenting a SKB_GSO_FRAGLIST GSO packet, if the destructor
> > > > > > callback is available, the skb destructor is invoked on each
> > > > > > aggregated packet via skb_release_head_state().
> > > > > > 
> > > > > > Such field (and the pairer skb->sk) is left untouched, so the same
> > > > > > destructor is invoked again when the segmented skbs are freed, leading
> > > > > > to double-free/UaF of the relevant socket.
> > > > > 
> > > > > Similar to skb_segment, should the destructor be swapped with the last
> > > > > segment and callback delayed, instead of called immediately as part of
> > > > > segmentation?
> > > > > 
> > > > >         /* Following permits correct backpressure, for protocols
> > > > >          * using skb_set_owner_w().
> > > > >          * Idea is to tranfert ownership from head_skb to last segment.
> > > > >          */
> > > > >         if (head_skb->destructor == sock_wfree) {
> > > > >                 swap(tail->truesize, head_skb->truesize);
> > > > >                 swap(tail->destructor, head_skb->destructor);
> > > > >                 swap(tail->sk, head_skb->sk);
> > > > >         }
> > > > 
> > > > My understanding is that one assumption in the original
> > > > SKB_GSO_FRAGLIST implementation was that SKB_GSO_FRAGLIST skbs are not
> > > > owned by any socket.
> > > > 
> > > > AFAICS the above assumption was true until:
> > > > 
> > > > commit c75fb320d482a5ce6e522378d137fd2c3bf79225
> > > > Author: Paolo Abeni <pabeni@redhat.com>
> > > > Date:   Fri Apr 9 13:04:37 2021 +0200
> > > > 
> > > >     veth: use skb_orphan_partial instead of skb_orphan
> > > > 
> > > > after that, if the skb is owned, skb->destructor is sock_efree(), so
> > > > the above code should not trigger.
> > > 
> > > Okay, great.
> > > 
> > > > More importantly SKB_GSO_FRAGLIST can only be applied if the inner-
> > > > most protocol is UDP, so
> > > > commit 432c856fcf45c468fffe2e5029cb3f95c7dc9475
> > > > and d6a4a10411764cf1c3a5dad4f06c5ebe5194488b should not be relevant.
> > > 
> > > I think the first does apply, as it applies to any protocol that uses
> > > sock_wfree, not just tcp_wfree? Anyway, the point is moot indeed.
> > 
> > If we want to be safe about future possible sock_wfree users, I think
> > the approach here should be different: in skb_segment(), tail-
> > > destructor is expected to be NULL, while skb_segment_list(), all the
> > list skbs can be owned by the same socket. Possibly we could open-
> > code skb_release_head_state(), omitting the skb orphaning part
> > for sock_wfree() destructor.
> > 
> > Note that the this is not currently needed - sock_wfree destructor
> > can't reach there.
> > 
> > Given all the above, I'm unsure if you are fine with (or at least do
> > not oppose to) the code proposed in this patch?
> 
> Yes. Thanks for clarifying, Paolo.

Thank you for reviewing!

@David, @Jakub: I see this series is already archived as "change
requested", should I repost?

Thanks!

Paolo


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net 1/4] net: fix double-free on fraglist GSO skbs
  2021-05-06 15:55             ` Paolo Abeni
@ 2021-05-06 21:17               ` Jakub Kicinski
  2021-05-07  8:46                 ` Paolo Abeni
  0 siblings, 1 reply; 15+ messages in thread
From: Jakub Kicinski @ 2021-05-06 21:17 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Willem de Bruijn, David S. Miller, Network Development,
	Steffen Klassert, Willem de Bruijn, Miaohe Lin

On Thu, 06 May 2021 17:55:36 +0200 Paolo Abeni wrote:
> On Thu, 2021-05-06 at 10:32 -0400, Willem de Bruijn wrote:
> > On Thu, May 6, 2021 at 7:07 AM Paolo Abeni <pabeni@redhat.com> wrote:  
> > > If we want to be safe about future possible sock_wfree users, I think
> > > the approach here should be different: in skb_segment(), tail-  
> > > > destructor is expected to be NULL, while skb_segment_list(), all the  
> > > list skbs can be owned by the same socket. Possibly we could open-
> > > code skb_release_head_state(), omitting the skb orphaning part
> > > for sock_wfree() destructor.
> > > 
> > > Note that the this is not currently needed - sock_wfree destructor
> > > can't reach there.
> > > 
> > > Given all the above, I'm unsure if you are fine with (or at least do
> > > not oppose to) the code proposed in this patch?  
> > 
> > Yes. Thanks for clarifying, Paolo.  
> 
> Thank you for reviewing!
> 
> @David, @Jakub: I see this series is already archived as "change
> requested", should I repost?

Yes, please. Patch 2 adds two new sparse warnings. 

I think you need csum_unfold() to go from __sum16 to __wsum.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net 1/4] net: fix double-free on fraglist GSO skbs
  2021-05-06 21:17               ` Jakub Kicinski
@ 2021-05-07  8:46                 ` Paolo Abeni
  2021-05-10 15:37                   ` Paolo Abeni
  0 siblings, 1 reply; 15+ messages in thread
From: Paolo Abeni @ 2021-05-07  8:46 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Willem de Bruijn, David S. Miller, Network Development,
	Steffen Klassert, Willem de Bruijn, Miaohe Lin

On Thu, 2021-05-06 at 14:17 -0700, Jakub Kicinski wrote:
> On Thu, 06 May 2021 17:55:36 +0200 Paolo Abeni wrote:
> > On Thu, 2021-05-06 at 10:32 -0400, Willem de Bruijn wrote:
> > > On Thu, May 6, 2021 at 7:07 AM Paolo Abeni <pabeni@redhat.com> wrote:  
> > > > If we want to be safe about future possible sock_wfree users, I think
> > > > the approach here should be different: in skb_segment(), tail-  
> > > > > destructor is expected to be NULL, while skb_segment_list(), all the  
> > > > list skbs can be owned by the same socket. Possibly we could open-
> > > > code skb_release_head_state(), omitting the skb orphaning part
> > > > for sock_wfree() destructor.
> > > > 
> > > > Note that the this is not currently needed - sock_wfree destructor
> > > > can't reach there.
> > > > 
> > > > Given all the above, I'm unsure if you are fine with (or at least do
> > > > not oppose to) the code proposed in this patch?  
> > > 
> > > Yes. Thanks for clarifying, Paolo.  
> > 
> > Thank you for reviewing!
> > 
> > @David, @Jakub: I see this series is already archived as "change
> > requested", should I repost?
> 
> Yes, please. Patch 2 adds two new sparse warnings. 
> 
> I think you need csum_unfold() to go from __sum16 to __wsum.

Yes, indeed. I'll send a v2 with such change, thanks!

Paolo
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net 1/4] net: fix double-free on fraglist GSO skbs
  2021-05-07  8:46                 ` Paolo Abeni
@ 2021-05-10 15:37                   ` Paolo Abeni
  2021-05-11  9:39                     ` Steffen Klassert
  0 siblings, 1 reply; 15+ messages in thread
From: Paolo Abeni @ 2021-05-10 15:37 UTC (permalink / raw)
  To: Steffen Klassert
  Cc: Willem de Bruijn, David S. Miller, Network Development,
	Willem de Bruijn, Miaohe Lin, Jakub Kicinski

On Fri, 2021-05-07 at 10:46 +0200, Paolo Abeni wrote:
> On Thu, 2021-05-06 at 14:17 -0700, Jakub Kicinski wrote:
> > On Thu, 06 May 2021 17:55:36 +0200 Paolo Abeni wrote:
> > > On Thu, 2021-05-06 at 10:32 -0400, Willem de Bruijn wrote:
> > > > On Thu, May 6, 2021 at 7:07 AM Paolo Abeni <pabeni@redhat.com> wrote:  
> > > > > If we want to be safe about future possible sock_wfree users, I think
> > > > > the approach here should be different: in skb_segment(), tail-  
> > > > > > destructor is expected to be NULL, while skb_segment_list(), all the  
> > > > > list skbs can be owned by the same socket. Possibly we could open-
> > > > > code skb_release_head_state(), omitting the skb orphaning part
> > > > > for sock_wfree() destructor.
> > > > > 
> > > > > Note that the this is not currently needed - sock_wfree destructor
> > > > > can't reach there.
> > > > > 
> > > > > Given all the above, I'm unsure if you are fine with (or at least do
> > > > > not oppose to) the code proposed in this patch?  
> > > > 
> > > > Yes. Thanks for clarifying, Paolo.  
> > > 
> > > Thank you for reviewing!
> > > 
> > > @David, @Jakub: I see this series is already archived as "change
> > > requested", should I repost?
> > 
> > Yes, please. Patch 2 adds two new sparse warnings. 
> > 
> > I think you need csum_unfold() to go from __sum16 to __wsum.
> 
> Yes, indeed. I'll send a v2 with such change, thanks!

It's taking [much] more than expected, as it turned out that thare are
still a number of case where the tx csum is uncorrect.

If the traffic comes from a veth we don't have a valid th->csum value
at GRO time, setting ip_summed to CHECKSUM_UNNECESSARY - as the current
code does - looks wrong.
@Steffen: I see in the original discussion about GRO_FRAGLIST
introduction that you wanted the GRO packets to be CHECKSUM_UNNECESSARY
to avoid csum modification in fwd path. I guess that choice was mostily
due performance reasons, to avoid touching the aggregated pkts header
at gso_segment_list time, but it looks like it's quite bug prone. If
so, I'm unsure the performance gain is worty. I propose to switch to
CHECKSUM_PARTIAL. Would you be ok with that?

Thanks,

Paolo


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net 1/4] net: fix double-free on fraglist GSO skbs
  2021-05-10 15:37                   ` Paolo Abeni
@ 2021-05-11  9:39                     ` Steffen Klassert
  0 siblings, 0 replies; 15+ messages in thread
From: Steffen Klassert @ 2021-05-11  9:39 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Willem de Bruijn, David S. Miller, Network Development,
	Willem de Bruijn, Miaohe Lin, Jakub Kicinski

On Mon, May 10, 2021 at 05:37:58PM +0200, Paolo Abeni wrote:
> 
> It's taking [much] more than expected, as it turned out that thare are
> still a number of case where the tx csum is uncorrect.
> 
> If the traffic comes from a veth we don't have a valid th->csum value
> at GRO time, setting ip_summed to CHECKSUM_UNNECESSARY - as the current
> code does - looks wrong.
> @Steffen: I see in the original discussion about GRO_FRAGLIST
> introduction that you wanted the GRO packets to be CHECKSUM_UNNECESSARY
> to avoid csum modification in fwd path. I guess that choice was mostily
> due performance reasons, to avoid touching the aggregated pkts header
> at gso_segment_list time, but it looks like it's quite bug prone. If
> so, I'm unsure the performance gain is worty.

Yes, that was for performance reasons. We don't mangle the packets
with fraglist GRO, so the checksum should be still correct when
doing GSO.

> I propose to switch to
> CHECKSUM_PARTIAL. Would you be ok with that?

If there are cases where CHECKSUM_UNNECESSARY is problematic,
then yes, let's switch to CHECKSUM_PARTIAL.

Thanks for doing this Paolo!

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2021-05-11  9:39 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-05 15:35 [PATCH net 0/4] udp: more FRAGLIST fixes Paolo Abeni
2021-05-05 15:35 ` [PATCH net 1/4] net: fix double-free on fraglist GSO skbs Paolo Abeni
2021-05-05 16:13   ` Willem de Bruijn
2021-05-05 17:28     ` Paolo Abeni
2021-05-05 17:30       ` Willem de Bruijn
2021-05-06 11:06         ` Paolo Abeni
2021-05-06 14:32           ` Willem de Bruijn
2021-05-06 15:55             ` Paolo Abeni
2021-05-06 21:17               ` Jakub Kicinski
2021-05-07  8:46                 ` Paolo Abeni
2021-05-10 15:37                   ` Paolo Abeni
2021-05-11  9:39                     ` Steffen Klassert
2021-05-05 15:35 ` [PATCH net 2/4] udp: fix out-of-bound at segmentation time Paolo Abeni
2021-05-05 15:35 ` [PATCH net 3/4] udp: fix outer header csum for SKB_GSO_FRAGLIST over UDP tunnel Paolo Abeni
2021-05-05 15:35 ` [PATCH net 4/4] selftests: more UDP GRO tests Paolo Abeni

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.