All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] skbuff: skb_segment, Update nfrags after calling zero copy functions
@ 2023-08-28 23:32 Mohamed Khalfella
  2023-08-29  4:18 ` willemjdebruijn
  0 siblings, 1 reply; 15+ messages in thread
From: Mohamed Khalfella @ 2023-08-28 23:32 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Willem de Bruijn
  Cc: Mohamed Khalfella, Alexander Duyck, David Howells,
	Jesper Dangaard Brouer, Kees Cook, open list:NETWORKING [GENERAL],
	open list, open list:BPF [MISC]

We have seen kernel panic with stacktrace below. This is 5.15.123
LTS kernel running a qemu VM with virtio network interface and
vhost=on. When enabling packet corruption, with command below, we see
the kernel panic.

tc qdisc add dev eth2 root netem corrupt 0.7065335155074846%

[  193.894380] BUG: kernel NULL pointer dereference, address: 00000000000000bc
[  193.894635] #PF: supervisor read access in kernel mode
[  193.894828] #PF: error_code(0x0000) - not-present page
[  193.895027] PGD 0 P4D 0
[  193.895140] Oops: 0000 [#1] SMP
[  193.895273] CPU: 13 PID: 18164 Comm: vh-net-17428 Kdump: loaded Tainted: G           O      5.15.123+ #26
[  193.895602] Hardware name:
[  193.903919] RIP: 0010:skb_segment+0xb0e/0x12f0
[  193.908176] Code: 45 a8 50 e8 54 46 be ff 44 8b 5d 80 41 01 c7 48 83 c4 18 44 89 7d b4 44 3b 5d b0 0f 8c f0 00 00 00 4d 85 e4 0f 84 65 07 00 00 <45> 8b b4 24 bc 00 00 00 48 8b b5 70 ff ff ff 4d 03 b4 24 c0 00 00
[  193.921099] RSP: 0018:ffffc9002bf2f770 EFLAGS: 00010282
[  193.925552] RAX: 00000000000000ee RBX: ffff88baab5092c8 RCX: 0000000000000003
[  193.934308] RDX: 0000000000000000 RSI: 00000000fff7ffff RDI: 0000000000000001
[  193.943281] RBP: ffffc9002bf2f850 R08: 0000000000000000 R09: c0000000fff7ffff
[  193.952658] R10: 0000000000000029 R11: ffffc9002bf2f310 R12: 0000000000000000
[  193.962423] R13: ffff88abc2291e00 R14: ffff88abc2291d00 R15: ffff88abc2290600
[  193.972593] FS:  0000000000000000(0000) GS:ffff88c07f840000(0000) knlGS:0000000000000000
[  193.983302] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  193.988891] CR2: 00000000000000bc CR3: 0000001dccb4b006 CR4: 00000000001726e0
[  193.999925] MSR 198h IA32 perf status 0x00001ba000001a00
[  194.005488] MSR 19Ch IA32 thermal status 0x0000000088370000
[  194.010983] MSR 1B1h IA32 package thermal status 0x0000000088330000
[  194.021892] Call Trace:
[  194.027422]  <TASK>
[  194.032838]  ? __die_body+0x1a/0x60
[  194.038172]  ? page_fault_oops+0x12d/0x4d0
[  194.043395]  ? skb_segment+0xb0e/0x12f0
[  194.048501]  ? search_bpf_extables+0x59/0x60
[  194.053547]  ? fixup_exception+0x1d/0x250
[  194.058537]  ? exc_page_fault+0x67/0x140
[  194.063382]  ? asm_exc_page_fault+0x1f/0x30
[  194.068171]  ? skb_segment+0xb0e/0x12f0
[  194.072861]  tcp_gso_segment+0x107/0x540
[  194.077507]  ? sk_common_release+0xe0/0xe0
[  194.082031]  inet_gso_segment+0x15c/0x3d0
[  194.086441]  ? __skb_get_hash_symmetric+0x190/0x190
[  194.090783]  skb_mac_gso_segment+0x9f/0x110
[  194.095016]  __skb_gso_segment+0xc1/0x190
[  194.099124]  ? netif_skb_features+0xb5/0x280
[  194.103131]  netem_enqueue+0x290/0xb10 [sch_netem]
[  194.107071]  dev_qdisc_enqueue+0x16/0x70
[  194.110884]  __dev_queue_xmit+0x63b/0xb30
[  194.114569]  ? inet_gso_segment+0x15c/0x3d0
[  194.118160]  ? bond_start_xmit+0x159/0x380 [bonding]
[  194.121670]  bond_start_xmit+0x159/0x380 [bonding]
[  194.125101]  ? skb_mac_gso_segment+0xa7/0x110
[  194.128506]  dev_hard_start_xmit+0xc3/0x1e0
[  194.131787]  __dev_queue_xmit+0x8a0/0xb30
[  194.134977]  ? macvlan_start_xmit+0x4f/0x100 [macvlan]
[  194.138225]  macvlan_start_xmit+0x4f/0x100 [macvlan]
[  194.141477]  dev_hard_start_xmit+0xc3/0x1e0
[  194.144622]  sch_direct_xmit+0xe3/0x280
[  194.147748]  __dev_queue_xmit+0x54a/0xb30
[  194.150924]  ? tap_get_user+0x2a8/0x9c0 [tap]
[  194.154131]  tap_get_user+0x2a8/0x9c0 [tap]
[  194.157358]  tap_sendmsg+0x52/0x8e0 [tap]
[  194.160565]  ? get_tx_bufs+0x42/0x1d0 [vhost_net]
[  194.163815]  ? get_tx_bufs+0x16a/0x1d0 [vhost_net]
[  194.167049]  handle_tx_zerocopy+0x14e/0x4c0 [vhost_net]
[  194.170351]  ? add_range+0x11/0x30
[  194.173631]  handle_tx+0xcd/0xe0 [vhost_net]
[  194.176959]  vhost_worker+0x76/0xb0 [vhost]
[  194.180299]  ? vhost_flush_work+0x10/0x10 [vhost]
[  194.183667]  kthread+0x118/0x140
[  194.187007]  ? set_kthread_struct+0x40/0x40
[  194.190358]  ret_from_fork+0x1f/0x30
[  194.193670]  </TASK>

I have narrowed the issue down to these lines in skb_segment()

4247                 if (skb_orphan_frags(frag_skb, GFP_ATOMIC) ||
4248                     skb_zerocopy_clone(nskb, frag_skb, GFP_ATOMIC))
4249                         goto err;

It is possible for skb_orphan_frags() or skb_zerocopy_clone() to update
`nr_frags` as both functions may call skb_copy_ubufs(). If `nr_frags`
gets updated, the local copy in `nfrags` might end up stale and cause
this panic. In particular it is possible the while loop below hits
`i >= nrfrags` prematurely and tries to move to next `frag_skb` by using
`list_skb`. If `list_skb` is NULL then we hit the panic above.

The naive way to fix this is to update `nfrags` as shown in this patch.
This way we get the correct number of fragments and while loop can
find all fragments without needing to move to next skbuff.

I wanted to share this with the list and see what people think before
submitting a patch. Specially that I have not tested this on master
branch.

Fixes: bf5c25d60861 ("skbuff: in skb_segment, call zerocopy functions once per nskb")
Signed-off-by: Mohamed Khalfella <mkhalfella@purestorage.com>
---
 net/core/skbuff.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index a298992060e6..864cc8ad1969 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -4567,6 +4567,8 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
 		if (skb_orphan_frags(frag_skb, GFP_ATOMIC) ||
 		    skb_zerocopy_clone(nskb, frag_skb, GFP_ATOMIC))
 			goto err;
+		/* Update nfrags in case skb_copy_ubufs() updates nr_frags */
+		nfrags = skb_shinfo(frag_skb)->nr_frags;
 
 		while (pos < offset + len) {
 			if (i >= nfrags) {
@@ -4587,6 +4589,8 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
 				    skb_zerocopy_clone(nskb, frag_skb,
 						       GFP_ATOMIC))
 					goto err;
+				/* Update nfrags in case skb_copy_ubufs() updates nr_frags */
+				nfrags = skb_shinfo(frag_skb)->nr_frags;
 
 				list_skb = list_skb->next;
 			}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* RE: [PATCH] skbuff: skb_segment, Update nfrags after calling zero copy functions
  2023-08-28 23:32 [PATCH] skbuff: skb_segment, Update nfrags after calling zero copy functions Mohamed Khalfella
@ 2023-08-29  4:18 ` willemjdebruijn
  2023-08-29  6:50   ` Mohamed Khalfella
                     ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: willemjdebruijn @ 2023-08-29  4:18 UTC (permalink / raw)
  To: Mohamed Khalfella, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Willem de Bruijn
  Cc: Mohamed Khalfella, Alexander Duyck, David Howells,
	Jesper Dangaard Brouer, Kees Cook, open list:NETWORKING [GENERAL],
	open list, open list:BPF [MISC]

Mohamed Khalfella wrote:
> We have seen kernel panic with stacktrace below. This is 5.15.123
> LTS kernel running a qemu VM with virtio network interface and
> vhost=on. When enabling packet corruption, with command below, we see
> the kernel panic.
> 
> tc qdisc add dev eth2 root netem corrupt 0.7065335155074846%
> 
> [  193.894380] BUG: kernel NULL pointer dereference, address: 00000000000000bc
> [  193.894635] #PF: supervisor read access in kernel mode
> [  193.894828] #PF: error_code(0x0000) - not-present page
> [  193.895027] PGD 0 P4D 0
> [  193.895140] Oops: 0000 [#1] SMP
> [  193.895273] CPU: 13 PID: 18164 Comm: vh-net-17428 Kdump: loaded Tainted: G           O      5.15.123+ #26
> [  193.895602] Hardware name:
> [  193.903919] RIP: 0010:skb_segment+0xb0e/0x12f0
> [  193.908176] Code: 45 a8 50 e8 54 46 be ff 44 8b 5d 80 41 01 c7 48 83 c4 18 44 89 7d b4 44 3b 5d b0 0f 8c f0 00 00 00 4d 85 e4 0f 84 65 07 00 00 <45> 8b b4 24 bc 00 00 00 48 8b b5 70 ff ff ff 4d 03 b4 24 c0 00 00
> [  193.921099] RSP: 0018:ffffc9002bf2f770 EFLAGS: 00010282
> [  193.925552] RAX: 00000000000000ee RBX: ffff88baab5092c8 RCX: 0000000000000003
> [  193.934308] RDX: 0000000000000000 RSI: 00000000fff7ffff RDI: 0000000000000001
> [  193.943281] RBP: ffffc9002bf2f850 R08: 0000000000000000 R09: c0000000fff7ffff
> [  193.952658] R10: 0000000000000029 R11: ffffc9002bf2f310 R12: 0000000000000000
> [  193.962423] R13: ffff88abc2291e00 R14: ffff88abc2291d00 R15: ffff88abc2290600
> [  193.972593] FS:  0000000000000000(0000) GS:ffff88c07f840000(0000) knlGS:0000000000000000
> [  193.983302] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  193.988891] CR2: 00000000000000bc CR3: 0000001dccb4b006 CR4: 00000000001726e0
> [  193.999925] MSR 198h IA32 perf status 0x00001ba000001a00
> [  194.005488] MSR 19Ch IA32 thermal status 0x0000000088370000
> [  194.010983] MSR 1B1h IA32 package thermal status 0x0000000088330000
> [  194.021892] Call Trace:
> [  194.027422]  <TASK>
> [  194.032838]  ? __die_body+0x1a/0x60
> [  194.038172]  ? page_fault_oops+0x12d/0x4d0
> [  194.043395]  ? skb_segment+0xb0e/0x12f0
> [  194.048501]  ? search_bpf_extables+0x59/0x60
> [  194.053547]  ? fixup_exception+0x1d/0x250
> [  194.058537]  ? exc_page_fault+0x67/0x140
> [  194.063382]  ? asm_exc_page_fault+0x1f/0x30
> [  194.068171]  ? skb_segment+0xb0e/0x12f0
> [  194.072861]  tcp_gso_segment+0x107/0x540
> [  194.077507]  ? sk_common_release+0xe0/0xe0
> [  194.082031]  inet_gso_segment+0x15c/0x3d0
> [  194.086441]  ? __skb_get_hash_symmetric+0x190/0x190
> [  194.090783]  skb_mac_gso_segment+0x9f/0x110
> [  194.095016]  __skb_gso_segment+0xc1/0x190
> [  194.099124]  ? netif_skb_features+0xb5/0x280
> [  194.103131]  netem_enqueue+0x290/0xb10 [sch_netem]
> [  194.107071]  dev_qdisc_enqueue+0x16/0x70
> [  194.110884]  __dev_queue_xmit+0x63b/0xb30
> [  194.114569]  ? inet_gso_segment+0x15c/0x3d0
> [  194.118160]  ? bond_start_xmit+0x159/0x380 [bonding]
> [  194.121670]  bond_start_xmit+0x159/0x380 [bonding]
> [  194.125101]  ? skb_mac_gso_segment+0xa7/0x110
> [  194.128506]  dev_hard_start_xmit+0xc3/0x1e0
> [  194.131787]  __dev_queue_xmit+0x8a0/0xb30
> [  194.134977]  ? macvlan_start_xmit+0x4f/0x100 [macvlan]
> [  194.138225]  macvlan_start_xmit+0x4f/0x100 [macvlan]
> [  194.141477]  dev_hard_start_xmit+0xc3/0x1e0
> [  194.144622]  sch_direct_xmit+0xe3/0x280
> [  194.147748]  __dev_queue_xmit+0x54a/0xb30
> [  194.150924]  ? tap_get_user+0x2a8/0x9c0 [tap]
> [  194.154131]  tap_get_user+0x2a8/0x9c0 [tap]
> [  194.157358]  tap_sendmsg+0x52/0x8e0 [tap]
> [  194.160565]  ? get_tx_bufs+0x42/0x1d0 [vhost_net]
> [  194.163815]  ? get_tx_bufs+0x16a/0x1d0 [vhost_net]
> [  194.167049]  handle_tx_zerocopy+0x14e/0x4c0 [vhost_net]
> [  194.170351]  ? add_range+0x11/0x30
> [  194.173631]  handle_tx+0xcd/0xe0 [vhost_net]
> [  194.176959]  vhost_worker+0x76/0xb0 [vhost]
> [  194.180299]  ? vhost_flush_work+0x10/0x10 [vhost]
> [  194.183667]  kthread+0x118/0x140
> [  194.187007]  ? set_kthread_struct+0x40/0x40
> [  194.190358]  ret_from_fork+0x1f/0x30
> [  194.193670]  </TASK>
> 
> I have narrowed the issue down to these lines in skb_segment()
> 
> 4247                 if (skb_orphan_frags(frag_skb, GFP_ATOMIC) ||
> 4248                     skb_zerocopy_clone(nskb, frag_skb, GFP_ATOMIC))
> 4249                         goto err;
> 
> It is possible for skb_orphan_frags() or skb_zerocopy_clone() to update
> `nr_frags` as both functions may call skb_copy_ubufs(). If `nr_frags`
> gets updated, the local copy in `nfrags` might end up stale and cause
> this panic. In particular it is possible the while loop below hits
> `i >= nrfrags` prematurely and tries to move to next `frag_skb` by using
> `list_skb`. If `list_skb` is NULL then we hit the panic above.
> 
> The naive way to fix this is to update `nfrags` as shown in this patch.
> This way we get the correct number of fragments and while loop can
> find all fragments without needing to move to next skbuff.
> 
> I wanted to share this with the list and see what people think before
> submitting a patch. Specially that I have not tested this on master
> branch.
> 
> Fixes: bf5c25d60861 ("skbuff: in skb_segment, call zerocopy functions once per nskb")
> Signed-off-by: Mohamed Khalfella <mkhalfella@purestorage.com>
> ---
>  net/core/skbuff.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index a298992060e6..864cc8ad1969 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -4567,6 +4567,8 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>  		if (skb_orphan_frags(frag_skb, GFP_ATOMIC) ||
>  		    skb_zerocopy_clone(nskb, frag_skb, GFP_ATOMIC))
>  			goto err;
> +		/* Update nfrags in case skb_copy_ubufs() updates nr_frags */
> +		nfrags = skb_shinfo(frag_skb)->nr_frags;
>  
>  		while (pos < offset + len) {
>  			if (i >= nfrags) {
> @@ -4587,6 +4589,8 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>  				    skb_zerocopy_clone(nskb, frag_skb,
>  						       GFP_ATOMIC))
>  					goto err;
> +				/* Update nfrags in case skb_copy_ubufs() updates nr_frags */
> +				nfrags = skb_shinfo(frag_skb)->nr_frags;
>  
>  				list_skb = list_skb->next;
>  			}

In principle this makes sense to me. skb_copy_ubufs will create new
frags, which do not map 1:1 on to the original frags. It originally
allocated the number of order 0 pages required to store
__skb_pagelen(skb). Recently it switched to using high order pages.

The two calls to skb_orphan_frags and skb_zerocopy clone are called
for each new segment skb. But only on skb_orphan_frags will it
possibly call skb_copy_ubufs, an then always for the first call of the
input skb (head_skb or a list_skb). From commit "skbuff: in
skb_segment, call zerocopy functions once per nskb":

   "When segmenting skbs with user frags, either the user frags must be
    replaced with private copies and uarg released, or the uarg must have
    its refcount increased for each new skb.

    skb_orphan_frags does the first, except for cases that can handle
    reference counting. skb_zerocopy_clone then does the second."

skb_copy_ubufs only called on the first check for an input simplifies.
A possible realloc in the middle of processing a series of frags would
be much harder.

Small point: nfrags is not the only state that needs to be refreshed
after a fags realloc, also frag.

Thanks for the report. I'm traveling likely without internet until the
weekend. Apologies if it takes a while for me to follow up.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] skbuff: skb_segment, Update nfrags after calling zero copy functions
  2023-08-29  4:18 ` willemjdebruijn
@ 2023-08-29  6:50   ` Mohamed Khalfella
  2023-08-29  8:07     ` Eric Dumazet
  2023-08-30 23:28   ` [PATCH v2] skbuff: skb_segment, Call zero copy functions before using skbuff frags Mohamed Khalfella
  2023-08-31  8:17   ` [PATCH v3] " Mohamed Khalfella
  2 siblings, 1 reply; 15+ messages in thread
From: Mohamed Khalfella @ 2023-08-29  6:50 UTC (permalink / raw)
  To: willemjdebruijn
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Willem de Bruijn, Alexander Duyck, David Howells,
	Jesper Dangaard Brouer, Kees Cook, open list:NETWORKING [GENERAL],
	open list, open list:BPF [MISC]

On 2023-08-28 21:18:16 -0700, willemjdebruijn wrote:
> Small point: nfrags is not the only state that needs to be refreshed
> after a fags realloc, also frag.

I am new to this code. Can you help me understand why frag needs to be
updated too? My reading of this code is that frag points to frags array
in shared info. As long as shared info pointer remain the same frag
pointer should remain valid.

Am I missing something?
> 
> Thanks for the report. I'm traveling likely without internet until the
> weekend. Apologies if it takes a while for me to follow up.
No problem. Thanks for the quick response!

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] skbuff: skb_segment, Update nfrags after calling zero copy functions
  2023-08-29  6:50   ` Mohamed Khalfella
@ 2023-08-29  8:07     ` Eric Dumazet
  2023-08-29  9:31       ` Mohamed Khalfella
  0 siblings, 1 reply; 15+ messages in thread
From: Eric Dumazet @ 2023-08-29  8:07 UTC (permalink / raw)
  To: Mohamed Khalfella
  Cc: willemjdebruijn, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Willem de Bruijn, Alexander Duyck, David Howells,
	Jesper Dangaard Brouer, Kees Cook, open list:NETWORKING [GENERAL],
	open list, open list:BPF [MISC]

On Tue, Aug 29, 2023 at 8:50 AM Mohamed Khalfella
<mkhalfella@purestorage.com> wrote:
>
> On 2023-08-28 21:18:16 -0700, willemjdebruijn wrote:
> > Small point: nfrags is not the only state that needs to be refreshed
> > after a fags realloc, also frag.
>
> I am new to this code. Can you help me understand why frag needs to be
> updated too? My reading of this code is that frag points to frags array
> in shared info. As long as shared info pointer remain the same frag
> pointer should remain valid.
>

skb_copy_ubufs() could actually call skb_unclone() and thus skb->head
could be re-allocated.

I guess that if you run your patch (and a repro of the bug ?) with
KASAN enabled kernel, you should see a possible use-after-free ?

To force the skb_unclone() path, having a tcpdump catching all packets
would be enough I think.

> Am I missing something?
> >
> > Thanks for the report. I'm traveling likely without internet until the
> > weekend. Apologies if it takes a while for me to follow up.
> No problem. Thanks for the quick response!

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] skbuff: skb_segment, Update nfrags after calling zero copy functions
  2023-08-29  8:07     ` Eric Dumazet
@ 2023-08-29  9:31       ` Mohamed Khalfella
  2023-08-29 10:09         ` Eric Dumazet
  0 siblings, 1 reply; 15+ messages in thread
From: Mohamed Khalfella @ 2023-08-29  9:31 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: willemjdebruijn, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Willem de Bruijn, Alexander Duyck, David Howells,
	Jesper Dangaard Brouer, Kees Cook, open list:NETWORKING [GENERAL],
	open list, open list:BPF [MISC]

On 2023-08-29 10:07:59 +0200, Eric Dumazet wrote:
> On Tue, Aug 29, 2023 at 8:50 AM Mohamed Khalfella
> <mkhalfella@purestorage.com> wrote:
> >
> > On 2023-08-28 21:18:16 -0700, willemjdebruijn wrote:
> > > Small point: nfrags is not the only state that needs to be refreshed
> > > after a fags realloc, also frag.
> >
> > I am new to this code. Can you help me understand why frag needs to be
> > updated too? My reading of this code is that frag points to frags array
> > in shared info. As long as shared info pointer remain the same frag
> > pointer should remain valid.
> >
> 
> skb_copy_ubufs() could actually call skb_unclone() and thus skb->head
> could be re-allocated.
> 
> I guess that if you run your patch (and a repro of the bug ?) with
> KASAN enabled kernel, you should see a possible use-after-free ?
> 
> To force the skb_unclone() path, having a tcpdump catching all packets
> would be enough I think.
> 

Okay, I see it now. I have not tested this patch with tcpdump capturing
packets at the same time. Also, during my testing I have not seen the
value of skb->head changnig. Now you are mentioning it it, I will make
sure to test with tcpdump running and see skb->head changing. Thank you
for pointing that out.

For frag, I guess something like frag = &skb_shinfo(list_skb)->frags[i];
should do the job. I have not tested it though. I will need to do more
testing before posting updated patch.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] skbuff: skb_segment, Update nfrags after calling zero copy functions
  2023-08-29  9:31       ` Mohamed Khalfella
@ 2023-08-29 10:09         ` Eric Dumazet
  2023-08-29 22:24           ` Mohamed Khalfella
  0 siblings, 1 reply; 15+ messages in thread
From: Eric Dumazet @ 2023-08-29 10:09 UTC (permalink / raw)
  To: Mohamed Khalfella
  Cc: willemjdebruijn, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Willem de Bruijn, Alexander Duyck, David Howells,
	Jesper Dangaard Brouer, Kees Cook, open list:NETWORKING [GENERAL],
	open list, open list:BPF [MISC]

On Tue, Aug 29, 2023 at 11:31 AM Mohamed Khalfella
<mkhalfella@purestorage.com> wrote:
>
> On 2023-08-29 10:07:59 +0200, Eric Dumazet wrote:
> > On Tue, Aug 29, 2023 at 8:50 AM Mohamed Khalfella
> > <mkhalfella@purestorage.com> wrote:
> > >
> > > On 2023-08-28 21:18:16 -0700, willemjdebruijn wrote:
> > > > Small point: nfrags is not the only state that needs to be refreshed
> > > > after a fags realloc, also frag.
> > >
> > > I am new to this code. Can you help me understand why frag needs to be
> > > updated too? My reading of this code is that frag points to frags array
> > > in shared info. As long as shared info pointer remain the same frag
> > > pointer should remain valid.
> > >
> >
> > skb_copy_ubufs() could actually call skb_unclone() and thus skb->head
> > could be re-allocated.
> >
> > I guess that if you run your patch (and a repro of the bug ?) with
> > KASAN enabled kernel, you should see a possible use-after-free ?
> >
> > To force the skb_unclone() path, having a tcpdump catching all packets
> > would be enough I think.
> >
>
> Okay, I see it now. I have not tested this patch with tcpdump capturing
> packets at the same time. Also, during my testing I have not seen the
> value of skb->head changnig. Now you are mentioning it it, I will make
> sure to test with tcpdump running and see skb->head changing. Thank you
> for pointing that out.
>
> For frag, I guess something like frag = &skb_shinfo(list_skb)->frags[i];
> should do the job. I have not tested it though. I will need to do more
> testing before posting updated patch.

Another way to test this path for certain (without tcpdump having to race)
is to add a temporary/debug patch like this one:

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index a298992060e6efdecb87c7ffc8290eafe330583f..20cc42be5e81cdca567515f2a886af4ada0fbe0a
100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1749,7 +1749,8 @@ int skb_copy_ubufs(struct sk_buff *skb, gfp_t gfp_mask)
        int i, order, psize, new_frags;
        u32 d_off;

-       if (skb_shared(skb) || skb_unclone(skb, gfp_mask))
+       if (skb_shared(skb) ||
+           pskb_expand_head(skb, 0, 0, gfp_mask))
                return -EINVAL;

        if (!num_frags)

Note that this might catch other bugs :/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] skbuff: skb_segment, Update nfrags after calling zero copy functions
  2023-08-29 10:09         ` Eric Dumazet
@ 2023-08-29 22:24           ` Mohamed Khalfella
  2023-08-30  3:44             ` Eric Dumazet
  0 siblings, 1 reply; 15+ messages in thread
From: Mohamed Khalfella @ 2023-08-29 22:24 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: willemjdebruijn, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Willem de Bruijn, Alexander Duyck, David Howells,
	Jesper Dangaard Brouer, Kees Cook, open list:NETWORKING [GENERAL],
	open list, open list:BPF [MISC]

On 2023-08-29 12:09:15 +0200, Eric Dumazet wrote:
> Another way to test this path for certain (without tcpdump having to race)
> is to add a temporary/debug patch like this one:
> 
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index a298992060e6efdecb87c7ffc8290eafe330583f..20cc42be5e81cdca567515f2a886af4ada0fbe0a
> 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -1749,7 +1749,8 @@ int skb_copy_ubufs(struct sk_buff *skb, gfp_t gfp_mask)
>         int i, order, psize, new_frags;
>         u32 d_off;
> 
> -       if (skb_shared(skb) || skb_unclone(skb, gfp_mask))
> +       if (skb_shared(skb) ||
> +           pskb_expand_head(skb, 0, 0, gfp_mask))
>                 return -EINVAL;
> 
>         if (!num_frags)
> 
> Note that this might catch other bugs :/

I was not able to make it allocate a new frags by running tcpdump while
reproing the problem. However, I was able to do it with your patch.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] skbuff: skb_segment, Update nfrags after calling zero copy functions
  2023-08-29 22:24           ` Mohamed Khalfella
@ 2023-08-30  3:44             ` Eric Dumazet
  0 siblings, 0 replies; 15+ messages in thread
From: Eric Dumazet @ 2023-08-30  3:44 UTC (permalink / raw)
  To: Mohamed Khalfella
  Cc: willemjdebruijn, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Willem de Bruijn, Alexander Duyck, David Howells,
	Jesper Dangaard Brouer, Kees Cook, open list:NETWORKING [GENERAL],
	open list, open list:BPF [MISC]

On Wed, Aug 30, 2023 at 12:24 AM Mohamed Khalfella
<mkhalfella@purestorage.com> wrote:
>
> On 2023-08-29 12:09:15 +0200, Eric Dumazet wrote:
> > Another way to test this path for certain (without tcpdump having to race)
> > is to add a temporary/debug patch like this one:
> >
> > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > index a298992060e6efdecb87c7ffc8290eafe330583f..20cc42be5e81cdca567515f2a886af4ada0fbe0a
> > 100644
> > --- a/net/core/skbuff.c
> > +++ b/net/core/skbuff.c
> > @@ -1749,7 +1749,8 @@ int skb_copy_ubufs(struct sk_buff *skb, gfp_t gfp_mask)
> >         int i, order, psize, new_frags;
> >         u32 d_off;
> >
> > -       if (skb_shared(skb) || skb_unclone(skb, gfp_mask))
> > +       if (skb_shared(skb) ||
> > +           pskb_expand_head(skb, 0, 0, gfp_mask))
> >                 return -EINVAL;
> >
> >         if (!num_frags)
> >
> > Note that this might catch other bugs :/
>
> I was not able to make it allocate a new frags by running tcpdump while
> reproing the problem. However, I was able to do it with your patch.

I am glad this worked, and looking forward to a v2 of your patch, thanks !

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2] skbuff: skb_segment, Call zero copy functions before using skbuff frags
  2023-08-29  4:18 ` willemjdebruijn
  2023-08-29  6:50   ` Mohamed Khalfella
@ 2023-08-30 23:28   ` Mohamed Khalfella
  2023-08-31  6:58     ` Eric Dumazet
  2023-08-31  8:17   ` [PATCH v3] " Mohamed Khalfella
  2 siblings, 1 reply; 15+ messages in thread
From: Mohamed Khalfella @ 2023-08-30 23:28 UTC (permalink / raw)
  To: willemdebruijn.kernel
  Cc: alexanderduyck, bpf, brouer, davem, dhowells, edumazet, keescook,
	kuba, linux-kernel, mkhalfella, netdev, pabeni, willemb, stable

Commit bf5c25d60861 ("skbuff: in skb_segment, call zerocopy functions
once per nskb") added the call to zero copy functions in skb_segment().
The change introduced a bug in skb_segment() because skb_orphan_frags()
may possibly change the number of fragments or allocate new fragments
altogether leaving nrfrags and frag to point to the old values. This can
cause a panic with stacktrace like the one below.

[  193.894380] BUG: kernel NULL pointer dereference, address: 00000000000000bc
[  193.895273] CPU: 13 PID: 18164 Comm: vh-net-17428 Kdump: loaded Tainted: G           O      5.15.123+ #26
[  193.903919] RIP: 0010:skb_segment+0xb0e/0x12f0
[  194.021892] Call Trace:
[  194.027422]  <TASK>
[  194.072861]  tcp_gso_segment+0x107/0x540
[  194.082031]  inet_gso_segment+0x15c/0x3d0
[  194.090783]  skb_mac_gso_segment+0x9f/0x110
[  194.095016]  __skb_gso_segment+0xc1/0x190
[  194.103131]  netem_enqueue+0x290/0xb10 [sch_netem]
[  194.107071]  dev_qdisc_enqueue+0x16/0x70
[  194.110884]  __dev_queue_xmit+0x63b/0xb30
[  194.121670]  bond_start_xmit+0x159/0x380 [bonding]
[  194.128506]  dev_hard_start_xmit+0xc3/0x1e0
[  194.131787]  __dev_queue_xmit+0x8a0/0xb30
[  194.138225]  macvlan_start_xmit+0x4f/0x100 [macvlan]
[  194.141477]  dev_hard_start_xmit+0xc3/0x1e0
[  194.144622]  sch_direct_xmit+0xe3/0x280
[  194.147748]  __dev_queue_xmit+0x54a/0xb30
[  194.154131]  tap_get_user+0x2a8/0x9c0 [tap]
[  194.157358]  tap_sendmsg+0x52/0x8e0 [tap]
[  194.167049]  handle_tx_zerocopy+0x14e/0x4c0 [vhost_net]
[  194.173631]  handle_tx+0xcd/0xe0 [vhost_net]
[  194.176959]  vhost_worker+0x76/0xb0 [vhost]
[  194.183667]  kthread+0x118/0x140
[  194.190358]  ret_from_fork+0x1f/0x30
[  194.193670]  </TASK>

In this case calling skb_orphan_frags() updated nr_frags leaving nrfrags
local variable in skb_segment() stale. This resulted in the code hitting
i >= nrfrags prematurely and trying to move to next frag_skb using
list_skb pointer, which was NULL, and caused kernel panic. Move the call
to zero copy functions before using frags and nr_frags.

Fixes: bf5c25d60861 ("skbuff: in skb_segment, call zerocopy functions once per nskb")
Signed-off-by: Mohamed Khalfella <mkhalfella@purestorage.com>
Reported-by: Amit Goyal <agoyal@purestorage.com>
Cc: stable@vger.kernel.org
---
 net/core/skbuff.c | 34 ++++++++++++++++++++--------------
 1 file changed, 20 insertions(+), 14 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index a298992060e6..18a33dc2d6af 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -4354,21 +4354,20 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
 	struct sk_buff *segs = NULL;
 	struct sk_buff *tail = NULL;
 	struct sk_buff *list_skb = skb_shinfo(head_skb)->frag_list;
-	skb_frag_t *frag = skb_shinfo(head_skb)->frags;
 	unsigned int mss = skb_shinfo(head_skb)->gso_size;
 	unsigned int doffset = head_skb->data - skb_mac_header(head_skb);
-	struct sk_buff *frag_skb = head_skb;
 	unsigned int offset = doffset;
 	unsigned int tnl_hlen = skb_tnl_header_len(head_skb);
 	unsigned int partial_segs = 0;
 	unsigned int headroom;
 	unsigned int len = head_skb->len;
+	struct sk_buff *frag_skb;
+	skb_frag_t *frag;
 	__be16 proto;
 	bool csum, sg;
-	int nfrags = skb_shinfo(head_skb)->nr_frags;
 	int err = -ENOMEM;
 	int i = 0;
-	int pos;
+	int nfrags, pos;
 
 	if ((skb_shinfo(head_skb)->gso_type & SKB_GSO_DODGY) &&
 	    mss != GSO_BY_FRAGS && mss != skb_headlen(head_skb)) {
@@ -4445,6 +4444,13 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
 	headroom = skb_headroom(head_skb);
 	pos = skb_headlen(head_skb);
 
+	if (skb_orphan_frags(head_skb, GFP_ATOMIC))
+		return ERR_PTR(-ENOMEM);
+
+	nfrags = skb_shinfo(head_skb)->nr_frags;
+	frag = skb_shinfo(head_skb)->frags;
+	frag_skb = head_skb;
+
 	do {
 		struct sk_buff *nskb;
 		skb_frag_t *nskb_frag;
@@ -4465,6 +4471,10 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
 		    (skb_headlen(list_skb) == len || sg)) {
 			BUG_ON(skb_headlen(list_skb) > len);
 
+			nskb = skb_clone(list_skb, GFP_ATOMIC);
+			if (unlikely(!nskb))
+				goto err;
+
 			i = 0;
 			nfrags = skb_shinfo(list_skb)->nr_frags;
 			frag = skb_shinfo(list_skb)->frags;
@@ -4483,12 +4493,8 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
 				frag++;
 			}
 
-			nskb = skb_clone(list_skb, GFP_ATOMIC);
 			list_skb = list_skb->next;
 
-			if (unlikely(!nskb))
-				goto err;
-
 			if (unlikely(pskb_trim(nskb, len))) {
 				kfree_skb(nskb);
 				goto err;
@@ -4564,12 +4570,16 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
 		skb_shinfo(nskb)->flags |= skb_shinfo(head_skb)->flags &
 					   SKBFL_SHARED_FRAG;
 
-		if (skb_orphan_frags(frag_skb, GFP_ATOMIC) ||
-		    skb_zerocopy_clone(nskb, frag_skb, GFP_ATOMIC))
+		if (skb_zerocopy_clone(nskb, list_skb, GFP_ATOMIC))
 			goto err;
 
 		while (pos < offset + len) {
 			if (i >= nfrags) {
+				if (skb_orphan_frags(list_skb, GFP_ATOMIC) ||
+				    skb_zerocopy_clone(nskb, list_skb,
+						       GFP_ATOMIC))
+					goto err;
+
 				i = 0;
 				nfrags = skb_shinfo(list_skb)->nr_frags;
 				frag = skb_shinfo(list_skb)->frags;
@@ -4583,10 +4593,6 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
 					i--;
 					frag--;
 				}
-				if (skb_orphan_frags(frag_skb, GFP_ATOMIC) ||
-				    skb_zerocopy_clone(nskb, frag_skb,
-						       GFP_ATOMIC))
-					goto err;
 
 				list_skb = list_skb->next;
 			}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] skbuff: skb_segment, Call zero copy functions before using skbuff frags
  2023-08-30 23:28   ` [PATCH v2] skbuff: skb_segment, Call zero copy functions before using skbuff frags Mohamed Khalfella
@ 2023-08-31  6:58     ` Eric Dumazet
  2023-08-31  7:29       ` Mohamed Khalfella
  0 siblings, 1 reply; 15+ messages in thread
From: Eric Dumazet @ 2023-08-31  6:58 UTC (permalink / raw)
  To: Mohamed Khalfella
  Cc: willemdebruijn.kernel, alexanderduyck, bpf, brouer, davem,
	dhowells, keescook, kuba, linux-kernel, netdev, pabeni, willemb,
	stable

On Thu, Aug 31, 2023 at 1:28 AM Mohamed Khalfella
<mkhalfella@purestorage.com> wrote:
>
> Commit bf5c25d60861 ("skbuff: in skb_segment, call zerocopy functions
> once per nskb") added the call to zero copy functions in skb_segment().
> The change introduced a bug in skb_segment() because skb_orphan_frags()
> may possibly change the number of fragments or allocate new fragments
> altogether leaving nrfrags and frag to point to the old values. This can
> cause a panic with stacktrace like the one below.
>
> [  193.894380] BUG: kernel NULL pointer dereference, address: 00000000000000bc
> [  193.895273] CPU: 13 PID: 18164 Comm: vh-net-17428 Kdump: loaded Tainted: G           O      5.15.123+ #26
> [  193.903919] RIP: 0010:skb_segment+0xb0e/0x12f0
> [  194.021892] Call Trace:
> [  194.027422]  <TASK>
> [  194.072861]  tcp_gso_segment+0x107/0x540
> [  194.082031]  inet_gso_segment+0x15c/0x3d0
> [  194.090783]  skb_mac_gso_segment+0x9f/0x110
> [  194.095016]  __skb_gso_segment+0xc1/0x190
> [  194.103131]  netem_enqueue+0x290/0xb10 [sch_netem]
> [  194.107071]  dev_qdisc_enqueue+0x16/0x70
> [  194.110884]  __dev_queue_xmit+0x63b/0xb30
> [  194.121670]  bond_start_xmit+0x159/0x380 [bonding]
> [  194.128506]  dev_hard_start_xmit+0xc3/0x1e0
> [  194.131787]  __dev_queue_xmit+0x8a0/0xb30
> [  194.138225]  macvlan_start_xmit+0x4f/0x100 [macvlan]
> [  194.141477]  dev_hard_start_xmit+0xc3/0x1e0
> [  194.144622]  sch_direct_xmit+0xe3/0x280
> [  194.147748]  __dev_queue_xmit+0x54a/0xb30
> [  194.154131]  tap_get_user+0x2a8/0x9c0 [tap]
> [  194.157358]  tap_sendmsg+0x52/0x8e0 [tap]
> [  194.167049]  handle_tx_zerocopy+0x14e/0x4c0 [vhost_net]
> [  194.173631]  handle_tx+0xcd/0xe0 [vhost_net]
> [  194.176959]  vhost_worker+0x76/0xb0 [vhost]
> [  194.183667]  kthread+0x118/0x140
> [  194.190358]  ret_from_fork+0x1f/0x30
> [  194.193670]  </TASK>
>
> In this case calling skb_orphan_frags() updated nr_frags leaving nrfrags
> local variable in skb_segment() stale. This resulted in the code hitting
> i >= nrfrags prematurely and trying to move to next frag_skb using
> list_skb pointer, which was NULL, and caused kernel panic. Move the call
> to zero copy functions before using frags and nr_frags.
>
> Fixes: bf5c25d60861 ("skbuff: in skb_segment, call zerocopy functions once per nskb")
> Signed-off-by: Mohamed Khalfella <mkhalfella@purestorage.com>
> Reported-by: Amit Goyal <agoyal@purestorage.com>
> Cc: stable@vger.kernel.org
> ---
>  net/core/skbuff.c | 34 ++++++++++++++++++++--------------
>  1 file changed, 20 insertions(+), 14 deletions(-)
>
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index a298992060e6..18a33dc2d6af 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -4354,21 +4354,20 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>         struct sk_buff *segs = NULL;
>         struct sk_buff *tail = NULL;
>         struct sk_buff *list_skb = skb_shinfo(head_skb)->frag_list;
> -       skb_frag_t *frag = skb_shinfo(head_skb)->frags;
>         unsigned int mss = skb_shinfo(head_skb)->gso_size;
>         unsigned int doffset = head_skb->data - skb_mac_header(head_skb);
> -       struct sk_buff *frag_skb = head_skb;
>         unsigned int offset = doffset;
>         unsigned int tnl_hlen = skb_tnl_header_len(head_skb);
>         unsigned int partial_segs = 0;
>         unsigned int headroom;
>         unsigned int len = head_skb->len;
> +       struct sk_buff *frag_skb;
> +       skb_frag_t *frag;
>         __be16 proto;
>         bool csum, sg;
> -       int nfrags = skb_shinfo(head_skb)->nr_frags;
>         int err = -ENOMEM;
>         int i = 0;
> -       int pos;
> +       int nfrags, pos;
>
>         if ((skb_shinfo(head_skb)->gso_type & SKB_GSO_DODGY) &&
>             mss != GSO_BY_FRAGS && mss != skb_headlen(head_skb)) {
> @@ -4445,6 +4444,13 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>         headroom = skb_headroom(head_skb);
>         pos = skb_headlen(head_skb);
>
> +       if (skb_orphan_frags(head_skb, GFP_ATOMIC))
> +               return ERR_PTR(-ENOMEM);
> +
> +       nfrags = skb_shinfo(head_skb)->nr_frags;
> +       frag = skb_shinfo(head_skb)->frags;
> +       frag_skb = head_skb;
> +
>         do {
>                 struct sk_buff *nskb;
>                 skb_frag_t *nskb_frag;
> @@ -4465,6 +4471,10 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>                     (skb_headlen(list_skb) == len || sg)) {
>                         BUG_ON(skb_headlen(list_skb) > len);
>
> +                       nskb = skb_clone(list_skb, GFP_ATOMIC);
> +                       if (unlikely(!nskb))
> +                               goto err;
> +

This patch is quite complex to review, so I am asking if this part was
really needed ?
<1>  : You moved here <2> and <3>

If this is not strictly needed, please keep the code as is to ease
code review...

>                         i = 0;
>                         nfrags = skb_shinfo(list_skb)->nr_frags;
>                         frag = skb_shinfo(list_skb)->frags;
> @@ -4483,12 +4493,8 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>                                 frag++;
>                         }
>
> -                       nskb = skb_clone(list_skb, GFP_ATOMIC);

<2>

>                         list_skb = list_skb->next;
>
> -                       if (unlikely(!nskb))
> -                               goto err;
> -

<3>

>                         if (unlikely(pskb_trim(nskb, len))) {
>                                 kfree_skb(nskb);
>                                 goto err;
> @@ -4564,12 +4570,16 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>                 skb_shinfo(nskb)->flags |= skb_shinfo(head_skb)->flags &
>                                            SKBFL_SHARED_FRAG;
>
> -               if (skb_orphan_frags(frag_skb, GFP_ATOMIC) ||
> -                   skb_zerocopy_clone(nskb, frag_skb, GFP_ATOMIC))
> +               if (skb_zerocopy_clone(nskb, list_skb, GFP_ATOMIC))

Why using list_skb here instead of frag_skb ?
Again, I have to look at the whole thing to understand why you did this.

>                         goto err;
>
>                 while (pos < offset + len) {
>                         if (i >= nfrags) {
> +                               if (skb_orphan_frags(list_skb, GFP_ATOMIC) ||
> +                                   skb_zerocopy_clone(nskb, list_skb,
> +                                                      GFP_ATOMIC))
> +                                       goto err;
> +

This part is fine.

>                                 i = 0;
>                                 nfrags = skb_shinfo(list_skb)->nr_frags;
>                                 frag = skb_shinfo(list_skb)->frags;
> @@ -4583,10 +4593,6 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>                                         i--;
>                                         frag--;
>                                 }
> -                               if (skb_orphan_frags(frag_skb, GFP_ATOMIC) ||
> -                                   skb_zerocopy_clone(nskb, frag_skb,
> -                                                      GFP_ATOMIC))
> -                                       goto err;
>
>                                 list_skb = list_skb->next;
>                         }
> --
> 2.17.1
>

Thanks.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] skbuff: skb_segment, Call zero copy functions before using skbuff frags
  2023-08-31  6:58     ` Eric Dumazet
@ 2023-08-31  7:29       ` Mohamed Khalfella
  2023-08-31  7:43         ` Eric Dumazet
  0 siblings, 1 reply; 15+ messages in thread
From: Mohamed Khalfella @ 2023-08-31  7:29 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: willemdebruijn.kernel, alexanderduyck, bpf, brouer, davem,
	dhowells, keescook, kuba, linux-kernel, netdev, pabeni, willemb,
	stable

On 2023-08-31 08:58:51 +0200, Eric Dumazet wrote:
> On Thu, Aug 31, 2023 at 1:28 AM Mohamed Khalfella
> <mkhalfella@purestorage.com> wrote:
> >         do {
> >                 struct sk_buff *nskb;
> >                 skb_frag_t *nskb_frag;
> > @@ -4465,6 +4471,10 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
> >                     (skb_headlen(list_skb) == len || sg)) {
> >                         BUG_ON(skb_headlen(list_skb) > len);
> >
> > +                       nskb = skb_clone(list_skb, GFP_ATOMIC);
> > +                       if (unlikely(!nskb))
> > +                               goto err;
> > +
> 
> This patch is quite complex to review, so I am asking if this part was
> really needed ?

Unfortunately the patch is complex because I try to avoid calling
skb_orphan_frags() in the middle of processing these frags. Otherwise
it would be much harder to implement because as reallocated frags do not
map 1:1 with existing frags as Willem mentioned.

> <1>  : You moved here <2> and <3>

<2> was moved here because skb_clone() calls skb_orphan_frags(). By
moving this up we do not need to call skb_orphan_frags() for list_skb
and we can start to use nr_frags and frags without worrying their value
is going to change.

<3> was moved here because <2> was moved here. Fail fast if we can not
clone list_skb.

> 
> If this is not strictly needed, please keep the code as is to ease
> code review...
> 
> >                         i = 0;
> >                         nfrags = skb_shinfo(list_skb)->nr_frags;
> >                         frag = skb_shinfo(list_skb)->frags;
> > @@ -4483,12 +4493,8 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
> >                                 frag++;
> >                         }
> >
> > -                       nskb = skb_clone(list_skb, GFP_ATOMIC);
> 
> <2>
> 
> >                         list_skb = list_skb->next;
> >
> > -                       if (unlikely(!nskb))
> > -                               goto err;
> > -
> 
> <3>
> 
> >                         if (unlikely(pskb_trim(nskb, len))) {
> >                                 kfree_skb(nskb);
> >                                 goto err;
> > @@ -4564,12 +4570,16 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
> >                 skb_shinfo(nskb)->flags |= skb_shinfo(head_skb)->flags &
> >                                            SKBFL_SHARED_FRAG;
> >
> > -               if (skb_orphan_frags(frag_skb, GFP_ATOMIC) ||
> > -                   skb_zerocopy_clone(nskb, frag_skb, GFP_ATOMIC))
> > +               if (skb_zerocopy_clone(nskb, list_skb, GFP_ATOMIC))
> 
> Why using list_skb here instead of frag_skb ?
> Again, I have to look at the whole thing to understand why you did this.

Oops, this is a mistake. It should be frag_skb. Will fix it run the test
one more time and post v3.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] skbuff: skb_segment, Call zero copy functions before using skbuff frags
  2023-08-31  7:29       ` Mohamed Khalfella
@ 2023-08-31  7:43         ` Eric Dumazet
  0 siblings, 0 replies; 15+ messages in thread
From: Eric Dumazet @ 2023-08-31  7:43 UTC (permalink / raw)
  To: Mohamed Khalfella
  Cc: willemdebruijn.kernel, alexanderduyck, bpf, brouer, davem,
	dhowells, keescook, kuba, linux-kernel, netdev, pabeni, willemb,
	stable

On Thu, Aug 31, 2023 at 9:30 AM Mohamed Khalfella
<mkhalfella@purestorage.com> wrote:
>
> On 2023-08-31 08:58:51 +0200, Eric Dumazet wrote:
> > On Thu, Aug 31, 2023 at 1:28 AM Mohamed Khalfella
> > <mkhalfella@purestorage.com> wrote:
> > >         do {
> > >                 struct sk_buff *nskb;
> > >                 skb_frag_t *nskb_frag;
> > > @@ -4465,6 +4471,10 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
> > >                     (skb_headlen(list_skb) == len || sg)) {
> > >                         BUG_ON(skb_headlen(list_skb) > len);
> > >
> > > +                       nskb = skb_clone(list_skb, GFP_ATOMIC);
> > > +                       if (unlikely(!nskb))
> > > +                               goto err;
> > > +
> >
> > This patch is quite complex to review, so I am asking if this part was
> > really needed ?
>
> Unfortunately the patch is complex because I try to avoid calling
> skb_orphan_frags() in the middle of processing these frags. Otherwise
> it would be much harder to implement because as reallocated frags do not
> map 1:1 with existing frags as Willem mentioned.
>
> > <1>  : You moved here <2> and <3>
>
> <2> was moved here because skb_clone() calls skb_orphan_frags(). By

Oh right, I think we should amend skb_clone() documentation, it is
slightly wrong.

( I will take care of this change)

> moving this up we do not need to call skb_orphan_frags() for list_skb
> and we can start to use nr_frags and frags without worrying their value
> is going to change.
>
> <3> was moved here because <2> was moved here. Fail fast if we can not
> clone list_skb.
>
> >
> > If this is not strictly needed, please keep the code as is to ease
> > code review...
> >
> > >                         i = 0;
> > >                         nfrags = skb_shinfo(list_skb)->nr_frags;
> > >                         frag = skb_shinfo(list_skb)->frags;
> > > @@ -4483,12 +4493,8 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
> > >                                 frag++;
> > >                         }
> > >
> > > -                       nskb = skb_clone(list_skb, GFP_ATOMIC);
> >
> > <2>
> >
> > >                         list_skb = list_skb->next;
> > >
> > > -                       if (unlikely(!nskb))
> > > -                               goto err;
> > > -
> >
> > <3>
> >
> > >                         if (unlikely(pskb_trim(nskb, len))) {
> > >                                 kfree_skb(nskb);
> > >                                 goto err;
> > > @@ -4564,12 +4570,16 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
> > >                 skb_shinfo(nskb)->flags |= skb_shinfo(head_skb)->flags &
> > >                                            SKBFL_SHARED_FRAG;
> > >
> > > -               if (skb_orphan_frags(frag_skb, GFP_ATOMIC) ||
> > > -                   skb_zerocopy_clone(nskb, frag_skb, GFP_ATOMIC))
> > > +               if (skb_zerocopy_clone(nskb, list_skb, GFP_ATOMIC))
> >
> > Why using list_skb here instead of frag_skb ?
> > Again, I have to look at the whole thing to understand why you did this.
>
> Oops, this is a mistake. It should be frag_skb. Will fix it run the test
> one more time and post v3.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v3] skbuff: skb_segment, Call zero copy functions before using skbuff frags
  2023-08-29  4:18 ` willemjdebruijn
  2023-08-29  6:50   ` Mohamed Khalfella
  2023-08-30 23:28   ` [PATCH v2] skbuff: skb_segment, Call zero copy functions before using skbuff frags Mohamed Khalfella
@ 2023-08-31  8:17   ` Mohamed Khalfella
  2023-08-31  8:47     ` Eric Dumazet
  2023-09-01  7:10     ` patchwork-bot+netdevbpf
  2 siblings, 2 replies; 15+ messages in thread
From: Mohamed Khalfella @ 2023-08-31  8:17 UTC (permalink / raw)
  To: willemdebruijn.kernel
  Cc: alexanderduyck, bpf, brouer, davem, dhowells, edumazet, keescook,
	kuba, linux-kernel, mkhalfella, netdev, pabeni, willemb, stable

Commit bf5c25d60861 ("skbuff: in skb_segment, call zerocopy functions
once per nskb") added the call to zero copy functions in skb_segment().
The change introduced a bug in skb_segment() because skb_orphan_frags()
may possibly change the number of fragments or allocate new fragments
altogether leaving nrfrags and frag to point to the old values. This can
cause a panic with stacktrace like the one below.

[  193.894380] BUG: kernel NULL pointer dereference, address: 00000000000000bc
[  193.895273] CPU: 13 PID: 18164 Comm: vh-net-17428 Kdump: loaded Tainted: G           O      5.15.123+ #26
[  193.903919] RIP: 0010:skb_segment+0xb0e/0x12f0
[  194.021892] Call Trace:
[  194.027422]  <TASK>
[  194.072861]  tcp_gso_segment+0x107/0x540
[  194.082031]  inet_gso_segment+0x15c/0x3d0
[  194.090783]  skb_mac_gso_segment+0x9f/0x110
[  194.095016]  __skb_gso_segment+0xc1/0x190
[  194.103131]  netem_enqueue+0x290/0xb10 [sch_netem]
[  194.107071]  dev_qdisc_enqueue+0x16/0x70
[  194.110884]  __dev_queue_xmit+0x63b/0xb30
[  194.121670]  bond_start_xmit+0x159/0x380 [bonding]
[  194.128506]  dev_hard_start_xmit+0xc3/0x1e0
[  194.131787]  __dev_queue_xmit+0x8a0/0xb30
[  194.138225]  macvlan_start_xmit+0x4f/0x100 [macvlan]
[  194.141477]  dev_hard_start_xmit+0xc3/0x1e0
[  194.144622]  sch_direct_xmit+0xe3/0x280
[  194.147748]  __dev_queue_xmit+0x54a/0xb30
[  194.154131]  tap_get_user+0x2a8/0x9c0 [tap]
[  194.157358]  tap_sendmsg+0x52/0x8e0 [tap]
[  194.167049]  handle_tx_zerocopy+0x14e/0x4c0 [vhost_net]
[  194.173631]  handle_tx+0xcd/0xe0 [vhost_net]
[  194.176959]  vhost_worker+0x76/0xb0 [vhost]
[  194.183667]  kthread+0x118/0x140
[  194.190358]  ret_from_fork+0x1f/0x30
[  194.193670]  </TASK>

In this case calling skb_orphan_frags() updated nr_frags leaving nrfrags
local variable in skb_segment() stale. This resulted in the code hitting
i >= nrfrags prematurely and trying to move to next frag_skb using
list_skb pointer, which was NULL, and caused kernel panic. Move the call
to zero copy functions before using frags and nr_frags.

Fixes: bf5c25d60861 ("skbuff: in skb_segment, call zerocopy functions once per nskb")
Signed-off-by: Mohamed Khalfella <mkhalfella@purestorage.com>
Reported-by: Amit Goyal <agoyal@purestorage.com>
Cc: stable@vger.kernel.org
---
 net/core/skbuff.c | 34 ++++++++++++++++++++--------------
 1 file changed, 20 insertions(+), 14 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index a298992060e6..74a8829a6b59 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -4354,21 +4354,20 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
 	struct sk_buff *segs = NULL;
 	struct sk_buff *tail = NULL;
 	struct sk_buff *list_skb = skb_shinfo(head_skb)->frag_list;
-	skb_frag_t *frag = skb_shinfo(head_skb)->frags;
 	unsigned int mss = skb_shinfo(head_skb)->gso_size;
 	unsigned int doffset = head_skb->data - skb_mac_header(head_skb);
-	struct sk_buff *frag_skb = head_skb;
 	unsigned int offset = doffset;
 	unsigned int tnl_hlen = skb_tnl_header_len(head_skb);
 	unsigned int partial_segs = 0;
 	unsigned int headroom;
 	unsigned int len = head_skb->len;
+	struct sk_buff *frag_skb;
+	skb_frag_t *frag;
 	__be16 proto;
 	bool csum, sg;
-	int nfrags = skb_shinfo(head_skb)->nr_frags;
 	int err = -ENOMEM;
 	int i = 0;
-	int pos;
+	int nfrags, pos;
 
 	if ((skb_shinfo(head_skb)->gso_type & SKB_GSO_DODGY) &&
 	    mss != GSO_BY_FRAGS && mss != skb_headlen(head_skb)) {
@@ -4445,6 +4444,13 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
 	headroom = skb_headroom(head_skb);
 	pos = skb_headlen(head_skb);
 
+	if (skb_orphan_frags(head_skb, GFP_ATOMIC))
+		return ERR_PTR(-ENOMEM);
+
+	nfrags = skb_shinfo(head_skb)->nr_frags;
+	frag = skb_shinfo(head_skb)->frags;
+	frag_skb = head_skb;
+
 	do {
 		struct sk_buff *nskb;
 		skb_frag_t *nskb_frag;
@@ -4465,6 +4471,10 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
 		    (skb_headlen(list_skb) == len || sg)) {
 			BUG_ON(skb_headlen(list_skb) > len);
 
+			nskb = skb_clone(list_skb, GFP_ATOMIC);
+			if (unlikely(!nskb))
+				goto err;
+
 			i = 0;
 			nfrags = skb_shinfo(list_skb)->nr_frags;
 			frag = skb_shinfo(list_skb)->frags;
@@ -4483,12 +4493,8 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
 				frag++;
 			}
 
-			nskb = skb_clone(list_skb, GFP_ATOMIC);
 			list_skb = list_skb->next;
 
-			if (unlikely(!nskb))
-				goto err;
-
 			if (unlikely(pskb_trim(nskb, len))) {
 				kfree_skb(nskb);
 				goto err;
@@ -4564,12 +4570,16 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
 		skb_shinfo(nskb)->flags |= skb_shinfo(head_skb)->flags &
 					   SKBFL_SHARED_FRAG;
 
-		if (skb_orphan_frags(frag_skb, GFP_ATOMIC) ||
-		    skb_zerocopy_clone(nskb, frag_skb, GFP_ATOMIC))
+		if (skb_zerocopy_clone(nskb, frag_skb, GFP_ATOMIC))
 			goto err;
 
 		while (pos < offset + len) {
 			if (i >= nfrags) {
+				if (skb_orphan_frags(list_skb, GFP_ATOMIC) ||
+				    skb_zerocopy_clone(nskb, list_skb,
+						       GFP_ATOMIC))
+					goto err;
+
 				i = 0;
 				nfrags = skb_shinfo(list_skb)->nr_frags;
 				frag = skb_shinfo(list_skb)->frags;
@@ -4583,10 +4593,6 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
 					i--;
 					frag--;
 				}
-				if (skb_orphan_frags(frag_skb, GFP_ATOMIC) ||
-				    skb_zerocopy_clone(nskb, frag_skb,
-						       GFP_ATOMIC))
-					goto err;
 
 				list_skb = list_skb->next;
 			}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v3] skbuff: skb_segment, Call zero copy functions before using skbuff frags
  2023-08-31  8:17   ` [PATCH v3] " Mohamed Khalfella
@ 2023-08-31  8:47     ` Eric Dumazet
  2023-09-01  7:10     ` patchwork-bot+netdevbpf
  1 sibling, 0 replies; 15+ messages in thread
From: Eric Dumazet @ 2023-08-31  8:47 UTC (permalink / raw)
  To: Mohamed Khalfella
  Cc: willemdebruijn.kernel, alexanderduyck, bpf, brouer, davem,
	dhowells, keescook, kuba, linux-kernel, netdev, pabeni, willemb,
	stable

On Thu, Aug 31, 2023 at 10:17 AM Mohamed Khalfella
<mkhalfella@purestorage.com> wrote:
>
> Commit bf5c25d60861 ("skbuff: in skb_segment, call zerocopy functions
> once per nskb") added the call to zero copy functions in skb_segment().
> The change introduced a bug in skb_segment() because skb_orphan_frags()
> may possibly change the number of fragments or allocate new fragments
> altogether leaving nrfrags and frag to point to the old values. This can
> cause a panic with stacktrace like the one below.
>
>

> In this case calling skb_orphan_frags() updated nr_frags leaving nrfrags
> local variable in skb_segment() stale. This resulted in the code hitting
> i >= nrfrags prematurely and trying to move to next frag_skb using
> list_skb pointer, which was NULL, and caused kernel panic. Move the call
> to zero copy functions before using frags and nr_frags.
>
> Fixes: bf5c25d60861 ("skbuff: in skb_segment, call zerocopy functions once per nskb")
> Signed-off-by: Mohamed Khalfella <mkhalfella@purestorage.com>
> Reported-by: Amit Goyal <agoyal@purestorage.com>
> Cc: stable@vger.kernel.org
> ---

Reviewed-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3] skbuff: skb_segment, Call zero copy functions before using skbuff frags
  2023-08-31  8:17   ` [PATCH v3] " Mohamed Khalfella
  2023-08-31  8:47     ` Eric Dumazet
@ 2023-09-01  7:10     ` patchwork-bot+netdevbpf
  1 sibling, 0 replies; 15+ messages in thread
From: patchwork-bot+netdevbpf @ 2023-09-01  7:10 UTC (permalink / raw)
  To: Mohamed Khalfella
  Cc: willemdebruijn.kernel, alexanderduyck, bpf, brouer, davem,
	dhowells, edumazet, keescook, kuba, linux-kernel, netdev, pabeni,
	willemb, stable

Hello:

This patch was applied to netdev/net.git (main)
by David S. Miller <davem@davemloft.net>:

On Thu, 31 Aug 2023 02:17:02 -0600 you wrote:
> Commit bf5c25d60861 ("skbuff: in skb_segment, call zerocopy functions
> once per nskb") added the call to zero copy functions in skb_segment().
> The change introduced a bug in skb_segment() because skb_orphan_frags()
> may possibly change the number of fragments or allocate new fragments
> altogether leaving nrfrags and frag to point to the old values. This can
> cause a panic with stacktrace like the one below.
> 
> [...]

Here is the summary with links:
  - [v3] skbuff: skb_segment, Call zero copy functions before using skbuff frags
    https://git.kernel.org/netdev/net/c/2ea35288c83b

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2023-09-01  7:10 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-28 23:32 [PATCH] skbuff: skb_segment, Update nfrags after calling zero copy functions Mohamed Khalfella
2023-08-29  4:18 ` willemjdebruijn
2023-08-29  6:50   ` Mohamed Khalfella
2023-08-29  8:07     ` Eric Dumazet
2023-08-29  9:31       ` Mohamed Khalfella
2023-08-29 10:09         ` Eric Dumazet
2023-08-29 22:24           ` Mohamed Khalfella
2023-08-30  3:44             ` Eric Dumazet
2023-08-30 23:28   ` [PATCH v2] skbuff: skb_segment, Call zero copy functions before using skbuff frags Mohamed Khalfella
2023-08-31  6:58     ` Eric Dumazet
2023-08-31  7:29       ` Mohamed Khalfella
2023-08-31  7:43         ` Eric Dumazet
2023-08-31  8:17   ` [PATCH v3] " Mohamed Khalfella
2023-08-31  8:47     ` Eric Dumazet
2023-09-01  7:10     ` patchwork-bot+netdevbpf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.