* [PATCH net] net: fix use-after-free when UDP GRO with shared fraglist [not found] <CGME20210104085750epcas2p1a5b22559d87df61ef3c8215ae0b470b5@epcas2p1.samsung.com> @ 2021-01-04 8:46 ` Dongseok Yi 2021-01-04 21:03 ` Willem de Bruijn [not found] ` <CGME20210107005028epcas2p35dfa745fd92e31400024874f54243556@epcas2p3.samsung.com> 0 siblings, 2 replies; 21+ messages in thread From: Dongseok Yi @ 2021-01-04 8:46 UTC (permalink / raw) To: David S. Miller Cc: Jakub Kicinski, Miaohe Lin, Willem de Bruijn, Paolo Abeni, Florian Westphal, Al Viro, Guillaume Nault, Yunsheng Lin, Steffen Klassert, Yadu Kishore, Marco Elver, netdev, linux-kernel, namkyu78.kim, Dongseok Yi skbs in frag_list could be shared by pskb_expand_head() from BPF. While tcpdump, sk_receive_queue of PF_PACKET has the original frag_list. But the same frag_list is queued to PF_INET (or PF_INET6) as the fraglist chain made by skb_segment_list(). If the new skb (not frag_list) is queued to one of the sk_receive_queue, multiple ptypes can see this. The skb could be released by ptypes and it causes use-after-free. [ 4443.426215] ------------[ cut here ]------------ [ 4443.426222] refcount_t: underflow; use-after-free. [ 4443.426291] WARNING: CPU: 7 PID: 28161 at lib/refcount.c:190 refcount_dec_and_test_checked+0xa4/0xc8 [ 4443.426726] pstate: 60400005 (nZCv daif +PAN -UAO) [ 4443.426732] pc : refcount_dec_and_test_checked+0xa4/0xc8 [ 4443.426737] lr : refcount_dec_and_test_checked+0xa0/0xc8 [ 4443.426808] Call trace: [ 4443.426813] refcount_dec_and_test_checked+0xa4/0xc8 [ 4443.426823] skb_release_data+0x144/0x264 [ 4443.426828] kfree_skb+0x58/0xc4 [ 4443.426832] skb_queue_purge+0x64/0x9c [ 4443.426844] packet_set_ring+0x5f0/0x820 [ 4443.426849] packet_setsockopt+0x5a4/0xcd0 [ 4443.426853] __sys_setsockopt+0x188/0x278 [ 4443.426858] __arm64_sys_setsockopt+0x28/0x38 [ 4443.426869] el0_svc_common+0xf0/0x1d0 [ 4443.426873] el0_svc_handler+0x74/0x98 [ 4443.426880] el0_svc+0x8/0xc Fixes: 3a1296a38d0c (net: Support GRO/GSO fraglist chaining.) Signed-off-by: Dongseok Yi <dseok.yi@samsung.com> --- net/core/skbuff.c | 20 +++++++++++++++++++- 1 file changed, 19 insertions(+), 1 deletion(-) diff --git a/net/core/skbuff.c b/net/core/skbuff.c index f62cae3..1dcbda8 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -3655,7 +3655,8 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb, unsigned int delta_truesize = 0; unsigned int delta_len = 0; struct sk_buff *tail = NULL; - struct sk_buff *nskb; + struct sk_buff *nskb, *tmp; + int err; skb_push(skb, -skb_network_offset(skb) + offset); @@ -3665,11 +3666,28 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb, nskb = list_skb; list_skb = list_skb->next; + err = 0; + if (skb_shared(nskb)) { + tmp = skb_clone(nskb, GFP_ATOMIC); + if (tmp) { + kfree_skb(nskb); + nskb = tmp; + err = skb_unclone(nskb, GFP_ATOMIC); + } else { + err = -ENOMEM; + } + } + if (!tail) skb->next = nskb; else tail->next = nskb; + if (unlikely(err)) { + nskb->next = list_skb; + goto err_linearize; + } + tail = nskb; delta_len += nskb->len; -- 2.7.4 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH net] net: fix use-after-free when UDP GRO with shared fraglist 2021-01-04 8:46 ` [PATCH net] net: fix use-after-free when UDP GRO with shared fraglist Dongseok Yi @ 2021-01-04 21:03 ` Willem de Bruijn 2021-01-06 1:29 ` Dongseok Yi [not found] ` <CGME20210107005028epcas2p35dfa745fd92e31400024874f54243556@epcas2p3.samsung.com> 1 sibling, 1 reply; 21+ messages in thread From: Willem de Bruijn @ 2021-01-04 21:03 UTC (permalink / raw) To: Dongseok Yi Cc: David S. Miller, Jakub Kicinski, Miaohe Lin, Willem de Bruijn, Paolo Abeni, Florian Westphal, Al Viro, Guillaume Nault, Yunsheng Lin, Steffen Klassert, Yadu Kishore, Marco Elver, Network Development, LKML, namkyu78.kim On Mon, Jan 4, 2021 at 4:00 AM Dongseok Yi <dseok.yi@samsung.com> wrote: > > skbs in frag_list could be shared by pskb_expand_head() from BPF. Can you elaborate on the BPF connection? > While tcpdump, sk_receive_queue of PF_PACKET has the original frag_list. > But the same frag_list is queued to PF_INET (or PF_INET6) as the fraglist > chain made by skb_segment_list(). > > If the new skb (not frag_list) is queued to one of the sk_receive_queue, > multiple ptypes can see this. The skb could be released by ptypes and > it causes use-after-free. If I understand correctly, a udp-gro-list skb makes it up the receive path with one or more active packet sockets. The packet socket will call skb_clone after accepting the filter. This replaces the head_skb, but shares the skb_shinfo and thus frag_list. udp_rcv_segment later converts the udp-gro-list skb to a list of regular packets to pass these one-by-one to udp_queue_rcv_one_skb. Now all the frags are fully fledged packets, with headers pushed before the payload. This does not change their refcount anymore than the skb_clone in pf_packet did. This should be 1. Eventually udp_recvmsg will call skb_consume_udp on each packet. The packet socket eventually also frees its cloned head_skb, which triggers kfree_skb_list(shinfo->frag_list) kfree_skb skb_unref refcount_dec_and_test(&skb->users) > > [ 4443.426215] ------------[ cut here ]------------ > [ 4443.426222] refcount_t: underflow; use-after-free. > [ 4443.426291] WARNING: CPU: 7 PID: 28161 at lib/refcount.c:190 > refcount_dec_and_test_checked+0xa4/0xc8 > [ 4443.426726] pstate: 60400005 (nZCv daif +PAN -UAO) > [ 4443.426732] pc : refcount_dec_and_test_checked+0xa4/0xc8 > [ 4443.426737] lr : refcount_dec_and_test_checked+0xa0/0xc8 > [ 4443.426808] Call trace: > [ 4443.426813] refcount_dec_and_test_checked+0xa4/0xc8 > [ 4443.426823] skb_release_data+0x144/0x264 > [ 4443.426828] kfree_skb+0x58/0xc4 > [ 4443.426832] skb_queue_purge+0x64/0x9c > [ 4443.426844] packet_set_ring+0x5f0/0x820 > [ 4443.426849] packet_setsockopt+0x5a4/0xcd0 > [ 4443.426853] __sys_setsockopt+0x188/0x278 > [ 4443.426858] __arm64_sys_setsockopt+0x28/0x38 > [ 4443.426869] el0_svc_common+0xf0/0x1d0 > [ 4443.426873] el0_svc_handler+0x74/0x98 > [ 4443.426880] el0_svc+0x8/0xc > > Fixes: 3a1296a38d0c (net: Support GRO/GSO fraglist chaining.) > Signed-off-by: Dongseok Yi <dseok.yi@samsung.com> > --- > net/core/skbuff.c | 20 +++++++++++++++++++- > 1 file changed, 19 insertions(+), 1 deletion(-) > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > index f62cae3..1dcbda8 100644 > --- a/net/core/skbuff.c > +++ b/net/core/skbuff.c > @@ -3655,7 +3655,8 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb, > unsigned int delta_truesize = 0; > unsigned int delta_len = 0; > struct sk_buff *tail = NULL; > - struct sk_buff *nskb; > + struct sk_buff *nskb, *tmp; > + int err; > > skb_push(skb, -skb_network_offset(skb) + offset); > > @@ -3665,11 +3666,28 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb, > nskb = list_skb; > list_skb = list_skb->next; > > + err = 0; > + if (skb_shared(nskb)) { I must be missing something still. This does not square with my understanding that the two sockets are operating on clones, with each frag_list skb having skb->users == 1. Unless the packet socket patch previously also triggered an skb_unclone/pskb_expand_head, as that call skb_clone_fraglist, which calls skb_get on each frag_list skb. > + tmp = skb_clone(nskb, GFP_ATOMIC); > + if (tmp) { > + kfree_skb(nskb); > + nskb = tmp; > + err = skb_unclone(nskb, GFP_ATOMIC); > + } else { > + err = -ENOMEM; > + } > + } > + > if (!tail) > skb->next = nskb; > else > tail->next = nskb; > > + if (unlikely(err)) { > + nskb->next = list_skb; > + goto err_linearize; > + } > + > tail = nskb; > > delta_len += nskb->len; > -- > 2.7.4 > ^ permalink raw reply [flat|nested] 21+ messages in thread
* RE: [PATCH net] net: fix use-after-free when UDP GRO with shared fraglist 2021-01-04 21:03 ` Willem de Bruijn @ 2021-01-06 1:29 ` Dongseok Yi 2021-01-06 3:07 ` Willem de Bruijn 0 siblings, 1 reply; 21+ messages in thread From: Dongseok Yi @ 2021-01-06 1:29 UTC (permalink / raw) To: 'Willem de Bruijn' Cc: 'David S. Miller', 'Jakub Kicinski', 'Miaohe Lin', 'Willem de Bruijn', 'Paolo Abeni', 'Florian Westphal', 'Al Viro', 'Guillaume Nault', 'Yunsheng Lin', 'Steffen Klassert', 'Yadu Kishore', 'Marco Elver', 'Network Development', 'LKML', namkyu78.kim On 2021-01-05 06:03, Willem de Bruijn wrote: > > On Mon, Jan 4, 2021 at 4:00 AM Dongseok Yi <dseok.yi@samsung.com> wrote: > > > > skbs in frag_list could be shared by pskb_expand_head() from BPF. > > Can you elaborate on the BPF connection? With the following registered ptypes, /proc/net # cat ptype Type Device Function ALL tpacket_rcv 0800 ip_rcv.cfi_jt 0011 llc_rcv.cfi_jt 0004 llc_rcv.cfi_jt 0806 arp_rcv 86dd ipv6_rcv.cfi_jt BPF checks skb_ensure_writable between tpacket_rcv and ip_rcv (or ipv6_rcv). And it calls pskb_expand_head. [ 132.051228] pskb_expand_head+0x360/0x378 [ 132.051237] skb_ensure_writable+0xa0/0xc4 [ 132.051249] bpf_skb_pull_data+0x28/0x60 [ 132.051262] bpf_prog_331d69c77ea5e964_schedcls_ingres+0x5f4/0x1000 [ 132.051273] cls_bpf_classify+0x254/0x348 [ 132.051284] tcf_classify+0xa4/0x180 [ 132.051294] __netif_receive_skb_core+0x590/0xd28 [ 132.051303] __netif_receive_skb+0x50/0x17c [ 132.051312] process_backlog+0x15c/0x1b8 > > > While tcpdump, sk_receive_queue of PF_PACKET has the original frag_list. > > But the same frag_list is queued to PF_INET (or PF_INET6) as the fraglist > > chain made by skb_segment_list(). > > > > If the new skb (not frag_list) is queued to one of the sk_receive_queue, > > multiple ptypes can see this. The skb could be released by ptypes and > > it causes use-after-free. > > If I understand correctly, a udp-gro-list skb makes it up the receive > path with one or more active packet sockets. > > The packet socket will call skb_clone after accepting the filter. This > replaces the head_skb, but shares the skb_shinfo and thus frag_list. > > udp_rcv_segment later converts the udp-gro-list skb to a list of > regular packets to pass these one-by-one to udp_queue_rcv_one_skb. > Now all the frags are fully fledged packets, with headers pushed > before the payload. This does not change their refcount anymore than > the skb_clone in pf_packet did. This should be 1. > > Eventually udp_recvmsg will call skb_consume_udp on each packet. > > The packet socket eventually also frees its cloned head_skb, which triggers > > kfree_skb_list(shinfo->frag_list) > kfree_skb > skb_unref > refcount_dec_and_test(&skb->users) Every your understanding is right, but > > > > > [ 4443.426215] ------------[ cut here ]------------ > > [ 4443.426222] refcount_t: underflow; use-after-free. > > [ 4443.426291] WARNING: CPU: 7 PID: 28161 at lib/refcount.c:190 > > refcount_dec_and_test_checked+0xa4/0xc8 > > [ 4443.426726] pstate: 60400005 (nZCv daif +PAN -UAO) > > [ 4443.426732] pc : refcount_dec_and_test_checked+0xa4/0xc8 > > [ 4443.426737] lr : refcount_dec_and_test_checked+0xa0/0xc8 > > [ 4443.426808] Call trace: > > [ 4443.426813] refcount_dec_and_test_checked+0xa4/0xc8 > > [ 4443.426823] skb_release_data+0x144/0x264 > > [ 4443.426828] kfree_skb+0x58/0xc4 > > [ 4443.426832] skb_queue_purge+0x64/0x9c > > [ 4443.426844] packet_set_ring+0x5f0/0x820 > > [ 4443.426849] packet_setsockopt+0x5a4/0xcd0 > > [ 4443.426853] __sys_setsockopt+0x188/0x278 > > [ 4443.426858] __arm64_sys_setsockopt+0x28/0x38 > > [ 4443.426869] el0_svc_common+0xf0/0x1d0 > > [ 4443.426873] el0_svc_handler+0x74/0x98 > > [ 4443.426880] el0_svc+0x8/0xc > > > > Fixes: 3a1296a38d0c (net: Support GRO/GSO fraglist chaining.) > > Signed-off-by: Dongseok Yi <dseok.yi@samsung.com> > > --- > > net/core/skbuff.c | 20 +++++++++++++++++++- > > 1 file changed, 19 insertions(+), 1 deletion(-) > > > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > > index f62cae3..1dcbda8 100644 > > --- a/net/core/skbuff.c > > +++ b/net/core/skbuff.c > > @@ -3655,7 +3655,8 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb, > > unsigned int delta_truesize = 0; > > unsigned int delta_len = 0; > > struct sk_buff *tail = NULL; > > - struct sk_buff *nskb; > > + struct sk_buff *nskb, *tmp; > > + int err; > > > > skb_push(skb, -skb_network_offset(skb) + offset); > > > > @@ -3665,11 +3666,28 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb, > > nskb = list_skb; > > list_skb = list_skb->next; > > > > + err = 0; > > + if (skb_shared(nskb)) { > > I must be missing something still. This does not square with my > understanding that the two sockets are operating on clones, with each > frag_list skb having skb->users == 1. > > Unless the packet socket patch previously also triggered an > skb_unclone/pskb_expand_head, as that call skb_clone_fraglist, which > calls skb_get on each frag_list skb. A cloned skb after tpacket_rcv cannot go through skb_ensure_writable with the original shinfo. pskb_expand_head reallocates the shinfo of the skb and call skb_clone_fraglist. skb_release_data in pskb_expand_head could not reduce skb->users of the each frag_list skb if skb_shinfo(skb)->dataref == 2. After the reallocation, skb_shinfo(skb)->dataref == 1 but each frag_list skb could have skb->users == 2. > > > > + tmp = skb_clone(nskb, GFP_ATOMIC); > > + if (tmp) { > > + kfree_skb(nskb); > > + nskb = tmp; > > + err = skb_unclone(nskb, GFP_ATOMIC); > > + } else { > > + err = -ENOMEM; > > + } > > + } > > + > > if (!tail) > > skb->next = nskb; > > else > > tail->next = nskb; > > > > + if (unlikely(err)) { > > + nskb->next = list_skb; > > + goto err_linearize; > > + } > > + > > tail = nskb; > > > > delta_len += nskb->len; > > -- > > 2.7.4 > > ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH net] net: fix use-after-free when UDP GRO with shared fraglist 2021-01-06 1:29 ` Dongseok Yi @ 2021-01-06 3:07 ` Willem de Bruijn 2021-01-06 3:32 ` Dongseok Yi 0 siblings, 1 reply; 21+ messages in thread From: Willem de Bruijn @ 2021-01-06 3:07 UTC (permalink / raw) To: Dongseok Yi Cc: David S. Miller, Jakub Kicinski, Miaohe Lin, Willem de Bruijn, Paolo Abeni, Florian Westphal, Al Viro, Guillaume Nault, Yunsheng Lin, Steffen Klassert, Yadu Kishore, Marco Elver, Network Development, LKML, namkyu78.kim On Tue, Jan 5, 2021 at 8:29 PM Dongseok Yi <dseok.yi@samsung.com> wrote: > > On 2021-01-05 06:03, Willem de Bruijn wrote: > > > > On Mon, Jan 4, 2021 at 4:00 AM Dongseok Yi <dseok.yi@samsung.com> wrote: > > > > > > skbs in frag_list could be shared by pskb_expand_head() from BPF. > > > > Can you elaborate on the BPF connection? > > With the following registered ptypes, > > /proc/net # cat ptype > Type Device Function > ALL tpacket_rcv > 0800 ip_rcv.cfi_jt > 0011 llc_rcv.cfi_jt > 0004 llc_rcv.cfi_jt > 0806 arp_rcv > 86dd ipv6_rcv.cfi_jt > > BPF checks skb_ensure_writable between tpacket_rcv and ip_rcv > (or ipv6_rcv). And it calls pskb_expand_head. > > [ 132.051228] pskb_expand_head+0x360/0x378 > [ 132.051237] skb_ensure_writable+0xa0/0xc4 > [ 132.051249] bpf_skb_pull_data+0x28/0x60 > [ 132.051262] bpf_prog_331d69c77ea5e964_schedcls_ingres+0x5f4/0x1000 > [ 132.051273] cls_bpf_classify+0x254/0x348 > [ 132.051284] tcf_classify+0xa4/0x180 Ah, you have a BPF program loaded at TC. That was not entirely obvious. This program gets called after packet sockets with ptype_all, before those with a specific protocol. Tcpdump will have inserted a program with ptype_all, which cloned the skb. This triggers skb_ensure_writable -> pskb_expand_head -> skb_clone_fraglist -> skb_get. > [ 132.051294] __netif_receive_skb_core+0x590/0xd28 > [ 132.051303] __netif_receive_skb+0x50/0x17c > [ 132.051312] process_backlog+0x15c/0x1b8 > > > > > > While tcpdump, sk_receive_queue of PF_PACKET has the original frag_list. > > > But the same frag_list is queued to PF_INET (or PF_INET6) as the fraglist > > > chain made by skb_segment_list(). > > > > > > If the new skb (not frag_list) is queued to one of the sk_receive_queue, > > > multiple ptypes can see this. The skb could be released by ptypes and > > > it causes use-after-free. > > > > If I understand correctly, a udp-gro-list skb makes it up the receive > > path with one or more active packet sockets. > > > > The packet socket will call skb_clone after accepting the filter. This > > replaces the head_skb, but shares the skb_shinfo and thus frag_list. > > > > udp_rcv_segment later converts the udp-gro-list skb to a list of > > regular packets to pass these one-by-one to udp_queue_rcv_one_skb. > > Now all the frags are fully fledged packets, with headers pushed > > before the payload. This does not change their refcount anymore than > > the skb_clone in pf_packet did. This should be 1. > > > > Eventually udp_recvmsg will call skb_consume_udp on each packet. > > > > The packet socket eventually also frees its cloned head_skb, which triggers > > > > kfree_skb_list(shinfo->frag_list) > > kfree_skb > > skb_unref > > refcount_dec_and_test(&skb->users) > > Every your understanding is right, but > > > > > > > > > [ 4443.426215] ------------[ cut here ]------------ > > > [ 4443.426222] refcount_t: underflow; use-after-free. > > > [ 4443.426291] WARNING: CPU: 7 PID: 28161 at lib/refcount.c:190 > > > refcount_dec_and_test_checked+0xa4/0xc8 > > > [ 4443.426726] pstate: 60400005 (nZCv daif +PAN -UAO) > > > [ 4443.426732] pc : refcount_dec_and_test_checked+0xa4/0xc8 > > > [ 4443.426737] lr : refcount_dec_and_test_checked+0xa0/0xc8 > > > [ 4443.426808] Call trace: > > > [ 4443.426813] refcount_dec_and_test_checked+0xa4/0xc8 > > > [ 4443.426823] skb_release_data+0x144/0x264 > > > [ 4443.426828] kfree_skb+0x58/0xc4 > > > [ 4443.426832] skb_queue_purge+0x64/0x9c > > > [ 4443.426844] packet_set_ring+0x5f0/0x820 > > > [ 4443.426849] packet_setsockopt+0x5a4/0xcd0 > > > [ 4443.426853] __sys_setsockopt+0x188/0x278 > > > [ 4443.426858] __arm64_sys_setsockopt+0x28/0x38 > > > [ 4443.426869] el0_svc_common+0xf0/0x1d0 > > > [ 4443.426873] el0_svc_handler+0x74/0x98 > > > [ 4443.426880] el0_svc+0x8/0xc > > > > > > Fixes: 3a1296a38d0c (net: Support GRO/GSO fraglist chaining.) > > > Signed-off-by: Dongseok Yi <dseok.yi@samsung.com> > > > --- > > > net/core/skbuff.c | 20 +++++++++++++++++++- > > > 1 file changed, 19 insertions(+), 1 deletion(-) > > > > > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > > > index f62cae3..1dcbda8 100644 > > > --- a/net/core/skbuff.c > > > +++ b/net/core/skbuff.c > > > @@ -3655,7 +3655,8 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb, > > > unsigned int delta_truesize = 0; > > > unsigned int delta_len = 0; > > > struct sk_buff *tail = NULL; > > > - struct sk_buff *nskb; > > > + struct sk_buff *nskb, *tmp; > > > + int err; > > > > > > skb_push(skb, -skb_network_offset(skb) + offset); > > > > > > @@ -3665,11 +3666,28 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb, > > > nskb = list_skb; > > > list_skb = list_skb->next; > > > > > > + err = 0; > > > + if (skb_shared(nskb)) { > > > > I must be missing something still. This does not square with my > > understanding that the two sockets are operating on clones, with each > > frag_list skb having skb->users == 1. > > > > Unless the packet socket patch previously also triggered an > > skb_unclone/pskb_expand_head, as that call skb_clone_fraglist, which > > calls skb_get on each frag_list skb. > > A cloned skb after tpacket_rcv cannot go through skb_ensure_writable > with the original shinfo. pskb_expand_head reallocates the shinfo of > the skb and call skb_clone_fraglist. skb_release_data in > pskb_expand_head could not reduce skb->users of the each frag_list skb > if skb_shinfo(skb)->dataref == 2. > > After the reallocation, skb_shinfo(skb)->dataref == 1 but each frag_list > skb could have skb->users == 2. Yes, that makes sense. skb_clone_fraglist just increments the frag_list skb's refcounts. skb_segment_list must create an unshared struct sk_buff before it changes skb data to insert the protocol headers. > > > > > > > + tmp = skb_clone(nskb, GFP_ATOMIC); > > > + if (tmp) { > > > + kfree_skb(nskb); > > > + nskb = tmp; > > > + err = skb_unclone(nskb, GFP_ATOMIC); Calling clone and unclone in quick succession looks odd. But you need the first to create a private skb and to trigger the second to create a private copy of the linear data (as well as frags, if any, but these are not touched). So this looks okay. > > > + } else { > > > + err = -ENOMEM; > > > + } > > > + } > > > + > > > if (!tail) > > > skb->next = nskb; > > > else > > > tail->next = nskb; > > > > > > + if (unlikely(err)) { > > > + nskb->next = list_skb; To avoid leaking these skbs when calling kfree_skb_list(skb->next). Is that concern new with this patch, or also needed for the existing error case? > > > + goto err_linearize; > > > + } > > > + > > > tail = nskb; > > > > > > delta_len += nskb->len; > > > -- > > > 2.7.4 > > > > ^ permalink raw reply [flat|nested] 21+ messages in thread
* RE: [PATCH net] net: fix use-after-free when UDP GRO with shared fraglist 2021-01-06 3:07 ` Willem de Bruijn @ 2021-01-06 3:32 ` Dongseok Yi 2021-01-06 17:13 ` Willem de Bruijn 2021-04-17 3:44 ` Yunsheng Lin 0 siblings, 2 replies; 21+ messages in thread From: Dongseok Yi @ 2021-01-06 3:32 UTC (permalink / raw) To: 'Willem de Bruijn' Cc: 'David S. Miller', 'Jakub Kicinski', 'Miaohe Lin', 'Willem de Bruijn', 'Paolo Abeni', 'Florian Westphal', 'Al Viro', 'Guillaume Nault', 'Yunsheng Lin', 'Steffen Klassert', 'Yadu Kishore', 'Marco Elver', 'Network Development', 'LKML', namkyu78.kim On 2021-01-06 12:07, Willem de Bruijn wrote: > > On Tue, Jan 5, 2021 at 8:29 PM Dongseok Yi <dseok.yi@samsung.com> wrote: > > > > On 2021-01-05 06:03, Willem de Bruijn wrote: > > > > > > On Mon, Jan 4, 2021 at 4:00 AM Dongseok Yi <dseok.yi@samsung.com> wrote: > > > > > > > > skbs in frag_list could be shared by pskb_expand_head() from BPF. > > > > > > Can you elaborate on the BPF connection? > > > > With the following registered ptypes, > > > > /proc/net # cat ptype > > Type Device Function > > ALL tpacket_rcv > > 0800 ip_rcv.cfi_jt > > 0011 llc_rcv.cfi_jt > > 0004 llc_rcv.cfi_jt > > 0806 arp_rcv > > 86dd ipv6_rcv.cfi_jt > > > > BPF checks skb_ensure_writable between tpacket_rcv and ip_rcv > > (or ipv6_rcv). And it calls pskb_expand_head. > > > > [ 132.051228] pskb_expand_head+0x360/0x378 > > [ 132.051237] skb_ensure_writable+0xa0/0xc4 > > [ 132.051249] bpf_skb_pull_data+0x28/0x60 > > [ 132.051262] bpf_prog_331d69c77ea5e964_schedcls_ingres+0x5f4/0x1000 > > [ 132.051273] cls_bpf_classify+0x254/0x348 > > [ 132.051284] tcf_classify+0xa4/0x180 > > Ah, you have a BPF program loaded at TC. That was not entirely obvious. > > This program gets called after packet sockets with ptype_all, before > those with a specific protocol. > > Tcpdump will have inserted a program with ptype_all, which cloned the > skb. This triggers skb_ensure_writable -> pskb_expand_head -> > skb_clone_fraglist -> skb_get. > > > [ 132.051294] __netif_receive_skb_core+0x590/0xd28 > > [ 132.051303] __netif_receive_skb+0x50/0x17c > > [ 132.051312] process_backlog+0x15c/0x1b8 > > > > > > > > > While tcpdump, sk_receive_queue of PF_PACKET has the original frag_list. > > > > But the same frag_list is queued to PF_INET (or PF_INET6) as the fraglist > > > > chain made by skb_segment_list(). > > > > > > > > If the new skb (not frag_list) is queued to one of the sk_receive_queue, > > > > multiple ptypes can see this. The skb could be released by ptypes and > > > > it causes use-after-free. > > > > > > If I understand correctly, a udp-gro-list skb makes it up the receive > > > path with one or more active packet sockets. > > > > > > The packet socket will call skb_clone after accepting the filter. This > > > replaces the head_skb, but shares the skb_shinfo and thus frag_list. > > > > > > udp_rcv_segment later converts the udp-gro-list skb to a list of > > > regular packets to pass these one-by-one to udp_queue_rcv_one_skb. > > > Now all the frags are fully fledged packets, with headers pushed > > > before the payload. This does not change their refcount anymore than > > > the skb_clone in pf_packet did. This should be 1. > > > > > > Eventually udp_recvmsg will call skb_consume_udp on each packet. > > > > > > The packet socket eventually also frees its cloned head_skb, which triggers > > > > > > kfree_skb_list(shinfo->frag_list) > > > kfree_skb > > > skb_unref > > > refcount_dec_and_test(&skb->users) > > > > Every your understanding is right, but > > > > > > > > > > > > > [ 4443.426215] ------------[ cut here ]------------ > > > > [ 4443.426222] refcount_t: underflow; use-after-free. > > > > [ 4443.426291] WARNING: CPU: 7 PID: 28161 at lib/refcount.c:190 > > > > refcount_dec_and_test_checked+0xa4/0xc8 > > > > [ 4443.426726] pstate: 60400005 (nZCv daif +PAN -UAO) > > > > [ 4443.426732] pc : refcount_dec_and_test_checked+0xa4/0xc8 > > > > [ 4443.426737] lr : refcount_dec_and_test_checked+0xa0/0xc8 > > > > [ 4443.426808] Call trace: > > > > [ 4443.426813] refcount_dec_and_test_checked+0xa4/0xc8 > > > > [ 4443.426823] skb_release_data+0x144/0x264 > > > > [ 4443.426828] kfree_skb+0x58/0xc4 > > > > [ 4443.426832] skb_queue_purge+0x64/0x9c > > > > [ 4443.426844] packet_set_ring+0x5f0/0x820 > > > > [ 4443.426849] packet_setsockopt+0x5a4/0xcd0 > > > > [ 4443.426853] __sys_setsockopt+0x188/0x278 > > > > [ 4443.426858] __arm64_sys_setsockopt+0x28/0x38 > > > > [ 4443.426869] el0_svc_common+0xf0/0x1d0 > > > > [ 4443.426873] el0_svc_handler+0x74/0x98 > > > > [ 4443.426880] el0_svc+0x8/0xc > > > > > > > > Fixes: 3a1296a38d0c (net: Support GRO/GSO fraglist chaining.) > > > > Signed-off-by: Dongseok Yi <dseok.yi@samsung.com> > > > > --- > > > > net/core/skbuff.c | 20 +++++++++++++++++++- > > > > 1 file changed, 19 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > > > > index f62cae3..1dcbda8 100644 > > > > --- a/net/core/skbuff.c > > > > +++ b/net/core/skbuff.c > > > > @@ -3655,7 +3655,8 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb, > > > > unsigned int delta_truesize = 0; > > > > unsigned int delta_len = 0; > > > > struct sk_buff *tail = NULL; > > > > - struct sk_buff *nskb; > > > > + struct sk_buff *nskb, *tmp; > > > > + int err; > > > > > > > > skb_push(skb, -skb_network_offset(skb) + offset); > > > > > > > > @@ -3665,11 +3666,28 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb, > > > > nskb = list_skb; > > > > list_skb = list_skb->next; > > > > > > > > + err = 0; > > > > + if (skb_shared(nskb)) { > > > > > > I must be missing something still. This does not square with my > > > understanding that the two sockets are operating on clones, with each > > > frag_list skb having skb->users == 1. > > > > > > Unless the packet socket patch previously also triggered an > > > skb_unclone/pskb_expand_head, as that call skb_clone_fraglist, which > > > calls skb_get on each frag_list skb. > > > > A cloned skb after tpacket_rcv cannot go through skb_ensure_writable > > with the original shinfo. pskb_expand_head reallocates the shinfo of > > the skb and call skb_clone_fraglist. skb_release_data in > > pskb_expand_head could not reduce skb->users of the each frag_list skb > > if skb_shinfo(skb)->dataref == 2. > > > > After the reallocation, skb_shinfo(skb)->dataref == 1 but each frag_list > > skb could have skb->users == 2. > > Yes, that makes sense. skb_clone_fraglist just increments the > frag_list skb's refcounts. > > skb_segment_list must create an unshared struct sk_buff before it > changes skb data to insert the protocol headers. > > > > > > > > > > > + tmp = skb_clone(nskb, GFP_ATOMIC); > > > > + if (tmp) { > > > > + kfree_skb(nskb); > > > > + nskb = tmp; > > > > + err = skb_unclone(nskb, GFP_ATOMIC); > > Calling clone and unclone in quick succession looks odd. > > But you need the first to create a private skb and to trigger the > second to create a private copy of the linear data (as well as frags, > if any, but these are not touched). So this looks okay. > > > > > + } else { > > > > + err = -ENOMEM; > > > > + } > > > > + } > > > > + > > > > if (!tail) > > > > skb->next = nskb; > > > > else > > > > tail->next = nskb; > > > > > > > > + if (unlikely(err)) { > > > > + nskb->next = list_skb; > > To avoid leaking these skbs when calling kfree_skb_list(skb->next). Is > that concern new with this patch, or also needed for the existing > error case? It's new for this patch. nskb can lose next skb due to tmp = skb_clone(nskb, GFP_ATOMIC); on the prior. I believe it is not needed for the existing errors. > > > > > + goto err_linearize; > > > > + } > > > > + > > > > tail = nskb; > > > > > > > > delta_len += nskb->len; > > > > -- > > > > 2.7.4 > > > > > > ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH net] net: fix use-after-free when UDP GRO with shared fraglist 2021-01-06 3:32 ` Dongseok Yi @ 2021-01-06 17:13 ` Willem de Bruijn 2021-04-17 3:44 ` Yunsheng Lin 1 sibling, 0 replies; 21+ messages in thread From: Willem de Bruijn @ 2021-01-06 17:13 UTC (permalink / raw) To: Dongseok Yi Cc: David S. Miller, Jakub Kicinski, Miaohe Lin, Willem de Bruijn, Paolo Abeni, Florian Westphal, Al Viro, Guillaume Nault, Yunsheng Lin, Steffen Klassert, Yadu Kishore, Marco Elver, Network Development, LKML, namkyu78.kim On Tue, Jan 5, 2021 at 10:32 PM Dongseok Yi <dseok.yi@samsung.com> wrote: > > On 2021-01-06 12:07, Willem de Bruijn wrote: > > > > On Tue, Jan 5, 2021 at 8:29 PM Dongseok Yi <dseok.yi@samsung.com> wrote: > > > > > > On 2021-01-05 06:03, Willem de Bruijn wrote: > > > > > > > > On Mon, Jan 4, 2021 at 4:00 AM Dongseok Yi <dseok.yi@samsung.com> wrote: > > > > > > > > > > skbs in frag_list could be shared by pskb_expand_head() from BPF. > > > > > > > > Can you elaborate on the BPF connection? > > > > > > With the following registered ptypes, > > > > > > /proc/net # cat ptype > > > Type Device Function > > > ALL tpacket_rcv > > > 0800 ip_rcv.cfi_jt > > > 0011 llc_rcv.cfi_jt > > > 0004 llc_rcv.cfi_jt > > > 0806 arp_rcv > > > 86dd ipv6_rcv.cfi_jt > > > > > > BPF checks skb_ensure_writable between tpacket_rcv and ip_rcv > > > (or ipv6_rcv). And it calls pskb_expand_head. > > > > > > [ 132.051228] pskb_expand_head+0x360/0x378 > > > [ 132.051237] skb_ensure_writable+0xa0/0xc4 > > > [ 132.051249] bpf_skb_pull_data+0x28/0x60 > > > [ 132.051262] bpf_prog_331d69c77ea5e964_schedcls_ingres+0x5f4/0x1000 > > > [ 132.051273] cls_bpf_classify+0x254/0x348 > > > [ 132.051284] tcf_classify+0xa4/0x180 > > > > Ah, you have a BPF program loaded at TC. That was not entirely obvious. > > > > This program gets called after packet sockets with ptype_all, before > > those with a specific protocol. > > > > Tcpdump will have inserted a program with ptype_all, which cloned the > > skb. This triggers skb_ensure_writable -> pskb_expand_head -> > > skb_clone_fraglist -> skb_get. > > > > > [ 132.051294] __netif_receive_skb_core+0x590/0xd28 > > > [ 132.051303] __netif_receive_skb+0x50/0x17c > > > [ 132.051312] process_backlog+0x15c/0x1b8 > > > > > > > > > > > > While tcpdump, sk_receive_queue of PF_PACKET has the original frag_list. > > > > > But the same frag_list is queued to PF_INET (or PF_INET6) as the fraglist > > > > > chain made by skb_segment_list(). > > > > > > > > > > If the new skb (not frag_list) is queued to one of the sk_receive_queue, > > > > > multiple ptypes can see this. The skb could be released by ptypes and > > > > > it causes use-after-free. > > > > > > > > If I understand correctly, a udp-gro-list skb makes it up the receive > > > > path with one or more active packet sockets. > > > > > > > > The packet socket will call skb_clone after accepting the filter. This > > > > replaces the head_skb, but shares the skb_shinfo and thus frag_list. > > > > > > > > udp_rcv_segment later converts the udp-gro-list skb to a list of > > > > regular packets to pass these one-by-one to udp_queue_rcv_one_skb. > > > > Now all the frags are fully fledged packets, with headers pushed > > > > before the payload. This does not change their refcount anymore than > > > > the skb_clone in pf_packet did. This should be 1. > > > > > > > > Eventually udp_recvmsg will call skb_consume_udp on each packet. > > > > > > > > The packet socket eventually also frees its cloned head_skb, which triggers > > > > > > > > kfree_skb_list(shinfo->frag_list) > > > > kfree_skb > > > > skb_unref > > > > refcount_dec_and_test(&skb->users) > > > > > > Every your understanding is right, but > > > > > > > > > > > > > > > > > [ 4443.426215] ------------[ cut here ]------------ > > > > > [ 4443.426222] refcount_t: underflow; use-after-free. > > > > > [ 4443.426291] WARNING: CPU: 7 PID: 28161 at lib/refcount.c:190 > > > > > refcount_dec_and_test_checked+0xa4/0xc8 > > > > > [ 4443.426726] pstate: 60400005 (nZCv daif +PAN -UAO) > > > > > [ 4443.426732] pc : refcount_dec_and_test_checked+0xa4/0xc8 > > > > > [ 4443.426737] lr : refcount_dec_and_test_checked+0xa0/0xc8 > > > > > [ 4443.426808] Call trace: > > > > > [ 4443.426813] refcount_dec_and_test_checked+0xa4/0xc8 > > > > > [ 4443.426823] skb_release_data+0x144/0x264 > > > > > [ 4443.426828] kfree_skb+0x58/0xc4 > > > > > [ 4443.426832] skb_queue_purge+0x64/0x9c > > > > > [ 4443.426844] packet_set_ring+0x5f0/0x820 > > > > > [ 4443.426849] packet_setsockopt+0x5a4/0xcd0 > > > > > [ 4443.426853] __sys_setsockopt+0x188/0x278 > > > > > [ 4443.426858] __arm64_sys_setsockopt+0x28/0x38 > > > > > [ 4443.426869] el0_svc_common+0xf0/0x1d0 > > > > > [ 4443.426873] el0_svc_handler+0x74/0x98 > > > > > [ 4443.426880] el0_svc+0x8/0xc > > > > > > > > > > Fixes: 3a1296a38d0c (net: Support GRO/GSO fraglist chaining.) > > > > > Signed-off-by: Dongseok Yi <dseok.yi@samsung.com> > > > > > --- > > > > > net/core/skbuff.c | 20 +++++++++++++++++++- > > > > > 1 file changed, 19 insertions(+), 1 deletion(-) > > > > > > > > > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > > > > > index f62cae3..1dcbda8 100644 > > > > > --- a/net/core/skbuff.c > > > > > +++ b/net/core/skbuff.c > > > > > @@ -3655,7 +3655,8 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb, > > > > > unsigned int delta_truesize = 0; > > > > > unsigned int delta_len = 0; > > > > > struct sk_buff *tail = NULL; > > > > > - struct sk_buff *nskb; > > > > > + struct sk_buff *nskb, *tmp; > > > > > + int err; > > > > > > > > > > skb_push(skb, -skb_network_offset(skb) + offset); > > > > > > > > > > @@ -3665,11 +3666,28 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb, > > > > > nskb = list_skb; > > > > > list_skb = list_skb->next; > > > > > > > > > > + err = 0; > > > > > + if (skb_shared(nskb)) { > > > > > > > > I must be missing something still. This does not square with my > > > > understanding that the two sockets are operating on clones, with each > > > > frag_list skb having skb->users == 1. > > > > > > > > Unless the packet socket patch previously also triggered an > > > > skb_unclone/pskb_expand_head, as that call skb_clone_fraglist, which > > > > calls skb_get on each frag_list skb. > > > > > > A cloned skb after tpacket_rcv cannot go through skb_ensure_writable > > > with the original shinfo. pskb_expand_head reallocates the shinfo of > > > the skb and call skb_clone_fraglist. skb_release_data in > > > pskb_expand_head could not reduce skb->users of the each frag_list skb > > > if skb_shinfo(skb)->dataref == 2. > > > > > > After the reallocation, skb_shinfo(skb)->dataref == 1 but each frag_list > > > skb could have skb->users == 2. > > > > Yes, that makes sense. skb_clone_fraglist just increments the > > frag_list skb's refcounts. > > > > skb_segment_list must create an unshared struct sk_buff before it > > changes skb data to insert the protocol headers. > > > > > > > > > > > > > > > + tmp = skb_clone(nskb, GFP_ATOMIC); > > > > > + if (tmp) { > > > > > + kfree_skb(nskb); > > > > > + nskb = tmp; > > > > > + err = skb_unclone(nskb, GFP_ATOMIC); > > > > Calling clone and unclone in quick succession looks odd. > > > > But you need the first to create a private skb and to trigger the > > second to create a private copy of the linear data (as well as frags, > > if any, but these are not touched). So this looks okay. > > > > > > > + } else { > > > > > + err = -ENOMEM; > > > > > + } > > > > > + } > > > > > + > > > > > if (!tail) > > > > > skb->next = nskb; > > > > > else > > > > > tail->next = nskb; > > > > > > > > > > + if (unlikely(err)) { > > > > > + nskb->next = list_skb; > > > > To avoid leaking these skbs when calling kfree_skb_list(skb->next). Is > > that concern new with this patch, or also needed for the existing > > error case? > > It's new for this patch. nskb can lose next skb due to > tmp = skb_clone(nskb, GFP_ATOMIC); on the prior. I believe it is not > needed for the existing errors. Ah, skb_clone clears the next pointer, indeed. Thanks. Yes, then this looks correct to me. Thanks for fixing, not an obvious code path or bug at all. Acked-by: Willem de Bruijn <willemb@google.com> The patch is already marked as changes requested in https://patchwork.kernel.org/project/netdevbpf , so you might have to resubmit it. If so, please expand a little bit in the commit message on the fact that a bpf filter is loaded at TC, which triggers skb_ensure_writable -> pskb_expand_head -> skb_clone_fraglist -> skb_get on each skb in the fraglist. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH net] net: fix use-after-free when UDP GRO with shared fraglist 2021-01-06 3:32 ` Dongseok Yi 2021-01-06 17:13 ` Willem de Bruijn @ 2021-04-17 3:44 ` Yunsheng Lin 2021-04-19 0:35 ` Dongseok Yi 1 sibling, 1 reply; 21+ messages in thread From: Yunsheng Lin @ 2021-04-17 3:44 UTC (permalink / raw) To: Dongseok Yi, 'Willem de Bruijn' Cc: 'David S. Miller', 'Jakub Kicinski', 'Miaohe Lin', 'Willem de Bruijn', 'Paolo Abeni', 'Florian Westphal', 'Al Viro', 'Guillaume Nault', 'Steffen Klassert', 'Yadu Kishore', 'Marco Elver', 'Network Development', 'LKML', namkyu78.kim On 2021/1/6 11:32, Dongseok Yi wrote: > On 2021-01-06 12:07, Willem de Bruijn wrote: >> >> On Tue, Jan 5, 2021 at 8:29 PM Dongseok Yi <dseok.yi@samsung.com> wrote: >>> >>> On 2021-01-05 06:03, Willem de Bruijn wrote: >>>> >>>> On Mon, Jan 4, 2021 at 4:00 AM Dongseok Yi <dseok.yi@samsung.com> wrote: >>>>> >>>>> skbs in frag_list could be shared by pskb_expand_head() from BPF. >>>> >>>> Can you elaborate on the BPF connection? >>> >>> With the following registered ptypes, >>> >>> /proc/net # cat ptype >>> Type Device Function >>> ALL tpacket_rcv >>> 0800 ip_rcv.cfi_jt >>> 0011 llc_rcv.cfi_jt >>> 0004 llc_rcv.cfi_jt >>> 0806 arp_rcv >>> 86dd ipv6_rcv.cfi_jt >>> >>> BPF checks skb_ensure_writable between tpacket_rcv and ip_rcv >>> (or ipv6_rcv). And it calls pskb_expand_head. >>> >>> [ 132.051228] pskb_expand_head+0x360/0x378 >>> [ 132.051237] skb_ensure_writable+0xa0/0xc4 >>> [ 132.051249] bpf_skb_pull_data+0x28/0x60 >>> [ 132.051262] bpf_prog_331d69c77ea5e964_schedcls_ingres+0x5f4/0x1000 >>> [ 132.051273] cls_bpf_classify+0x254/0x348 >>> [ 132.051284] tcf_classify+0xa4/0x180 >> >> Ah, you have a BPF program loaded at TC. That was not entirely obvious. >> >> This program gets called after packet sockets with ptype_all, before >> those with a specific protocol. >> >> Tcpdump will have inserted a program with ptype_all, which cloned the >> skb. This triggers skb_ensure_writable -> pskb_expand_head -> >> skb_clone_fraglist -> skb_get. >> >>> [ 132.051294] __netif_receive_skb_core+0x590/0xd28 >>> [ 132.051303] __netif_receive_skb+0x50/0x17c >>> [ 132.051312] process_backlog+0x15c/0x1b8 >>> >>>> >>>>> While tcpdump, sk_receive_queue of PF_PACKET has the original frag_list. >>>>> But the same frag_list is queued to PF_INET (or PF_INET6) as the fraglist >>>>> chain made by skb_segment_list(). >>>>> >>>>> If the new skb (not frag_list) is queued to one of the sk_receive_queue, >>>>> multiple ptypes can see this. The skb could be released by ptypes and >>>>> it causes use-after-free. >>>> >>>> If I understand correctly, a udp-gro-list skb makes it up the receive >>>> path with one or more active packet sockets. >>>> >>>> The packet socket will call skb_clone after accepting the filter. This >>>> replaces the head_skb, but shares the skb_shinfo and thus frag_list. >>>> >>>> udp_rcv_segment later converts the udp-gro-list skb to a list of >>>> regular packets to pass these one-by-one to udp_queue_rcv_one_skb. >>>> Now all the frags are fully fledged packets, with headers pushed >>>> before the payload. This does not change their refcount anymore than >>>> the skb_clone in pf_packet did. This should be 1. >>>> >>>> Eventually udp_recvmsg will call skb_consume_udp on each packet. >>>> >>>> The packet socket eventually also frees its cloned head_skb, which triggers >>>> >>>> kfree_skb_list(shinfo->frag_list) >>>> kfree_skb >>>> skb_unref >>>> refcount_dec_and_test(&skb->users) >>> >>> Every your understanding is right, but >>> >>>> >>>>> >>>>> [ 4443.426215] ------------[ cut here ]------------ >>>>> [ 4443.426222] refcount_t: underflow; use-after-free. >>>>> [ 4443.426291] WARNING: CPU: 7 PID: 28161 at lib/refcount.c:190 >>>>> refcount_dec_and_test_checked+0xa4/0xc8 >>>>> [ 4443.426726] pstate: 60400005 (nZCv daif +PAN -UAO) >>>>> [ 4443.426732] pc : refcount_dec_and_test_checked+0xa4/0xc8 >>>>> [ 4443.426737] lr : refcount_dec_and_test_checked+0xa0/0xc8 >>>>> [ 4443.426808] Call trace: >>>>> [ 4443.426813] refcount_dec_and_test_checked+0xa4/0xc8 >>>>> [ 4443.426823] skb_release_data+0x144/0x264 >>>>> [ 4443.426828] kfree_skb+0x58/0xc4 >>>>> [ 4443.426832] skb_queue_purge+0x64/0x9c >>>>> [ 4443.426844] packet_set_ring+0x5f0/0x820 >>>>> [ 4443.426849] packet_setsockopt+0x5a4/0xcd0 >>>>> [ 4443.426853] __sys_setsockopt+0x188/0x278 >>>>> [ 4443.426858] __arm64_sys_setsockopt+0x28/0x38 >>>>> [ 4443.426869] el0_svc_common+0xf0/0x1d0 >>>>> [ 4443.426873] el0_svc_handler+0x74/0x98 >>>>> [ 4443.426880] el0_svc+0x8/0xc >>>>> >>>>> Fixes: 3a1296a38d0c (net: Support GRO/GSO fraglist chaining.) >>>>> Signed-off-by: Dongseok Yi <dseok.yi@samsung.com> >>>>> --- >>>>> net/core/skbuff.c | 20 +++++++++++++++++++- >>>>> 1 file changed, 19 insertions(+), 1 deletion(-) >>>>> >>>>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c >>>>> index f62cae3..1dcbda8 100644 >>>>> --- a/net/core/skbuff.c >>>>> +++ b/net/core/skbuff.c >>>>> @@ -3655,7 +3655,8 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb, >>>>> unsigned int delta_truesize = 0; >>>>> unsigned int delta_len = 0; >>>>> struct sk_buff *tail = NULL; >>>>> - struct sk_buff *nskb; >>>>> + struct sk_buff *nskb, *tmp; >>>>> + int err; >>>>> >>>>> skb_push(skb, -skb_network_offset(skb) + offset); >>>>> >>>>> @@ -3665,11 +3666,28 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb, >>>>> nskb = list_skb; >>>>> list_skb = list_skb->next; >>>>> >>>>> + err = 0; >>>>> + if (skb_shared(nskb)) { >>>> >>>> I must be missing something still. This does not square with my >>>> understanding that the two sockets are operating on clones, with each >>>> frag_list skb having skb->users == 1. >>>> >>>> Unless the packet socket patch previously also triggered an >>>> skb_unclone/pskb_expand_head, as that call skb_clone_fraglist, which >>>> calls skb_get on each frag_list skb. >>> >>> A cloned skb after tpacket_rcv cannot go through skb_ensure_writable >>> with the original shinfo. pskb_expand_head reallocates the shinfo of >>> the skb and call skb_clone_fraglist. skb_release_data in >>> pskb_expand_head could not reduce skb->users of the each frag_list skb >>> if skb_shinfo(skb)->dataref == 2. >>> >>> After the reallocation, skb_shinfo(skb)->dataref == 1 but each frag_list >>> skb could have skb->users == 2. Hi, Dongseok I understand there is liner head data shared between the frag_list skb in the cloned skb(cloned by pf_packet?) and original skb, which should not be shared when skb_segment_list() converts the frag_list skb into regular packet. But both skb->users of original and cloned skb is one(skb_shinfo(skb)->dataref is one for both skb too), and skb->users of each fraglist skb is two because both original and cloned skb is linking to the same fraglist pointer, and there is "skb_shinfo(skb)->frag_list = NULL" for original skb in the begin of skb_segment_list(), if kfree_skb() is called with original skb, the fraglist skb will not be freed. If kfree_skb is called with original skb,cloned skb and each fraglist skb here, the reference counter for three of them seem right here, so why is there a refcount_t warning in the commit log? am I missing something obvious here? Sorry for bringing up this thread again. >> >> Yes, that makes sense. skb_clone_fraglist just increments the >> frag_list skb's refcounts. >> >> skb_segment_list must create an unshared struct sk_buff before it >> changes skb data to insert the protocol headers. >> ^ permalink raw reply [flat|nested] 21+ messages in thread
* RE: [PATCH net] net: fix use-after-free when UDP GRO with shared fraglist 2021-04-17 3:44 ` Yunsheng Lin @ 2021-04-19 0:35 ` Dongseok Yi 2021-04-21 9:42 ` Yunsheng Lin 0 siblings, 1 reply; 21+ messages in thread From: Dongseok Yi @ 2021-04-19 0:35 UTC (permalink / raw) To: 'Yunsheng Lin', 'Willem de Bruijn' Cc: 'David S. Miller', 'Jakub Kicinski', 'Miaohe Lin', 'Willem de Bruijn', 'Paolo Abeni', 'Florian Westphal', 'Al Viro', 'Guillaume Nault', 'Steffen Klassert', 'Yadu Kishore', 'Marco Elver', 'Network Development', 'LKML', namkyu78.kim On Sat, Apr 17, 2021 at 11:44:35AM +0800, Yunsheng Lin wrote: > > On 2021/1/6 11:32, Dongseok Yi wrote: > > On 2021-01-06 12:07, Willem de Bruijn wrote: > >> > >> On Tue, Jan 5, 2021 at 8:29 PM Dongseok Yi <dseok.yi@samsung.com> wrote: > >>> > >>> On 2021-01-05 06:03, Willem de Bruijn wrote: > >>>> > >>>> On Mon, Jan 4, 2021 at 4:00 AM Dongseok Yi <dseok.yi@samsung.com> wrote: > >>>>> > >>>>> skbs in frag_list could be shared by pskb_expand_head() from BPF. > >>>> > >>>> Can you elaborate on the BPF connection? > >>> > >>> With the following registered ptypes, > >>> > >>> /proc/net # cat ptype > >>> Type Device Function > >>> ALL tpacket_rcv > >>> 0800 ip_rcv.cfi_jt > >>> 0011 llc_rcv.cfi_jt > >>> 0004 llc_rcv.cfi_jt > >>> 0806 arp_rcv > >>> 86dd ipv6_rcv.cfi_jt > >>> > >>> BPF checks skb_ensure_writable between tpacket_rcv and ip_rcv > >>> (or ipv6_rcv). And it calls pskb_expand_head. > >>> > >>> [ 132.051228] pskb_expand_head+0x360/0x378 > >>> [ 132.051237] skb_ensure_writable+0xa0/0xc4 > >>> [ 132.051249] bpf_skb_pull_data+0x28/0x60 > >>> [ 132.051262] bpf_prog_331d69c77ea5e964_schedcls_ingres+0x5f4/0x1000 > >>> [ 132.051273] cls_bpf_classify+0x254/0x348 > >>> [ 132.051284] tcf_classify+0xa4/0x180 > >> > >> Ah, you have a BPF program loaded at TC. That was not entirely obvious. > >> > >> This program gets called after packet sockets with ptype_all, before > >> those with a specific protocol. > >> > >> Tcpdump will have inserted a program with ptype_all, which cloned the > >> skb. This triggers skb_ensure_writable -> pskb_expand_head -> > >> skb_clone_fraglist -> skb_get. > >> > >>> [ 132.051294] __netif_receive_skb_core+0x590/0xd28 > >>> [ 132.051303] __netif_receive_skb+0x50/0x17c > >>> [ 132.051312] process_backlog+0x15c/0x1b8 > >>> > >>>> > >>>>> While tcpdump, sk_receive_queue of PF_PACKET has the original frag_list. > >>>>> But the same frag_list is queued to PF_INET (or PF_INET6) as the fraglist > >>>>> chain made by skb_segment_list(). > >>>>> > >>>>> If the new skb (not frag_list) is queued to one of the sk_receive_queue, > >>>>> multiple ptypes can see this. The skb could be released by ptypes and > >>>>> it causes use-after-free. > >>>> > >>>> If I understand correctly, a udp-gro-list skb makes it up the receive > >>>> path with one or more active packet sockets. > >>>> > >>>> The packet socket will call skb_clone after accepting the filter. This > >>>> replaces the head_skb, but shares the skb_shinfo and thus frag_list. > >>>> > >>>> udp_rcv_segment later converts the udp-gro-list skb to a list of > >>>> regular packets to pass these one-by-one to udp_queue_rcv_one_skb. > >>>> Now all the frags are fully fledged packets, with headers pushed > >>>> before the payload. This does not change their refcount anymore than > >>>> the skb_clone in pf_packet did. This should be 1. > >>>> > >>>> Eventually udp_recvmsg will call skb_consume_udp on each packet. > >>>> > >>>> The packet socket eventually also frees its cloned head_skb, which triggers > >>>> > >>>> kfree_skb_list(shinfo->frag_list) > >>>> kfree_skb > >>>> skb_unref > >>>> refcount_dec_and_test(&skb->users) > >>> > >>> Every your understanding is right, but > >>> > >>>> > >>>>> > >>>>> [ 4443.426215] ------------[ cut here ]------------ > >>>>> [ 4443.426222] refcount_t: underflow; use-after-free. > >>>>> [ 4443.426291] WARNING: CPU: 7 PID: 28161 at lib/refcount.c:190 > >>>>> refcount_dec_and_test_checked+0xa4/0xc8 > >>>>> [ 4443.426726] pstate: 60400005 (nZCv daif +PAN -UAO) > >>>>> [ 4443.426732] pc : refcount_dec_and_test_checked+0xa4/0xc8 > >>>>> [ 4443.426737] lr : refcount_dec_and_test_checked+0xa0/0xc8 > >>>>> [ 4443.426808] Call trace: > >>>>> [ 4443.426813] refcount_dec_and_test_checked+0xa4/0xc8 > >>>>> [ 4443.426823] skb_release_data+0x144/0x264 > >>>>> [ 4443.426828] kfree_skb+0x58/0xc4 > >>>>> [ 4443.426832] skb_queue_purge+0x64/0x9c > >>>>> [ 4443.426844] packet_set_ring+0x5f0/0x820 > >>>>> [ 4443.426849] packet_setsockopt+0x5a4/0xcd0 > >>>>> [ 4443.426853] __sys_setsockopt+0x188/0x278 > >>>>> [ 4443.426858] __arm64_sys_setsockopt+0x28/0x38 > >>>>> [ 4443.426869] el0_svc_common+0xf0/0x1d0 > >>>>> [ 4443.426873] el0_svc_handler+0x74/0x98 > >>>>> [ 4443.426880] el0_svc+0x8/0xc > >>>>> > >>>>> Fixes: 3a1296a38d0c (net: Support GRO/GSO fraglist chaining.) > >>>>> Signed-off-by: Dongseok Yi <dseok.yi@samsung.com> > >>>>> --- > >>>>> net/core/skbuff.c | 20 +++++++++++++++++++- > >>>>> 1 file changed, 19 insertions(+), 1 deletion(-) > >>>>> > >>>>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c > >>>>> index f62cae3..1dcbda8 100644 > >>>>> --- a/net/core/skbuff.c > >>>>> +++ b/net/core/skbuff.c > >>>>> @@ -3655,7 +3655,8 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb, > >>>>> unsigned int delta_truesize = 0; > >>>>> unsigned int delta_len = 0; > >>>>> struct sk_buff *tail = NULL; > >>>>> - struct sk_buff *nskb; > >>>>> + struct sk_buff *nskb, *tmp; > >>>>> + int err; > >>>>> > >>>>> skb_push(skb, -skb_network_offset(skb) + offset); > >>>>> > >>>>> @@ -3665,11 +3666,28 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb, > >>>>> nskb = list_skb; > >>>>> list_skb = list_skb->next; > >>>>> > >>>>> + err = 0; > >>>>> + if (skb_shared(nskb)) { > >>>> > >>>> I must be missing something still. This does not square with my > >>>> understanding that the two sockets are operating on clones, with each > >>>> frag_list skb having skb->users == 1. > >>>> > >>>> Unless the packet socket patch previously also triggered an > >>>> skb_unclone/pskb_expand_head, as that call skb_clone_fraglist, which > >>>> calls skb_get on each frag_list skb. > >>> > >>> A cloned skb after tpacket_rcv cannot go through skb_ensure_writable > >>> with the original shinfo. pskb_expand_head reallocates the shinfo of > >>> the skb and call skb_clone_fraglist. skb_release_data in > >>> pskb_expand_head could not reduce skb->users of the each frag_list skb > >>> if skb_shinfo(skb)->dataref == 2. > >>> > >>> After the reallocation, skb_shinfo(skb)->dataref == 1 but each frag_list > >>> skb could have skb->users == 2. > > Hi, Dongseok > I understand there is liner head data shared between the frag_list skb in the > cloned skb(cloned by pf_packet?) and original skb, which should not be shared > when skb_segment_list() converts the frag_list skb into regular packet. > > But both skb->users of original and cloned skb is one(skb_shinfo(skb)->dataref > is one for both skb too), and skb->users of each fraglist skb is two because both > original and cloned skb is linking to the same fraglist pointer, and there is > "skb_shinfo(skb)->frag_list = NULL" for original skb in the begin of skb_segment_list(), > if kfree_skb() is called with original skb, the fraglist skb will not be freed. > If kfree_skb is called with original skb,cloned skb and each fraglist skb here, the > reference counter for three of them seem right here, so why is there a refcount_t > warning in the commit log? am I missing something obvious here? > > Sorry for bringing up this thread again. A skb which detects use-after-free was not a part of frag_list. Please check the commit msg once again. Both sk_receive_queue of PF_PACKET and PF_INET (or PF_INET6) can have a link for the same frag_skbs chain. If a new skb (*not frags*) is queued to one of the sk_receive_queue, multiple ptypes can see and release this. It causes use-after-free. > > >> > >> Yes, that makes sense. skb_clone_fraglist just increments the > >> frag_list skb's refcounts. > >> > >> skb_segment_list must create an unshared struct sk_buff before it > >> changes skb data to insert the protocol headers. > >> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH net] net: fix use-after-free when UDP GRO with shared fraglist 2021-04-19 0:35 ` Dongseok Yi @ 2021-04-21 9:42 ` Yunsheng Lin 2021-04-21 11:04 ` Dongseok Yi 0 siblings, 1 reply; 21+ messages in thread From: Yunsheng Lin @ 2021-04-21 9:42 UTC (permalink / raw) To: Dongseok Yi, 'Willem de Bruijn' Cc: 'David S. Miller', 'Jakub Kicinski', 'Miaohe Lin', 'Willem de Bruijn', 'Paolo Abeni', 'Florian Westphal', 'Al Viro', 'Guillaume Nault', 'Steffen Klassert', 'Yadu Kishore', 'Marco Elver', 'Network Development', 'LKML', namkyu78.kim On 2021/4/19 8:35, Dongseok Yi wrote: > On Sat, Apr 17, 2021 at 11:44:35AM +0800, Yunsheng Lin wrote: >> >> On 2021/1/6 11:32, Dongseok Yi wrote: >>> On 2021-01-06 12:07, Willem de Bruijn wrote: >>>> >>>> On Tue, Jan 5, 2021 at 8:29 PM Dongseok Yi <dseok.yi@samsung.com> wrote: >>>>> >>>>> On 2021-01-05 06:03, Willem de Bruijn wrote: >>>>>> >>>>>> On Mon, Jan 4, 2021 at 4:00 AM Dongseok Yi <dseok.yi@samsung.com> wrote: >>>>>>> >>>>>>> skbs in frag_list could be shared by pskb_expand_head() from BPF. >>>>>> >>>>>> Can you elaborate on the BPF connection? >>>>> >>>>> With the following registered ptypes, >>>>> >>>>> /proc/net # cat ptype >>>>> Type Device Function >>>>> ALL tpacket_rcv >>>>> 0800 ip_rcv.cfi_jt >>>>> 0011 llc_rcv.cfi_jt >>>>> 0004 llc_rcv.cfi_jt >>>>> 0806 arp_rcv >>>>> 86dd ipv6_rcv.cfi_jt >>>>> >>>>> BPF checks skb_ensure_writable between tpacket_rcv and ip_rcv >>>>> (or ipv6_rcv). And it calls pskb_expand_head. >>>>> >>>>> [ 132.051228] pskb_expand_head+0x360/0x378 >>>>> [ 132.051237] skb_ensure_writable+0xa0/0xc4 >>>>> [ 132.051249] bpf_skb_pull_data+0x28/0x60 >>>>> [ 132.051262] bpf_prog_331d69c77ea5e964_schedcls_ingres+0x5f4/0x1000 >>>>> [ 132.051273] cls_bpf_classify+0x254/0x348 >>>>> [ 132.051284] tcf_classify+0xa4/0x180 >>>> >>>> Ah, you have a BPF program loaded at TC. That was not entirely obvious. >>>> >>>> This program gets called after packet sockets with ptype_all, before >>>> those with a specific protocol. >>>> >>>> Tcpdump will have inserted a program with ptype_all, which cloned the >>>> skb. This triggers skb_ensure_writable -> pskb_expand_head -> >>>> skb_clone_fraglist -> skb_get. >>>> >>>>> [ 132.051294] __netif_receive_skb_core+0x590/0xd28 >>>>> [ 132.051303] __netif_receive_skb+0x50/0x17c >>>>> [ 132.051312] process_backlog+0x15c/0x1b8 >>>>> >>>>>> >>>>>>> While tcpdump, sk_receive_queue of PF_PACKET has the original frag_list. >>>>>>> But the same frag_list is queued to PF_INET (or PF_INET6) as the fraglist >>>>>>> chain made by skb_segment_list(). >>>>>>> >>>>>>> If the new skb (not frag_list) is queued to one of the sk_receive_queue, >>>>>>> multiple ptypes can see this. The skb could be released by ptypes and >>>>>>> it causes use-after-free. >>>>>> >>>>>> If I understand correctly, a udp-gro-list skb makes it up the receive >>>>>> path with one or more active packet sockets. >>>>>> >>>>>> The packet socket will call skb_clone after accepting the filter. This >>>>>> replaces the head_skb, but shares the skb_shinfo and thus frag_list. >>>>>> >>>>>> udp_rcv_segment later converts the udp-gro-list skb to a list of >>>>>> regular packets to pass these one-by-one to udp_queue_rcv_one_skb. >>>>>> Now all the frags are fully fledged packets, with headers pushed >>>>>> before the payload. This does not change their refcount anymore than >>>>>> the skb_clone in pf_packet did. This should be 1. >>>>>> >>>>>> Eventually udp_recvmsg will call skb_consume_udp on each packet. >>>>>> >>>>>> The packet socket eventually also frees its cloned head_skb, which triggers >>>>>> >>>>>> kfree_skb_list(shinfo->frag_list) >>>>>> kfree_skb >>>>>> skb_unref >>>>>> refcount_dec_and_test(&skb->users) >>>>> >>>>> Every your understanding is right, but >>>>> >>>>>> >>>>>>> >>>>>>> [ 4443.426215] ------------[ cut here ]------------ >>>>>>> [ 4443.426222] refcount_t: underflow; use-after-free. >>>>>>> [ 4443.426291] WARNING: CPU: 7 PID: 28161 at lib/refcount.c:190 >>>>>>> refcount_dec_and_test_checked+0xa4/0xc8 >>>>>>> [ 4443.426726] pstate: 60400005 (nZCv daif +PAN -UAO) >>>>>>> [ 4443.426732] pc : refcount_dec_and_test_checked+0xa4/0xc8 >>>>>>> [ 4443.426737] lr : refcount_dec_and_test_checked+0xa0/0xc8 >>>>>>> [ 4443.426808] Call trace: >>>>>>> [ 4443.426813] refcount_dec_and_test_checked+0xa4/0xc8 >>>>>>> [ 4443.426823] skb_release_data+0x144/0x264 >>>>>>> [ 4443.426828] kfree_skb+0x58/0xc4 >>>>>>> [ 4443.426832] skb_queue_purge+0x64/0x9c >>>>>>> [ 4443.426844] packet_set_ring+0x5f0/0x820 >>>>>>> [ 4443.426849] packet_setsockopt+0x5a4/0xcd0 >>>>>>> [ 4443.426853] __sys_setsockopt+0x188/0x278 >>>>>>> [ 4443.426858] __arm64_sys_setsockopt+0x28/0x38 >>>>>>> [ 4443.426869] el0_svc_common+0xf0/0x1d0 >>>>>>> [ 4443.426873] el0_svc_handler+0x74/0x98 >>>>>>> [ 4443.426880] el0_svc+0x8/0xc >>>>>>> >>>>>>> Fixes: 3a1296a38d0c (net: Support GRO/GSO fraglist chaining.) >>>>>>> Signed-off-by: Dongseok Yi <dseok.yi@samsung.com> >>>>>>> --- >>>>>>> net/core/skbuff.c | 20 +++++++++++++++++++- >>>>>>> 1 file changed, 19 insertions(+), 1 deletion(-) >>>>>>> >>>>>>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c >>>>>>> index f62cae3..1dcbda8 100644 >>>>>>> --- a/net/core/skbuff.c >>>>>>> +++ b/net/core/skbuff.c >>>>>>> @@ -3655,7 +3655,8 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb, >>>>>>> unsigned int delta_truesize = 0; >>>>>>> unsigned int delta_len = 0; >>>>>>> struct sk_buff *tail = NULL; >>>>>>> - struct sk_buff *nskb; >>>>>>> + struct sk_buff *nskb, *tmp; >>>>>>> + int err; >>>>>>> >>>>>>> skb_push(skb, -skb_network_offset(skb) + offset); >>>>>>> >>>>>>> @@ -3665,11 +3666,28 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb, >>>>>>> nskb = list_skb; >>>>>>> list_skb = list_skb->next; >>>>>>> >>>>>>> + err = 0; >>>>>>> + if (skb_shared(nskb)) { >>>>>> >>>>>> I must be missing something still. This does not square with my >>>>>> understanding that the two sockets are operating on clones, with each >>>>>> frag_list skb having skb->users == 1. >>>>>> >>>>>> Unless the packet socket patch previously also triggered an >>>>>> skb_unclone/pskb_expand_head, as that call skb_clone_fraglist, which >>>>>> calls skb_get on each frag_list skb. >>>>> >>>>> A cloned skb after tpacket_rcv cannot go through skb_ensure_writable >>>>> with the original shinfo. pskb_expand_head reallocates the shinfo of >>>>> the skb and call skb_clone_fraglist. skb_release_data in >>>>> pskb_expand_head could not reduce skb->users of the each frag_list skb >>>>> if skb_shinfo(skb)->dataref == 2. >>>>> >>>>> After the reallocation, skb_shinfo(skb)->dataref == 1 but each frag_list >>>>> skb could have skb->users == 2. >> >> Hi, Dongseok >> I understand there is liner head data shared between the frag_list skb in the >> cloned skb(cloned by pf_packet?) and original skb, which should not be shared >> when skb_segment_list() converts the frag_list skb into regular packet. >> >> But both skb->users of original and cloned skb is one(skb_shinfo(skb)->dataref >> is one for both skb too), and skb->users of each fraglist skb is two because both >> original and cloned skb is linking to the same fraglist pointer, and there is >> "skb_shinfo(skb)->frag_list = NULL" for original skb in the begin of skb_segment_list(), >> if kfree_skb() is called with original skb, the fraglist skb will not be freed. >> If kfree_skb is called with original skb,cloned skb and each fraglist skb here, the >> reference counter for three of them seem right here, so why is there a refcount_t >> warning in the commit log? am I missing something obvious here? >> >> Sorry for bringing up this thread again. > > A skb which detects use-after-free was not a part of frag_list. Please > check the commit msg once again. I checked the commit msg again, but still have not figured it out yet:) So I tried to see if I understand the skb'reference counting correctly: skb->user is used to reference counting the "struct sk_buff", and skb_shinfo(skb)->dataref is used to reference counting head data. skb_clone(): allocate a sperate "struct sk_buff" but share the head data with the original skb, so skb_shinfo()->dataref need incrmenting. pskb_expand_head(): allocate a sperate head data(which includes the space for skb_shinfo(skb)), since the original head data and the new head data' skb_shinfo()->frag_list both point to the same fraglist skb, so each fraglist_skb's skb->users need incrmenting, and original head data's skb_shinfo() need decrmenting. So after pf_packet called skb_clone() and pskb_expand_head(), we have: old skb new skb | | | | old head data new head data \ / \ / \ / \ / \ / fraglist_skb1 -> fraglist_skb2 -> fraglist_skb3 ..... So both old and new skb' skb->user is one, both old and new head data's skb_shinfo()->dataref is one, and both old and new head data' skb_shinfo()->frag_list points to fraglist_skb1, and each fraglist_skb's skb->user is two. Each fraglist_skb points to a head data, and its skb_shinfo()->dataref is one too. Suppose old skb is called with skb_segment_list(), without this patch, we have: new skb | | new head data / / / / / old skb -> fraglist_skb1 -> fraglist_skb2 -> fraglist_skb3 ..... | | old head data And old skb and each fraglist_skb become a regular packet, so freeing the old skb, new skb and each fraglist_skb here do not seems to have any reference counting problem, because each fraglist_skb's skb->user is two, right? > > Both sk_receive_queue of PF_PACKET and PF_INET (or PF_INET6) can have > a link for the same frag_skbs chain. Does "frag_skbs chain" means fraglist_skb1? It seems only new head data's skb_shinfo()->frag_list points to fraglist_skb1 If a new skb (*not frags*) is > queued to one of the sk_receive_queue, multiple ptypes can see and > release this. It causes use-after-free. Does "a new skb" mean each fraglist_skb after skb_segment_list()? Or other new incoming skb? I am not so familiar with the PF_PACKET and PF_INET, so still have hard time figuring how the reference counting goes wrong here:) > >> >>>> >>>> Yes, that makes sense. skb_clone_fraglist just increments the >>>> frag_list skb's refcounts. >>>> >>>> skb_segment_list must create an unshared struct sk_buff before it >>>> changes skb data to insert the protocol headers. >>>> > > > > . > ^ permalink raw reply [flat|nested] 21+ messages in thread
* RE: [PATCH net] net: fix use-after-free when UDP GRO with shared fraglist 2021-04-21 9:42 ` Yunsheng Lin @ 2021-04-21 11:04 ` Dongseok Yi 0 siblings, 0 replies; 21+ messages in thread From: Dongseok Yi @ 2021-04-21 11:04 UTC (permalink / raw) To: 'Yunsheng Lin', 'Willem de Bruijn' Cc: 'David S. Miller', 'Jakub Kicinski', 'Miaohe Lin', 'Willem de Bruijn', 'Paolo Abeni', 'Florian Westphal', 'Al Viro', 'Guillaume Nault', 'Steffen Klassert', 'Yadu Kishore', 'Marco Elver', 'Network Development', 'LKML', namkyu78.kim On Wed, Apr 21, 2021 at 05:42:12PM +0800, Yunsheng Lin wrote: > On 2021/4/19 8:35, Dongseok Yi wrote: > > On Sat, Apr 17, 2021 at 11:44:35AM +0800, Yunsheng Lin wrote: > >> > >> On 2021/1/6 11:32, Dongseok Yi wrote: > >>> On 2021-01-06 12:07, Willem de Bruijn wrote: > >>>> > >>>> On Tue, Jan 5, 2021 at 8:29 PM Dongseok Yi <dseok.yi@samsung.com> wrote: > >>>>> > >>>>> On 2021-01-05 06:03, Willem de Bruijn wrote: > >>>>>> > >>>>>> On Mon, Jan 4, 2021 at 4:00 AM Dongseok Yi <dseok.yi@samsung.com> wrote: > >>>>>>> > >>>>>>> skbs in frag_list could be shared by pskb_expand_head() from BPF. > >>>>>> > >>>>>> Can you elaborate on the BPF connection? > >>>>> > >>>>> With the following registered ptypes, > >>>>> > >>>>> /proc/net # cat ptype > >>>>> Type Device Function > >>>>> ALL tpacket_rcv > >>>>> 0800 ip_rcv.cfi_jt > >>>>> 0011 llc_rcv.cfi_jt > >>>>> 0004 llc_rcv.cfi_jt > >>>>> 0806 arp_rcv > >>>>> 86dd ipv6_rcv.cfi_jt > >>>>> > >>>>> BPF checks skb_ensure_writable between tpacket_rcv and ip_rcv > >>>>> (or ipv6_rcv). And it calls pskb_expand_head. > >>>>> > >>>>> [ 132.051228] pskb_expand_head+0x360/0x378 > >>>>> [ 132.051237] skb_ensure_writable+0xa0/0xc4 > >>>>> [ 132.051249] bpf_skb_pull_data+0x28/0x60 > >>>>> [ 132.051262] bpf_prog_331d69c77ea5e964_schedcls_ingres+0x5f4/0x1000 > >>>>> [ 132.051273] cls_bpf_classify+0x254/0x348 > >>>>> [ 132.051284] tcf_classify+0xa4/0x180 > >>>> > >>>> Ah, you have a BPF program loaded at TC. That was not entirely obvious. > >>>> > >>>> This program gets called after packet sockets with ptype_all, before > >>>> those with a specific protocol. > >>>> > >>>> Tcpdump will have inserted a program with ptype_all, which cloned the > >>>> skb. This triggers skb_ensure_writable -> pskb_expand_head -> > >>>> skb_clone_fraglist -> skb_get. > >>>> > >>>>> [ 132.051294] __netif_receive_skb_core+0x590/0xd28 > >>>>> [ 132.051303] __netif_receive_skb+0x50/0x17c > >>>>> [ 132.051312] process_backlog+0x15c/0x1b8 > >>>>> > >>>>>> > >>>>>>> While tcpdump, sk_receive_queue of PF_PACKET has the original frag_list. > >>>>>>> But the same frag_list is queued to PF_INET (or PF_INET6) as the fraglist > >>>>>>> chain made by skb_segment_list(). > >>>>>>> > >>>>>>> If the new skb (not frag_list) is queued to one of the sk_receive_queue, > >>>>>>> multiple ptypes can see this. The skb could be released by ptypes and > >>>>>>> it causes use-after-free. > >>>>>> > >>>>>> If I understand correctly, a udp-gro-list skb makes it up the receive > >>>>>> path with one or more active packet sockets. > >>>>>> > >>>>>> The packet socket will call skb_clone after accepting the filter. This > >>>>>> replaces the head_skb, but shares the skb_shinfo and thus frag_list. > >>>>>> > >>>>>> udp_rcv_segment later converts the udp-gro-list skb to a list of > >>>>>> regular packets to pass these one-by-one to udp_queue_rcv_one_skb. > >>>>>> Now all the frags are fully fledged packets, with headers pushed > >>>>>> before the payload. This does not change their refcount anymore than > >>>>>> the skb_clone in pf_packet did. This should be 1. > >>>>>> > >>>>>> Eventually udp_recvmsg will call skb_consume_udp on each packet. > >>>>>> > >>>>>> The packet socket eventually also frees its cloned head_skb, which triggers > >>>>>> > >>>>>> kfree_skb_list(shinfo->frag_list) > >>>>>> kfree_skb > >>>>>> skb_unref > >>>>>> refcount_dec_and_test(&skb->users) > >>>>> > >>>>> Every your understanding is right, but > >>>>> > >>>>>> > >>>>>>> > >>>>>>> [ 4443.426215] ------------[ cut here ]------------ > >>>>>>> [ 4443.426222] refcount_t: underflow; use-after-free. > >>>>>>> [ 4443.426291] WARNING: CPU: 7 PID: 28161 at lib/refcount.c:190 > >>>>>>> refcount_dec_and_test_checked+0xa4/0xc8 > >>>>>>> [ 4443.426726] pstate: 60400005 (nZCv daif +PAN -UAO) > >>>>>>> [ 4443.426732] pc : refcount_dec_and_test_checked+0xa4/0xc8 > >>>>>>> [ 4443.426737] lr : refcount_dec_and_test_checked+0xa0/0xc8 > >>>>>>> [ 4443.426808] Call trace: > >>>>>>> [ 4443.426813] refcount_dec_and_test_checked+0xa4/0xc8 > >>>>>>> [ 4443.426823] skb_release_data+0x144/0x264 > >>>>>>> [ 4443.426828] kfree_skb+0x58/0xc4 > >>>>>>> [ 4443.426832] skb_queue_purge+0x64/0x9c > >>>>>>> [ 4443.426844] packet_set_ring+0x5f0/0x820 > >>>>>>> [ 4443.426849] packet_setsockopt+0x5a4/0xcd0 > >>>>>>> [ 4443.426853] __sys_setsockopt+0x188/0x278 > >>>>>>> [ 4443.426858] __arm64_sys_setsockopt+0x28/0x38 > >>>>>>> [ 4443.426869] el0_svc_common+0xf0/0x1d0 > >>>>>>> [ 4443.426873] el0_svc_handler+0x74/0x98 > >>>>>>> [ 4443.426880] el0_svc+0x8/0xc > >>>>>>> > >>>>>>> Fixes: 3a1296a38d0c (net: Support GRO/GSO fraglist chaining.) > >>>>>>> Signed-off-by: Dongseok Yi <dseok.yi@samsung.com> > >>>>>>> --- > >>>>>>> net/core/skbuff.c | 20 +++++++++++++++++++- > >>>>>>> 1 file changed, 19 insertions(+), 1 deletion(-) > >>>>>>> > >>>>>>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c > >>>>>>> index f62cae3..1dcbda8 100644 > >>>>>>> --- a/net/core/skbuff.c > >>>>>>> +++ b/net/core/skbuff.c > >>>>>>> @@ -3655,7 +3655,8 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb, > >>>>>>> unsigned int delta_truesize = 0; > >>>>>>> unsigned int delta_len = 0; > >>>>>>> struct sk_buff *tail = NULL; > >>>>>>> - struct sk_buff *nskb; > >>>>>>> + struct sk_buff *nskb, *tmp; > >>>>>>> + int err; > >>>>>>> > >>>>>>> skb_push(skb, -skb_network_offset(skb) + offset); > >>>>>>> > >>>>>>> @@ -3665,11 +3666,28 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb, > >>>>>>> nskb = list_skb; > >>>>>>> list_skb = list_skb->next; > >>>>>>> > >>>>>>> + err = 0; > >>>>>>> + if (skb_shared(nskb)) { > >>>>>> > >>>>>> I must be missing something still. This does not square with my > >>>>>> understanding that the two sockets are operating on clones, with each > >>>>>> frag_list skb having skb->users == 1. > >>>>>> > >>>>>> Unless the packet socket patch previously also triggered an > >>>>>> skb_unclone/pskb_expand_head, as that call skb_clone_fraglist, which > >>>>>> calls skb_get on each frag_list skb. > >>>>> > >>>>> A cloned skb after tpacket_rcv cannot go through skb_ensure_writable > >>>>> with the original shinfo. pskb_expand_head reallocates the shinfo of > >>>>> the skb and call skb_clone_fraglist. skb_release_data in > >>>>> pskb_expand_head could not reduce skb->users of the each frag_list skb > >>>>> if skb_shinfo(skb)->dataref == 2. > >>>>> > >>>>> After the reallocation, skb_shinfo(skb)->dataref == 1 but each frag_list > >>>>> skb could have skb->users == 2. > >> > >> Hi, Dongseok > >> I understand there is liner head data shared between the frag_list skb in the > >> cloned skb(cloned by pf_packet?) and original skb, which should not be shared > >> when skb_segment_list() converts the frag_list skb into regular packet. > >> > >> But both skb->users of original and cloned skb is one(skb_shinfo(skb)->dataref > >> is one for both skb too), and skb->users of each fraglist skb is two because both > >> original and cloned skb is linking to the same fraglist pointer, and there is > >> "skb_shinfo(skb)->frag_list = NULL" for original skb in the begin of skb_segment_list(), > >> if kfree_skb() is called with original skb, the fraglist skb will not be freed. > >> If kfree_skb is called with original skb,cloned skb and each fraglist skb here, the > >> reference counter for three of them seem right here, so why is there a refcount_t > >> warning in the commit log? am I missing something obvious here? > >> > >> Sorry for bringing up this thread again. > > > > A skb which detects use-after-free was not a part of frag_list. Please > > check the commit msg once again. > > I checked the commit msg again, but still have not figured it out yet:) > > So I tried to see if I understand the skb'reference counting correctly: > > skb->user is used to reference counting the "struct sk_buff", and > skb_shinfo(skb)->dataref is used to reference counting head data. > > skb_clone(): allocate a sperate "struct sk_buff" but share the head data > with the original skb, so skb_shinfo()->dataref need > incrmenting. > > pskb_expand_head(): allocate a sperate head data(which includes the space > for skb_shinfo(skb)), since the original head data > and the new head data' skb_shinfo()->frag_list both > point to the same fraglist skb, so each fraglist_skb's > skb->users need incrmenting, and original head data's > skb_shinfo() need decrmenting. > > > So after pf_packet called skb_clone() and pskb_expand_head(), we have: > > old skb new skb > | | > | | > old head data new head data > \ / > \ / > \ / > \ / > \ / > fraglist_skb1 -> fraglist_skb2 -> fraglist_skb3 ..... > > So both old and new skb' skb->user is one, both old and new head data's > skb_shinfo()->dataref is one, and both old and new head data' > skb_shinfo()->frag_list points to fraglist_skb1, and each fraglist_skb's > skb->user is two. > > Each fraglist_skb points to a head data, and its skb_shinfo()->dataref > is one too. > > Suppose old skb is called with skb_segment_list(), without this patch, > we have: > > new skb > | > | > new head data > / > / > / > / > / > old skb -> fraglist_skb1 -> fraglist_skb2 -> fraglist_skb3 ..... > | > | > old head data > > And old skb and each fraglist_skb become a regular packet, so freeing > the old skb, new skb and each fraglist_skb here do not seems to have > any reference counting problem, because each fraglist_skb's skb->user > is two, right? > > > > > Both sk_receive_queue of PF_PACKET and PF_INET (or PF_INET6) can have > > a link for the same frag_skbs chain. > > Does "frag_skbs chain" means fraglist_skb1? It seems only new head data's > skb_shinfo()->frag_list points to fraglist_skb1 Yes, right. > > > If a new skb (*not frags*) is > > queued to one of the sk_receive_queue, multiple ptypes can see and > > release this. It causes use-after-free. > > Does "a new skb" mean each fraglist_skb after skb_segment_list()? Or other > new incoming skb? I mean a new incoming skb. > > I am not so familiar with the PF_PACKET and PF_INET, so still have hard > time figuring how the reference counting goes wrong here:) Let's assume a new incoming skb that is added to the next of the last fraglist_skb. The new incoming skb->user is *one*. new skb | | new head data / / / / / old skb -> fraglist_skb1 -> fraglist_skb2 -> ... -> new incoming skb | | old head data Let's skb_queue_purge from old skb. kfree_skb from old skb will free 2 skbs (marked as xxx1 and xxx2). What happened if kfree_skb(new skb)? new skb | | new head data / / / / / xxx1 -> fraglist_skb1 -> fraglist_skb2 -> ... -> xxx2 It will try to free xxx2. > > > > >> > >>>> > >>>> Yes, that makes sense. skb_clone_fraglist just increments the > >>>> frag_list skb's refcounts. > >>>> > >>>> skb_segment_list must create an unshared struct sk_buff before it > >>>> changes skb data to insert the protocol headers. > >>>> > > > > > > > > . > > ^ permalink raw reply [flat|nested] 21+ messages in thread
[parent not found: <CGME20210107005028epcas2p35dfa745fd92e31400024874f54243556@epcas2p3.samsung.com>]
* [PATCH net v2] net: fix use-after-free when UDP GRO with shared fraglist [not found] ` <CGME20210107005028epcas2p35dfa745fd92e31400024874f54243556@epcas2p3.samsung.com> @ 2021-01-07 0:39 ` Dongseok Yi 2021-01-07 11:05 ` Daniel Borkmann [not found] ` <CGME20210108024017epcas2p455fe96b8483880f9b7a654dbcf600b20@epcas2p4.samsung.com> 0 siblings, 2 replies; 21+ messages in thread From: Dongseok Yi @ 2021-01-07 0:39 UTC (permalink / raw) To: David S. Miller, Willem de Bruijn Cc: Jakub Kicinski, Miaohe Lin, Paolo Abeni, Florian Westphal, Al Viro, Guillaume Nault, Yunsheng Lin, Steffen Klassert, Yadu Kishore, Marco Elver, netdev, linux-kernel, namkyu78.kim, Dongseok Yi skbs in fraglist could be shared by a BPF filter loaded at TC. It triggers skb_ensure_writable -> pskb_expand_head -> skb_clone_fraglist -> skb_get on each skb in the fraglist. While tcpdump, sk_receive_queue of PF_PACKET has the original fraglist. But the same fraglist is queued to PF_INET (or PF_INET6) as the fraglist chain made by skb_segment_list. If the new skb (not fraglist) is queued to one of the sk_receive_queue, multiple ptypes can see this. The skb could be released by ptypes and it causes use-after-free. [ 4443.426215] ------------[ cut here ]------------ [ 4443.426222] refcount_t: underflow; use-after-free. [ 4443.426291] WARNING: CPU: 7 PID: 28161 at lib/refcount.c:190 refcount_dec_and_test_checked+0xa4/0xc8 [ 4443.426726] pstate: 60400005 (nZCv daif +PAN -UAO) [ 4443.426732] pc : refcount_dec_and_test_checked+0xa4/0xc8 [ 4443.426737] lr : refcount_dec_and_test_checked+0xa0/0xc8 [ 4443.426808] Call trace: [ 4443.426813] refcount_dec_and_test_checked+0xa4/0xc8 [ 4443.426823] skb_release_data+0x144/0x264 [ 4443.426828] kfree_skb+0x58/0xc4 [ 4443.426832] skb_queue_purge+0x64/0x9c [ 4443.426844] packet_set_ring+0x5f0/0x820 [ 4443.426849] packet_setsockopt+0x5a4/0xcd0 [ 4443.426853] __sys_setsockopt+0x188/0x278 [ 4443.426858] __arm64_sys_setsockopt+0x28/0x38 [ 4443.426869] el0_svc_common+0xf0/0x1d0 [ 4443.426873] el0_svc_handler+0x74/0x98 [ 4443.426880] el0_svc+0x8/0xc Fixes: 3a1296a38d0c (net: Support GRO/GSO fraglist chaining.) Signed-off-by: Dongseok Yi <dseok.yi@samsung.com> Acked-by: Willem de Bruijn <willemb@google.com> --- net/core/skbuff.c | 20 +++++++++++++++++++- 1 file changed, 19 insertions(+), 1 deletion(-) v2: Expand the commit message to clarify a BPF filter loaded diff --git a/net/core/skbuff.c b/net/core/skbuff.c index f62cae3..1dcbda8 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -3655,7 +3655,8 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb, unsigned int delta_truesize = 0; unsigned int delta_len = 0; struct sk_buff *tail = NULL; - struct sk_buff *nskb; + struct sk_buff *nskb, *tmp; + int err; skb_push(skb, -skb_network_offset(skb) + offset); @@ -3665,11 +3666,28 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb, nskb = list_skb; list_skb = list_skb->next; + err = 0; + if (skb_shared(nskb)) { + tmp = skb_clone(nskb, GFP_ATOMIC); + if (tmp) { + kfree_skb(nskb); + nskb = tmp; + err = skb_unclone(nskb, GFP_ATOMIC); + } else { + err = -ENOMEM; + } + } + if (!tail) skb->next = nskb; else tail->next = nskb; + if (unlikely(err)) { + nskb->next = list_skb; + goto err_linearize; + } + tail = nskb; delta_len += nskb->len; -- 2.7.4 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH net v2] net: fix use-after-free when UDP GRO with shared fraglist 2021-01-07 0:39 ` [PATCH net v2] " Dongseok Yi @ 2021-01-07 11:05 ` Daniel Borkmann 2021-01-07 11:40 ` Dongseok Yi [not found] ` <CGME20210108024017epcas2p455fe96b8483880f9b7a654dbcf600b20@epcas2p4.samsung.com> 1 sibling, 1 reply; 21+ messages in thread From: Daniel Borkmann @ 2021-01-07 11:05 UTC (permalink / raw) To: Dongseok Yi, David S. Miller, Willem de Bruijn Cc: Jakub Kicinski, Miaohe Lin, Paolo Abeni, Florian Westphal, Al Viro, Guillaume Nault, Yunsheng Lin, Steffen Klassert, Yadu Kishore, Marco Elver, netdev, linux-kernel, namkyu78.kim On 1/7/21 1:39 AM, Dongseok Yi wrote: > skbs in fraglist could be shared by a BPF filter loaded at TC. It > triggers skb_ensure_writable -> pskb_expand_head -> > skb_clone_fraglist -> skb_get on each skb in the fraglist. > > While tcpdump, sk_receive_queue of PF_PACKET has the original fraglist. > But the same fraglist is queued to PF_INET (or PF_INET6) as the fraglist > chain made by skb_segment_list. > > If the new skb (not fraglist) is queued to one of the sk_receive_queue, > multiple ptypes can see this. The skb could be released by ptypes and > it causes use-after-free. > > [ 4443.426215] ------------[ cut here ]------------ > [ 4443.426222] refcount_t: underflow; use-after-free. > [ 4443.426291] WARNING: CPU: 7 PID: 28161 at lib/refcount.c:190 > refcount_dec_and_test_checked+0xa4/0xc8 > [ 4443.426726] pstate: 60400005 (nZCv daif +PAN -UAO) > [ 4443.426732] pc : refcount_dec_and_test_checked+0xa4/0xc8 > [ 4443.426737] lr : refcount_dec_and_test_checked+0xa0/0xc8 > [ 4443.426808] Call trace: > [ 4443.426813] refcount_dec_and_test_checked+0xa4/0xc8 > [ 4443.426823] skb_release_data+0x144/0x264 > [ 4443.426828] kfree_skb+0x58/0xc4 > [ 4443.426832] skb_queue_purge+0x64/0x9c > [ 4443.426844] packet_set_ring+0x5f0/0x820 > [ 4443.426849] packet_setsockopt+0x5a4/0xcd0 > [ 4443.426853] __sys_setsockopt+0x188/0x278 > [ 4443.426858] __arm64_sys_setsockopt+0x28/0x38 > [ 4443.426869] el0_svc_common+0xf0/0x1d0 > [ 4443.426873] el0_svc_handler+0x74/0x98 > [ 4443.426880] el0_svc+0x8/0xc > > Fixes: 3a1296a38d0c (net: Support GRO/GSO fraglist chaining.) > Signed-off-by: Dongseok Yi <dseok.yi@samsung.com> > Acked-by: Willem de Bruijn <willemb@google.com> > --- > net/core/skbuff.c | 20 +++++++++++++++++++- > 1 file changed, 19 insertions(+), 1 deletion(-) > > v2: Expand the commit message to clarify a BPF filter loaded > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > index f62cae3..1dcbda8 100644 > --- a/net/core/skbuff.c > +++ b/net/core/skbuff.c > @@ -3655,7 +3655,8 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb, > unsigned int delta_truesize = 0; > unsigned int delta_len = 0; > struct sk_buff *tail = NULL; > - struct sk_buff *nskb; > + struct sk_buff *nskb, *tmp; > + int err; > > skb_push(skb, -skb_network_offset(skb) + offset); > > @@ -3665,11 +3666,28 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb, > nskb = list_skb; > list_skb = list_skb->next; > > + err = 0; > + if (skb_shared(nskb)) { > + tmp = skb_clone(nskb, GFP_ATOMIC); > + if (tmp) { > + kfree_skb(nskb); Should use consume_skb() to not trigger skb:kfree_skb tracepoint when looking for drops in the stack. > + nskb = tmp; > + err = skb_unclone(nskb, GFP_ATOMIC); Could you elaborate why you also need to unclone? This looks odd here. tc layer (independent of BPF) from ingress & egress side generally assumes unshared skb, so above clone + dropping ref of nskb looks okay to make the main skb struct private for mangling attributes (e.g. mark) & should suffice. What is the exact purpose of the additional skb_unclone() in this context? > + } else { > + err = -ENOMEM; > + } > + } > + > if (!tail) > skb->next = nskb; > else > tail->next = nskb; > > + if (unlikely(err)) { > + nskb->next = list_skb; > + goto err_linearize; > + } > + > tail = nskb; > > delta_len += nskb->len; > ^ permalink raw reply [flat|nested] 21+ messages in thread
* RE: [PATCH net v2] net: fix use-after-free when UDP GRO with shared fraglist 2021-01-07 11:05 ` Daniel Borkmann @ 2021-01-07 11:40 ` Dongseok Yi 2021-01-07 12:49 ` Daniel Borkmann 0 siblings, 1 reply; 21+ messages in thread From: Dongseok Yi @ 2021-01-07 11:40 UTC (permalink / raw) To: 'Daniel Borkmann', 'David S. Miller', 'Willem de Bruijn' Cc: 'Jakub Kicinski', 'Miaohe Lin', 'Paolo Abeni', 'Florian Westphal', 'Al Viro', 'Guillaume Nault', 'Yunsheng Lin', 'Steffen Klassert', 'Yadu Kishore', 'Marco Elver', netdev, linux-kernel, namkyu78.kim On 2021-01-07 20:05, Daniel Borkmann wrote: > > On 1/7/21 1:39 AM, Dongseok Yi wrote: > > skbs in fraglist could be shared by a BPF filter loaded at TC. It > > triggers skb_ensure_writable -> pskb_expand_head -> > > skb_clone_fraglist -> skb_get on each skb in the fraglist. > > > > While tcpdump, sk_receive_queue of PF_PACKET has the original fraglist. > > But the same fraglist is queued to PF_INET (or PF_INET6) as the fraglist > > chain made by skb_segment_list. > > > > If the new skb (not fraglist) is queued to one of the sk_receive_queue, > > multiple ptypes can see this. The skb could be released by ptypes and > > it causes use-after-free. > > > > [ 4443.426215] ------------[ cut here ]------------ > > [ 4443.426222] refcount_t: underflow; use-after-free. > > [ 4443.426291] WARNING: CPU: 7 PID: 28161 at lib/refcount.c:190 > > refcount_dec_and_test_checked+0xa4/0xc8 > > [ 4443.426726] pstate: 60400005 (nZCv daif +PAN -UAO) > > [ 4443.426732] pc : refcount_dec_and_test_checked+0xa4/0xc8 > > [ 4443.426737] lr : refcount_dec_and_test_checked+0xa0/0xc8 > > [ 4443.426808] Call trace: > > [ 4443.426813] refcount_dec_and_test_checked+0xa4/0xc8 > > [ 4443.426823] skb_release_data+0x144/0x264 > > [ 4443.426828] kfree_skb+0x58/0xc4 > > [ 4443.426832] skb_queue_purge+0x64/0x9c > > [ 4443.426844] packet_set_ring+0x5f0/0x820 > > [ 4443.426849] packet_setsockopt+0x5a4/0xcd0 > > [ 4443.426853] __sys_setsockopt+0x188/0x278 > > [ 4443.426858] __arm64_sys_setsockopt+0x28/0x38 > > [ 4443.426869] el0_svc_common+0xf0/0x1d0 > > [ 4443.426873] el0_svc_handler+0x74/0x98 > > [ 4443.426880] el0_svc+0x8/0xc > > > > Fixes: 3a1296a38d0c (net: Support GRO/GSO fraglist chaining.) > > Signed-off-by: Dongseok Yi <dseok.yi@samsung.com> > > Acked-by: Willem de Bruijn <willemb@google.com> > > --- > > net/core/skbuff.c | 20 +++++++++++++++++++- > > 1 file changed, 19 insertions(+), 1 deletion(-) > > > > v2: Expand the commit message to clarify a BPF filter loaded > > > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > > index f62cae3..1dcbda8 100644 > > --- a/net/core/skbuff.c > > +++ b/net/core/skbuff.c > > @@ -3655,7 +3655,8 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb, > > unsigned int delta_truesize = 0; > > unsigned int delta_len = 0; > > struct sk_buff *tail = NULL; > > - struct sk_buff *nskb; > > + struct sk_buff *nskb, *tmp; > > + int err; > > > > skb_push(skb, -skb_network_offset(skb) + offset); > > > > @@ -3665,11 +3666,28 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb, > > nskb = list_skb; > > list_skb = list_skb->next; > > > > + err = 0; > > + if (skb_shared(nskb)) { > > + tmp = skb_clone(nskb, GFP_ATOMIC); > > + if (tmp) { > > + kfree_skb(nskb); > > Should use consume_skb() to not trigger skb:kfree_skb tracepoint when looking > for drops in the stack. I will use to consume_skb() on the next version. > > > + nskb = tmp; > > + err = skb_unclone(nskb, GFP_ATOMIC); > > Could you elaborate why you also need to unclone? This looks odd here. tc layer > (independent of BPF) from ingress & egress side generally assumes unshared skb, > so above clone + dropping ref of nskb looks okay to make the main skb struct private > for mangling attributes (e.g. mark) & should suffice. What is the exact purpose of > the additional skb_unclone() in this context? Willem de Bruijn said: udp_rcv_segment later converts the udp-gro-list skb to a list of regular packets to pass these one-by-one to udp_queue_rcv_one_skb. Now all the frags are fully fledged packets, with headers pushed before the payload. PF_PACKET handles untouched fraglist. To modify the payload only for udp_rcv_segment, skb_unclone is necessary. > > > + } else { > > + err = -ENOMEM; > > + } > > + } > > + > > if (!tail) > > skb->next = nskb; > > else > > tail->next = nskb; > > > > + if (unlikely(err)) { > > + nskb->next = list_skb; > > + goto err_linearize; > > + } > > + > > tail = nskb; > > > > delta_len += nskb->len; > > ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH net v2] net: fix use-after-free when UDP GRO with shared fraglist 2021-01-07 11:40 ` Dongseok Yi @ 2021-01-07 12:49 ` Daniel Borkmann 2021-01-07 13:05 ` Willem de Bruijn 0 siblings, 1 reply; 21+ messages in thread From: Daniel Borkmann @ 2021-01-07 12:49 UTC (permalink / raw) To: Dongseok Yi, 'David S. Miller', 'Willem de Bruijn' Cc: 'Jakub Kicinski', 'Miaohe Lin', 'Paolo Abeni', 'Florian Westphal', 'Al Viro', 'Guillaume Nault', 'Yunsheng Lin', 'Steffen Klassert', 'Yadu Kishore', 'Marco Elver', netdev, linux-kernel, namkyu78.kim On 1/7/21 12:40 PM, Dongseok Yi wrote: > On 2021-01-07 20:05, Daniel Borkmann wrote: >> On 1/7/21 1:39 AM, Dongseok Yi wrote: >>> skbs in fraglist could be shared by a BPF filter loaded at TC. It >>> triggers skb_ensure_writable -> pskb_expand_head -> >>> skb_clone_fraglist -> skb_get on each skb in the fraglist. >>> >>> While tcpdump, sk_receive_queue of PF_PACKET has the original fraglist. >>> But the same fraglist is queued to PF_INET (or PF_INET6) as the fraglist >>> chain made by skb_segment_list. >>> >>> If the new skb (not fraglist) is queued to one of the sk_receive_queue, >>> multiple ptypes can see this. The skb could be released by ptypes and >>> it causes use-after-free. >>> >>> [ 4443.426215] ------------[ cut here ]------------ >>> [ 4443.426222] refcount_t: underflow; use-after-free. >>> [ 4443.426291] WARNING: CPU: 7 PID: 28161 at lib/refcount.c:190 >>> refcount_dec_and_test_checked+0xa4/0xc8 >>> [ 4443.426726] pstate: 60400005 (nZCv daif +PAN -UAO) >>> [ 4443.426732] pc : refcount_dec_and_test_checked+0xa4/0xc8 >>> [ 4443.426737] lr : refcount_dec_and_test_checked+0xa0/0xc8 >>> [ 4443.426808] Call trace: >>> [ 4443.426813] refcount_dec_and_test_checked+0xa4/0xc8 >>> [ 4443.426823] skb_release_data+0x144/0x264 >>> [ 4443.426828] kfree_skb+0x58/0xc4 >>> [ 4443.426832] skb_queue_purge+0x64/0x9c >>> [ 4443.426844] packet_set_ring+0x5f0/0x820 >>> [ 4443.426849] packet_setsockopt+0x5a4/0xcd0 >>> [ 4443.426853] __sys_setsockopt+0x188/0x278 >>> [ 4443.426858] __arm64_sys_setsockopt+0x28/0x38 >>> [ 4443.426869] el0_svc_common+0xf0/0x1d0 >>> [ 4443.426873] el0_svc_handler+0x74/0x98 >>> [ 4443.426880] el0_svc+0x8/0xc >>> >>> Fixes: 3a1296a38d0c (net: Support GRO/GSO fraglist chaining.) >>> Signed-off-by: Dongseok Yi <dseok.yi@samsung.com> >>> Acked-by: Willem de Bruijn <willemb@google.com> >>> --- >>> net/core/skbuff.c | 20 +++++++++++++++++++- >>> 1 file changed, 19 insertions(+), 1 deletion(-) >>> >>> v2: Expand the commit message to clarify a BPF filter loaded >>> >>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c >>> index f62cae3..1dcbda8 100644 >>> --- a/net/core/skbuff.c >>> +++ b/net/core/skbuff.c >>> @@ -3655,7 +3655,8 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb, >>> unsigned int delta_truesize = 0; >>> unsigned int delta_len = 0; >>> struct sk_buff *tail = NULL; >>> - struct sk_buff *nskb; >>> + struct sk_buff *nskb, *tmp; >>> + int err; >>> >>> skb_push(skb, -skb_network_offset(skb) + offset); >>> >>> @@ -3665,11 +3666,28 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb, >>> nskb = list_skb; >>> list_skb = list_skb->next; >>> >>> + err = 0; >>> + if (skb_shared(nskb)) { >>> + tmp = skb_clone(nskb, GFP_ATOMIC); >>> + if (tmp) { >>> + kfree_skb(nskb); >> >> Should use consume_skb() to not trigger skb:kfree_skb tracepoint when looking >> for drops in the stack. > > I will use to consume_skb() on the next version. > >>> + nskb = tmp; >>> + err = skb_unclone(nskb, GFP_ATOMIC); >> >> Could you elaborate why you also need to unclone? This looks odd here. tc layer >> (independent of BPF) from ingress & egress side generally assumes unshared skb, >> so above clone + dropping ref of nskb looks okay to make the main skb struct private >> for mangling attributes (e.g. mark) & should suffice. What is the exact purpose of >> the additional skb_unclone() in this context? > > Willem de Bruijn said: > udp_rcv_segment later converts the udp-gro-list skb to a list of > regular packets to pass these one-by-one to udp_queue_rcv_one_skb. > Now all the frags are fully fledged packets, with headers pushed > before the payload. Yes. > PF_PACKET handles untouched fraglist. To modify the payload only > for udp_rcv_segment, skb_unclone is necessary. I don't parse this last sentence here, please elaborate in more detail on why it is necessary. For example, if tc layer would modify mark on the skb, then __copy_skb_header() in skb_segment_list() will propagate it. If tc layer would modify payload, the skb_ensure_writable() will take care of that internally and if needed pull in parts from fraglist into linear section to make it private. The purpose of the skb_clone() above iff shared is to make the struct itself private (to safely modify its struct members). What am I missing? >>> + } else { >>> + err = -ENOMEM; >>> + } >>> + } >>> + >>> if (!tail) >>> skb->next = nskb; >>> else >>> tail->next = nskb; >>> >>> + if (unlikely(err)) { >>> + nskb->next = list_skb; >>> + goto err_linearize; >>> + } >>> + >>> tail = nskb; >>> >>> delta_len += nskb->len; >>> > > ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH net v2] net: fix use-after-free when UDP GRO with shared fraglist 2021-01-07 12:49 ` Daniel Borkmann @ 2021-01-07 13:05 ` Willem de Bruijn 2021-01-07 13:33 ` Daniel Borkmann 0 siblings, 1 reply; 21+ messages in thread From: Willem de Bruijn @ 2021-01-07 13:05 UTC (permalink / raw) To: Daniel Borkmann Cc: Dongseok Yi, David S. Miller, Willem de Bruijn, Jakub Kicinski, Miaohe Lin, Paolo Abeni, Florian Westphal, Al Viro, Guillaume Nault, Yunsheng Lin, Steffen Klassert, Yadu Kishore, Marco Elver, Network Development, LKML, namkyu78.kim On Thu, Jan 7, 2021 at 7:52 AM Daniel Borkmann <daniel@iogearbox.net> wrote: > > On 1/7/21 12:40 PM, Dongseok Yi wrote: > > On 2021-01-07 20:05, Daniel Borkmann wrote: > >> On 1/7/21 1:39 AM, Dongseok Yi wrote: > >>> skbs in fraglist could be shared by a BPF filter loaded at TC. It > >>> triggers skb_ensure_writable -> pskb_expand_head -> > >>> skb_clone_fraglist -> skb_get on each skb in the fraglist. > >>> > >>> While tcpdump, sk_receive_queue of PF_PACKET has the original fraglist. > >>> But the same fraglist is queued to PF_INET (or PF_INET6) as the fraglist > >>> chain made by skb_segment_list. > >>> > >>> If the new skb (not fraglist) is queued to one of the sk_receive_queue, > >>> multiple ptypes can see this. The skb could be released by ptypes and > >>> it causes use-after-free. > >>> > >>> [ 4443.426215] ------------[ cut here ]------------ > >>> [ 4443.426222] refcount_t: underflow; use-after-free. > >>> [ 4443.426291] WARNING: CPU: 7 PID: 28161 at lib/refcount.c:190 > >>> refcount_dec_and_test_checked+0xa4/0xc8 > >>> [ 4443.426726] pstate: 60400005 (nZCv daif +PAN -UAO) > >>> [ 4443.426732] pc : refcount_dec_and_test_checked+0xa4/0xc8 > >>> [ 4443.426737] lr : refcount_dec_and_test_checked+0xa0/0xc8 > >>> [ 4443.426808] Call trace: > >>> [ 4443.426813] refcount_dec_and_test_checked+0xa4/0xc8 > >>> [ 4443.426823] skb_release_data+0x144/0x264 > >>> [ 4443.426828] kfree_skb+0x58/0xc4 > >>> [ 4443.426832] skb_queue_purge+0x64/0x9c > >>> [ 4443.426844] packet_set_ring+0x5f0/0x820 > >>> [ 4443.426849] packet_setsockopt+0x5a4/0xcd0 > >>> [ 4443.426853] __sys_setsockopt+0x188/0x278 > >>> [ 4443.426858] __arm64_sys_setsockopt+0x28/0x38 > >>> [ 4443.426869] el0_svc_common+0xf0/0x1d0 > >>> [ 4443.426873] el0_svc_handler+0x74/0x98 > >>> [ 4443.426880] el0_svc+0x8/0xc > >>> > >>> Fixes: 3a1296a38d0c (net: Support GRO/GSO fraglist chaining.) > >>> Signed-off-by: Dongseok Yi <dseok.yi@samsung.com> > >>> Acked-by: Willem de Bruijn <willemb@google.com> > >>> --- > >>> net/core/skbuff.c | 20 +++++++++++++++++++- > >>> 1 file changed, 19 insertions(+), 1 deletion(-) > >>> > >>> v2: Expand the commit message to clarify a BPF filter loaded > >>> > >>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c > >>> index f62cae3..1dcbda8 100644 > >>> --- a/net/core/skbuff.c > >>> +++ b/net/core/skbuff.c > >>> @@ -3655,7 +3655,8 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb, > >>> unsigned int delta_truesize = 0; > >>> unsigned int delta_len = 0; > >>> struct sk_buff *tail = NULL; > >>> - struct sk_buff *nskb; > >>> + struct sk_buff *nskb, *tmp; > >>> + int err; > >>> > >>> skb_push(skb, -skb_network_offset(skb) + offset); > >>> > >>> @@ -3665,11 +3666,28 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb, > >>> nskb = list_skb; > >>> list_skb = list_skb->next; > >>> > >>> + err = 0; > >>> + if (skb_shared(nskb)) { > >>> + tmp = skb_clone(nskb, GFP_ATOMIC); > >>> + if (tmp) { > >>> + kfree_skb(nskb); > >> > >> Should use consume_skb() to not trigger skb:kfree_skb tracepoint when looking > >> for drops in the stack. > > > > I will use to consume_skb() on the next version. > > > >>> + nskb = tmp; > >>> + err = skb_unclone(nskb, GFP_ATOMIC); > >> > >> Could you elaborate why you also need to unclone? This looks odd here. tc layer > >> (independent of BPF) from ingress & egress side generally assumes unshared skb, > >> so above clone + dropping ref of nskb looks okay to make the main skb struct private > >> for mangling attributes (e.g. mark) & should suffice. What is the exact purpose of > >> the additional skb_unclone() in this context? > > > > Willem de Bruijn said: > > udp_rcv_segment later converts the udp-gro-list skb to a list of > > regular packets to pass these one-by-one to udp_queue_rcv_one_skb. > > Now all the frags are fully fledged packets, with headers pushed > > before the payload. > > Yes. > > > PF_PACKET handles untouched fraglist. To modify the payload only > > for udp_rcv_segment, skb_unclone is necessary. > > I don't parse this last sentence here, please elaborate in more detail on why > it is necessary. > > For example, if tc layer would modify mark on the skb, then __copy_skb_header() > in skb_segment_list() will propagate it. If tc layer would modify payload, the > skb_ensure_writable() will take care of that internally and if needed pull in > parts from fraglist into linear section to make it private. The purpose of the > skb_clone() above iff shared is to make the struct itself private (to safely > modify its struct members). What am I missing? If tc writes, it will call skb_make_writable and thus pskb_expand_head to create a private linear section for the head_skb. skb_segment_list overwrites part of the skb linear section of each fragment itself. Even after skb_clone, the frag_skbs share their linear section with their clone in pf_packet, so we need a pskb_expand_head here, too. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH net v2] net: fix use-after-free when UDP GRO with shared fraglist 2021-01-07 13:05 ` Willem de Bruijn @ 2021-01-07 13:33 ` Daniel Borkmann 2021-01-07 14:44 ` Willem de Bruijn 0 siblings, 1 reply; 21+ messages in thread From: Daniel Borkmann @ 2021-01-07 13:33 UTC (permalink / raw) To: Willem de Bruijn Cc: Dongseok Yi, David S. Miller, Willem de Bruijn, Jakub Kicinski, Miaohe Lin, Paolo Abeni, Florian Westphal, Al Viro, Guillaume Nault, Yunsheng Lin, Steffen Klassert, Yadu Kishore, Marco Elver, Network Development, LKML, namkyu78.kim On 1/7/21 2:05 PM, Willem de Bruijn wrote: > On Thu, Jan 7, 2021 at 7:52 AM Daniel Borkmann <daniel@iogearbox.net> wrote: >> On 1/7/21 12:40 PM, Dongseok Yi wrote: >>> On 2021-01-07 20:05, Daniel Borkmann wrote: >>>> On 1/7/21 1:39 AM, Dongseok Yi wrote: >>>>> skbs in fraglist could be shared by a BPF filter loaded at TC. It >>>>> triggers skb_ensure_writable -> pskb_expand_head -> >>>>> skb_clone_fraglist -> skb_get on each skb in the fraglist. >>>>> >>>>> While tcpdump, sk_receive_queue of PF_PACKET has the original fraglist. >>>>> But the same fraglist is queued to PF_INET (or PF_INET6) as the fraglist >>>>> chain made by skb_segment_list. >>>>> >>>>> If the new skb (not fraglist) is queued to one of the sk_receive_queue, >>>>> multiple ptypes can see this. The skb could be released by ptypes and >>>>> it causes use-after-free. >>>>> >>>>> [ 4443.426215] ------------[ cut here ]------------ >>>>> [ 4443.426222] refcount_t: underflow; use-after-free. >>>>> [ 4443.426291] WARNING: CPU: 7 PID: 28161 at lib/refcount.c:190 >>>>> refcount_dec_and_test_checked+0xa4/0xc8 >>>>> [ 4443.426726] pstate: 60400005 (nZCv daif +PAN -UAO) >>>>> [ 4443.426732] pc : refcount_dec_and_test_checked+0xa4/0xc8 >>>>> [ 4443.426737] lr : refcount_dec_and_test_checked+0xa0/0xc8 >>>>> [ 4443.426808] Call trace: >>>>> [ 4443.426813] refcount_dec_and_test_checked+0xa4/0xc8 >>>>> [ 4443.426823] skb_release_data+0x144/0x264 >>>>> [ 4443.426828] kfree_skb+0x58/0xc4 >>>>> [ 4443.426832] skb_queue_purge+0x64/0x9c >>>>> [ 4443.426844] packet_set_ring+0x5f0/0x820 >>>>> [ 4443.426849] packet_setsockopt+0x5a4/0xcd0 >>>>> [ 4443.426853] __sys_setsockopt+0x188/0x278 >>>>> [ 4443.426858] __arm64_sys_setsockopt+0x28/0x38 >>>>> [ 4443.426869] el0_svc_common+0xf0/0x1d0 >>>>> [ 4443.426873] el0_svc_handler+0x74/0x98 >>>>> [ 4443.426880] el0_svc+0x8/0xc >>>>> >>>>> Fixes: 3a1296a38d0c (net: Support GRO/GSO fraglist chaining.) >>>>> Signed-off-by: Dongseok Yi <dseok.yi@samsung.com> >>>>> Acked-by: Willem de Bruijn <willemb@google.com> >>>>> --- >>>>> net/core/skbuff.c | 20 +++++++++++++++++++- >>>>> 1 file changed, 19 insertions(+), 1 deletion(-) >>>>> >>>>> v2: Expand the commit message to clarify a BPF filter loaded >>>>> >>>>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c >>>>> index f62cae3..1dcbda8 100644 >>>>> --- a/net/core/skbuff.c >>>>> +++ b/net/core/skbuff.c >>>>> @@ -3655,7 +3655,8 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb, >>>>> unsigned int delta_truesize = 0; >>>>> unsigned int delta_len = 0; >>>>> struct sk_buff *tail = NULL; >>>>> - struct sk_buff *nskb; >>>>> + struct sk_buff *nskb, *tmp; >>>>> + int err; >>>>> >>>>> skb_push(skb, -skb_network_offset(skb) + offset); >>>>> >>>>> @@ -3665,11 +3666,28 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb, >>>>> nskb = list_skb; >>>>> list_skb = list_skb->next; >>>>> >>>>> + err = 0; >>>>> + if (skb_shared(nskb)) { >>>>> + tmp = skb_clone(nskb, GFP_ATOMIC); >>>>> + if (tmp) { >>>>> + kfree_skb(nskb); >>>> >>>> Should use consume_skb() to not trigger skb:kfree_skb tracepoint when looking >>>> for drops in the stack. >>> >>> I will use to consume_skb() on the next version. >>> >>>>> + nskb = tmp; >>>>> + err = skb_unclone(nskb, GFP_ATOMIC); >>>> >>>> Could you elaborate why you also need to unclone? This looks odd here. tc layer >>>> (independent of BPF) from ingress & egress side generally assumes unshared skb, >>>> so above clone + dropping ref of nskb looks okay to make the main skb struct private >>>> for mangling attributes (e.g. mark) & should suffice. What is the exact purpose of >>>> the additional skb_unclone() in this context? >>> >>> Willem de Bruijn said: >>> udp_rcv_segment later converts the udp-gro-list skb to a list of >>> regular packets to pass these one-by-one to udp_queue_rcv_one_skb. >>> Now all the frags are fully fledged packets, with headers pushed >>> before the payload. >> >> Yes. >> >>> PF_PACKET handles untouched fraglist. To modify the payload only >>> for udp_rcv_segment, skb_unclone is necessary. >> >> I don't parse this last sentence here, please elaborate in more detail on why >> it is necessary. >> >> For example, if tc layer would modify mark on the skb, then __copy_skb_header() >> in skb_segment_list() will propagate it. If tc layer would modify payload, the >> skb_ensure_writable() will take care of that internally and if needed pull in >> parts from fraglist into linear section to make it private. The purpose of the >> skb_clone() above iff shared is to make the struct itself private (to safely >> modify its struct members). What am I missing? > > If tc writes, it will call skb_make_writable and thus pskb_expand_head > to create a private linear section for the head_skb. > > skb_segment_list overwrites part of the skb linear section of each > fragment itself. Even after skb_clone, the frag_skbs share their > linear section with their clone in pf_packet, so we need a > pskb_expand_head here, too. Ok, got it, thanks for the explanation. Would be great to have it in the v3 commit log as well as that was more clear than the above. Too bad in that case (otoh the pf_packet situation could be considered corner case ...); ether way, fix makes sense then. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH net v2] net: fix use-after-free when UDP GRO with shared fraglist 2021-01-07 13:33 ` Daniel Borkmann @ 2021-01-07 14:44 ` Willem de Bruijn 2021-01-08 10:32 ` Daniel Borkmann 0 siblings, 1 reply; 21+ messages in thread From: Willem de Bruijn @ 2021-01-07 14:44 UTC (permalink / raw) To: Daniel Borkmann Cc: Dongseok Yi, David S. Miller, Willem de Bruijn, Jakub Kicinski, Miaohe Lin, Paolo Abeni, Florian Westphal, Al Viro, Guillaume Nault, Yunsheng Lin, Steffen Klassert, Yadu Kishore, Marco Elver, Network Development, LKML, namkyu78.kim On Thu, Jan 7, 2021 at 8:33 AM Daniel Borkmann <daniel@iogearbox.net> wrote: > > On 1/7/21 2:05 PM, Willem de Bruijn wrote: > > On Thu, Jan 7, 2021 at 7:52 AM Daniel Borkmann <daniel@iogearbox.net> wrote: > >> On 1/7/21 12:40 PM, Dongseok Yi wrote: > >>> On 2021-01-07 20:05, Daniel Borkmann wrote: > >>>> On 1/7/21 1:39 AM, Dongseok Yi wrote: > >>>>> skbs in fraglist could be shared by a BPF filter loaded at TC. It > >>>>> triggers skb_ensure_writable -> pskb_expand_head -> > >>>>> skb_clone_fraglist -> skb_get on each skb in the fraglist. > >>>>> > >>>>> While tcpdump, sk_receive_queue of PF_PACKET has the original fraglist. > >>>>> But the same fraglist is queued to PF_INET (or PF_INET6) as the fraglist > >>>>> chain made by skb_segment_list. > >>>>> > >>>>> If the new skb (not fraglist) is queued to one of the sk_receive_queue, > >>>>> multiple ptypes can see this. The skb could be released by ptypes and > >>>>> it causes use-after-free. > >>>>> > >>>>> [ 4443.426215] ------------[ cut here ]------------ > >>>>> [ 4443.426222] refcount_t: underflow; use-after-free. > >>>>> [ 4443.426291] WARNING: CPU: 7 PID: 28161 at lib/refcount.c:190 > >>>>> refcount_dec_and_test_checked+0xa4/0xc8 > >>>>> [ 4443.426726] pstate: 60400005 (nZCv daif +PAN -UAO) > >>>>> [ 4443.426732] pc : refcount_dec_and_test_checked+0xa4/0xc8 > >>>>> [ 4443.426737] lr : refcount_dec_and_test_checked+0xa0/0xc8 > >>>>> [ 4443.426808] Call trace: > >>>>> [ 4443.426813] refcount_dec_and_test_checked+0xa4/0xc8 > >>>>> [ 4443.426823] skb_release_data+0x144/0x264 > >>>>> [ 4443.426828] kfree_skb+0x58/0xc4 > >>>>> [ 4443.426832] skb_queue_purge+0x64/0x9c > >>>>> [ 4443.426844] packet_set_ring+0x5f0/0x820 > >>>>> [ 4443.426849] packet_setsockopt+0x5a4/0xcd0 > >>>>> [ 4443.426853] __sys_setsockopt+0x188/0x278 > >>>>> [ 4443.426858] __arm64_sys_setsockopt+0x28/0x38 > >>>>> [ 4443.426869] el0_svc_common+0xf0/0x1d0 > >>>>> [ 4443.426873] el0_svc_handler+0x74/0x98 > >>>>> [ 4443.426880] el0_svc+0x8/0xc > >>>>> > >>>>> Fixes: 3a1296a38d0c (net: Support GRO/GSO fraglist chaining.) > >>>>> Signed-off-by: Dongseok Yi <dseok.yi@samsung.com> > >>>>> Acked-by: Willem de Bruijn <willemb@google.com> > >>>>> --- > >>>>> net/core/skbuff.c | 20 +++++++++++++++++++- > >>>>> 1 file changed, 19 insertions(+), 1 deletion(-) > >>>>> > >>>>> v2: Expand the commit message to clarify a BPF filter loaded > >>>>> > >>>>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c > >>>>> index f62cae3..1dcbda8 100644 > >>>>> --- a/net/core/skbuff.c > >>>>> +++ b/net/core/skbuff.c > >>>>> @@ -3655,7 +3655,8 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb, > >>>>> unsigned int delta_truesize = 0; > >>>>> unsigned int delta_len = 0; > >>>>> struct sk_buff *tail = NULL; > >>>>> - struct sk_buff *nskb; > >>>>> + struct sk_buff *nskb, *tmp; > >>>>> + int err; > >>>>> > >>>>> skb_push(skb, -skb_network_offset(skb) + offset); > >>>>> > >>>>> @@ -3665,11 +3666,28 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb, > >>>>> nskb = list_skb; > >>>>> list_skb = list_skb->next; > >>>>> > >>>>> + err = 0; > >>>>> + if (skb_shared(nskb)) { > >>>>> + tmp = skb_clone(nskb, GFP_ATOMIC); > >>>>> + if (tmp) { > >>>>> + kfree_skb(nskb); > >>>> > >>>> Should use consume_skb() to not trigger skb:kfree_skb tracepoint when looking > >>>> for drops in the stack. > >>> > >>> I will use to consume_skb() on the next version. > >>> > >>>>> + nskb = tmp; > >>>>> + err = skb_unclone(nskb, GFP_ATOMIC); > >>>> > >>>> Could you elaborate why you also need to unclone? This looks odd here. tc layer > >>>> (independent of BPF) from ingress & egress side generally assumes unshared skb, > >>>> so above clone + dropping ref of nskb looks okay to make the main skb struct private > >>>> for mangling attributes (e.g. mark) & should suffice. What is the exact purpose of > >>>> the additional skb_unclone() in this context? > >>> > >>> Willem de Bruijn said: > >>> udp_rcv_segment later converts the udp-gro-list skb to a list of > >>> regular packets to pass these one-by-one to udp_queue_rcv_one_skb. > >>> Now all the frags are fully fledged packets, with headers pushed > >>> before the payload. > >> > >> Yes. > >> > >>> PF_PACKET handles untouched fraglist. To modify the payload only > >>> for udp_rcv_segment, skb_unclone is necessary. > >> > >> I don't parse this last sentence here, please elaborate in more detail on why > >> it is necessary. > >> > >> For example, if tc layer would modify mark on the skb, then __copy_skb_header() > >> in skb_segment_list() will propagate it. If tc layer would modify payload, the > >> skb_ensure_writable() will take care of that internally and if needed pull in > >> parts from fraglist into linear section to make it private. The purpose of the > >> skb_clone() above iff shared is to make the struct itself private (to safely > >> modify its struct members). What am I missing? > > > > If tc writes, it will call skb_make_writable and thus pskb_expand_head > > to create a private linear section for the head_skb. > > > > skb_segment_list overwrites part of the skb linear section of each > > fragment itself. Even after skb_clone, the frag_skbs share their > > linear section with their clone in pf_packet, so we need a > > pskb_expand_head here, too. > > Ok, got it, thanks for the explanation. Would be great to have it in the v3 commit > log as well as that was more clear than the above. Too bad in that case (otoh > the pf_packet situation could be considered corner case ...); ether way, fix makes > sense then. Thanks for double checking the tricky logic. Pf_packet + BPF is indeed a peculiar corner case. But there are perhaps more, like raw sockets, and other BPF hooks that can trigger an skb_make_writable. Come to think of it, the no touching shared data rule is also violated without a BPF program? Then there is nothing in the frag_skbs themselves signifying that they are shared, but the head_skb is cloned, so its data may still not be modified. This modification happens to be safe in practice, as the pf_packet clones don't access the bytes before skb->data where this path inserts headers. But still. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH net v2] net: fix use-after-free when UDP GRO with shared fraglist 2021-01-07 14:44 ` Willem de Bruijn @ 2021-01-08 10:32 ` Daniel Borkmann 0 siblings, 0 replies; 21+ messages in thread From: Daniel Borkmann @ 2021-01-08 10:32 UTC (permalink / raw) To: Willem de Bruijn Cc: Dongseok Yi, David S. Miller, Willem de Bruijn, Jakub Kicinski, Miaohe Lin, Paolo Abeni, Florian Westphal, Al Viro, Guillaume Nault, Yunsheng Lin, Steffen Klassert, Yadu Kishore, Marco Elver, Network Development, LKML, namkyu78.kim On 1/7/21 3:44 PM, Willem de Bruijn wrote: > On Thu, Jan 7, 2021 at 8:33 AM Daniel Borkmann <daniel@iogearbox.net> wrote: >> On 1/7/21 2:05 PM, Willem de Bruijn wrote: >>> On Thu, Jan 7, 2021 at 7:52 AM Daniel Borkmann <daniel@iogearbox.net> wrote: >>>> On 1/7/21 12:40 PM, Dongseok Yi wrote: >>>>> On 2021-01-07 20:05, Daniel Borkmann wrote: >>>>>> On 1/7/21 1:39 AM, Dongseok Yi wrote: [...] >>>>> PF_PACKET handles untouched fraglist. To modify the payload only >>>>> for udp_rcv_segment, skb_unclone is necessary. >>>> >>>> I don't parse this last sentence here, please elaborate in more detail on why >>>> it is necessary. >>>> >>>> For example, if tc layer would modify mark on the skb, then __copy_skb_header() >>>> in skb_segment_list() will propagate it. If tc layer would modify payload, the >>>> skb_ensure_writable() will take care of that internally and if needed pull in >>>> parts from fraglist into linear section to make it private. The purpose of the >>>> skb_clone() above iff shared is to make the struct itself private (to safely >>>> modify its struct members). What am I missing? >>> >>> If tc writes, it will call skb_make_writable and thus pskb_expand_head >>> to create a private linear section for the head_skb. >>> >>> skb_segment_list overwrites part of the skb linear section of each >>> fragment itself. Even after skb_clone, the frag_skbs share their >>> linear section with their clone in pf_packet, so we need a >>> pskb_expand_head here, too. >> >> Ok, got it, thanks for the explanation. Would be great to have it in the v3 commit >> log as well as that was more clear than the above. Too bad in that case (otoh >> the pf_packet situation could be considered corner case ...); ether way, fix makes >> sense then. > > Thanks for double checking the tricky logic. Pf_packet + BPF is indeed > a peculiar corner case. But there are perhaps more, like raw sockets, > and other BPF hooks that can trigger an skb_make_writable. > > Come to think of it, the no touching shared data rule is also violated > without a BPF program? Then there is nothing in the frag_skbs > themselves signifying that they are shared, but the head_skb is > cloned, so its data may still not be modified. The skb_ensure_writable() is used in plenty of places from bpf, ovs, netfilter to core stack (e.g. vlan, mpls, icmp), but either way there should be nothing wrong with that as it's making sure to pull the data into its linear section before modification. Uncareful use of skb_store_bits() with offset into skb_frags for example could defeat that, but I don't see any in-tree occurrence that would be problematic at this point.. > This modification happens to be safe in practice, as the pf_packet > clones don't access the bytes before skb->data where this path inserts > headers. But still. ^ permalink raw reply [flat|nested] 21+ messages in thread
[parent not found: <CGME20210108024017epcas2p455fe96b8483880f9b7a654dbcf600b20@epcas2p4.samsung.com>]
* [PATCH net v3] net: fix use-after-free when UDP GRO with shared fraglist [not found] ` <CGME20210108024017epcas2p455fe96b8483880f9b7a654dbcf600b20@epcas2p4.samsung.com> @ 2021-01-08 2:28 ` Dongseok Yi 2021-01-08 10:18 ` Daniel Borkmann 0 siblings, 1 reply; 21+ messages in thread From: Dongseok Yi @ 2021-01-08 2:28 UTC (permalink / raw) To: David S. Miller, Willem de Bruijn, Daniel Borkmann Cc: Jakub Kicinski, Miaohe Lin, Paolo Abeni, Florian Westphal, Al Viro, Guillaume Nault, Yunsheng Lin, Steffen Klassert, Yadu Kishore, Marco Elver, netdev, linux-kernel, namkyu78.kim, Dongseok Yi skbs in fraglist could be shared by a BPF filter loaded at TC. If TC writes, it will call skb_ensure_writable -> pskb_expand_head to create a private linear section for the head_skb. And then call skb_clone_fraglist -> skb_get on each skb in the fraglist. skb_segment_list overwrites part of the skb linear section of each fragment itself. Even after skb_clone, the frag_skbs share their linear section with their clone in PF_PACKET. Both sk_receive_queue of PF_PACKET and PF_INET (or PF_INET6) can have a link for the same frag_skbs chain. If a new skb (not frags) is queued to one of the sk_receive_queue, multiple ptypes can see and release this. It causes use-after-free. [ 4443.426215] ------------[ cut here ]------------ [ 4443.426222] refcount_t: underflow; use-after-free. [ 4443.426291] WARNING: CPU: 7 PID: 28161 at lib/refcount.c:190 refcount_dec_and_test_checked+0xa4/0xc8 [ 4443.426726] pstate: 60400005 (nZCv daif +PAN -UAO) [ 4443.426732] pc : refcount_dec_and_test_checked+0xa4/0xc8 [ 4443.426737] lr : refcount_dec_and_test_checked+0xa0/0xc8 [ 4443.426808] Call trace: [ 4443.426813] refcount_dec_and_test_checked+0xa4/0xc8 [ 4443.426823] skb_release_data+0x144/0x264 [ 4443.426828] kfree_skb+0x58/0xc4 [ 4443.426832] skb_queue_purge+0x64/0x9c [ 4443.426844] packet_set_ring+0x5f0/0x820 [ 4443.426849] packet_setsockopt+0x5a4/0xcd0 [ 4443.426853] __sys_setsockopt+0x188/0x278 [ 4443.426858] __arm64_sys_setsockopt+0x28/0x38 [ 4443.426869] el0_svc_common+0xf0/0x1d0 [ 4443.426873] el0_svc_handler+0x74/0x98 [ 4443.426880] el0_svc+0x8/0xc Fixes: 3a1296a38d0c (net: Support GRO/GSO fraglist chaining.) Signed-off-by: Dongseok Yi <dseok.yi@samsung.com> Acked-by: Willem de Bruijn <willemb@google.com> --- net/core/skbuff.c | 20 +++++++++++++++++++- 1 file changed, 19 insertions(+), 1 deletion(-) v2: per Willem de Bruijn request, expanded the commit message to clarify a BPF filter loaded. v3: per Daniel Borkmann request, added further details from Willem in the commit message, and use consume_skb instead of kfree_skb. diff --git a/net/core/skbuff.c b/net/core/skbuff.c index f62cae3..b6f2b52 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -3655,7 +3655,8 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb, unsigned int delta_truesize = 0; unsigned int delta_len = 0; struct sk_buff *tail = NULL; - struct sk_buff *nskb; + struct sk_buff *nskb, *tmp; + int err; skb_push(skb, -skb_network_offset(skb) + offset); @@ -3665,11 +3666,28 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb, nskb = list_skb; list_skb = list_skb->next; + err = 0; + if (skb_shared(nskb)) { + tmp = skb_clone(nskb, GFP_ATOMIC); + if (tmp) { + consume_skb(nskb); + nskb = tmp; + err = skb_unclone(nskb, GFP_ATOMIC); + } else { + err = -ENOMEM; + } + } + if (!tail) skb->next = nskb; else tail->next = nskb; + if (unlikely(err)) { + nskb->next = list_skb; + goto err_linearize; + } + tail = nskb; delta_len += nskb->len; -- 2.7.4 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH net v3] net: fix use-after-free when UDP GRO with shared fraglist 2021-01-08 2:28 ` [PATCH net v3] " Dongseok Yi @ 2021-01-08 10:18 ` Daniel Borkmann 2021-01-09 3:14 ` Jakub Kicinski 0 siblings, 1 reply; 21+ messages in thread From: Daniel Borkmann @ 2021-01-08 10:18 UTC (permalink / raw) To: Dongseok Yi, David S. Miller, Willem de Bruijn Cc: Jakub Kicinski, Miaohe Lin, Paolo Abeni, Florian Westphal, Al Viro, Guillaume Nault, Yunsheng Lin, Steffen Klassert, Yadu Kishore, Marco Elver, netdev, linux-kernel, namkyu78.kim On 1/8/21 3:28 AM, Dongseok Yi wrote: > skbs in fraglist could be shared by a BPF filter loaded at TC. If TC > writes, it will call skb_ensure_writable -> pskb_expand_head to create > a private linear section for the head_skb. And then call > skb_clone_fraglist -> skb_get on each skb in the fraglist. > > skb_segment_list overwrites part of the skb linear section of each > fragment itself. Even after skb_clone, the frag_skbs share their > linear section with their clone in PF_PACKET. > > Both sk_receive_queue of PF_PACKET and PF_INET (or PF_INET6) can have > a link for the same frag_skbs chain. If a new skb (not frags) is > queued to one of the sk_receive_queue, multiple ptypes can see and > release this. It causes use-after-free. > > [ 4443.426215] ------------[ cut here ]------------ > [ 4443.426222] refcount_t: underflow; use-after-free. > [ 4443.426291] WARNING: CPU: 7 PID: 28161 at lib/refcount.c:190 > refcount_dec_and_test_checked+0xa4/0xc8 > [ 4443.426726] pstate: 60400005 (nZCv daif +PAN -UAO) > [ 4443.426732] pc : refcount_dec_and_test_checked+0xa4/0xc8 > [ 4443.426737] lr : refcount_dec_and_test_checked+0xa0/0xc8 > [ 4443.426808] Call trace: > [ 4443.426813] refcount_dec_and_test_checked+0xa4/0xc8 > [ 4443.426823] skb_release_data+0x144/0x264 > [ 4443.426828] kfree_skb+0x58/0xc4 > [ 4443.426832] skb_queue_purge+0x64/0x9c > [ 4443.426844] packet_set_ring+0x5f0/0x820 > [ 4443.426849] packet_setsockopt+0x5a4/0xcd0 > [ 4443.426853] __sys_setsockopt+0x188/0x278 > [ 4443.426858] __arm64_sys_setsockopt+0x28/0x38 > [ 4443.426869] el0_svc_common+0xf0/0x1d0 > [ 4443.426873] el0_svc_handler+0x74/0x98 > [ 4443.426880] el0_svc+0x8/0xc > > Fixes: 3a1296a38d0c (net: Support GRO/GSO fraglist chaining.) > Signed-off-by: Dongseok Yi <dseok.yi@samsung.com> > Acked-by: Willem de Bruijn <willemb@google.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH net v3] net: fix use-after-free when UDP GRO with shared fraglist 2021-01-08 10:18 ` Daniel Borkmann @ 2021-01-09 3:14 ` Jakub Kicinski 0 siblings, 0 replies; 21+ messages in thread From: Jakub Kicinski @ 2021-01-09 3:14 UTC (permalink / raw) To: Daniel Borkmann, Dongseok Yi, Willem de Bruijn Cc: David S. Miller, Miaohe Lin, Paolo Abeni, Florian Westphal, Al Viro, Guillaume Nault, Yunsheng Lin, Steffen Klassert, Yadu Kishore, Marco Elver, netdev, linux-kernel, namkyu78.kim On Fri, 8 Jan 2021 11:18:39 +0100 Daniel Borkmann wrote: > On 1/8/21 3:28 AM, Dongseok Yi wrote: > > skbs in fraglist could be shared by a BPF filter loaded at TC. If TC > > writes, it will call skb_ensure_writable -> pskb_expand_head to create > > a private linear section for the head_skb. And then call > > skb_clone_fraglist -> skb_get on each skb in the fraglist. > > > > skb_segment_list overwrites part of the skb linear section of each > > fragment itself. Even after skb_clone, the frag_skbs share their > > linear section with their clone in PF_PACKET. > > > > Both sk_receive_queue of PF_PACKET and PF_INET (or PF_INET6) can have > > a link for the same frag_skbs chain. If a new skb (not frags) is > > queued to one of the sk_receive_queue, multiple ptypes can see and > > release this. It causes use-after-free. > > > > [ 4443.426215] ------------[ cut here ]------------ > > [ 4443.426222] refcount_t: underflow; use-after-free. > > [ 4443.426291] WARNING: CPU: 7 PID: 28161 at lib/refcount.c:190 > > refcount_dec_and_test_checked+0xa4/0xc8 > > [ 4443.426726] pstate: 60400005 (nZCv daif +PAN -UAO) > > [ 4443.426732] pc : refcount_dec_and_test_checked+0xa4/0xc8 > > [ 4443.426737] lr : refcount_dec_and_test_checked+0xa0/0xc8 > > [ 4443.426808] Call trace: > > [ 4443.426813] refcount_dec_and_test_checked+0xa4/0xc8 > > [ 4443.426823] skb_release_data+0x144/0x264 > > [ 4443.426828] kfree_skb+0x58/0xc4 > > [ 4443.426832] skb_queue_purge+0x64/0x9c > > [ 4443.426844] packet_set_ring+0x5f0/0x820 > > [ 4443.426849] packet_setsockopt+0x5a4/0xcd0 > > [ 4443.426853] __sys_setsockopt+0x188/0x278 > > [ 4443.426858] __arm64_sys_setsockopt+0x28/0x38 > > [ 4443.426869] el0_svc_common+0xf0/0x1d0 > > [ 4443.426873] el0_svc_handler+0x74/0x98 > > [ 4443.426880] el0_svc+0x8/0xc > > > > Fixes: 3a1296a38d0c (net: Support GRO/GSO fraglist chaining.) > > Signed-off-by: Dongseok Yi <dseok.yi@samsung.com> > > Acked-by: Willem de Bruijn <willemb@google.com> > > Acked-by: Daniel Borkmann <daniel@iogearbox.net> Applied, thanks! ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2021-04-21 11:04 UTC | newest] Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <CGME20210104085750epcas2p1a5b22559d87df61ef3c8215ae0b470b5@epcas2p1.samsung.com> 2021-01-04 8:46 ` [PATCH net] net: fix use-after-free when UDP GRO with shared fraglist Dongseok Yi 2021-01-04 21:03 ` Willem de Bruijn 2021-01-06 1:29 ` Dongseok Yi 2021-01-06 3:07 ` Willem de Bruijn 2021-01-06 3:32 ` Dongseok Yi 2021-01-06 17:13 ` Willem de Bruijn 2021-04-17 3:44 ` Yunsheng Lin 2021-04-19 0:35 ` Dongseok Yi 2021-04-21 9:42 ` Yunsheng Lin 2021-04-21 11:04 ` Dongseok Yi [not found] ` <CGME20210107005028epcas2p35dfa745fd92e31400024874f54243556@epcas2p3.samsung.com> 2021-01-07 0:39 ` [PATCH net v2] " Dongseok Yi 2021-01-07 11:05 ` Daniel Borkmann 2021-01-07 11:40 ` Dongseok Yi 2021-01-07 12:49 ` Daniel Borkmann 2021-01-07 13:05 ` Willem de Bruijn 2021-01-07 13:33 ` Daniel Borkmann 2021-01-07 14:44 ` Willem de Bruijn 2021-01-08 10:32 ` Daniel Borkmann [not found] ` <CGME20210108024017epcas2p455fe96b8483880f9b7a654dbcf600b20@epcas2p4.samsung.com> 2021-01-08 2:28 ` [PATCH net v3] " Dongseok Yi 2021-01-08 10:18 ` Daniel Borkmann 2021-01-09 3:14 ` Jakub Kicinski
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).