All of lore.kernel.org
 help / color / mirror / Atom feed
From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
To: Dongseok Yi <dseok.yi@samsung.com>
Cc: "David S. Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>,
	Miaohe Lin <linmiaohe@huawei.com>,
	Willem de Bruijn <willemb@google.com>,
	Paolo Abeni <pabeni@redhat.com>, Florian Westphal <fw@strlen.de>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Guillaume Nault <gnault@redhat.com>,
	Yunsheng Lin <linyunsheng@huawei.com>,
	Steffen Klassert <steffen.klassert@secunet.com>,
	Yadu Kishore <kyk.segfault@gmail.com>,
	Marco Elver <elver@google.com>,
	Network Development <netdev@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	namkyu78.kim@samsung.com
Subject: Re: [PATCH net] net: fix use-after-free when UDP GRO with shared fraglist
Date: Wed, 6 Jan 2021 12:13:33 -0500	[thread overview]
Message-ID: <CAF=yD-+w489MoSKfpaH23dYXhVCL2qh4f0x4COd2nsT5DT8Aiw@mail.gmail.com> (raw)
In-Reply-To: <019b01d6e3dc$9a940330$cfbc0990$@samsung.com>

On Tue, Jan 5, 2021 at 10:32 PM Dongseok Yi <dseok.yi@samsung.com> wrote:
>
> On 2021-01-06 12:07, Willem de Bruijn wrote:
> >
> > On Tue, Jan 5, 2021 at 8:29 PM Dongseok Yi <dseok.yi@samsung.com> wrote:
> > >
> > > On 2021-01-05 06:03, Willem de Bruijn wrote:
> > > >
> > > > On Mon, Jan 4, 2021 at 4:00 AM Dongseok Yi <dseok.yi@samsung.com> wrote:
> > > > >
> > > > > skbs in frag_list could be shared by pskb_expand_head() from BPF.
> > > >
> > > > Can you elaborate on the BPF connection?
> > >
> > > With the following registered ptypes,
> > >
> > > /proc/net # cat ptype
> > > Type Device      Function
> > > ALL           tpacket_rcv
> > > 0800          ip_rcv.cfi_jt
> > > 0011          llc_rcv.cfi_jt
> > > 0004          llc_rcv.cfi_jt
> > > 0806          arp_rcv
> > > 86dd          ipv6_rcv.cfi_jt
> > >
> > > BPF checks skb_ensure_writable between tpacket_rcv and ip_rcv
> > > (or ipv6_rcv). And it calls pskb_expand_head.
> > >
> > > [  132.051228] pskb_expand_head+0x360/0x378
> > > [  132.051237] skb_ensure_writable+0xa0/0xc4
> > > [  132.051249] bpf_skb_pull_data+0x28/0x60
> > > [  132.051262] bpf_prog_331d69c77ea5e964_schedcls_ingres+0x5f4/0x1000
> > > [  132.051273] cls_bpf_classify+0x254/0x348
> > > [  132.051284] tcf_classify+0xa4/0x180
> >
> > Ah, you have a BPF program loaded at TC. That was not entirely obvious.
> >
> > This program gets called after packet sockets with ptype_all, before
> > those with a specific protocol.
> >
> > Tcpdump will have inserted a program with ptype_all, which cloned the
> > skb. This triggers skb_ensure_writable -> pskb_expand_head ->
> > skb_clone_fraglist -> skb_get.
> >
> > > [  132.051294] __netif_receive_skb_core+0x590/0xd28
> > > [  132.051303] __netif_receive_skb+0x50/0x17c
> > > [  132.051312] process_backlog+0x15c/0x1b8
> > >
> > > >
> > > > > While tcpdump, sk_receive_queue of PF_PACKET has the original frag_list.
> > > > > But the same frag_list is queued to PF_INET (or PF_INET6) as the fraglist
> > > > > chain made by skb_segment_list().
> > > > >
> > > > > If the new skb (not frag_list) is queued to one of the sk_receive_queue,
> > > > > multiple ptypes can see this. The skb could be released by ptypes and
> > > > > it causes use-after-free.
> > > >
> > > > If I understand correctly, a udp-gro-list skb makes it up the receive
> > > > path with one or more active packet sockets.
> > > >
> > > > The packet socket will call skb_clone after accepting the filter. This
> > > > replaces the head_skb, but shares the skb_shinfo and thus frag_list.
> > > >
> > > > udp_rcv_segment later converts the udp-gro-list skb to a list of
> > > > regular packets to pass these one-by-one to udp_queue_rcv_one_skb.
> > > > Now all the frags are fully fledged packets, with headers pushed
> > > > before the payload. This does not change their refcount anymore than
> > > > the skb_clone in pf_packet did. This should be 1.
> > > >
> > > > Eventually udp_recvmsg will call skb_consume_udp on each packet.
> > > >
> > > > The packet socket eventually also frees its cloned head_skb, which triggers
> > > >
> > > >   kfree_skb_list(shinfo->frag_list)
> > > >     kfree_skb
> > > >       skb_unref
> > > >         refcount_dec_and_test(&skb->users)
> > >
> > > Every your understanding is right, but
> > >
> > > >
> > > > >
> > > > > [ 4443.426215] ------------[ cut here ]------------
> > > > > [ 4443.426222] refcount_t: underflow; use-after-free.
> > > > > [ 4443.426291] WARNING: CPU: 7 PID: 28161 at lib/refcount.c:190
> > > > > refcount_dec_and_test_checked+0xa4/0xc8
> > > > > [ 4443.426726] pstate: 60400005 (nZCv daif +PAN -UAO)
> > > > > [ 4443.426732] pc : refcount_dec_and_test_checked+0xa4/0xc8
> > > > > [ 4443.426737] lr : refcount_dec_and_test_checked+0xa0/0xc8
> > > > > [ 4443.426808] Call trace:
> > > > > [ 4443.426813]  refcount_dec_and_test_checked+0xa4/0xc8
> > > > > [ 4443.426823]  skb_release_data+0x144/0x264
> > > > > [ 4443.426828]  kfree_skb+0x58/0xc4
> > > > > [ 4443.426832]  skb_queue_purge+0x64/0x9c
> > > > > [ 4443.426844]  packet_set_ring+0x5f0/0x820
> > > > > [ 4443.426849]  packet_setsockopt+0x5a4/0xcd0
> > > > > [ 4443.426853]  __sys_setsockopt+0x188/0x278
> > > > > [ 4443.426858]  __arm64_sys_setsockopt+0x28/0x38
> > > > > [ 4443.426869]  el0_svc_common+0xf0/0x1d0
> > > > > [ 4443.426873]  el0_svc_handler+0x74/0x98
> > > > > [ 4443.426880]  el0_svc+0x8/0xc
> > > > >
> > > > > Fixes: 3a1296a38d0c (net: Support GRO/GSO fraglist chaining.)
> > > > > Signed-off-by: Dongseok Yi <dseok.yi@samsung.com>
> > > > > ---
> > > > >  net/core/skbuff.c | 20 +++++++++++++++++++-
> > > > >  1 file changed, 19 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > > > > index f62cae3..1dcbda8 100644
> > > > > --- a/net/core/skbuff.c
> > > > > +++ b/net/core/skbuff.c
> > > > > @@ -3655,7 +3655,8 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb,
> > > > >         unsigned int delta_truesize = 0;
> > > > >         unsigned int delta_len = 0;
> > > > >         struct sk_buff *tail = NULL;
> > > > > -       struct sk_buff *nskb;
> > > > > +       struct sk_buff *nskb, *tmp;
> > > > > +       int err;
> > > > >
> > > > >         skb_push(skb, -skb_network_offset(skb) + offset);
> > > > >
> > > > > @@ -3665,11 +3666,28 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb,
> > > > >                 nskb = list_skb;
> > > > >                 list_skb = list_skb->next;
> > > > >
> > > > > +               err = 0;
> > > > > +               if (skb_shared(nskb)) {
> > > >
> > > > I must be missing something still. This does not square with my
> > > > understanding that the two sockets are operating on clones, with each
> > > > frag_list skb having skb->users == 1.
> > > >
> > > > Unless the packet socket patch previously also triggered an
> > > > skb_unclone/pskb_expand_head, as that call skb_clone_fraglist, which
> > > > calls skb_get on each frag_list skb.
> > >
> > > A cloned skb after tpacket_rcv cannot go through skb_ensure_writable
> > > with the original shinfo. pskb_expand_head reallocates the shinfo of
> > > the skb and call skb_clone_fraglist. skb_release_data in
> > > pskb_expand_head could not reduce skb->users of the each frag_list skb
> > > if skb_shinfo(skb)->dataref == 2.
> > >
> > > After the reallocation, skb_shinfo(skb)->dataref == 1 but each frag_list
> > > skb could have skb->users == 2.
> >
> > Yes, that makes sense. skb_clone_fraglist just increments the
> > frag_list skb's refcounts.
> >
> > skb_segment_list must create an unshared struct sk_buff before it
> > changes skb data to insert the protocol headers.
> >
> > > >
> > > >
> > > > > +                       tmp = skb_clone(nskb, GFP_ATOMIC);
> > > > > +                       if (tmp) {
> > > > > +                               kfree_skb(nskb);
> > > > > +                               nskb = tmp;
> > > > > +                               err = skb_unclone(nskb, GFP_ATOMIC);
> >
> > Calling clone and unclone in quick succession looks odd.
> >
> > But you need the first to create a private skb and to trigger the
> > second to create a private copy of the linear data (as well as frags,
> > if any, but these are not touched). So this looks okay.
> >
> > > > > +                       } else {
> > > > > +                               err = -ENOMEM;
> > > > > +                       }
> > > > > +               }
> > > > > +
> > > > >                 if (!tail)
> > > > >                         skb->next = nskb;
> > > > >                 else
> > > > >                         tail->next = nskb;
> > > > >
> > > > > +               if (unlikely(err)) {
> > > > > +                       nskb->next = list_skb;
> >
> > To avoid leaking these skbs when calling kfree_skb_list(skb->next). Is
> > that concern new with this patch, or also needed for the existing
> > error case?
>
> It's new for this patch. nskb can lose next skb due to
> tmp = skb_clone(nskb, GFP_ATOMIC); on the prior. I believe it is not
> needed for the existing errors.

Ah, skb_clone clears the next pointer, indeed. Thanks.

Yes, then this looks correct to me. Thanks for fixing, not an obvious
code path or bug at all.

Acked-by: Willem de Bruijn <willemb@google.com>

The patch is already marked as changes requested in
https://patchwork.kernel.org/project/netdevbpf , so you might have to
resubmit it.

If so, please expand a little bit in the commit message on the fact
that a bpf filter is loaded at TC, which triggers skb_ensure_writable
-> pskb_expand_head -> skb_clone_fraglist -> skb_get on each skb in
the fraglist.

  reply	other threads:[~2021-01-06 17:14 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20210104085750epcas2p1a5b22559d87df61ef3c8215ae0b470b5@epcas2p1.samsung.com>
2021-01-04  8:46 ` [PATCH net] net: fix use-after-free when UDP GRO with shared fraglist Dongseok Yi
2021-01-04 21:03   ` Willem de Bruijn
2021-01-06  1:29     ` Dongseok Yi
2021-01-06  3:07       ` Willem de Bruijn
2021-01-06  3:32         ` Dongseok Yi
2021-01-06 17:13           ` Willem de Bruijn [this message]
2021-04-17  3:44           ` Yunsheng Lin
2021-04-19  0:35             ` Dongseok Yi
2021-04-21  9:42               ` Yunsheng Lin
2021-04-21 11:04                 ` Dongseok Yi
     [not found]   ` <CGME20210107005028epcas2p35dfa745fd92e31400024874f54243556@epcas2p3.samsung.com>
2021-01-07  0:39     ` [PATCH net v2] " Dongseok Yi
2021-01-07 11:05       ` Daniel Borkmann
2021-01-07 11:40         ` Dongseok Yi
2021-01-07 12:49           ` Daniel Borkmann
2021-01-07 13:05             ` Willem de Bruijn
2021-01-07 13:33               ` Daniel Borkmann
2021-01-07 14:44                 ` Willem de Bruijn
2021-01-08 10:32                   ` Daniel Borkmann
     [not found]       ` <CGME20210108024017epcas2p455fe96b8483880f9b7a654dbcf600b20@epcas2p4.samsung.com>
2021-01-08  2:28         ` [PATCH net v3] " Dongseok Yi
2021-01-08 10:18           ` Daniel Borkmann
2021-01-09  3:14             ` Jakub Kicinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAF=yD-+w489MoSKfpaH23dYXhVCL2qh4f0x4COd2nsT5DT8Aiw@mail.gmail.com' \
    --to=willemdebruijn.kernel@gmail.com \
    --cc=davem@davemloft.net \
    --cc=dseok.yi@samsung.com \
    --cc=elver@google.com \
    --cc=fw@strlen.de \
    --cc=gnault@redhat.com \
    --cc=kuba@kernel.org \
    --cc=kyk.segfault@gmail.com \
    --cc=linmiaohe@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linyunsheng@huawei.com \
    --cc=namkyu78.kim@samsung.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=steffen.klassert@secunet.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willemb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.