All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net v2] skbuff: disable coalescing for page_pool fragment recycling
@ 2022-03-28 13:22 Jean-Philippe Brucker
  2022-03-28 15:03 ` Alexander H Duyck
  0 siblings, 1 reply; 4+ messages in thread
From: Jean-Philippe Brucker @ 2022-03-28 13:22 UTC (permalink / raw)
  To: ilias.apalodimas, hawk, alexanderduyck, linyunsheng
  Cc: davem, kuba, pabeni, netdev, Jean-Philippe Brucker

Fix a use-after-free when using page_pool with page fragments. We
encountered this problem during normal RX in the hns3 driver:

(1) Initially we have three descriptors in the RX queue. The first one
    allocates PAGE1 through page_pool, and the other two allocate one
    half of PAGE2 each. Page references look like this:

                RX_BD1 _______ PAGE1
                RX_BD2 _______ PAGE2
                RX_BD3 _________/

(2) Handle RX on the first descriptor. Allocate SKB1, eventually added
    to the receive queue by tcp_queue_rcv().

(3) Handle RX on the second descriptor. Allocate SKB2 and pass it to
    netif_receive_skb():

    netif_receive_skb(SKB2)
      ip_rcv(SKB2)
        SKB3 = skb_clone(SKB2)

    SKB2 and SKB3 share a reference to PAGE2 through
    skb_shinfo()->dataref. The other ref to PAGE2 is still held by
    RX_BD3:

                      SKB2 ---+- PAGE2
                      SKB3 __/   /
                RX_BD3 _________/

 (3b) Now while handling TCP, coalesce SKB3 with SKB1:

      tcp_v4_rcv(SKB3)
        tcp_try_coalesce(to=SKB1, from=SKB3)    // succeeds
        kfree_skb_partial(SKB3)
          skb_release_data(SKB3)                // drops one dataref

                      SKB1 _____ PAGE1
                           \____
                      SKB2 _____ PAGE2
                                 /
                RX_BD3 _________/

    In tcp_try_coalesce(), __skb_frag_ref() takes a page reference to
    PAGE2, where it should instead have increased the page_pool frag
    reference, pp_frag_count. Without coalescing, when releasing both
    SKB2 and SKB3, a single reference to PAGE2 would be dropped. Now
    when releasing SKB1 and SKB2, two references to PAGE2 will be
    dropped, resulting in underflow.

 (3c) Drop SKB2:

      af_packet_rcv(SKB2)
        consume_skb(SKB2)
          skb_release_data(SKB2)                // drops second dataref
            page_pool_return_skb_page(PAGE2)    // drops one pp_frag_count

                      SKB1 _____ PAGE1
                           \____
                                 PAGE2
                                 /
                RX_BD3 _________/

(4) Userspace calls recvmsg()
    Copies SKB1 and releases it. Since SKB3 was coalesced with SKB1, we
    release the SKB3 page as well:

    tcp_eat_recv_skb(SKB1)
      skb_release_data(SKB1)
        page_pool_return_skb_page(PAGE1)
        page_pool_return_skb_page(PAGE2)        // drops second pp_frag_count

(5) PAGE2 is freed, but the third RX descriptor was still using it!
    In our case this causes IOMMU faults, but it would silently corrupt
    memory if the IOMMU was disabled.

Prevent coalescing the SKB if it may hold shared page_pool fragment
references.

Fixes: 53e0961da1c7 ("page_pool: add frag page recycling support in page pool")
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 net/core/skbuff.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 10bde7c6db44..56b45b9f0b4d 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -5276,6 +5276,13 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from,
 	if (skb_cloned(to))
 		return false;
 
+	/* We don't support taking page_pool frag references at the moment.
+	 * If the SKB is cloned and could have page_pool frag references, don't
+	 * coalesce it.
+	 */
+	if (skb_cloned(from) && from->pp_recycle)
+		return false;
+
 	/* The page pool signature of struct page will eventually figure out
 	 * which pages can be recycled or not but for now let's prohibit slab
 	 * allocated and page_pool allocated SKBs from being coalesced.
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH net v2] skbuff: disable coalescing for page_pool fragment recycling
  2022-03-28 13:22 [PATCH net v2] skbuff: disable coalescing for page_pool fragment recycling Jean-Philippe Brucker
@ 2022-03-28 15:03 ` Alexander H Duyck
  2022-03-30 12:41   ` Jean-Philippe Brucker
  0 siblings, 1 reply; 4+ messages in thread
From: Alexander H Duyck @ 2022-03-28 15:03 UTC (permalink / raw)
  To: Jean-Philippe Brucker, ilias.apalodimas, hawk, alexanderduyck,
	linyunsheng
  Cc: davem, kuba, pabeni, netdev

On Mon, 2022-03-28 at 14:22 +0100, Jean-Philippe Brucker wrote:
> Fix a use-after-free when using page_pool with page fragments. We
> encountered this problem during normal RX in the hns3 driver:
> 
> (1) Initially we have three descriptors in the RX queue. The first one
>     allocates PAGE1 through page_pool, and the other two allocate one
>     half of PAGE2 each. Page references look like this:
> 
>                 RX_BD1 _______ PAGE1
>                 RX_BD2 _______ PAGE2
>                 RX_BD3 _________/
> 
> (2) Handle RX on the first descriptor. Allocate SKB1, eventually added
>     to the receive queue by tcp_queue_rcv().
> 
> (3) Handle RX on the second descriptor. Allocate SKB2 and pass it to
>     netif_receive_skb():
> 
>     netif_receive_skb(SKB2)
>       ip_rcv(SKB2)
>         SKB3 = skb_clone(SKB2)
> 
>     SKB2 and SKB3 share a reference to PAGE2 through
>     skb_shinfo()->dataref. The other ref to PAGE2 is still held by
>     RX_BD3:
> 
>                       SKB2 ---+- PAGE2
>                       SKB3 __/   /
>                 RX_BD3 _________/
> 
>  (3b) Now while handling TCP, coalesce SKB3 with SKB1:
> 
>       tcp_v4_rcv(SKB3)
>         tcp_try_coalesce(to=SKB1, from=SKB3)    // succeeds
>         kfree_skb_partial(SKB3)
>           skb_release_data(SKB3)                // drops one dataref
> 
>                       SKB1 _____ PAGE1
>                            \____
>                       SKB2 _____ PAGE2
>                                  /
>                 RX_BD3 _________/
> 
>     In tcp_try_coalesce(), __skb_frag_ref() takes a page reference to
>     PAGE2, where it should instead have increased the page_pool frag
>     reference, pp_frag_count. Without coalescing, when releasing both
>     SKB2 and SKB3, a single reference to PAGE2 would be dropped. Now
>     when releasing SKB1 and SKB2, two references to PAGE2 will be
>     dropped, resulting in underflow.
> 
>  (3c) Drop SKB2:
> 
>       af_packet_rcv(SKB2)
>         consume_skb(SKB2)
>           skb_release_data(SKB2)                // drops second dataref
>             page_pool_return_skb_page(PAGE2)    // drops one pp_frag_count
> 
>                       SKB1 _____ PAGE1
>                            \____
>                                  PAGE2
>                                  /
>                 RX_BD3 _________/
> 
> (4) Userspace calls recvmsg()
>     Copies SKB1 and releases it. Since SKB3 was coalesced with SKB1, we
>     release the SKB3 page as well:
> 
>     tcp_eat_recv_skb(SKB1)
>       skb_release_data(SKB1)
>         page_pool_return_skb_page(PAGE1)
>         page_pool_return_skb_page(PAGE2)        // drops second pp_frag_count
> 
> (5) PAGE2 is freed, but the third RX descriptor was still using it!
>     In our case this causes IOMMU faults, but it would silently corrupt
>     memory if the IOMMU was disabled.
> 
> Prevent coalescing the SKB if it may hold shared page_pool fragment
> references.
> 
> Fixes: 53e0961da1c7 ("page_pool: add frag page recycling support in page pool")
> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
> ---
>  net/core/skbuff.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 10bde7c6db44..56b45b9f0b4d 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -5276,6 +5276,13 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from,
>  	if (skb_cloned(to))
>  		return false;
>  
> +	/* We don't support taking page_pool frag references at the moment.
> +	 * If the SKB is cloned and could have page_pool frag references, don't
> +	 * coalesce it.
> +	 */
> +	if (skb_cloned(from) && from->pp_recycle)
> +		return false;
> +
>  	/* The page pool signature of struct page will eventually figure out
>  	 * which pages can be recycled or not but for now let's prohibit slab
>  	 * allocated and page_pool allocated SKBs from being coalesced.


This is close but not quite. Actually now that I think about it we can
probably alter the block below rather than adding a new one.

The issue is we want only reference counted pages in standard skbs, and
pp_frag_count pages in pp_recycle skbs. So we already had logic along
the lines of:
	if (to->pp_recycle != from->pp_recycle)
		return false;

I would say we need to change that because from->pp_recycle is the
piece that is probably too simplistic. Basically we will get a page
pool page if from->pp_recycle && !skb_cloned(from). So we can probably
just tweak the check below to be something along the lines of:
	if (to->pp_recycle != (from->pp_recycle && !skb_cloned(from)))
		return false;

That should actually increase the number of cases where we can
correctly coalesce frames while also addressing the issue you
discovered as it will only merge cloned skbs into standard skbs instead
of page pool ones.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH net v2] skbuff: disable coalescing for page_pool fragment recycling
  2022-03-28 15:03 ` Alexander H Duyck
@ 2022-03-30 12:41   ` Jean-Philippe Brucker
  2022-03-30 15:05     ` Alexander Duyck
  0 siblings, 1 reply; 4+ messages in thread
From: Jean-Philippe Brucker @ 2022-03-30 12:41 UTC (permalink / raw)
  To: Alexander H Duyck
  Cc: ilias.apalodimas, hawk, alexanderduyck, linyunsheng, davem, kuba,
	pabeni, netdev

On Mon, Mar 28, 2022 at 08:03:46AM -0700, Alexander H Duyck wrote:
> >  (3b) Now while handling TCP, coalesce SKB3 with SKB1:
> > 
> >       tcp_v4_rcv(SKB3)
> >         tcp_try_coalesce(to=SKB1, from=SKB3)    // succeeds
> >         kfree_skb_partial(SKB3)
> >           skb_release_data(SKB3)                // drops one dataref
> > 
> >                       SKB1 _____ PAGE1
> >                            \____
> >                       SKB2 _____ PAGE2
> >                                  /
> >                 RX_BD3 _________/
> > 
> >     In tcp_try_coalesce(), __skb_frag_ref() takes a page reference to
> >     PAGE2, where it should instead have increased the page_pool frag
> >     reference, pp_frag_count. Without coalescing, when releasing both
> >     SKB2 and SKB3, a single reference to PAGE2 would be dropped. Now
> >     when releasing SKB1 and SKB2, two references to PAGE2 will be
> >     dropped, resulting in underflow.
> > 
> >  (3c) Drop SKB2:
> > 
> >       af_packet_rcv(SKB2)
> >         consume_skb(SKB2)
> >           skb_release_data(SKB2)                // drops second dataref
> >             page_pool_return_skb_page(PAGE2)    // drops one pp_frag_count
> > 
> >                       SKB1 _____ PAGE1
> >                            \____
> >                                  PAGE2
> >                                  /
> >                 RX_BD3 _________/
> > 
> > (4) Userspace calls recvmsg()
> >     Copies SKB1 and releases it. Since SKB3 was coalesced with SKB1, we
> >     release the SKB3 page as well:
> > 
> >     tcp_eat_recv_skb(SKB1)
> >       skb_release_data(SKB1)
> >         page_pool_return_skb_page(PAGE1)
> >         page_pool_return_skb_page(PAGE2)        // drops second pp_frag_count
> > 
> > (5) PAGE2 is freed, but the third RX descriptor was still using it!
> >     In our case this causes IOMMU faults, but it would silently corrupt
> >     memory if the IOMMU was disabled.
> > 
> > Prevent coalescing the SKB if it may hold shared page_pool fragment
> > references.
> > 
> > Fixes: 53e0961da1c7 ("page_pool: add frag page recycling support in page pool")
> > Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
> > ---
> >  net/core/skbuff.c | 7 +++++++
> >  1 file changed, 7 insertions(+)
> > 
> > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > index 10bde7c6db44..56b45b9f0b4d 100644
> > --- a/net/core/skbuff.c
> > +++ b/net/core/skbuff.c
> > @@ -5276,6 +5276,13 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from,
> >  	if (skb_cloned(to))
> >  		return false;
> >  
> > +	/* We don't support taking page_pool frag references at the moment.
> > +	 * If the SKB is cloned and could have page_pool frag references, don't
> > +	 * coalesce it.
> > +	 */
> > +	if (skb_cloned(from) && from->pp_recycle)
> > +		return false;
> > +
> >  	/* The page pool signature of struct page will eventually figure out
> >  	 * which pages can be recycled or not but for now let's prohibit slab
> >  	 * allocated and page_pool allocated SKBs from being coalesced.
> 
> 
> This is close but not quite. Actually now that I think about it we can
> probably alter the block below rather than adding a new one.
> 
> The issue is we want only reference counted pages in standard skbs, and
> pp_frag_count pages in pp_recycle skbs. So we already had logic along
> the lines of:
> 	if (to->pp_recycle != from->pp_recycle)
> 		return false;
> 
> I would say we need to change that because from->pp_recycle is the
> piece that is probably too simplistic. Basically we will get a page
> pool page if from->pp_recycle && !skb_cloned(from). So we can probably
> just tweak the check below to be something along the lines of:
> 	if (to->pp_recycle != (from->pp_recycle && !skb_cloned(from)))
> 		return false;

Just to confirm this is fine: the behavior now changes for
to->pp_recycle == 0, from->pp_recycle == 1 and skb_cloned(from) == 1
In this case we now coalesce and take a page ref. So the page has two refs
and two pp_frag_count. (3c) drops one pp_frag_count. If there wasn't
another RX desc holding a pp_frag_count (ie. no step (5)), that would also
drop a page ref, but since 'to' SKB is holding a second page ref the page
is not recycled. That reference gets dropped at (4) and the page is freed
there.  With step (5), the page would get recycled into page_pool, but
without (5) the page is discarded.

I guess it works, just want to make sure that it's OK to mix page_pool
pp_frag_count and normal reference counting at the same time.

Thanks,
Jean

> 
> That should actually increase the number of cases where we can
> correctly coalesce frames while also addressing the issue you
> discovered as it will only merge cloned skbs into standard skbs instead
> of page pool ones.
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH net v2] skbuff: disable coalescing for page_pool fragment recycling
  2022-03-30 12:41   ` Jean-Philippe Brucker
@ 2022-03-30 15:05     ` Alexander Duyck
  0 siblings, 0 replies; 4+ messages in thread
From: Alexander Duyck @ 2022-03-30 15:05 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: Ilias Apalodimas, hawk, Alexander Duyck, Yunsheng Lin,
	David Miller, Jakub Kicinski, Paolo Abeni, Netdev

On Wed, Mar 30, 2022 at 5:41 AM Jean-Philippe Brucker
<jean-philippe@linaro.org> wrote:
>
> On Mon, Mar 28, 2022 at 08:03:46AM -0700, Alexander H Duyck wrote:
> > >  (3b) Now while handling TCP, coalesce SKB3 with SKB1:
> > >
> > >       tcp_v4_rcv(SKB3)
> > >         tcp_try_coalesce(to=SKB1, from=SKB3)    // succeeds
> > >         kfree_skb_partial(SKB3)
> > >           skb_release_data(SKB3)                // drops one dataref
> > >
> > >                       SKB1 _____ PAGE1
> > >                            \____
> > >                       SKB2 _____ PAGE2
> > >                                  /
> > >                 RX_BD3 _________/
> > >
> > >     In tcp_try_coalesce(), __skb_frag_ref() takes a page reference to
> > >     PAGE2, where it should instead have increased the page_pool frag
> > >     reference, pp_frag_count. Without coalescing, when releasing both
> > >     SKB2 and SKB3, a single reference to PAGE2 would be dropped. Now
> > >     when releasing SKB1 and SKB2, two references to PAGE2 will be
> > >     dropped, resulting in underflow.
> > >
> > >  (3c) Drop SKB2:
> > >
> > >       af_packet_rcv(SKB2)
> > >         consume_skb(SKB2)
> > >           skb_release_data(SKB2)                // drops second dataref
> > >             page_pool_return_skb_page(PAGE2)    // drops one pp_frag_count
> > >
> > >                       SKB1 _____ PAGE1
> > >                            \____
> > >                                  PAGE2
> > >                                  /
> > >                 RX_BD3 _________/
> > >
> > > (4) Userspace calls recvmsg()
> > >     Copies SKB1 and releases it. Since SKB3 was coalesced with SKB1, we
> > >     release the SKB3 page as well:
> > >
> > >     tcp_eat_recv_skb(SKB1)
> > >       skb_release_data(SKB1)
> > >         page_pool_return_skb_page(PAGE1)
> > >         page_pool_return_skb_page(PAGE2)        // drops second pp_frag_count
> > >
> > > (5) PAGE2 is freed, but the third RX descriptor was still using it!
> > >     In our case this causes IOMMU faults, but it would silently corrupt
> > >     memory if the IOMMU was disabled.
> > >
> > > Prevent coalescing the SKB if it may hold shared page_pool fragment
> > > references.
> > >
> > > Fixes: 53e0961da1c7 ("page_pool: add frag page recycling support in page pool")
> > > Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
> > > ---
> > >  net/core/skbuff.c | 7 +++++++
> > >  1 file changed, 7 insertions(+)
> > >
> > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > > index 10bde7c6db44..56b45b9f0b4d 100644
> > > --- a/net/core/skbuff.c
> > > +++ b/net/core/skbuff.c
> > > @@ -5276,6 +5276,13 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from,
> > >     if (skb_cloned(to))
> > >             return false;
> > >
> > > +   /* We don't support taking page_pool frag references at the moment.
> > > +    * If the SKB is cloned and could have page_pool frag references, don't
> > > +    * coalesce it.
> > > +    */
> > > +   if (skb_cloned(from) && from->pp_recycle)
> > > +           return false;
> > > +
> > >     /* The page pool signature of struct page will eventually figure out
> > >      * which pages can be recycled or not but for now let's prohibit slab
> > >      * allocated and page_pool allocated SKBs from being coalesced.
> >
> >
> > This is close but not quite. Actually now that I think about it we can
> > probably alter the block below rather than adding a new one.
> >
> > The issue is we want only reference counted pages in standard skbs, and
> > pp_frag_count pages in pp_recycle skbs. So we already had logic along
> > the lines of:
> >       if (to->pp_recycle != from->pp_recycle)
> >               return false;
> >
> > I would say we need to change that because from->pp_recycle is the
> > piece that is probably too simplistic. Basically we will get a page
> > pool page if from->pp_recycle && !skb_cloned(from). So we can probably
> > just tweak the check below to be something along the lines of:
> >       if (to->pp_recycle != (from->pp_recycle && !skb_cloned(from)))
> >               return false;
>
> Just to confirm this is fine: the behavior now changes for
> to->pp_recycle == 0, from->pp_recycle == 1 and skb_cloned(from) == 1
> In this case we now coalesce and take a page ref. So the page has two refs
> and two pp_frag_count. (3c) drops one pp_frag_count. If there wasn't
> another RX desc holding a pp_frag_count (ie. no step (5)), that would also
> drop a page ref, but since 'to' SKB is holding a second page ref the page
> is not recycled. That reference gets dropped at (4) and the page is freed
> there.  With step (5), the page would get recycled into page_pool, but
> without (5) the page is discarded.
>
> I guess it works, just want to make sure that it's OK to mix page_pool
> pp_frag_count and normal reference counting at the same time.
>
> Thanks,
> Jean

The key thing is that we don't want to mix and match them within an
skb. This logic really isn't too different from what we do in
pskb_expand_head if the skb is cloned. Basically what we are doing is
transitioning "from" pages from page pool pages to reference counted
pages in the case that the skb is cloned and then placing them in a
skb that does page reference counting.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-03-30 15:05 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-28 13:22 [PATCH net v2] skbuff: disable coalescing for page_pool fragment recycling Jean-Philippe Brucker
2022-03-28 15:03 ` Alexander H Duyck
2022-03-30 12:41   ` Jean-Philippe Brucker
2022-03-30 15:05     ` Alexander Duyck

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.