netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5] skbuff: Fix a race between coalescing and releasing SKBs
@ 2023-04-13  9:03 Liang Chen
  2023-04-13 11:42 ` Yunsheng Lin
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Liang Chen @ 2023-04-13  9:03 UTC (permalink / raw)
  To: kuba, ilias.apalodimas, edumazet, hawk
  Cc: netdev, davem, pabeni, alexander.duyck, linyunsheng, liangchen.linux

Commit 1effe8ca4e34 ("skbuff: fix coalescing for page_pool fragment
recycling") allowed coalescing to proceed with non page pool page and page
pool page when @from is cloned, i.e.

to->pp_recycle    --> false
from->pp_recycle  --> true
skb_cloned(from)  --> true

However, it actually requires skb_cloned(@from) to hold true until
coalescing finishes in this situation. If the other cloned SKB is
released while the merging is in process, from_shinfo->nr_frags will be
set to 0 toward the end of the function, causing the increment of frag
page _refcount to be unexpectedly skipped resulting in inconsistent
reference counts. Later when SKB(@to) is released, it frees the page
directly even though the page pool page is still in use, leading to
use-after-free or double-free errors. So it should be prohibited.

The double-free error message below prompted us to investigate:
BUG: Bad page state in process swapper/1  pfn:0e0d1
page:00000000c6548b28 refcount:-1 mapcount:0 mapping:0000000000000000
index:0x2 pfn:0xe0d1
flags: 0xfffffc0000000(node=0|zone=1|lastcpupid=0x1fffff)
raw: 000fffffc0000000 0000000000000000 ffffffff00000101 0000000000000000
raw: 0000000000000002 0000000000000000 ffffffffffffffff 0000000000000000
page dumped because: nonzero _refcount

CPU: 1 PID: 0 Comm: swapper/1 Tainted: G            E      6.2.0+
Call Trace:
 <IRQ>
dump_stack_lvl+0x32/0x50
bad_page+0x69/0xf0
free_pcp_prepare+0x260/0x2f0
free_unref_page+0x20/0x1c0
skb_release_data+0x10b/0x1a0
napi_consume_skb+0x56/0x150
net_rx_action+0xf0/0x350
? __napi_schedule+0x79/0x90
__do_softirq+0xc8/0x2b1
__irq_exit_rcu+0xb9/0xf0
common_interrupt+0x82/0xa0
</IRQ>
<TASK>
asm_common_interrupt+0x22/0x40
RIP: 0010:default_idle+0xb/0x20

Fixes: 53e0961da1c7 ("page_pool: add frag page recycling support in page pool")
Signed-off-by: Liang Chen <liangchen.linux@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
---
Changes from v4:
- fixes some style issues
---
 net/core/skbuff.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 050a875d09c5..7b83410bbaae 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -5597,18 +5597,18 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from,
 	if (skb_cloned(to))
 		return false;
 
-	/* In general, avoid mixing slab allocated and page_pool allocated
-	 * pages within the same SKB. However when @to is not pp_recycle and
-	 * @from is cloned, we can transition frag pages from page_pool to
-	 * reference counted.
-	 *
-	 * On the other hand, don't allow coalescing two pp_recycle SKBs if
-	 * @from is cloned, in case the SKB is using page_pool fragment
-	 * references (PP_FLAG_PAGE_FRAG). Since we only take full page
-	 * references for cloned SKBs at the moment that would result in
+	/* In general, avoid mixing page_pool and non-page_pool allocated
+	 * pages within the same SKB. Additionally avoid dealing with clones
+	 * containing page_pool pages, in case the SKB is using page_pool
+	 * fragment references (PP_FLAG_PAGE_FRAG). Since we only take full
+	 * page references for cloned SKBs at the moment that would result in
 	 * inconsistent reference counts.
+	 * In theory we could take full references if from is cloned and
+	 * !@to->pp_recycle but its tricky (due to potential race with the clone
+	 * disappearing) and rare, so not worth dealing with.
 	 */
-	if (to->pp_recycle != (from->pp_recycle && !skb_cloned(from)))
+	if (to->pp_recycle != from->pp_recycle ||
+	    (from->pp_recycle && skb_cloned(from)))
 		return false;
 
 	if (len <= skb_tailroom(to)) {
-- 
2.18.2


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v5] skbuff: Fix a race between coalescing and releasing SKBs
  2023-04-13  9:03 [PATCH v5] skbuff: Fix a race between coalescing and releasing SKBs Liang Chen
@ 2023-04-13 11:42 ` Yunsheng Lin
  2023-04-13 13:45   ` Eric Dumazet
  2023-04-13 16:41 ` Jakub Kicinski
  2023-04-13 17:20 ` patchwork-bot+netdevbpf
  2 siblings, 1 reply; 6+ messages in thread
From: Yunsheng Lin @ 2023-04-13 11:42 UTC (permalink / raw)
  To: Liang Chen, kuba, ilias.apalodimas, edumazet, hawk
  Cc: netdev, davem, pabeni, alexander.duyck

On 2023/4/13 17:03, Liang Chen wrote:
> Commit 1effe8ca4e34 ("skbuff: fix coalescing for page_pool fragment
> recycling") allowed coalescing to proceed with non page pool page and page
> pool page when @from is cloned, i.e.
> 
> to->pp_recycle    --> false
> from->pp_recycle  --> true
> skb_cloned(from)  --> true
> 
> However, it actually requires skb_cloned(@from) to hold true until
> coalescing finishes in this situation. If the other cloned SKB is
> released while the merging is in process, from_shinfo->nr_frags will be
> set to 0 toward the end of the function, causing the increment of frag
> page _refcount to be unexpectedly skipped resulting in inconsistent
> reference counts. Later when SKB(@to) is released, it frees the page
> directly even though the page pool page is still in use, leading to
> use-after-free or double-free errors. So it should be prohibited.
> 
> The double-free error message below prompted us to investigate:
> BUG: Bad page state in process swapper/1  pfn:0e0d1
> page:00000000c6548b28 refcount:-1 mapcount:0 mapping:0000000000000000
> index:0x2 pfn:0xe0d1
> flags: 0xfffffc0000000(node=0|zone=1|lastcpupid=0x1fffff)
> raw: 000fffffc0000000 0000000000000000 ffffffff00000101 0000000000000000
> raw: 0000000000000002 0000000000000000 ffffffffffffffff 0000000000000000
> page dumped because: nonzero _refcount
> 
> CPU: 1 PID: 0 Comm: swapper/1 Tainted: G            E      6.2.0+
> Call Trace:
>  <IRQ>
> dump_stack_lvl+0x32/0x50
> bad_page+0x69/0xf0
> free_pcp_prepare+0x260/0x2f0
> free_unref_page+0x20/0x1c0
> skb_release_data+0x10b/0x1a0
> napi_consume_skb+0x56/0x150
> net_rx_action+0xf0/0x350
> ? __napi_schedule+0x79/0x90
> __do_softirq+0xc8/0x2b1
> __irq_exit_rcu+0xb9/0xf0
> common_interrupt+0x82/0xa0
> </IRQ>
> <TASK>
> asm_common_interrupt+0x22/0x40
> RIP: 0010:default_idle+0xb/0x20
> 
> Fixes: 53e0961da1c7 ("page_pool: add frag page recycling support in page pool")

I am not quite sure the above is right Fixes tag.
As 1effe8ca4e34 ("skbuff: fix coalescing for page_pool fragment recycling") has tried
to fix it, and it missed the case this patch is fixing, so we need another fix here.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v5] skbuff: Fix a race between coalescing and releasing SKBs
  2023-04-13 11:42 ` Yunsheng Lin
@ 2023-04-13 13:45   ` Eric Dumazet
  2023-04-14  1:41     ` Yunsheng Lin
  0 siblings, 1 reply; 6+ messages in thread
From: Eric Dumazet @ 2023-04-13 13:45 UTC (permalink / raw)
  To: Yunsheng Lin
  Cc: Liang Chen, kuba, ilias.apalodimas, hawk, netdev, davem, pabeni,
	alexander.duyck

On Thu, Apr 13, 2023 at 1:42 PM Yunsheng Lin <linyunsheng@huawei.com> wrote:
>
> On 2023/4/13 17:03, Liang Chen wrote:

> > Fixes: 53e0961da1c7 ("page_pool: add frag page recycling support in page pool")
>
> I am not quite sure the above is right Fixes tag.
> As 1effe8ca4e34 ("skbuff: fix coalescing for page_pool fragment recycling") has tried
> to fix it, and it missed the case this patch is fixing, so we need another fix here.

The Fixes: tag is more about pointing to stable teams how to deal with
backports.
There is no point giving the full chain, because this 'blamed' commit is enough.

If an old kernel does not contain this commit, there is no point trying harder.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v5] skbuff: Fix a race between coalescing and releasing SKBs
  2023-04-13  9:03 [PATCH v5] skbuff: Fix a race between coalescing and releasing SKBs Liang Chen
  2023-04-13 11:42 ` Yunsheng Lin
@ 2023-04-13 16:41 ` Jakub Kicinski
  2023-04-13 17:20 ` patchwork-bot+netdevbpf
  2 siblings, 0 replies; 6+ messages in thread
From: Jakub Kicinski @ 2023-04-13 16:41 UTC (permalink / raw)
  To: Liang Chen
  Cc: ilias.apalodimas, edumazet, hawk, netdev, davem, pabeni,
	alexander.duyck, linyunsheng

On Thu, 13 Apr 2023 17:03:53 +0800 Liang Chen wrote:
> Commit 1effe8ca4e34 ("skbuff: fix coalescing for page_pool fragment
> recycling") allowed coalescing to proceed with non page pool page and page
> pool page when @from is cloned, i.e.

I think Alex is out for spring celebrations so let me take this 
in for today's PR. Thanks!

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v5] skbuff: Fix a race between coalescing and releasing SKBs
  2023-04-13  9:03 [PATCH v5] skbuff: Fix a race between coalescing and releasing SKBs Liang Chen
  2023-04-13 11:42 ` Yunsheng Lin
  2023-04-13 16:41 ` Jakub Kicinski
@ 2023-04-13 17:20 ` patchwork-bot+netdevbpf
  2 siblings, 0 replies; 6+ messages in thread
From: patchwork-bot+netdevbpf @ 2023-04-13 17:20 UTC (permalink / raw)
  To: Liang Chen
  Cc: kuba, ilias.apalodimas, edumazet, hawk, netdev, davem, pabeni,
	alexander.duyck, linyunsheng

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Thu, 13 Apr 2023 17:03:53 +0800 you wrote:
> Commit 1effe8ca4e34 ("skbuff: fix coalescing for page_pool fragment
> recycling") allowed coalescing to proceed with non page pool page and page
> pool page when @from is cloned, i.e.
> 
> to->pp_recycle    --> false
> from->pp_recycle  --> true
> skb_cloned(from)  --> true
> 
> [...]

Here is the summary with links:
  - [v5] skbuff: Fix a race between coalescing and releasing SKBs
    https://git.kernel.org/netdev/net/c/0646dc31ca88

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v5] skbuff: Fix a race between coalescing and releasing SKBs
  2023-04-13 13:45   ` Eric Dumazet
@ 2023-04-14  1:41     ` Yunsheng Lin
  0 siblings, 0 replies; 6+ messages in thread
From: Yunsheng Lin @ 2023-04-14  1:41 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Liang Chen, kuba, ilias.apalodimas, hawk, netdev, davem, pabeni,
	alexander.duyck

On 2023/4/13 21:45, Eric Dumazet wrote:
> On Thu, Apr 13, 2023 at 1:42 PM Yunsheng Lin <linyunsheng@huawei.com> wrote:
>>
>> On 2023/4/13 17:03, Liang Chen wrote:
> 
>>> Fixes: 53e0961da1c7 ("page_pool: add frag page recycling support in page pool")
>>
>> I am not quite sure the above is right Fixes tag.
>> As 1effe8ca4e34 ("skbuff: fix coalescing for page_pool fragment recycling") has tried
>> to fix it, and it missed the case this patch is fixing, so we need another fix here.
> 
> The Fixes: tag is more about pointing to stable teams how to deal with
> backports.
> There is no point giving the full chain, because this 'blamed' commit is enough.
> 
> If an old kernel does not contain this commit, there is no point trying harder.

In that case, it may be better to point to
6a5bcd84e886 ("page_pool: Allow drivers to hint on SKB recycling") too.

As even without fragment support, for below case:
to->pp_recycle    --> true
from->pp_recycle  --> true
skb_cloned(from)  --> true

It seems some page may be called with page_pool_release_page() twice if 'to' and
cloned skb of 'from' are freed concurrently, as there is data race between checking
page->pp_magic and clearing page->pp_magic.

Anyway, as it is merged already, I guess it is not really matter anymore.


> .
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-04-14  1:41 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-13  9:03 [PATCH v5] skbuff: Fix a race between coalescing and releasing SKBs Liang Chen
2023-04-13 11:42 ` Yunsheng Lin
2023-04-13 13:45   ` Eric Dumazet
2023-04-14  1:41     ` Yunsheng Lin
2023-04-13 16:41 ` Jakub Kicinski
2023-04-13 17:20 ` patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).