netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alexander H Duyck <alexander.duyck@gmail.com>
To: Felix Fietkau <nbd@nbd.name>
Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org>,
	netdev@vger.kernel.org, Jesper Dangaard Brouer <hawk@kernel.org>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Lorenzo Bianconi <lorenzo@kernel.org>,
	linux-kernel@vger.kernel.org,
	Yunsheng Lin <linyunsheng@huawei.com>
Subject: Re: [PATCH] net: page_pool: fix refcounting issues with fragmented allocation
Date: Thu, 26 Jan 2023 10:38:45 -0800	[thread overview]
Message-ID: <156f3e120bd0757133cb6bc11b76889637b5e0a6.camel@gmail.com> (raw)
In-Reply-To: <bc0fa31a-c935-c6f0-f968-9e2a54bafd45@nbd.name>

> 
> > Which piece did you change? My main interest would be trying to narrow
> > down the section of this function that is causing this. Did you modify
> > __ieee80211_amsdu_copy or some other function within the setup?
> I replaced this line:
>    bool reuse_frag = skb->head_frag && !skb_has_frag_list(skb);
> with:
>    bool reuse_frag = false;

I see. So essentially everything is copied into the new buffers.

> > > > I believe the issue must be specific to that codepath, since most
> > > > received and processed packets are either not A-MSDU or A-MSDU decap has
> > > > already been performed by the hardware.
> > > > If I change my test to use 3 client mode interfaces instead of 4, the
> > > > hardware is able to offload all A-MSDU rx processing and I don't see any
> > > > crashes anymore.
> > > > 
> > > > Could you please take another look at ieee80211_amsdu_to_8023s to see if
> > > > there's anything in there that could cause these issues?
> > 
> > The thing is I am not sure it is the only cause for this. I am
> > suspecting we are seeing this triggering an issue when combined with
> > something else.
> > 
> > If we could add some tracing to dump the skb and list buffers that
> > might be helpful. We would want to verify the pp_recycle value, clone
> > flag, and for the frags we would want to frag count and page reference
> > counts. The expectation would be that the original skb should have the
> > pp_recycle flag set and the frag counts consistent through the
> > process, and the list skbs should all have taken page references w/ no
> > pp_recycle on the skbs in the list.
> > 
> > > Here are clues from a few more tests that I ran:
> > > - preventing the reuse of the last skb in ieee80211_amsdu_to_8023s does
> > > not prevent the crashes, so the issue is indeed related to taking page
> > > references and putting the pages in skb fragments.
> > 
> > You said in the first email it stops it and in the second "does not".
> > I am assuming that is some sort of typo since you seem to be implying
> > it does resolve it for you. Is that correct?
> For everything except for the last subframe, the function pulls 
> fragments out of the original skb and puts them into a new skb allocated 
> with dev_alloc_skb. Additionally, the last skb is reused for the final 
> subframe. What I meant was: In order to figure out if the skb-reuse is 
> problematic, I prevented it from happening, ensuring that all subframes 
> are allocated and get the fragments from the skb.
> All I meant to say is that since that led to the same crash, the 
> skb-reuse is not the problem.
> 
> Regarding the question from your other mail: without GRO there is no 
> crash and it looks stable.
> 
> - Felix
> 

Okay, I think that tells me exactly what is going on. Can you give the
change below a try and see if it solves the problem for you.

I think what is happening is that after you are reassigning the frags
they are getting merged into GRO frames where the head may have
pp_recycle set. As a result I think the pages are getting recycled when
they should be just freed via put_page.

I'm suspecting this wasn't an issue up until now as I don't believe
there are any that are running in a mixed mode where they have both
pp_recycle and non-pp_recycle skbs coming from the same device.

diff --git a/net/core/gro.c b/net/core/gro.c
index 506f83d715f8..4bac7ea6e025 100644
--- a/net/core/gro.c
+++ b/net/core/gro.c
@@ -162,6 +162,15 @@ int skb_gro_receive(struct sk_buff *p, struct
sk_buff *skb)
 	struct sk_buff *lp;
 	int segs;
 
+	/* Do not splice page pool based packets w/ non-page pool
+	 * packets. This can result in reference count issues as page
+	 * pool pages will not decrement the reference count and will
+	 * instead be immediately returned to the pool or have frag
+	 * count decremented.
+	 */
+	if (p->pp_recycle != skb->pp_recycle)
+		return -ETOOMANYREFS;
+
 	/* pairs with WRITE_ONCE() in netif_set_gro_max_size() */
 	gro_max_size = READ_ONCE(p->dev->gro_max_size);
 

  reply	other threads:[~2023-01-26 18:39 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-24 12:43 [PATCH] net: page_pool: fix refcounting issues with fragmented allocation Felix Fietkau
2023-01-24 14:11 ` Ilias Apalodimas
2023-01-24 15:57   ` Alexander H Duyck
2023-01-24 16:59     ` Felix Fietkau
2023-01-26 10:31     ` Ilias Apalodimas
2023-01-26 15:41       ` Alexander Duyck
2023-01-26 16:05         ` Ilias Apalodimas
2023-01-24 17:22   ` Felix Fietkau
2023-01-24 21:10     ` Alexander H Duyck
2023-01-24 21:30       ` Felix Fietkau
2023-01-25 17:11         ` Alexander H Duyck
2023-01-25 17:32           ` Felix Fietkau
2023-01-25 18:26             ` Alexander H Duyck
2023-01-25 18:42               ` Felix Fietkau
2023-01-25 19:02                 ` Alexander H Duyck
2023-01-25 19:10                   ` Felix Fietkau
2023-01-25 19:40                     ` Felix Fietkau
2023-01-25 20:02                       ` Felix Fietkau
2023-01-25 22:14                       ` Alexander H Duyck
2023-01-26  6:12                         ` Felix Fietkau
2023-01-26  9:14                           ` Felix Fietkau
2023-01-26 16:08                             ` Alexander Duyck
2023-01-26 16:40                               ` Alexander Duyck
2023-01-26 17:44                               ` Felix Fietkau
2023-01-26 18:38                                 ` Alexander H Duyck [this message]
2023-01-26 18:43                                   ` Felix Fietkau
2023-01-26 19:06                                     ` [net PATCH] skb: Do mix page pool and page referenced frags in GRO Alexander Duyck
2023-01-26 19:14                                       ` Toke Høiland-Jørgensen
2023-01-26 19:48                                         ` Alexander Duyck
2023-01-26 21:35                                           ` Toke Høiland-Jørgensen
2023-01-26 23:13                                       ` Jakub Kicinski
2023-01-27  7:15                                         ` Ilias Apalodimas
2023-01-27  7:21                                         ` Felix Fietkau
2023-01-30 16:49                                         ` Jesper Dangaard Brouer
2023-01-28  2:37                                       ` Yunsheng Lin
2023-01-28  5:26                                         ` Jakub Kicinski
2023-01-28  7:08                                           ` Eric Dumazet
2023-01-30  8:50                                             ` Paolo Abeni
2023-01-30 16:17                                               ` Alexander Duyck
2023-01-28  7:15                                           ` Eric Dumazet
2023-01-28 17:08                                             ` Alexander Duyck
2023-01-28  7:50                                       ` patchwork-bot+netdevbpf
2023-01-26 10:32     ` [PATCH] net: page_pool: fix refcounting issues with fragmented allocation Ilias Apalodimas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=156f3e120bd0757133cb6bc11b76889637b5e0a6.camel@gmail.com \
    --to=alexander.duyck@gmail.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=hawk@kernel.org \
    --cc=ilias.apalodimas@linaro.org \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linyunsheng@huawei.com \
    --cc=lorenzo@kernel.org \
    --cc=nbd@nbd.name \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).