netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alexander Duyck <alexander.duyck@gmail.com>
To: Paolo Abeni <pabeni@redhat.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>,
	"David S . Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>, netdev <netdev@vger.kernel.org>,
	Eric Dumazet <edumazet@google.com>,
	Alexander Duyck <alexanderduyck@fb.com>,
	"Michael S . Tsirkin" <mst@redhat.com>,
	Greg Thelen <gthelen@google.com>
Subject: Re: [PATCH net] net: avoid 32 x truesize under-estimation for tiny skbs
Date: Thu, 8 Sep 2022 12:26:21 -0700	[thread overview]
Message-ID: <CAKgT0UeV9+=AcQ1J+UA=KGWKAV2E4CW566qYHNv_XxQMC3Us-Q@mail.gmail.com> (raw)
In-Reply-To: <f3f867cf6814510817b253e6aca997cdd3acc48a.camel@redhat.com>

On Thu, Sep 8, 2022 at 11:01 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On Thu, 2022-09-08 at 07:53 -0700, Alexander H Duyck wrote:
> > On Thu, 2022-09-08 at 13:00 +0200, Paolo Abeni wrote:
> > > In most build GRO_MAX_HEAD packets are even larger (should be 640)
> >
> > Right, which is why I am thinking we may want to default to a 1K slice.
>
> Ok it looks like there is agreement to force a minimum frag size of 1K.
> Side note: that should not cause a memory usage increase compared to
> the slab allocator as kmalloc(640) should use the kmalloc-1k slab.
>
> [...]
>
> > > >
> > > If the pagecnt optimization should be dropped, it would be probably
> > > more straight-forward to use/adapt 'page_frag' for the page_order0
> > > allocator.
> >
> > That would make sense. Basically we could get rid of the pagecnt bias
> > and add the fixed number of slices to the count at allocation so we
> > would just need to track the offset to decide when we need to allocate
> > a new page. In addtion if we are flushing the page when it is depleted
> > we don't have to mess with the pfmemalloc logic.
>
> Uhmm... it looks like that the existing page_frag allocator does not
> always flush the depleted page:
>
> bool skb_page_frag_refill(unsigned int sz, struct page_frag *pfrag, gfp_t gfp)
> {
>         if (pfrag->page) {
>                 if (page_ref_count(pfrag->page) == 1) {
>                         pfrag->offset = 0;
>                         return true;
>                 }

Right, we have an option to reuse the page if the page count is 0.
However in the case of the 4K page with 1K slices scenario it means
you are having to bump back up the count on every 3 pages. So you
would be looking at 1.3 atomic accesses per frag. Just doing the bump
once at the start and using all 4 slices would give you 1.25 atomic
accesses per frag. That is why I assumed it would be better to just
let it go.

> so I'll try adding some separate/specialized code and see if the
> overall complexity would be reasonable.

The other thing to keep in mind is that once you start adding the
recycling you will have best case and worst case scenarios to
consider. The code above is for recycling frag in place it seems like,
or reallocating a new one in its place.

> > > BTW it's quite strange/confusing having to very similar APIs (page_frag
> > > and page_frag_cache) with very similar names and no references between
> > > them.
> >
> > I'm not sure what you are getting at here. There are plenty of
> > references between them, they just aren't direct.
>
> Looking/greping the tree I could not trivially understand when 'struct
> page_frag' should be preferred over 'struct page_frag_cache' and/or
> vice versa, I had to look at the respective implementation details.

The page_frag_cache is mostly there to store a higher order page to
slice up to generate page fragments that can be stored in the
page_frag struct. Honestly I am surprised we still have page_frag
floating around. I thought we replaced that with bio_vec some time
ago. At least that is the structure that skb_frag_t is typdef-ed as.

      reply	other threads:[~2022-09-08 19:26 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-13 16:18 [PATCH net] net: avoid 32 x truesize under-estimation for tiny skbs Eric Dumazet
2021-01-13 18:00 ` Alexander Duyck
2021-01-13 19:19 ` Michael S. Tsirkin
2021-01-13 22:23 ` David Laight
2021-01-14  5:16   ` Eric Dumazet
2021-01-14  9:29     ` David Laight
2021-01-14 19:00 ` patchwork-bot+netdevbpf
     [not found] ` <1617007696.5731978-1-xuanzhuo@linux.alibaba.com>
2021-03-29  9:06   ` Eric Dumazet
2021-03-31  8:11     ` Michael S. Tsirkin
2021-03-31  8:36       ` Eric Dumazet
2021-03-31  8:46         ` Eric Dumazet
2021-03-31  8:49           ` Eric Dumazet
2021-03-31  8:54             ` Eric Dumazet
     [not found]               ` <1617248264.4993114-2-xuanzhuo@linux.alibaba.com>
2021-04-01  5:06                 ` Eric Dumazet
     [not found]                   ` <1617357110.3822439-1-xuanzhuo@linux.alibaba.com>
2021-04-02 12:52                     ` Eric Dumazet
2021-04-01 13:51         ` Michael S. Tsirkin
2021-04-01 14:08           ` Eric Dumazet
2021-04-01  7:14       ` Jason Wang
     [not found]         ` <1617267183.5697193-1-xuanzhuo@linux.alibaba.com>
2021-04-01  9:58           ` Eric Dumazet
2021-04-02  2:52             ` Jason Wang
     [not found]               ` <1617361253.1788838-2-xuanzhuo@linux.alibaba.com>
2021-04-02 12:53                 ` Eric Dumazet
2021-04-06  2:04                 ` Jason Wang
     [not found]       ` <1617190239.1035674-1-xuanzhuo@linux.alibaba.com>
2021-03-31 12:08         ` Eric Dumazet
2021-04-01 13:36         ` Michael S. Tsirkin
2022-09-07 20:19 ` Paolo Abeni
2022-09-07 20:40   ` Eric Dumazet
2022-09-08 10:48     ` Paolo Abeni
2022-09-08 12:20       ` Eric Dumazet
2022-09-08 14:26         ` Paolo Abeni
2022-09-08 16:00           ` Eric Dumazet
2022-09-07 21:36   ` Alexander H Duyck
2022-09-08 11:00     ` Paolo Abeni
2022-09-08 14:53       ` Alexander H Duyck
2022-09-08 18:01         ` Paolo Abeni
2022-09-08 19:26           ` Alexander Duyck [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAKgT0UeV9+=AcQ1J+UA=KGWKAV2E4CW566qYHNv_XxQMC3Us-Q@mail.gmail.com' \
    --to=alexander.duyck@gmail.com \
    --cc=alexanderduyck@fb.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=eric.dumazet@gmail.com \
    --cc=gthelen@google.com \
    --cc=kuba@kernel.org \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).