linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Arjun Roy <arjunroy@google.com>
To: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
	Arjun Roy <arjunroy.kdev@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	David Miller <davem@davemloft.net>,
	netdev <netdev@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Cgroups <cgroups@vger.kernel.org>, Linux MM <linux-mm@kvack.org>,
	Shakeel Butt <shakeelb@google.com>,
	Eric Dumazet <edumazet@google.com>,
	Soheil Hassas Yeganeh <soheil@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Yang Shi <shy828301@gmail.com>,
	Roman Gushchin <guro@fb.com>
Subject: Re: [mm, net-next v2] mm: net: memcg accounting for TCP rx zerocopy
Date: Wed, 24 Mar 2021 13:39:06 -0700	[thread overview]
Message-ID: <CAOFY-A1+TT5EgT0oVEkGgHAaJavbLzbKp5fQx_uOrMtw-7VEiA@mail.gmail.com> (raw)
In-Reply-To: <YFsA78FfzICrnFf7@dhcp22.suse.cz>

On Wed, Mar 24, 2021 at 2:12 AM Michal Hocko <mhocko@suse.com> wrote:
>
> On Tue 23-03-21 11:47:54, Arjun Roy wrote:
> > On Tue, Mar 23, 2021 at 7:34 AM Michal Hocko <mhocko@suse.com> wrote:
> > >
> > > On Wed 17-03-21 18:12:55, Johannes Weiner wrote:
> > > [...]
> > > > Here is an idea of how it could work:
> > > >
> > > > struct page already has
> > > >
> > > >                 struct {        /* page_pool used by netstack */
> > > >                         /**
> > > >                          * @dma_addr: might require a 64-bit value even on
> > > >                          * 32-bit architectures.
> > > >                          */
> > > >                         dma_addr_t dma_addr;
> > > >                 };
> > > >
> > > > and as you can see from its union neighbors, there is quite a bit more
> > > > room to store private data necessary for the page pool.
> > > >
> > > > When a page's refcount hits zero and it's a networking page, we can
> > > > feed it back to the page pool instead of the page allocator.
> > > >
> > > > From a first look, we should be able to use the PG_owner_priv_1 page
> > > > flag for network pages (see how this flag is overloaded, we can add a
> > > > PG_network alias). With this, we can identify the page in __put_page()
> > > > and __release_page(). These functions are already aware of different
> > > > types of pages and do their respective cleanup handling. We can
> > > > similarly make network a first-class citizen and hand pages back to
> > > > the network allocator from in there.
> > >
> > > For compound pages we have a concept of destructors. Maybe we can extend
> > > that for order-0 pages as well. The struct page is heavily packed and
> > > compound_dtor shares the storage without other metadata
> > >                                         int    pages;    /*    16     4 */
> > >                         unsigned char compound_dtor;     /*    16     1 */
> > >                         atomic_t   hpage_pinned_refcount; /*    16     4 */
> > >                         pgtable_t  pmd_huge_pte;         /*    16     8 */
> > >                         void *     zone_device_data;     /*    16     8 */
> > >
> > > But none of those should really require to be valid when a page is freed
> > > unless I am missing something. It would really require to check their
> > > users whether they can leave the state behind. But if we can establish a
> > > contract that compound_dtor can be always valid when a page is freed
> > > this would be really a nice and useful abstraction because you wouldn't
> > > have to care about the specific type of page.
> > >
> > > But maybe I am just overlooking the real complexity there.
> > > --
> >
> > For now probably the easiest way is to have network pages be first
> > class with a specific flag as previously discussed and have concrete
> > handling for it, rather than trying to establish the contract across
> > page types.
>
> If you are going to claim a page flag then it would be much better to
> have it more generic. Flags are really scarce and if all you care about
> is PageHasDestructor() and provide one via page->dtor then the similar
> mechanism can be reused by somebody else. Or does anything prevent that?

The way I see it - the fundamental want here is, for some arbitrary
page that we are dropping a reference on, to be able to tell that the
provenance of the page is some network driver's page pool. If we added
an enum target to compound_dtor, if we examine that offset in the page
and look at that value, what guarantee do we have that the page isn't
instead some other kind of page, and the byte value there was just
coincidentally the one we were looking for (but it wasn't a network
driver pool page)?

Existing users of compound_dtor seem to check first that a
PageCompound() or PageHead() return true - the specific scenario here,
of receiving network packets, those pages will tend to not be compound
(and more specifically, compound pages are explicitly disallowed for
TCP receive zerocopy).

Given that's the case, the options seem to be:
1) Use a page flag - with the downside that they are a severely
limited resource,
2) Use some bits inside page->memcg_data - this I believe Johannes had
reasons against, and it isn't always the case that MEMCG support is
enabled.
3) Use compound_dtor - but I think this would have problems for the
prior reasons.

Thanks,
-Arjun



> --
> Michal Hocko
> SUSE Labs

  reply	other threads:[~2021-03-24 20:39 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-16  4:16 [mm, net-next v2] mm: net: memcg accounting for TCP rx zerocopy Arjun Roy
2021-03-16  4:20 ` Arjun Roy
2021-03-16  4:29   ` Shakeel Butt
2021-03-16  6:22     ` Arjun Roy
2021-03-16  6:28       ` Arjun Roy
2021-03-16 21:02         ` Jakub Kicinski
2021-03-16 10:26 ` Johannes Weiner
2021-03-17  6:05   ` Arjun Roy
2021-03-17 22:12     ` Johannes Weiner
2021-03-22 21:35       ` Arjun Roy
2021-03-23 17:01         ` Johannes Weiner
2021-03-23 18:42           ` Arjun Roy
2021-03-24 18:25             ` Shakeel Butt
2021-03-24 22:21               ` Arjun Roy
2021-03-23 14:34       ` Michal Hocko
2021-03-23 18:47         ` Arjun Roy
2021-03-24  9:12           ` Michal Hocko
2021-03-24 20:39             ` Arjun Roy [this message]
2021-03-24 20:53               ` Shakeel Butt
2021-03-24 21:56                 ` Michal Hocko
2021-03-24 21:24             ` Johannes Weiner
2021-03-24 22:49               ` Arjun Roy
2021-03-25  9:02                 ` Michal Hocko
2021-03-25 16:47                   ` Johannes Weiner
2021-03-25 17:50                     ` Michal Hocko
  -- strict thread matches above, loose matches on Subject: below --
2021-03-16  1:30 Arjun Roy
2021-03-18  3:21 ` Andrew Morton
2021-03-22 21:19   ` Arjun Roy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAOFY-A1+TT5EgT0oVEkGgHAaJavbLzbKp5fQx_uOrMtw-7VEiA@mail.gmail.com \
    --to=arjunroy@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=arjunroy.kdev@gmail.com \
    --cc=cgroups@vger.kernel.org \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=netdev@vger.kernel.org \
    --cc=shakeelb@google.com \
    --cc=shy828301@gmail.com \
    --cc=soheil@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).