All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Zach O'Keefe <zokeefe@google.com>
Cc: Alex Shi <alex.shi@linux.alibaba.com>,
	David Hildenbrand <david@redhat.com>,
	David Rientjes <rientjes@google.com>,
	Matthew Wilcox <willy@infradead.org>,
	Michal Hocko <mhocko@suse.com>,
	Pasha Tatashin <pasha.tatashin@soleen.com>,
	SeongJae Park <sj@kernel.org>, Song Liu <songliubraving@fb.com>,
	Vlastimil Babka <vbabka@suse.cz>, Yang Shi <shy828301@gmail.com>,
	Zi Yan <ziy@nvidia.com>,
	linux-mm@kvack.org, Andrea Arcangeli <aarcange@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Arnd Bergmann <arnd@arndb.de>,
	Axel Rasmussen <axelrasmussen@google.com>,
	Chris Kennelly <ckennelly@google.com>,
	Chris Zankel <chris@zankel.net>, Helge Deller <deller@gmx.de>,
	Hugh Dickins <hughd@google.com>,
	Ivan Kokshaysky <ink@jurassic.park.msu.ru>,
	"James E.J. Bottomley" <James.Bottomley@hansenpartnership.com>,
	Jens Axboe <axboe@kernel.dk>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Matt Turner <mattst88@gmail.com>,
	Max Filippov <jcmvbkbc@gmail.com>,
	Miaohe Lin <linmiaohe@huawei.com>,
	Minchan Kim <minchan@kernel.org>,
	Patrick Xia <patrickx@google.com>,
	Pavel Begunkov <asml.silence@gmail.com>,
	Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Subject: Re: [PATCH v2 00/12] mm: userspace hugepage collapse
Date: Fri, 15 Apr 2022 09:39:14 -0400	[thread overview]
Message-ID: <Yll1gg68NVnFRSby@xz-m1.local> (raw)
In-Reply-To: <CAAa6QmQbA4LFoUa19my+0Mc4Rp+ded5yiELB6qpVwhirDNrhtg@mail.gmail.com>

On Thu, Apr 14, 2022 at 05:52:43PM -0700, Zach O'Keefe wrote:
> Hey Peter,
> 
> Thanks for taking the time to review!
> 
> On Thu, Apr 14, 2022 at 5:04 PM Peter Xu <peterx@redhat.com> wrote:
> >
> > Hi, Zach,
> >
> > On Thu, Apr 14, 2022 at 11:06:00AM -0700, Zach O'Keefe wrote:
> > > process_madvise(2)
> > >
> > >       Performs a synchronous collapse of the native pages
> > >       mapped by the list of iovecs into transparent hugepages.
> > >
> > >       Allocation semantics are the same as khugepaged, and depend on
> > >       (1) the active sysfs settings
> > >       /sys/kernel/mm/transparent_hugepage/enabled and
> > >       /sys/kernel/mm/transparent_hugepage/khugepaged/defrag, and (2)
> > >       the VMA flags of the memory range being collapsed.
> > >
> > >       Collapse eligibility criteria differs from khugepaged in that
> > >       the sysfs files
> > >       /sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_[none|swap|shared]
> > >       are ignored.
> >
> > The userspace khugepaged idea definitely makes sense to me, though I'm
> > curious how the line is drown on the different behaviors here by explicitly
> > ignoring the max_ptes_* entries.
> >
> > Let's assume the initiative is to duplicate a more data-aware khugepaged in
> > the userspace, then IMHO it makes more sense to start with all the policies
> > that applies to khugepaged already, including max_pte_*.
> >
> > I can understand the willingness to provide even stronger semantics here
> > than khugepaged since the userspace could have very clear knowledge of how
> > to provision the memories (better than a kernel scanner).  It's just that
> > IMHO it could be slightly confusing if the new interface only partially
> > apply the khugepaged rules.
> >
> > No strong opinion here.  It could already been a trade-off after the
> > discussion from the RFC with Michal which I read..  Just curious about how
> > you made that design decision so feel free to read it as a pure question.
> >
> 
> Understand your point here. The allocation and max_pte_* semantics are
> split between khugepaged-like and fault-like, respectively - which
> could be confusing. Originally, I proposed a MADV_F_COLLAPSE_LIMITS
> flag to control the former's behavior, but agreed to keep things
> simple to start, and expand the interface if/when necessary. I opted
> to ignore max_ptes_* as the default since I envisioned that early
> adopters would "just want it to work". One such example would be
> backing executable text by hugepages on program load when many pages
> haven't been demand-paged in yet.
> 
> What do you think?

I'm just slightly worried that'll make the default MADV_COLLAPSE semantics
blurred.

To me, a clean default definition for MADV_COLLAPSE would be nice, as "do
khugepaged on this range, and with current thread context".  IMHO any
feature bits then can be supplementing special needs, and I'll take the thp
backing executable example to be one of the (good?) reason we'd need an
extra flag for ignoring the max_ptes_* knobs.

So personally if I were you maybe I'll start with the simple scheme of that
(even if it won't immediately service a thing) but then add either the
defrag or ignore_max_ptes_* as feature bits later on, with clear use case
descriptions about why we need each of the feature flags.  IMHO numbers
would be even more helpful when there's specific use cases on the show.

Or, perhaps you think all potential MADV_COLLAPSE users should literally
skip max_ptes_* limitations always?

Anyway, I won't pretend I am an expert in this area. :) So please take that
with a grain of salt.

Thanks,

-- 
Peter Xu



  reply	other threads:[~2022-04-15 13:39 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-14 18:06 [PATCH v2 00/12] mm: userspace hugepage collapse Zach O'Keefe
2022-04-14 18:06 ` [PATCH v2 01/12] mm/khugepaged: record SCAN_PMD_MAPPED when scan_pmd() finds THP Zach O'Keefe
2022-04-14 18:06 ` [PATCH v2 02/12] mm/khugepaged: add struct collapse_control Zach O'Keefe
2022-04-14 18:06 ` [PATCH v2 03/12] mm/khugepaged: make hugepage allocation context-specific Zach O'Keefe
2022-04-14 18:06 ` [PATCH v2 04/12] mm/khugepaged: add struct collapse_result Zach O'Keefe
2022-04-14 18:06 ` [PATCH v2 05/12] mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse Zach O'Keefe
2022-04-14 18:06 ` [PATCH v2 06/12] mm/khugepaged: remove khugepaged prefix from shared collapse functions Zach O'Keefe
2022-04-14 18:06 ` [PATCH v2 07/12] mm/khugepaged: add flag to ignore khugepaged_max_ptes_* Zach O'Keefe
2022-04-14 18:06 ` [PATCH v2 08/12] mm/khugepaged: add flag to ignore page young/referenced requirement Zach O'Keefe
2022-04-14 18:06 ` [PATCH v2 09/12] mm/madvise: add MADV_COLLAPSE to process_madvise() Zach O'Keefe
2022-04-14 18:06 ` [PATCH v2 10/12] selftests/vm: modularize collapse selftests Zach O'Keefe
2022-04-14 18:06 ` [PATCH v2 11/12] selftests/vm: add MADV_COLLAPSE collapse context to selftests Zach O'Keefe
2022-04-14 18:06 ` [PATCH v2 12/12] selftests/vm: add test to verify recollapse of THPs Zach O'Keefe
2022-04-15  0:04 ` [PATCH v2 00/12] mm: userspace hugepage collapse Peter Xu
2022-04-15  0:52   ` Zach O'Keefe
2022-04-15 13:39     ` Peter Xu [this message]
2022-04-15 20:04       ` Zach O'Keefe
2022-04-16 19:26         ` Peter Xu
2022-04-17 17:23           ` Zach O'Keefe
2022-04-19 14:33           ` David Hildenbrand
2022-04-19 15:55             ` Zach O'Keefe
2022-04-19 20:02               ` David Hildenbrand
2022-04-19 22:42                 ` Zach O'Keefe
2022-04-21  0:56                   ` Yang Shi
2022-04-21  1:02                     ` Zach O'Keefe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yll1gg68NVnFRSby@xz-m1.local \
    --to=peterx@redhat.com \
    --cc=James.Bottomley@hansenpartnership.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.shi@linux.alibaba.com \
    --cc=arnd@arndb.de \
    --cc=asml.silence@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=axelrasmussen@google.com \
    --cc=chris@zankel.net \
    --cc=ckennelly@google.com \
    --cc=david@redhat.com \
    --cc=deller@gmx.de \
    --cc=hughd@google.com \
    --cc=ink@jurassic.park.msu.ru \
    --cc=jcmvbkbc@gmail.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linmiaohe@huawei.com \
    --cc=linux-mm@kvack.org \
    --cc=mattst88@gmail.com \
    --cc=mhocko@suse.com \
    --cc=minchan@kernel.org \
    --cc=pasha.tatashin@soleen.com \
    --cc=patrickx@google.com \
    --cc=rientjes@google.com \
    --cc=shy828301@gmail.com \
    --cc=sj@kernel.org \
    --cc=songliubraving@fb.com \
    --cc=tsbogend@alpha.franken.de \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    --cc=ziy@nvidia.com \
    --cc=zokeefe@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.