linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Matthew Wilcox <willy@infradead.org>
To: Zach O'Keefe <zokeefe@google.com>
Cc: David Rientjes <rientjes@google.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>
Subject: Re: mm/khugepaged: collapse file/shmem compound pages
Date: Sat, 28 May 2022 04:48:45 +0100	[thread overview]
Message-ID: <YpGbnbi44JqtRg+n@casper.infradead.org> (raw)
In-Reply-To: <CAAa6QmQgAYJonu=mbv5NZ3DuYOphc9wj2PYV3gBg3=DH_aSM-A@mail.gmail.com>

On Fri, May 27, 2022 at 09:27:33AM -0700, Zach O'Keefe wrote:
> On Thu, May 26, 2022 at 8:47 PM Matthew Wilcox <willy@infradead.org> wrote:
> > Because PageTransCompound() does not do what it says on the tin.
> >
> > static inline int PageTransCompound(struct page *page)
> > {
> >         return PageCompound(page);
> > }
> >
> > So any compound page is treated as if it's a PMD-sized page.
> 
> Right - therein lies the problem :) I think I misattributed your
> comment "we'll simply skip over it because the code believes that
> means it's already a PMD" as a solution, not as the current state of
> things. What we need to be able to do is:
> 
> 1) If folio order == 0: do what we've been doing
> 2) If folio order == HPAGE_PMD_ORDER: check if it's _actually_
> pmd-mapped. If it is, we're done. If not, continue to step (3)

I would not do that part.  Just leave it alone and assume everything's
good.

> 3) Else (folio order > 0 and not pmd-mapped): new magic; hopefully
> it's ~ same as step (1)

Yes, exactly this.

> > > I thought the point / benefit of khugepaged was precisely to try and
> > > find places where we can collapse many pte entries into a single pmd
> > > mapping?
> >
> > Ideally, yes.  But if a file is mapped at an address which isn't
> > PMD-aligned, it can't.  Maybe it should just decline to operate in that
> > case.
> 
> To make sure I'm not missing anything here: It's not actually
> important that the file is mapped at a pmd-aligned address. All that
> is important is that the region of memory being collapsed is
> pmd-aligned. If we wanted to collapse memory mapped to the start of
> the file, then sure, the file has to be mapped suitably.

Ah, what you're probably missing is that for file pages/folios, they
have to be naturally aligned.  The data structure underlying the
page cache simply can't cope with askew pages.  (It kind of can under
some circumstances that are so complicated that they shouldn't be
explained, and it's far easier just to say "folios must be naturally
aligned within the file")

> > > > shmem still expects folios to be of order either 0 or PMD_ORDER.
> > > > That assumption extends into the swap code and I haven't had the heart
> > > > to go and fix all those places yet.  Plus Neil was doing major surgery
> > > > to the swap code in the most recent deveopment cycle and I didn't want
> > > > to get in his way.
> > > >
> > > > So I am absolutely fine with khugepaged allocating a PMD-size folio for
> > > > any inode that claims mapping_large_folio_support().  If any filesystems
> > > > break, we'll fix them.
> > >
> > > Just for clarification, what is the equivalent code today that
> > > enforces mapping_large_folio_support()? I.e. today, khugepaged can
> > > successfully collapse file without checking if the inode supports it
> > > (we only check that it's a regular file not opened for writing).
> >
> > Yeah, that's a dodgy hack which needs to go away.  But we need a lot
> > more filesystems converted to supporting large folios before we can
> > delete it.  Not your responsibility; I'm doing my best to encourage
> > fs maintainers to do this part.
> 
> Got it. In the meantime, do we want to check the old conditions +
> mapping_large_folio_support()?

Yes, that should work.  khugepaged should be free to create large
folios if the underlying filesystem supports them OR (executable,
read-only, CONFIG_THP_RO, etc, etc).

> > > Also, just to check, there isn't anything wrong with following
> > > collapse_file()'s approach, even for folios of 0 < order <
> > > HPAGE_PMD_ORDER? I.e this part:
> > >
> > >  * Basic scheme is simple, details are more complex:
> > >  *  - allocate and lock a new huge page;
> > >  *  - scan page cache replacing old pages with the new one
> > >  *    + swap/gup in pages if necessary;
> > >  *    + fill in gaps;
> > >  *    + keep old pages around in case rollback is required;
> > >  *  - if replacing succeeds:
> > >  *    + copy data over;
> > >  *    + free old pages;
> > >  *    + unlock huge page;
> > >  *  - if replacing failed;
> > >  *    + put all pages back and unfreeze them;
> > >  *    + restore gaps in the page cache;
> > >  *    + unlock and free huge page;
> > >  */
> >
> > Correct.  At least, as far as I know!  Working on folios has been quite
> > the education for me ...
> 
> Great! Well, perhaps I'll run into a snafu here or there (and
> hopefully learn something myself) but this gives me enough confidence
> to naively give it a try and see what happens!
> 
> Again, thank you very much for your time, help and advice with this,

You're welcome!  Thanks for putting in some work on this project!


  reply	other threads:[~2022-05-28  3:48 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-24 22:42 mm/khugepaged: collapse file/shmem compound pages Zach O'Keefe
2022-05-25 19:07 ` Matthew Wilcox
2022-05-26  1:23   ` Zach O'Keefe
2022-05-26  3:36     ` Matthew Wilcox
2022-05-27  0:54       ` Zach O'Keefe
2022-05-27  3:47         ` Matthew Wilcox
2022-05-27 16:27           ` Zach O'Keefe
2022-05-28  3:48             ` Matthew Wilcox [this message]
2022-05-29 21:36               ` Zach O'Keefe
2022-05-30  1:25                 ` Rongwei Wang
2022-06-01  5:19                   ` Zach O'Keefe
2022-06-01 11:26                     ` Rongwei Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YpGbnbi44JqtRg+n@casper.infradead.org \
    --to=willy@infradead.org \
    --cc=linux-mm@kvack.org \
    --cc=rientjes@google.com \
    --cc=zokeefe@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).