linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Matthew Wilcox <willy@infradead.org>
To: Zach O'Keefe <zokeefe@google.com>
Cc: David Rientjes <rientjes@google.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>
Subject: Re: mm/khugepaged: collapse file/shmem compound pages
Date: Fri, 27 May 2022 04:47:23 +0100	[thread overview]
Message-ID: <YpBJy9wQXABZeHLL@casper.infradead.org> (raw)
In-Reply-To: <CAAa6QmRfCrGc1RYyy_o4dGiKZJ8ZehBH9Lfg5g09SwXvBXx7HQ@mail.gmail.com>

On Thu, May 26, 2022 at 05:54:27PM -0700, Zach O'Keefe wrote:
> On Wed, May 25, 2022 at 8:36 PM Matthew Wilcox <willy@infradead.org> wrote:
> > On Wed, May 25, 2022 at 06:23:52PM -0700, Zach O'Keefe wrote:
> > > On Wed, May 25, 2022 at 12:07 PM Matthew Wilcox <willy@infradead.org> wrote:
> > > > Anyway, that meaning behind that comment is that the PageTransCompound()
> > > > test is going to be true on any compound page (TransCompound doesn't
> > > > check that the page is necessarily a THP).  So that particular test should
> > > > be folio_test_pmd_mappable(), but there are probably other things which
> > > > ought to be changed, including converting the entire file from dealing
> > > > in pages to dealing in folios.
> > >
> > > Right, at this point, the page might be a pmd-mapped THP, or it could
> > > be a pte-mapped compound page (I'm unsure if we can encounter compound
> > > pages outside hugepages).
> >
> > Today, there is a way.  We can find a folio with an order between 0 and
> > PMD_ORDER if the underlying filesystem supports large folios and the
> > file is executable and we've enabled CONFIG_READ_ONLY_THP_FOR_FS.
> > In this case, we'll simply skip over it because the code believes that
> > means it's already a PMD.
> 
> I think I'm missing something here - sorry. If the folio order is  <
> HPAGE_PMD_ORDER, why does the code think it's a pmd?

Because PageTransCompound() does not do what it says on the tin.

static inline int PageTransCompound(struct page *page)
{
        return PageCompound(page);
}

So any compound page is treated as if it's a PMD-sized page.

> > > If we could tell it's already pmd-mapped, we're done :) IIUC,
> > > folio_test_pmd_mappable() is a necessary but not sufficient condition
> > > to determine this.
> >
> > It is necessary, but from khugepaged's point of view, it's sufficient
> > because khugepaged's job is to create PMD-sized folios -- it's not up to
> > khugepaged to ensure that PMD-sized folios are actually mapped using
> > a PMD.
> 
> I thought the point / benefit of khugepaged was precisely to try and
> find places where we can collapse many pte entries into a single pmd
> mapping?

Ideally, yes.  But if a file is mapped at an address which isn't
PMD-aligned, it can't.  Maybe it should just decline to operate in that
case.

> > There may be some other component of the system (eg DAMON?)
> > which has chosen to temporarily map the PMD-sized folio using PTEs
> > in order to track whether the memory is all being used.  It may also
> > be the case that (for file-based memory), the VMA is mis-aligned and
> > despite creating a PMD-sized folio, it can't be mapped with a PMD.
> 
> AFAIK DAMON doesn't do this pmd splitting to do subpage tracking for
> THPs. Also, I believe retract_page_tables() does make the check to see
> if the address is suitably hugepage aligned/sized.

Maybe not DAMON itself, but it's something that various people are
talkig about doing; trying to determine whether THPs are worth using or
whether userspace has made the magic go-faster call without knowing
whether the valuable 2MB page is being entirely used.

> > shmem still expects folios to be of order either 0 or PMD_ORDER.
> > That assumption extends into the swap code and I haven't had the heart
> > to go and fix all those places yet.  Plus Neil was doing major surgery
> > to the swap code in the most recent deveopment cycle and I didn't want
> > to get in his way.
> >
> > So I am absolutely fine with khugepaged allocating a PMD-size folio for
> > any inode that claims mapping_large_folio_support().  If any filesystems
> > break, we'll fix them.
> 
> Just for clarification, what is the equivalent code today that
> enforces mapping_large_folio_support()? I.e. today, khugepaged can
> successfully collapse file without checking if the inode supports it
> (we only check that it's a regular file not opened for writing).

Yeah, that's a dodgy hack which needs to go away.  But we need a lot
more filesystems converted to supporting large folios before we can
delete it.  Not your responsibility; I'm doing my best to encourage
fs maintainers to do this part.

> Also, just to check, there isn't anything wrong with following
> collapse_file()'s approach, even for folios of 0 < order <
> HPAGE_PMD_ORDER? I.e this part:
> 
>  * Basic scheme is simple, details are more complex:
>  *  - allocate and lock a new huge page;
>  *  - scan page cache replacing old pages with the new one
>  *    + swap/gup in pages if necessary;
>  *    + fill in gaps;
>  *    + keep old pages around in case rollback is required;
>  *  - if replacing succeeds:
>  *    + copy data over;
>  *    + free old pages;
>  *    + unlock huge page;
>  *  - if replacing failed;
>  *    + put all pages back and unfreeze them;
>  *    + restore gaps in the page cache;
>  *    + unlock and free huge page;
>  */

Correct.  At least, as far as I know!  Working on folios has been quite
the education for me ...


  reply	other threads:[~2022-05-27  3:47 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-24 22:42 mm/khugepaged: collapse file/shmem compound pages Zach O'Keefe
2022-05-25 19:07 ` Matthew Wilcox
2022-05-26  1:23   ` Zach O'Keefe
2022-05-26  3:36     ` Matthew Wilcox
2022-05-27  0:54       ` Zach O'Keefe
2022-05-27  3:47         ` Matthew Wilcox [this message]
2022-05-27 16:27           ` Zach O'Keefe
2022-05-28  3:48             ` Matthew Wilcox
2022-05-29 21:36               ` Zach O'Keefe
2022-05-30  1:25                 ` Rongwei Wang
2022-06-01  5:19                   ` Zach O'Keefe
2022-06-01 11:26                     ` Rongwei Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YpBJy9wQXABZeHLL@casper.infradead.org \
    --to=willy@infradead.org \
    --cc=linux-mm@kvack.org \
    --cc=rientjes@google.com \
    --cc=zokeefe@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).