From: "Zach O'Keefe" <zokeefe@google.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: David Rientjes <rientjes@google.com>,
"linux-mm@kvack.org" <linux-mm@kvack.org>
Subject: Re: mm/khugepaged: collapse file/shmem compound pages
Date: Wed, 25 May 2022 18:23:52 -0700 [thread overview]
Message-ID: <CAAa6QmQhqHrKa5M4vRCAPtOa4pTet6MfELprN2Wb0rv46PSjTA@mail.gmail.com> (raw)
In-Reply-To: <Yo5+c6vFyLgjtVsG@casper.infradead.org>
On Wed, May 25, 2022 at 12:07 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Tue, May 24, 2022 at 03:42:55PM -0700, Zach O'Keefe wrote:
> > Hey Matthew,
> >
> > I'm leading an attempt to add a new madvise mode, MADV_COLLAPSE, to
> > allow userspace-directed collapse of memory into THPs[1]. The initial
> > proposal only supports anonymous memory, but I'm
> > working on adding support for file-backed and shmem memory.
> >
> > The intended behavior of MADV_COLLAPSE is that it should return
> > "success" if all hugepage-aligned / sized regions requested are backed
> > by pmd-mapped THPs on return (races aside). IOW: we were able to
> > successfully collapse the memory, or it was already backed by
> > pmd-mapped THPs.
> >
> > Currently there is a nice "XXX: khugepaged should compact smaller
> > compound pages into a PMD sized page" in khugepaged_scan_file() when
> > we encounter a compound page during scanning. Do you know what kind of
> > gotchas or technical difficulties would be involved in doing this? I
> > presume this work would also benefit those relying on khugepaged to
> > collapse read-only file and shmem memory, and I'd be happy to help
> > move it forward.
Hey Matthew,
Thanks for taking the time!
>
> Hi Zach,
>
> Thanks for your interest, and I'd love some help on this.
>
> The khugepaged code (like much of the mm used to) assumes that memory
> comes in two sizes, PTE and PMD. That's still true for anon and shmem
> for now, but hopefully we'll start managing both anon & shmem memory in
> larger chunks, without necessarily going as far as PMD.
>
> I think the purpose of khugepaged should continue to be to construct
> PMD-size pages; I don't see the point of it wandering through process VMs
> replacing order-2 pages with order-5 pages. I may be wrong about that,
> of course, so feel free to argue with me.
I'd agree here.
> Anyway, that meaning behind that comment is that the PageTransCompound()
> test is going to be true on any compound page (TransCompound doesn't
> check that the page is necessarily a THP). So that particular test should
> be folio_test_pmd_mappable(), but there are probably other things which
> ought to be changed, including converting the entire file from dealing
> in pages to dealing in folios.
Right, at this point, the page might be a pmd-mapped THP, or it could
be a pte-mapped compound page (I'm unsure if we can encounter compound
pages outside hugepages).
If we could tell it's already pmd-mapped, we're done :) IIUC,
folio_test_pmd_mappable() is a necessary but not sufficient condition
to determine this.
Else, if it's not, is it safe to try and continue? Suppose we find a
folio of 0 < order < HPAGE_PMD_ORDER. Are we safely able to try and
extend it, or will we break some filesystems that expect a certain
order folio?
> I actually have one patch which starts in that direction, but I haven't
> followed it up yet with all the other patches to that file which will
> be needed:
Thanks for the head start! Not an expert here, but would you say
converting this file to use folios is a necessary first step?
Again, thanks for your time,
Zach
> From a64ac45ad951557103a1040c8bcc3f229022cd26 Mon Sep 17 00:00:00 2001
> From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> Date: Fri, 7 May 2021 23:40:19 -0400
> Subject: [PATCH] mm/khugepaged: Allocate folios
>
> khugepaged only wants to deal in terms of folios, so switch to
> using the folio allocation functions. This eliminates the calls to
> prep_transhuge_page() and saves dozens of bytes of text.
>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
> mm/khugepaged.c | 32 ++++++++++++--------------------
> 1 file changed, 12 insertions(+), 20 deletions(-)
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 637bfecd6bf5..ec60ee4e14c9 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -854,18 +854,20 @@ static bool khugepaged_prealloc_page(struct page **hpage, bool *wait)
> static struct page *
> khugepaged_alloc_page(struct page **hpage, gfp_t gfp, int node)
> {
> + struct folio *folio;
> +
> VM_BUG_ON_PAGE(*hpage, *hpage);
>
> - *hpage = __alloc_pages_node(node, gfp, HPAGE_PMD_ORDER);
> - if (unlikely(!*hpage)) {
> + folio = __folio_alloc_node(gfp, HPAGE_PMD_ORDER, node);
> + if (unlikely(!folio)) {
> count_vm_event(THP_COLLAPSE_ALLOC_FAILED);
> *hpage = ERR_PTR(-ENOMEM);
> return NULL;
> }
>
> - prep_transhuge_page(*hpage);
> count_vm_event(THP_COLLAPSE_ALLOC);
> - return *hpage;
> + *hpage = &folio->page;
> + return &folio->page;
> }
> #else
> static int khugepaged_find_target_node(void)
> @@ -873,24 +875,14 @@ static int khugepaged_find_target_node(void)
> return 0;
> }
>
> -static inline struct page *alloc_khugepaged_hugepage(void)
> -{
> - struct page *page;
> -
> - page = alloc_pages(alloc_hugepage_khugepaged_gfpmask(),
> - HPAGE_PMD_ORDER);
> - if (page)
> - prep_transhuge_page(page);
> - return page;
> -}
> -
> static struct page *khugepaged_alloc_hugepage(bool *wait)
> {
> - struct page *hpage;
> + struct folio *folio;
>
> do {
> - hpage = alloc_khugepaged_hugepage();
> - if (!hpage) {
> + folio = folio_alloc(alloc_hugepage_khugepaged_gfpmask(),
> + HPAGE_PMD_ORDER);
> + if (!folio) {
> count_vm_event(THP_COLLAPSE_ALLOC_FAILED);
> if (!*wait)
> return NULL;
> @@ -899,9 +891,9 @@ static struct page *khugepaged_alloc_hugepage(bool *wait)
> khugepaged_alloc_sleep();
> } else
> count_vm_event(THP_COLLAPSE_ALLOC);
> - } while (unlikely(!hpage) && likely(khugepaged_enabled()));
> + } while (unlikely(!folio) && likely(khugepaged_enabled()));
>
> - return hpage;
> + return &folio->page;
> }
>
> static bool khugepaged_prealloc_page(struct page **hpage, bool *wait)
> --
> 2.34.1
>
next prev parent reply other threads:[~2022-05-26 1:24 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-05-24 22:42 mm/khugepaged: collapse file/shmem compound pages Zach O'Keefe
2022-05-25 19:07 ` Matthew Wilcox
2022-05-26 1:23 ` Zach O'Keefe [this message]
2022-05-26 3:36 ` Matthew Wilcox
2022-05-27 0:54 ` Zach O'Keefe
2022-05-27 3:47 ` Matthew Wilcox
2022-05-27 16:27 ` Zach O'Keefe
2022-05-28 3:48 ` Matthew Wilcox
2022-05-29 21:36 ` Zach O'Keefe
2022-05-30 1:25 ` Rongwei Wang
2022-06-01 5:19 ` Zach O'Keefe
2022-06-01 11:26 ` Rongwei Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAAa6QmQhqHrKa5M4vRCAPtOa4pTet6MfELprN2Wb0rv46PSjTA@mail.gmail.com \
--to=zokeefe@google.com \
--cc=linux-mm@kvack.org \
--cc=rientjes@google.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).