linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Rik van Riel <riel@surriel.com>, alexlzhu@fb.com, linux-mm@kvack.org
Cc: willy@infradead.org, hannes@cmpxchg.org,
	akpm@linux-foundation.org, kernel-team@fb.com,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC 2/3] mm: changes to split_huge_page() to free zero filled tail pages
Date: Tue, 30 Aug 2022 14:33:42 +0200	[thread overview]
Message-ID: <00f2dee2-ebc1-e732-f230-bc5b17da9f80@redhat.com> (raw)
In-Reply-To: <37db29410990991555362154a371b58f47d3cb0c.camel@surriel.com>

On 29.08.22 15:17, Rik van Riel wrote:
> On Mon, 2022-08-29 at 12:02 +0200, David Hildenbrand wrote:
>> On 26.08.22 23:18, Rik van Riel wrote:
>>> On Fri, 2022-08-26 at 12:18 +0200, David Hildenbrand wrote:
>>>> On 25.08.22 23:30, alexlzhu@fb.com wrote:
>>>>> From: Alexander Zhu <alexlzhu@fb.com>
>>>
>>> I could see wanting to maybe consolidate the scanning between
>>> KSM and this thing at some point, if it could be done without
>>> too much complexity, but keeping this change to split_huge_page
>>> looks like it might make sense even when KSM is enabled, since
>>> it will get rid of the unnecessary memory much faster than KSM
>>> could.
>>>
>>> Keeping a hundred MB of unnecessary memory around for longer
>>> would simply result in more THPs getting split up, and more
>>> memory pressure for a longer time than we need.
>>
>> Right. I was wondering if we want to map the shared zeropage instead
>> of
>> the "detected to be zero" page, similar to how KSM would do it. For
>> example, with userfaultfd there would be an observable difference.
>>
>> (maybe that's already done in this patch set)
>>
> The patch does not currently do that, but I suppose it could?
> 

It would be interesting to know why KSM decided to replace the mapped
page with the shared zeropage instead of dropping the page and letting
the next read fault populate the shared zeropage. That code predates
userfaultfd IIRC.

> What exactly are the userfaultfd differences here, and how does
> dropping 4kB pages break things vs. using the shared zeropage?

Once userfaultfd (missing mode) is enabled on a VMA:

1) khugepaged will no longer collapse pte_none(pteval), independent of
khugepaged_max_ptes_none setting -- see __collapse_huge_page_isolate.
[it will also not collapse zeropages, but I recall that that's not
actually required]

So it will not close holes, because the user space fault handler is in
charge of making a decision when something will get mapped there and
with which content.


2) Page faults will no longer populate a THP -- the user space handler
is notified instead and has to decide how the fault will be resolved
(place pages).


If you unmap something (resulting in pte_none()) where previously
something used to be mapped in a page table, you might suddenly inform
the user space fault handler about a page fault that it doesn't expect,
because it previously placed a page and did not zap that page itself
(MADV_DONTNEED).

So at least with userfaultfd I think we have to be careful. Not sure if
there are other corner cases (again, KSM behavior is interesting)

-- 
Thanks,

David / dhildenb


  reply	other threads:[~2022-08-30 12:33 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-25 21:30 [RFC 0/3] THP Shrinker alexlzhu
2022-08-25 21:30 ` [RFC 1/3] mm: add thp_utilization metrics to debugfs alexlzhu
2022-08-27  0:11   ` Zi Yan
2022-08-29 20:19     ` Alex Zhu (Kernel)
2022-08-25 21:30 ` [RFC 2/3] mm: changes to split_huge_page() to free zero filled tail pages alexlzhu
2022-08-26 10:18   ` David Hildenbrand
2022-08-26 18:34     ` Alex Zhu (Kernel)
2022-08-26 21:18     ` Rik van Riel
2022-08-29 10:02       ` David Hildenbrand
2022-08-29 13:17         ` Rik van Riel
2022-08-30 12:33           ` David Hildenbrand [this message]
2022-08-30 21:54             ` Alex Zhu (Kernel)
2022-08-25 21:30 ` [RFC 3/3] mm: THP low utilization shrinker alexlzhu
2022-08-27  0:25   ` Zi Yan
2022-08-29 20:49     ` Alex Zhu (Kernel)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=00f2dee2-ebc1-e732-f230-bc5b17da9f80@redhat.com \
    --to=david@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexlzhu@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=kernel-team@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=riel@surriel.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).