From: David Hildenbrand <david@redhat.com>
To: Ryan Roberts <ryan.roberts@arm.com>,
"Yin, Fengwei" <fengwei.yin@intel.com>, Zi Yan <ziy@nvidia.com>,
Matthew Wilcox <willy@infradead.org>, Yu Zhao <yuzhao@google.com>,
David Rientjes <rientjes@google.com>
Cc: Linux-MM <linux-mm@kvack.org>
Subject: Re: Prerequisites for Large Anon Folios
Date: Wed, 30 Aug 2023 18:20:36 +0200 [thread overview]
Message-ID: <43736fdb-1a9c-4ab4-bf9c-6e2052c6dfea@redhat.com> (raw)
In-Reply-To: <7f66344b-bf63-41e0-ae79-0a0a1d4f2afd@arm.com>
On 30.08.23 12:44, Ryan Roberts wrote:
> Hi All,
>
Hi Ryan,
I'll be back from vacation next Wednesday.
Note that I asked David R. to have large anon folios as topic for the
next bi-weekly mm meeting.
There, we should discuss things like
* naming
* accounting (/proc/meminfo)
* required toggles (especially, to ways to disable it, as we want to
keep toggles minimal)
David R. raised that there are certainly workloads where the additional
memory overhead is usually not acceptable. So it will be valuable to get
input from others.
>
> I want to get serious about getting large anon folios merged. To do that, there
> are a number of outstanding prerequistes. I'm hoping the respective owners may
> be able to provide an update on progress?
I shared some details in the last meeting when you were on vacation :)
High level update below.
[...]
>>
>> - item:
>> shared vs exclusive mappings
>>
>> priority:
>> prerequisite
>>
>> description: >-
>> New mechanism to allow us to easily determine precisely whether a given
>> folio is mapped exclusively or shared between multiple processes. Required
>> for (from David H):
>>
>> (1) Detecting shared folios, to not mess with them while they are shared.
>> MADV_PAGEOUT, user-triggered page migration, NUMA hinting, khugepaged ...
>> replace cases where folio_estimated_sharers() == 1 would currently be the
>> best we can do (and in some cases, page_mapcount() == 1).
>>
>> (2) COW improvements for PTE-mapped large anon folios after fork(). Before
>> fork(), PageAnonExclusive would have been reliable, after fork() it's not.
>>
>> For (1), "MADV_PAGEOUT" maps to the "madvise" item captured in this list. I
>> *think* "NUMA hinting" maps to "numa balancing" (but need confirmation!).
>> "user-triggered page migration" and "khugepaged" not yet captured (would
>> appreciate someone fleshing it out). I previously understood migration to be
>> working for large folios - is "user-triggered page migration" some specific
>> aspect that does not work?
>>
>> For (2), this relates to Large Anon Folio enhancements which I plan to
>> tackle after we get the basic series merged.
>>
>> links:
>> - 'email thread: Mapcount games: "exclusive mapped" vs. "mapped shared"'
>>
>> location:
>> - shrink_folio_list()
>>
>> assignee:
>> David Hildenbrand <david@redhat.com>
>
> Any comment on this David? I think the last comment I saw was that you were
> planning to start an implementation a couple of weeks back? Did that get anywhere?
The math should be solid at this point and I had a simple prototype
running -- including fairly clean COW reuse handling.
I started cleaning it all up before my vacation. I'll first need the
total mapcount (which I sent), and might have to implement rmap patching
during THP split (easy), but I first have to do more measurements.
Willies patches to free up space in the first tail page will be
required. In addition, my patches to free up ->private in tail pages for
THP_SWAP. Both things on their way upstream.
Based on that, I need a bit spinlock to protect the total
mapcount+tracking data. There are things to measure (contention) and
optimize (why even care about tracking shared vs. exclusive if it's
pretty guaranteed to always be shared -- for example, shared libraries).
So it looks reasonable at this point, but I'll have to look into
possible contentions and optimizations once I have the basics
implemented cleanly.
It's a shame we cannot get the subpage mapcount out of the way
immediately, then it wouldn't be "additional tracking" but "different
tracking" :)
Once back from vacation, I'm planning on prioritizing this. Shouldn't
take ages to get it cleaned up. Measurements and optimizations might
take a bit longer.
[...]
>>
>> assignee:
>> Yin, Fengwei <fengwei.yin@intel.com>
>
> As I understand it: initial solution based on folio_estimated_sharers() has gone
> into v6.5. Have a dependecy on David's precise shared vs exclusive work for an
shared vs. exclusive in place would replace folio_estimated_sharers()
users and most sub-page mapcount users.
> improved solution. And I think you mentioned you are planning to do a change
> that avoids splitting a large folio if it is entirely covered by the range?
[..]
>>
>> - item:
>> numa balancing
>>
>> priority:
>> prerequisite
>>
>> description: >-
>> Large, pte-mapped folios are ignored by numa-balancing code. Commit comment
>> (e81c480): "We're going to have THP mapped with PTEs. It will confuse
>> numabalancing. Let's skip them for now." Likely depends on "shared vs
>> exclusive mappings". >>
>> links: []
>>
>> location:
>> - do_numa_page()
>>
>> assignee:
>> <none>
>>
>
> Vaguely sounded like David might be planning to tackle this as part of his work
> on "shared vs exclusive mappings" ("NUMA hinting"??). David?
It should be easy to handle it based on that. Similarly, khugepaged IIRC.
--
Cheers,
David / dhildenb
next prev parent reply other threads:[~2023-08-30 16:20 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-20 9:41 Prerequisites for Large Anon Folios Ryan Roberts
2023-07-23 12:33 ` Yin, Fengwei
2023-07-24 9:04 ` Ryan Roberts
2023-07-24 9:33 ` Yin, Fengwei
2023-07-24 9:46 ` Ryan Roberts
2023-07-24 9:54 ` Yin, Fengwei
2023-07-24 11:42 ` David Hildenbrand
2023-08-30 10:08 ` Ryan Roberts
2023-08-31 0:01 ` Yin, Fengwei
2023-08-31 7:16 ` Ryan Roberts
2023-08-30 10:44 ` Ryan Roberts
2023-08-30 16:20 ` David Hildenbrand [this message]
2023-08-31 7:26 ` Ryan Roberts
2023-08-31 7:59 ` David Hildenbrand
2023-08-31 9:04 ` Ryan Roberts
2023-09-01 14:44 ` David Hildenbrand
2023-09-04 10:06 ` Ryan Roberts
2023-09-05 20:54 ` David Rientjes
2023-08-31 0:08 ` Yin, Fengwei
2023-08-31 7:18 ` Ryan Roberts
2023-08-31 7:38 ` Yin, Fengwei
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=43736fdb-1a9c-4ab4-bf9c-6e2052c6dfea@redhat.com \
--to=david@redhat.com \
--cc=fengwei.yin@intel.com \
--cc=linux-mm@kvack.org \
--cc=rientjes@google.com \
--cc=ryan.roberts@arm.com \
--cc=willy@infradead.org \
--cc=yuzhao@google.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).