linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Ryan Roberts <ryan.roberts@arm.com>,
	"Yin, Fengwei" <fengwei.yin@intel.com>, Zi Yan <ziy@nvidia.com>,
	Matthew Wilcox <willy@infradead.org>, Yu Zhao <yuzhao@google.com>,
	David Rientjes <rientjes@google.com>
Cc: Linux-MM <linux-mm@kvack.org>
Subject: Re: Prerequisites for Large Anon Folios
Date: Wed, 30 Aug 2023 18:20:36 +0200	[thread overview]
Message-ID: <43736fdb-1a9c-4ab4-bf9c-6e2052c6dfea@redhat.com> (raw)
In-Reply-To: <7f66344b-bf63-41e0-ae79-0a0a1d4f2afd@arm.com>

On 30.08.23 12:44, Ryan Roberts wrote:
> Hi All,
> 

Hi Ryan,

I'll be back from vacation next Wednesday.

Note that I asked David R. to have large anon folios as topic for the 
next bi-weekly mm meeting.

There, we should discuss things like
* naming
* accounting (/proc/meminfo)
* required toggles (especially, to ways to disable it, as we want to
   keep toggles minimal)

David R. raised that there are certainly workloads where the additional 
memory overhead is usually not acceptable. So it will be valuable to get 
input from others.

> 
> I want to get serious about getting large anon folios merged. To do that, there
> are a number of outstanding prerequistes. I'm hoping the respective owners may
> be able to provide an update on progress?

I shared some details in the last meeting when you were on vacation :)

High level update below.

[...]

>>
>> - item:
>>      shared vs exclusive mappings
>>
>>    priority:
>>      prerequisite
>>
>>    description: >-
>>      New mechanism to allow us to easily determine precisely whether a given
>>      folio is mapped exclusively or shared between multiple processes. Required
>>      for (from David H):
>>
>>      (1) Detecting shared folios, to not mess with them while they are shared.
>>      MADV_PAGEOUT, user-triggered page migration, NUMA hinting, khugepaged ...
>>      replace cases where folio_estimated_sharers() == 1 would currently be the
>>      best we can do (and in some cases, page_mapcount() == 1).
>>
>>      (2) COW improvements for PTE-mapped large anon folios after fork(). Before
>>      fork(), PageAnonExclusive would have been reliable, after fork() it's not.
>>
>>      For (1), "MADV_PAGEOUT" maps to the "madvise" item captured in this list. I
>>      *think* "NUMA hinting" maps to "numa balancing" (but need confirmation!).
>>      "user-triggered page migration" and "khugepaged" not yet captured (would
>>      appreciate someone fleshing it out). I previously understood migration to be
>>      working for large folios - is "user-triggered page migration" some specific
>>      aspect that does not work?
>>
>>      For (2), this relates to Large Anon Folio enhancements which I plan to
>>      tackle after we get the basic series merged.
>>
>>    links:
>>      - 'email thread: Mapcount games: "exclusive mapped" vs. "mapped shared"'
>>
>>    location:
>>      - shrink_folio_list()
>>
>>    assignee:
>>      David Hildenbrand <david@redhat.com>
> 
> Any comment on this David? I think the last comment I saw was that you were
> planning to start an implementation a couple of weeks back? Did that get anywhere?

The math should be solid at this point and I had a simple prototype 
running -- including fairly clean COW reuse handling.

I started cleaning it all up before my vacation. I'll first need the 
total mapcount (which I sent), and might have to implement rmap patching 
during THP split (easy), but I first have to do more measurements.

Willies patches to free up space in the first tail page will be 
required. In addition, my patches to free up ->private in tail pages for 
THP_SWAP. Both things on their way upstream.

Based on that, I need a bit spinlock to protect the total 
mapcount+tracking data. There are things to measure (contention) and 
optimize (why even care about tracking shared vs. exclusive if it's 
pretty guaranteed to always be shared -- for example, shared libraries).

So it looks reasonable at this point, but I'll have to look into 
possible contentions and optimizations once I have the basics 
implemented cleanly.

It's a shame we cannot get the subpage mapcount out of the way 
immediately, then it wouldn't be "additional tracking" but "different 
tracking" :)

Once back from vacation, I'm planning on prioritizing this. Shouldn't 
take ages to get it cleaned up. Measurements and optimizations might 
take a bit longer.

[...]


>>
>>    assignee:
>>      Yin, Fengwei <fengwei.yin@intel.com>
> 
> As I understand it: initial solution based on folio_estimated_sharers() has gone
> into v6.5. Have a dependecy on David's precise shared vs exclusive work for an

shared vs. exclusive in place would replace folio_estimated_sharers() 
users and most sub-page mapcount users.

> improved solution. And I think you mentioned you are planning to do a change
> that avoids splitting a large folio if it is entirely covered by the range?

[..]
>>
>> - item:
>>      numa balancing
>>
>>    priority:
>>      prerequisite
>>
>>    description: >-
>>      Large, pte-mapped folios are ignored by numa-balancing code. Commit comment
>>      (e81c480): "We're going to have THP mapped with PTEs. It will confuse
>>      numabalancing. Let's skip them for now." Likely depends on "shared vs
>>      exclusive mappings". >>
>>    links: []
>>
>>    location:
>>      - do_numa_page()
>>
>>    assignee:
>>      <none>
>>
> 
> Vaguely sounded like David might be planning to tackle this as part of his work
> on "shared vs exclusive mappings" ("NUMA hinting"??). David?

It should be easy to handle it based on that. Similarly, khugepaged IIRC.

-- 
Cheers,

David / dhildenb



  reply	other threads:[~2023-08-30 16:20 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-20  9:41 Prerequisites for Large Anon Folios Ryan Roberts
2023-07-23 12:33 ` Yin, Fengwei
2023-07-24  9:04   ` Ryan Roberts
2023-07-24  9:33     ` Yin, Fengwei
2023-07-24  9:46       ` Ryan Roberts
2023-07-24  9:54         ` Yin, Fengwei
2023-07-24 11:42         ` David Hildenbrand
2023-08-30 10:08       ` Ryan Roberts
2023-08-31  0:01         ` Yin, Fengwei
2023-08-31  7:16           ` Ryan Roberts
2023-08-30 10:44 ` Ryan Roberts
2023-08-30 16:20   ` David Hildenbrand [this message]
2023-08-31  7:26     ` Ryan Roberts
2023-08-31  7:59       ` David Hildenbrand
2023-08-31  9:04         ` Ryan Roberts
2023-09-01 14:44           ` David Hildenbrand
2023-09-04 10:06             ` Ryan Roberts
2023-09-05 20:54               ` David Rientjes
2023-08-31  0:08   ` Yin, Fengwei
2023-08-31  7:18     ` Ryan Roberts
2023-08-31  7:38       ` Yin, Fengwei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=43736fdb-1a9c-4ab4-bf9c-6e2052c6dfea@redhat.com \
    --to=david@redhat.com \
    --cc=fengwei.yin@intel.com \
    --cc=linux-mm@kvack.org \
    --cc=rientjes@google.com \
    --cc=ryan.roberts@arm.com \
    --cc=willy@infradead.org \
    --cc=yuzhao@google.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).