linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Ryan Roberts <ryan.roberts@arm.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Matthew Wilcox <willy@infradead.org>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Yin Fengwei <fengwei.yin@intel.com>, Yu Zhao <yuzhao@google.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>,
	Anshuman Khandual <anshuman.khandual@arm.com>,
	Yang Shi <shy828301@gmail.com>
Cc: linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH v2 0/5] variable-order, large folios for anonymous memory
Date: Fri, 7 Jul 2023 13:40:53 +0200	[thread overview]
Message-ID: <d9cb4563-c622-9660-287b-a2f35121aec7@redhat.com> (raw)
In-Reply-To: <4d4c45a2-0037-71de-b182-f516fee07e67@arm.com>

On 06.07.23 10:02, Ryan Roberts wrote:
> On 05/07/2023 20:38, David Hildenbrand wrote:
>> On 03.07.23 15:53, Ryan Roberts wrote:
>>> Hi All,
>>>
>>> This is v2 of a series to implement variable order, large folios for anonymous
>>> memory. The objective of this is to improve performance by allocating larger
>>> chunks of memory during anonymous page faults. See [1] for background.
>>>
> 
> [...]
> 
>>> Thanks,
>>> Ryan
>>
>> Hi Ryan,
>>
>> is page migration already working as expected (what about page compaction?), and
>> do we handle migration -ENOMEM when allocating a target page: do we split an
>> fallback to 4k page migration?
>>
> 
> Hi David, All,

Hi Ryan,

thanks a lot for the list.

But can you comment on the page migration part (IOW did you try it already)?

For example, memory hotunplug, CMA, MCE handling, compaction all rely on 
page migration of something that was allocated using GFP_MOVABLE to 
actually work.

Compaction seems to skip any higher-order folios, but the question is if 
the udnerlying migration itself works.

If it already works: great! If not, this really has to be tackled early, 
because otherwise we'll be breaking the GFP_MOVABLE semantics.

> 
> This series aims to be the bare minimum to demonstrate allocation of large anon
> folios. As such, there is a laundry list of things that need to be done for this
> feature to play nicely with other features. My preferred route is to merge this
> with it's Kconfig defaulted to disabled, and its Kconfig description clearly
> shouting that it's EXPERIMENTAL with an explanation of why (similar to
> READ_ONLY_THP_FOR_FS).
As long as we are not sure about the user space control and as long as 
basic functionality is not working (example, page migration), I would 
tend to not merge this early just for the sake of it.

But yes, something like mlock can eventually be tackled later: as long 
as there is a runtime interface to disable it ;)

> 
> That said, I've put together a table of the items that I'm aware of that need
> attention. It would be great if people can review and add any missing items.
> Then we can hopefully parallelize the implementation work. David, I don't think
> the items you raised are covered - would you mind providing a bit more detail so
> I can add them to the list? (or just add them to the list yourself, if you prefer).
> 
> ---
> 
> - item:
>      mlock
> 
>    description: >-
>      Large, pte-mapped folios are ignored when mlock is requested. Code comment
>      for mlock_vma_folio() says "...filter out pte mappings of THPs, which
>      cannot be consistently counted: a pte mapping of the THP head cannot be
>      distinguished by the page alone."
> 
>    location:
>      - mlock_pte_range()
>      - mlock_vma_folio()
> 
>    assignee:
>      Yin, Fengwei
> 
> 
> - item:
>      numa balancing
> 
>    description: >-
>      Large, pte-mapped folios are ignored by numa-balancing code. Commit
>      comment (e81c480): "We're going to have THP mapped with PTEs. It will
>      confuse numabalancing. Let's skip them for now."
> 
>    location:
>      - do_numa_page()
> 
>    assignee:
>      <none>
> 
> 
> - item:
>      madvise
> 
>    description: >-
>      MADV_COLD, MADV_PAGEOUT, MADV_FREE: For large folios, code assumes
>      exclusive only if mapcount==1, else skips remainder of operation. For
>      large, pte-mapped folios, exclusive folios can have mapcount upto nr_pages
>      and still be exclusive. Even better; don't split the folio if it fits
>      entirely within the range? Discussion at
> 
> https://lore.kernel.org/linux-mm/6cec6f68-248e-63b4-5615-9e0f3f819a0a@redhat.com/
>      talks about changing folio mapcounting - may help determine if exclusive
>      without pgtable scan?
> 
>    location:
>      - madvise_cold_or_pageout_pte_range()
>      - madvise_free_pte_range()
> 
>    assignee:
>      <none>
> 
> 
> - item:
>      shrink_folio_list
> 
>    description: >-
>      Raised by Yu Zhao; I can't see the problem in the code - need
>      clarification
> 
>    location:
>      - shrink_folio_list()
> 
>    assignee:
>      <none>
> 
> 
> - item:
>      compaction
> 
>    description: >-
>      Raised at LSFMM: Compaction skips non-order-0 pages. Already problem for
>      page-cache pages today. Is my understand correct?
> 
>    location:
>      - <where?>
> 
>    assignee:
>      <none>

I'm still thinking about the whole mapcount thingy (and I burned way too 
much time on that yesterday), which is a big item for such a list and 
affects some of these items.

A pagetable scan is pretty much irrelevant for order-2 pages. But once 
we're talking about higher orders we really don't want to do that.

I'm preparing a writeup with users and challenges.


Is swapping working as expected? zswap?

-- 
Cheers,

David / dhildenb



  reply	other threads:[~2023-07-07 11:41 UTC|newest]

Thread overview: 84+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-03 13:53 [PATCH v2 0/5] variable-order, large folios for anonymous memory Ryan Roberts
2023-07-03 13:53 ` [PATCH v2 1/5] mm: Non-pmd-mappable, large folios for folio_add_new_anon_rmap() Ryan Roberts
2023-07-03 19:05   ` Yu Zhao
2023-07-04  2:13     ` Yin, Fengwei
2023-07-04 11:19       ` Ryan Roberts
2023-07-04  2:14   ` Yin, Fengwei
2023-07-03 13:53 ` [PATCH v2 2/5] mm: Allow deferred splitting of arbitrary large anon folios Ryan Roberts
2023-07-07  8:21   ` Huang, Ying
2023-07-07  9:42     ` Ryan Roberts
2023-07-10  5:37       ` Huang, Ying
2023-07-10  8:29         ` Ryan Roberts
2023-07-10  9:01           ` Huang, Ying
2023-07-10  9:39             ` Ryan Roberts
2023-07-11  1:56               ` Huang, Ying
2023-07-03 13:53 ` [PATCH v2 3/5] mm: Default implementation of arch_wants_pte_order() Ryan Roberts
2023-07-03 19:50   ` Yu Zhao
2023-07-04 13:20     ` Ryan Roberts
2023-07-05  2:07       ` Yu Zhao
2023-07-05  9:11         ` Ryan Roberts
2023-07-05 17:24           ` Yu Zhao
2023-07-05 18:01             ` Ryan Roberts
2023-07-06 19:33         ` Matthew Wilcox
2023-07-07 10:00           ` Ryan Roberts
2023-07-04  2:22   ` Yin, Fengwei
2023-07-04  3:02     ` Yu Zhao
2023-07-04  3:59       ` Yu Zhao
2023-07-04  5:22         ` Yin, Fengwei
2023-07-04  5:42           ` Yu Zhao
2023-07-04 12:36         ` Ryan Roberts
2023-07-04 13:23           ` Ryan Roberts
2023-07-05  1:40             ` Yu Zhao
2023-07-05  1:23           ` Yu Zhao
2023-07-05  2:18             ` Yin Fengwei
2023-07-03 13:53 ` [PATCH v2 4/5] mm: FLEXIBLE_THP for improved performance Ryan Roberts
2023-07-03 15:51   ` kernel test robot
2023-07-03 16:01   ` kernel test robot
2023-07-04  1:35   ` Yu Zhao
2023-07-04 14:08     ` Ryan Roberts
2023-07-04 23:47       ` Yu Zhao
2023-07-04  3:45   ` Yin, Fengwei
2023-07-04 14:20     ` Ryan Roberts
2023-07-04 23:35       ` Yin Fengwei
2023-07-04 23:57       ` Matthew Wilcox
2023-07-05  9:54         ` Ryan Roberts
2023-07-05 12:08           ` Matthew Wilcox
2023-07-07  8:01   ` Huang, Ying
2023-07-07  9:52     ` Ryan Roberts
2023-07-07 11:29       ` David Hildenbrand
2023-07-07 13:57         ` Matthew Wilcox
2023-07-07 14:07           ` David Hildenbrand
2023-07-07 15:13             ` Ryan Roberts
2023-07-07 16:06               ` David Hildenbrand
2023-07-07 16:22                 ` Ryan Roberts
2023-07-07 19:06                   ` David Hildenbrand
2023-07-10  8:41                     ` Ryan Roberts
2023-07-10  3:03               ` Huang, Ying
2023-07-10  8:55                 ` Ryan Roberts
2023-07-10  9:18                   ` Huang, Ying
2023-07-10  9:25                     ` Ryan Roberts
2023-07-11  0:48                       ` Huang, Ying
2023-07-10  2:49           ` Huang, Ying
2023-07-03 13:53 ` [PATCH v2 5/5] arm64: mm: Override arch_wants_pte_order() Ryan Roberts
2023-07-03 20:02   ` Yu Zhao
2023-07-04  2:18 ` [PATCH v2 0/5] variable-order, large folios for anonymous memory Yu Zhao
2023-07-04  6:22   ` Yin, Fengwei
2023-07-04  7:11     ` Yu Zhao
2023-07-04 15:36       ` Ryan Roberts
2023-07-04 23:52         ` Yin Fengwei
2023-07-05  0:21           ` Yu Zhao
2023-07-05 10:16             ` Ryan Roberts
2023-07-05 19:00               ` Yu Zhao
2023-07-05 19:38 ` David Hildenbrand
2023-07-06  8:02   ` Ryan Roberts
2023-07-07 11:40     ` David Hildenbrand [this message]
2023-07-07 13:12       ` Matthew Wilcox
2023-07-07 13:24         ` David Hildenbrand
2023-07-10 10:07           ` Ryan Roberts
2023-07-10 16:57             ` Matthew Wilcox
2023-07-10 16:53           ` Zi Yan
2023-07-19 15:49             ` Ryan Roberts
2023-07-19 16:05               ` Zi Yan
2023-07-19 18:37                 ` Ryan Roberts
2023-07-11 21:11         ` Luis Chamberlain
2023-07-11 21:59           ` Matthew Wilcox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d9cb4563-c622-9660-287b-a2f35121aec7@redhat.com \
    --to=david@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=anshuman.khandual@arm.com \
    --cc=catalin.marinas@arm.com \
    --cc=fengwei.yin@intel.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ryan.roberts@arm.com \
    --cc=shy828301@gmail.com \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).