All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yu Zhao <yuzhao@google.com>
To: Ryan Roberts <ryan.roberts@arm.com>
Cc: "Yin, Fengwei" <fengwei.yin@intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Matthew Wilcox <willy@infradead.org>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	David Hildenbrand <david@redhat.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>,
	Anshuman Khandual <anshuman.khandual@arm.com>,
	Yang Shi <shy828301@gmail.com>,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH v2 3/5] mm: Default implementation of arch_wants_pte_order()
Date: Tue, 4 Jul 2023 19:40:02 -0600	[thread overview]
Message-ID: <CAOUHufYRBQv2WZ-RcF5qDm7Y6yLxmzoYzpfUh_CZ5dV=S5L4FA@mail.gmail.com> (raw)
In-Reply-To: <dd9ea461-df2d-afe1-a67c-c73ac1cb96b4@arm.com>

On Tue, Jul 4, 2023 at 7:23 AM Ryan Roberts <ryan.roberts@arm.com> wrote:
>
> On 04/07/2023 13:36, Ryan Roberts wrote:
> > On 04/07/2023 04:59, Yu Zhao wrote:
> >> On Mon, Jul 3, 2023 at 9:02 PM Yu Zhao <yuzhao@google.com> wrote:
> >>>
> >>> On Mon, Jul 3, 2023 at 8:23 PM Yin, Fengwei <fengwei.yin@intel.com> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 7/3/2023 9:53 PM, Ryan Roberts wrote:
> >>>>> arch_wants_pte_order() can be overridden by the arch to return the
> >>>>> preferred folio order for pte-mapped memory. This is useful as some
> >>>>> architectures (e.g. arm64) can coalesce TLB entries when the physical
> >>>>> memory is suitably contiguous.
> >>>>>
> >>>>> The first user for this hint will be FLEXIBLE_THP, which aims to
> >>>>> allocate large folios for anonymous memory to reduce page faults and
> >>>>> other per-page operation costs.
> >>>>>
> >>>>> Here we add the default implementation of the function, used when the
> >>>>> architecture does not define it, which returns the order corresponding
> >>>>> to 64K.
> >>>>>
> >>>>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> >>>>> ---
> >>>>>  include/linux/pgtable.h | 13 +++++++++++++
> >>>>>  1 file changed, 13 insertions(+)
> >>>>>
> >>>>> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
> >>>>> index a661a17173fa..f7e38598f20b 100644
> >>>>> --- a/include/linux/pgtable.h
> >>>>> +++ b/include/linux/pgtable.h
> >>>>> @@ -13,6 +13,7 @@
> >>>>>  #include <linux/errno.h>
> >>>>>  #include <asm-generic/pgtable_uffd.h>
> >>>>>  #include <linux/page_table_check.h>
> >>>>> +#include <linux/sizes.h>
> >>>>>
> >>>>>  #if 5 - defined(__PAGETABLE_P4D_FOLDED) - defined(__PAGETABLE_PUD_FOLDED) - \
> >>>>>       defined(__PAGETABLE_PMD_FOLDED) != CONFIG_PGTABLE_LEVELS
> >>>>> @@ -336,6 +337,18 @@ static inline bool arch_has_hw_pte_young(void)
> >>>>>  }
> >>>>>  #endif
> >>>>>
> >>>>> +#ifndef arch_wants_pte_order
> >>>>> +/*
> >>>>> + * Returns preferred folio order for pte-mapped memory. Must be in range [0,
> >>>>> + * PMD_SHIFT-PAGE_SHIFT) and must not be order-1 since THP requires large folios
> >>>>> + * to be at least order-2.
> >>>>> + */
> >>>>> +static inline int arch_wants_pte_order(struct vm_area_struct *vma)
> >>>>> +{
> >>>>> +     return ilog2(SZ_64K >> PAGE_SHIFT);
> >>>> Default value which is not related with any silicon may be: PAGE_ALLOC_COSTLY_ORDER?
> >>>>
> >>>> Also, current pcp list support cache page with order 0...PAGE_ALLOC_COSTLY_ORDER, 9.
> >>>> If the pcp could cover the page, the pressure to zone lock will be reduced by pcp.
> >>>
> >>> The value of PAGE_ALLOC_COSTLY_ORDER is reasonable but again it's a
> >>> s/w policy not a h/w preference. Besides, I don't think we can include
> >>> mmzone.h in pgtable.h.
> >>
> >> I think we can make a compromise:
> >> 1. change the default implementation of arch_has_hw_pte_young() to return 0, and
> >> 2. in memory.c, we can try PAGE_ALLOC_COSTLY_ORDER for archs that
> >> don't override arch_has_hw_pte_young(), or if its return value is too
> >> large to fit.
> >> This should also take care of the regression, right?
> >
> > I think you are suggesting that we use 0 as a sentinel which we then translate
> > to PAGE_ALLOC_COSTLY_ORDER? I already have a max_anon_folio_order() function in
> > memory.c (actually it is currently a macro defined as arch_wants_pte_order()).
> >
> > So it would become (I'll talk about the vma concern separately in the thread
> > where you raised it):
> >
> > static inline int max_anon_folio_order(struct vm_area_struct *vma)
> > {
> >       int order = arch_wants_pte_order(vma);
> >
> >       return order ? order : PAGE_ALLOC_COSTLY_ORDER;
> > }
> >
> > Correct?
>
> Actually, I'm not sure its a good idea to default to a fixed order. If running
> on an arch with big base pages (e.g. powerpc with 64K pages?), that will soon
> add up to a big chunk of memory, which could be wasteful?
>
> PAGE_ALLOC_COSTLY_ORDER = 3 so with 64K base page, that 512K. Is that a concern?
> Wouldn't it be better to define this as an absolute size? Or even the min of
> PAGE_ALLOC_COSTLY_ORDER and an absolute size?

For my POV, not at all. POWER can use smaller page sizes if they
wanted to -- I don't think they do: at least the distros I use on my
POWER9 all have THP=always by default (2MB).

WARNING: multiple messages have this Message-ID (diff)
From: Yu Zhao <yuzhao@google.com>
To: Ryan Roberts <ryan.roberts@arm.com>
Cc: "Yin, Fengwei" <fengwei.yin@intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	 Matthew Wilcox <willy@infradead.org>,
	 "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	David Hildenbrand <david@redhat.com>,
	 Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>,
	 Anshuman Khandual <anshuman.khandual@arm.com>,
	Yang Shi <shy828301@gmail.com>,
	 linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org,  linux-mm@kvack.org
Subject: Re: [PATCH v2 3/5] mm: Default implementation of arch_wants_pte_order()
Date: Tue, 4 Jul 2023 19:40:02 -0600	[thread overview]
Message-ID: <CAOUHufYRBQv2WZ-RcF5qDm7Y6yLxmzoYzpfUh_CZ5dV=S5L4FA@mail.gmail.com> (raw)
In-Reply-To: <dd9ea461-df2d-afe1-a67c-c73ac1cb96b4@arm.com>

On Tue, Jul 4, 2023 at 7:23 AM Ryan Roberts <ryan.roberts@arm.com> wrote:
>
> On 04/07/2023 13:36, Ryan Roberts wrote:
> > On 04/07/2023 04:59, Yu Zhao wrote:
> >> On Mon, Jul 3, 2023 at 9:02 PM Yu Zhao <yuzhao@google.com> wrote:
> >>>
> >>> On Mon, Jul 3, 2023 at 8:23 PM Yin, Fengwei <fengwei.yin@intel.com> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 7/3/2023 9:53 PM, Ryan Roberts wrote:
> >>>>> arch_wants_pte_order() can be overridden by the arch to return the
> >>>>> preferred folio order for pte-mapped memory. This is useful as some
> >>>>> architectures (e.g. arm64) can coalesce TLB entries when the physical
> >>>>> memory is suitably contiguous.
> >>>>>
> >>>>> The first user for this hint will be FLEXIBLE_THP, which aims to
> >>>>> allocate large folios for anonymous memory to reduce page faults and
> >>>>> other per-page operation costs.
> >>>>>
> >>>>> Here we add the default implementation of the function, used when the
> >>>>> architecture does not define it, which returns the order corresponding
> >>>>> to 64K.
> >>>>>
> >>>>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> >>>>> ---
> >>>>>  include/linux/pgtable.h | 13 +++++++++++++
> >>>>>  1 file changed, 13 insertions(+)
> >>>>>
> >>>>> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
> >>>>> index a661a17173fa..f7e38598f20b 100644
> >>>>> --- a/include/linux/pgtable.h
> >>>>> +++ b/include/linux/pgtable.h
> >>>>> @@ -13,6 +13,7 @@
> >>>>>  #include <linux/errno.h>
> >>>>>  #include <asm-generic/pgtable_uffd.h>
> >>>>>  #include <linux/page_table_check.h>
> >>>>> +#include <linux/sizes.h>
> >>>>>
> >>>>>  #if 5 - defined(__PAGETABLE_P4D_FOLDED) - defined(__PAGETABLE_PUD_FOLDED) - \
> >>>>>       defined(__PAGETABLE_PMD_FOLDED) != CONFIG_PGTABLE_LEVELS
> >>>>> @@ -336,6 +337,18 @@ static inline bool arch_has_hw_pte_young(void)
> >>>>>  }
> >>>>>  #endif
> >>>>>
> >>>>> +#ifndef arch_wants_pte_order
> >>>>> +/*
> >>>>> + * Returns preferred folio order for pte-mapped memory. Must be in range [0,
> >>>>> + * PMD_SHIFT-PAGE_SHIFT) and must not be order-1 since THP requires large folios
> >>>>> + * to be at least order-2.
> >>>>> + */
> >>>>> +static inline int arch_wants_pte_order(struct vm_area_struct *vma)
> >>>>> +{
> >>>>> +     return ilog2(SZ_64K >> PAGE_SHIFT);
> >>>> Default value which is not related with any silicon may be: PAGE_ALLOC_COSTLY_ORDER?
> >>>>
> >>>> Also, current pcp list support cache page with order 0...PAGE_ALLOC_COSTLY_ORDER, 9.
> >>>> If the pcp could cover the page, the pressure to zone lock will be reduced by pcp.
> >>>
> >>> The value of PAGE_ALLOC_COSTLY_ORDER is reasonable but again it's a
> >>> s/w policy not a h/w preference. Besides, I don't think we can include
> >>> mmzone.h in pgtable.h.
> >>
> >> I think we can make a compromise:
> >> 1. change the default implementation of arch_has_hw_pte_young() to return 0, and
> >> 2. in memory.c, we can try PAGE_ALLOC_COSTLY_ORDER for archs that
> >> don't override arch_has_hw_pte_young(), or if its return value is too
> >> large to fit.
> >> This should also take care of the regression, right?
> >
> > I think you are suggesting that we use 0 as a sentinel which we then translate
> > to PAGE_ALLOC_COSTLY_ORDER? I already have a max_anon_folio_order() function in
> > memory.c (actually it is currently a macro defined as arch_wants_pte_order()).
> >
> > So it would become (I'll talk about the vma concern separately in the thread
> > where you raised it):
> >
> > static inline int max_anon_folio_order(struct vm_area_struct *vma)
> > {
> >       int order = arch_wants_pte_order(vma);
> >
> >       return order ? order : PAGE_ALLOC_COSTLY_ORDER;
> > }
> >
> > Correct?
>
> Actually, I'm not sure its a good idea to default to a fixed order. If running
> on an arch with big base pages (e.g. powerpc with 64K pages?), that will soon
> add up to a big chunk of memory, which could be wasteful?
>
> PAGE_ALLOC_COSTLY_ORDER = 3 so with 64K base page, that 512K. Is that a concern?
> Wouldn't it be better to define this as an absolute size? Or even the min of
> PAGE_ALLOC_COSTLY_ORDER and an absolute size?

For my POV, not at all. POWER can use smaller page sizes if they
wanted to -- I don't think they do: at least the distros I use on my
POWER9 all have THP=always by default (2MB).

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2023-07-05  1:40 UTC|newest]

Thread overview: 167+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-03 13:53 [PATCH v2 0/5] variable-order, large folios for anonymous memory Ryan Roberts
2023-07-03 13:53 ` Ryan Roberts
2023-07-03 13:53 ` [PATCH v2 1/5] mm: Non-pmd-mappable, large folios for folio_add_new_anon_rmap() Ryan Roberts
2023-07-03 13:53   ` Ryan Roberts
2023-07-03 19:05   ` Yu Zhao
2023-07-03 19:05     ` Yu Zhao
2023-07-04  2:13     ` Yin, Fengwei
2023-07-04  2:13       ` Yin, Fengwei
2023-07-04 11:19       ` Ryan Roberts
2023-07-04 11:19         ` Ryan Roberts
2023-07-04  2:14   ` Yin, Fengwei
2023-07-04  2:14     ` Yin, Fengwei
2023-07-03 13:53 ` [PATCH v2 2/5] mm: Allow deferred splitting of arbitrary large anon folios Ryan Roberts
2023-07-03 13:53   ` Ryan Roberts
2023-07-07  8:21   ` Huang, Ying
2023-07-07  8:21     ` Huang, Ying
2023-07-07  9:39     ` Ryan Roberts
2023-07-07  9:42     ` Ryan Roberts
2023-07-07  9:42       ` Ryan Roberts
2023-07-10  5:37       ` Huang, Ying
2023-07-10  5:37         ` Huang, Ying
2023-07-10  8:29         ` Ryan Roberts
2023-07-10  8:29           ` Ryan Roberts
2023-07-10  9:01           ` Huang, Ying
2023-07-10  9:01             ` Huang, Ying
2023-07-10  9:39             ` Ryan Roberts
2023-07-10  9:39               ` Ryan Roberts
2023-07-11  1:56               ` Huang, Ying
2023-07-11  1:56                 ` Huang, Ying
2023-07-03 13:53 ` [PATCH v2 3/5] mm: Default implementation of arch_wants_pte_order() Ryan Roberts
2023-07-03 13:53   ` Ryan Roberts
2023-07-03 19:50   ` Yu Zhao
2023-07-03 19:50     ` Yu Zhao
2023-07-04 13:20     ` Ryan Roberts
2023-07-04 13:20       ` Ryan Roberts
2023-07-05  2:07       ` Yu Zhao
2023-07-05  2:07         ` Yu Zhao
2023-07-05  9:11         ` Ryan Roberts
2023-07-05  9:11           ` Ryan Roberts
2023-07-05 17:24           ` Yu Zhao
2023-07-05 17:24             ` Yu Zhao
2023-07-05 18:01             ` Ryan Roberts
2023-07-05 18:01               ` Ryan Roberts
2023-07-06 19:33         ` Matthew Wilcox
2023-07-06 19:33           ` Matthew Wilcox
2023-07-07 10:00           ` Ryan Roberts
2023-07-07 10:00             ` Ryan Roberts
2023-07-04  2:22   ` Yin, Fengwei
2023-07-04  2:22     ` Yin, Fengwei
2023-07-04  3:02     ` Yu Zhao
2023-07-04  3:02       ` Yu Zhao
2023-07-04  3:59       ` Yu Zhao
2023-07-04  3:59         ` Yu Zhao
2023-07-04  5:22         ` Yin, Fengwei
2023-07-04  5:22           ` Yin, Fengwei
2023-07-04  5:42           ` Yu Zhao
2023-07-04  5:42             ` Yu Zhao
2023-07-04 12:36         ` Ryan Roberts
2023-07-04 12:36           ` Ryan Roberts
2023-07-04 13:23           ` Ryan Roberts
2023-07-04 13:23             ` Ryan Roberts
2023-07-05  1:40             ` Yu Zhao [this message]
2023-07-05  1:40               ` Yu Zhao
2023-07-05  1:23           ` Yu Zhao
2023-07-05  1:23             ` Yu Zhao
2023-07-05  2:18             ` Yin Fengwei
2023-07-05  2:18               ` Yin Fengwei
2023-07-03 13:53 ` [PATCH v2 4/5] mm: FLEXIBLE_THP for improved performance Ryan Roberts
2023-07-03 13:53   ` Ryan Roberts
2023-07-03 15:51   ` kernel test robot
2023-07-03 15:51     ` kernel test robot
2023-07-03 16:01   ` kernel test robot
2023-07-03 16:01     ` kernel test robot
2023-07-04  1:35   ` Yu Zhao
2023-07-04  1:35     ` Yu Zhao
2023-07-04 14:08     ` Ryan Roberts
2023-07-04 14:08       ` Ryan Roberts
2023-07-04 23:47       ` Yu Zhao
2023-07-04 23:47         ` Yu Zhao
2023-07-04  3:45   ` Yin, Fengwei
2023-07-04  3:45     ` Yin, Fengwei
2023-07-04 14:20     ` Ryan Roberts
2023-07-04 14:20       ` Ryan Roberts
2023-07-04 23:35       ` Yin Fengwei
2023-07-04 23:57       ` Matthew Wilcox
2023-07-04 23:57         ` Matthew Wilcox
2023-07-05  9:54         ` Ryan Roberts
2023-07-05  9:54           ` Ryan Roberts
2023-07-05 12:08           ` Matthew Wilcox
2023-07-05 12:08             ` Matthew Wilcox
2023-07-07  8:01   ` Huang, Ying
2023-07-07  8:01     ` Huang, Ying
2023-07-07  9:52     ` Ryan Roberts
2023-07-07  9:52       ` Ryan Roberts
2023-07-07 11:29       ` David Hildenbrand
2023-07-07 11:29         ` David Hildenbrand
2023-07-07 13:57         ` Matthew Wilcox
2023-07-07 13:57           ` Matthew Wilcox
2023-07-07 14:07           ` David Hildenbrand
2023-07-07 14:07             ` David Hildenbrand
2023-07-07 15:13             ` Ryan Roberts
2023-07-07 15:13               ` Ryan Roberts
2023-07-07 16:06               ` David Hildenbrand
2023-07-07 16:06                 ` David Hildenbrand
2023-07-07 16:22                 ` Ryan Roberts
2023-07-07 16:22                   ` Ryan Roberts
2023-07-07 19:06                   ` David Hildenbrand
2023-07-07 19:06                     ` David Hildenbrand
2023-07-10  8:41                     ` Ryan Roberts
2023-07-10  8:41                       ` Ryan Roberts
2023-07-10  3:03               ` Huang, Ying
2023-07-10  3:03                 ` Huang, Ying
2023-07-10  8:55                 ` Ryan Roberts
2023-07-10  8:55                   ` Ryan Roberts
2023-07-10  9:18                   ` Huang, Ying
2023-07-10  9:18                     ` Huang, Ying
2023-07-10  9:25                     ` Ryan Roberts
2023-07-10  9:25                       ` Ryan Roberts
2023-07-11  0:48                       ` Huang, Ying
2023-07-11  0:48                         ` Huang, Ying
2023-07-10  2:49           ` Huang, Ying
2023-07-10  2:49             ` Huang, Ying
2023-07-03 13:53 ` [PATCH v2 5/5] arm64: mm: Override arch_wants_pte_order() Ryan Roberts
2023-07-03 13:53   ` Ryan Roberts
2023-07-03 20:02   ` Yu Zhao
2023-07-03 20:02     ` Yu Zhao
2023-07-04  2:18 ` [PATCH v2 0/5] variable-order, large folios for anonymous memory Yu Zhao
2023-07-04  2:18   ` Yu Zhao
2023-07-04  6:22   ` Yin, Fengwei
2023-07-04  6:22     ` Yin, Fengwei
2023-07-04  7:11     ` Yu Zhao
2023-07-04  7:11       ` Yu Zhao
2023-07-04 15:36       ` Ryan Roberts
2023-07-04 15:36         ` Ryan Roberts
2023-07-04 23:52         ` Yin Fengwei
2023-07-05  0:21           ` Yu Zhao
2023-07-05  0:21             ` Yu Zhao
2023-07-05 10:16             ` Ryan Roberts
2023-07-05 10:16               ` Ryan Roberts
2023-07-05 19:00               ` Yu Zhao
2023-07-05 19:00                 ` Yu Zhao
2023-07-05 19:38 ` David Hildenbrand
2023-07-05 19:38   ` David Hildenbrand
2023-07-06  8:02   ` Ryan Roberts
2023-07-06  8:02     ` Ryan Roberts
2023-07-07 11:40     ` David Hildenbrand
2023-07-07 11:40       ` David Hildenbrand
2023-07-07 13:12       ` Matthew Wilcox
2023-07-07 13:12         ` Matthew Wilcox
2023-07-07 13:24         ` David Hildenbrand
2023-07-07 13:24           ` David Hildenbrand
2023-07-10 10:07           ` Ryan Roberts
2023-07-10 10:07             ` Ryan Roberts
2023-07-10 16:57             ` Matthew Wilcox
2023-07-10 16:57               ` Matthew Wilcox
2023-07-10 16:53           ` Zi Yan
2023-07-10 16:53             ` Zi Yan
2023-07-19 15:49             ` Ryan Roberts
2023-07-19 15:49               ` Ryan Roberts
2023-07-19 16:05               ` Zi Yan
2023-07-19 16:05                 ` Zi Yan
2023-07-19 18:37                 ` Ryan Roberts
2023-07-19 18:37                   ` Ryan Roberts
2023-07-11 21:11         ` Luis Chamberlain
2023-07-11 21:11           ` Luis Chamberlain
2023-07-11 21:59           ` Matthew Wilcox
2023-07-11 21:59             ` Matthew Wilcox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAOUHufYRBQv2WZ-RcF5qDm7Y6yLxmzoYzpfUh_CZ5dV=S5L4FA@mail.gmail.com' \
    --to=yuzhao@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=anshuman.khandual@arm.com \
    --cc=catalin.marinas@arm.com \
    --cc=david@redhat.com \
    --cc=fengwei.yin@intel.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ryan.roberts@arm.com \
    --cc=shy828301@gmail.com \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.