linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Yu Zhao <yuzhao@google.com>
To: Ryan Roberts <ryan.roberts@arm.com>, Yin Fengwei <fengwei.yin@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	 "Matthew Wilcox (Oracle)" <willy@infradead.org>,
	 "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	David Hildenbrand <david@redhat.com>,
	 Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>,
	 Geert Uytterhoeven <geert@linux-m68k.org>,
	Christian Borntraeger <borntraeger@linux.ibm.com>,
	 Sven Schnelle <svens@linux.ibm.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>,  Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	 "H. Peter Anvin" <hpa@zytor.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	 linux-alpha@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	 linux-ia64@vger.kernel.org, linux-m68k@lists.linux-m68k.org,
	 linux-s390@vger.kernel.org
Subject: Re: [PATCH v1 00/10] variable-order, large folios for anonymous memory
Date: Wed, 28 Jun 2023 12:22:28 -0600	[thread overview]
Message-ID: <CAOUHufbtNPkdktjt_5qM45GegVO-rCFOMkSh0HQminQ12zsV8Q@mail.gmail.com> (raw)
In-Reply-To: <1fb0c4cb-a709-de20-d643-32ed43550059@arm.com>

On Tue, Jun 27, 2023 at 3:59 AM Ryan Roberts <ryan.roberts@arm.com> wrote:
>
> On 27/06/2023 08:49, Yu Zhao wrote:
> > On Mon, Jun 26, 2023 at 9:30 PM Yu Zhao <yuzhao@google.com> wrote:
> >>
> >> On Mon, Jun 26, 2023 at 11:14 AM Ryan Roberts <ryan.roberts@arm.com> wrote:
> >>>
> >>> Hi All,
> >>>
> >>> Following on from the previous RFCv2 [1], this series implements variable order,
> >>> large folios for anonymous memory. The objective of this is to improve
> >>> performance by allocating larger chunks of memory during anonymous page faults:
> >>>
> >>>  - Since SW (the kernel) is dealing with larger chunks of memory than base
> >>>    pages, there are efficiency savings to be had; fewer page faults, batched PTE
> >>>    and RMAP manipulation, fewer items on lists, etc. In short, we reduce kernel
> >>>    overhead. This should benefit all architectures.
> >>>  - Since we are now mapping physically contiguous chunks of memory, we can take
> >>>    advantage of HW TLB compression techniques. A reduction in TLB pressure
> >>>    speeds up kernel and user space. arm64 systems have 2 mechanisms to coalesce
> >>>    TLB entries; "the contiguous bit" (architectural) and HPA (uarch).
> >>>
> >>> This patch set deals with the SW side of things only and based on feedback from
> >>> the RFC, aims to be the most minimal initial change, upon which future
> >>> incremental changes can be added. For this reason, the new behaviour is hidden
> >>> behind a new Kconfig switch, CONFIG_LARGE_ANON_FOLIO, which is disabled by
> >>> default. Although the code has been refactored to parameterize the desired order
> >>> of the allocation, when the feature is disabled (by forcing the order to be
> >>> always 0) my performance tests measure no regression. So I'm hoping this will be
> >>> a suitable mechanism to allow incremental submissions to the kernel without
> >>> affecting the rest of the world.
> >>>
> >>> The patches are based on top of v6.4 plus Matthew Wilcox's set_ptes() series
> >>> [2], which is a hard dependency. I'm not sure of Matthew's exact plans for
> >>> getting that series into the kernel, but I'm hoping we can start the review
> >>> process on this patch set independently. I have a branch at [3].
> >>>
> >>> I've posted a separate series concerning the HW part (contpte mapping) for arm64
> >>> at [4].
> >>>
> >>>
> >>> Performance
> >>> -----------
> >>>
> >>> Below results show 2 benchmarks; kernel compilation and speedometer 2.0 (a
> >>> javascript benchmark running in Chromium). Both cases are running on Ampere
> >>> Altra with 1 NUMA node enabled, Ubuntu 22.04 and XFS filesystem. Each benchmark
> >>> is repeated 15 times over 5 reboots and averaged.
> >>>
> >>> All improvements are relative to baseline-4k. 'anonfolio-basic' is this series.
> >>> 'anonfolio' is the full patch set similar to the RFC with the additional changes
> >>> to the extra 3 fault paths. The rest of the configs are described at [4].
> >>>
> >>> Kernel Compilation (smaller is better):
> >>>
> >>> | kernel          |   real-time |   kern-time |   user-time |
> >>> |:----------------|------------:|------------:|------------:|
> >>> | baseline-4k     |        0.0% |        0.0% |        0.0% |
> >>> | anonfolio-basic |       -5.3% |      -42.9% |       -0.6% |
> >>> | anonfolio       |       -5.4% |      -46.0% |       -0.3% |
> >>> | contpte         |       -6.8% |      -45.7% |       -2.1% |
> >>> | exefolio        |       -8.4% |      -46.4% |       -3.7% |
> >>> | baseline-16k    |       -8.7% |      -49.2% |       -3.7% |
> >>> | baseline-64k    |      -10.5% |      -66.0% |       -3.5% |
> >>>
> >>> Speedometer 2.0 (bigger is better):
> >>>
> >>> | kernel          |   runs_per_min |
> >>> |:----------------|---------------:|
> >>> | baseline-4k     |           0.0% |
> >>> | anonfolio-basic |           0.7% |
> >>> | anonfolio       |           1.2% |
> >>> | contpte         |           3.1% |
> >>> | exefolio        |           4.2% |
> >>> | baseline-16k    |           5.3% |
> >>
> >> Thanks for pushing this forward!
> >>
> >>> Changes since RFCv2
> >>> -------------------
> >>>
> >>>   - Simplified series to bare minimum (on David Hildenbrand's advice)
> >>
> >> My impression is that this series still includes many pieces that can
> >> be split out and discussed separately with followup series.
> >>
> >> (I skipped 04/10 and will look at it tomorrow.)
> >
> > I went through the series twice. Here what I think a bare minimum
> > series (easier to review/debug/land) would look like:

===

> > 1. a new arch specific function providing a prefered order within (0,
> > PMD_ORDER).
> > 2. an extended anon folio alloc API taking that order (02/10, partially).
> > 3. an updated folio_add_new_anon_rmap() covering the large() &&
> > !pmd_mappable() case (similar to 04/10).
> > 4. s/folio_test_pmd_mappable/folio_test_large/ in page_remove_rmap()
> > (06/10, reviewed-by provided).
> > 5. finally, use the extended anon folio alloc API with the arch
> > preferred order in do_anonymous_page() (10/10, partially).

===

> > The rest can be split out into separate series and move forward in
> > parallel with probably a long list of things we need/want to do.
>
> Thanks for the fadt review - I really appreciate it!
>
> I've responded to many of your comments. I'd appreciate if we can close those
> points then I will work up a v2.

Thanks!

Based on the latest discussion here [1], my original list above can be
optionally reduced to 4 patches: item 2 can be quashed into item 5.

Also please make sure we have only one global (apply to all archs)
Kconfig option, and it should be added in item 5:

  if TRANSPARENT_HUGEPAGE
    config FLEXIBLE/VARIABLE_THP # or whatever name you see fit
  end if

(How many new Kconfig options added within arch/arm64/ is not a concern of MM.)

And please make sure it's disabled by default, because we are still
missing many important functions, e.g., I don't think we can mlock()
when large() && !pmd_mappable(), see mlock_pte_range() and
mlock_vma_folio(). We can fix it along with many things later, but we
need to present a plan and a schedule now. Otherwise, there would be
pushback if we try to land the series without supporting mlock().

Do you or Fengwei plan to take on it? (I personally don't.) If not,
I'll try to find someone from our team to look at it. (It'd be more
scalable if we have a coordinated group of people individually solving
different problems.)

[1] https://lore.kernel.org/r/b2c81404-67df-f841-ef02-919e841f49f2@arm.com/


  reply	other threads:[~2023-06-28 18:23 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-26 17:14 [PATCH v1 00/10] variable-order, large folios for anonymous memory Ryan Roberts
2023-06-26 17:14 ` [PATCH v1 01/10] mm: Expose clear_huge_page() unconditionally Ryan Roberts
2023-06-27  1:55   ` Yu Zhao
2023-06-27  7:21     ` Ryan Roberts
2023-06-27  8:29       ` Yu Zhao
2023-06-27  9:41         ` Ryan Roberts
2023-06-27 18:26           ` Yu Zhao
2023-06-28 10:56             ` Ryan Roberts
2023-06-26 17:14 ` [PATCH v1 02/10] mm: pass gfp flags and order to vma_alloc_zeroed_movable_folio() Ryan Roberts
2023-06-27  2:27   ` Yu Zhao
2023-06-27  7:27     ` Ryan Roberts
2023-06-26 17:14 ` [PATCH v1 03/10] mm: Introduce try_vma_alloc_movable_folio() Ryan Roberts
2023-06-27  2:34   ` Yu Zhao
2023-06-27  5:29     ` Yu Zhao
2023-06-27  7:56       ` Ryan Roberts
2023-06-28  2:32         ` Yin Fengwei
2023-06-28 11:06           ` Ryan Roberts
2023-06-26 17:14 ` [PATCH v1 04/10] mm: Implement folio_add_new_anon_rmap_range() Ryan Roberts
2023-06-27  7:08   ` Yu Zhao
2023-06-27  8:09     ` Ryan Roberts
2023-06-28  2:20       ` Yin Fengwei
2023-06-28 11:09         ` Ryan Roberts
2023-06-28  2:17     ` Yin Fengwei
2023-06-26 17:14 ` [PATCH v1 05/10] mm: Implement folio_remove_rmap_range() Ryan Roberts
2023-06-27  3:06   ` Yu Zhao
2023-06-26 17:14 ` [PATCH v1 06/10] mm: Allow deferred splitting of arbitrary large anon folios Ryan Roberts
2023-06-27  2:54   ` Yu Zhao
2023-06-28  2:43   ` Yin Fengwei
2023-06-26 17:14 ` [PATCH v1 07/10] mm: Batch-zap large anonymous folio PTE mappings Ryan Roberts
2023-06-27  3:04   ` Yu Zhao
2023-06-27  9:46     ` Ryan Roberts
2023-06-26 17:14 ` [PATCH v1 08/10] mm: Kconfig hooks to determine max anon folio allocation order Ryan Roberts
2023-06-27  2:47   ` Yu Zhao
2023-06-27  9:54     ` Ryan Roberts
2023-06-29  1:38   ` Yang Shi
2023-06-29 11:31     ` Ryan Roberts
2023-06-26 17:14 ` [PATCH v1 09/10] arm64: mm: Declare support for large anonymous folios Ryan Roberts
2023-06-27  2:53   ` Yu Zhao
2023-06-26 17:14 ` [PATCH v1 10/10] mm: Allocate large folios for anonymous memory Ryan Roberts
2023-06-27  3:01   ` Yu Zhao
2023-06-27  9:57     ` Ryan Roberts
2023-06-27 18:33       ` Yu Zhao
2023-06-29  2:13   ` Yang Shi
2023-06-29 11:30     ` Ryan Roberts
2023-06-29 17:05       ` Yang Shi
2023-06-27  3:30 ` [PATCH v1 00/10] variable-order, " Yu Zhao
2023-06-27  7:49   ` Yu Zhao
2023-06-27  9:59     ` Ryan Roberts
2023-06-28 18:22       ` Yu Zhao [this message]
2023-06-28 23:59         ` Yin Fengwei
2023-06-29  0:27           ` Yu Zhao
2023-06-29  0:31             ` Yin Fengwei
2023-06-29 15:28         ` Ryan Roberts
2023-06-29  2:21     ` Yang Shi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAOUHufbtNPkdktjt_5qM45GegVO-rCFOMkSh0HQminQ12zsV8Q@mail.gmail.com \
    --to=yuzhao@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=borntraeger@linux.ibm.com \
    --cc=bp@alien8.de \
    --cc=catalin.marinas@arm.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=fengwei.yin@intel.com \
    --cc=geert@linux-m68k.org \
    --cc=hpa@zytor.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-alpha@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-ia64@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-m68k@lists.linux-m68k.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=ryan.roberts@arm.com \
    --cc=svens@linux.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).