linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mateusz Guzik <mjguzik@gmail.com>
To: Ankur Arora <ankur.a.arora@oracle.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, x86@kernel.org,
	akpm@linux-foundation.org, luto@kernel.org, bp@alien8.de,
	dave.hansen@linux.intel.com, hpa@zytor.com, mingo@redhat.com,
	juri.lelli@redhat.com, vincent.guittot@linaro.org,
	willy@infradead.org, mgorman@suse.de, peterz@infradead.org,
	rostedt@goodmis.org, tglx@linutronix.de, jon.grimm@amd.com,
	bharata@amd.com, raghavendra.kt@amd.com,
	boris.ostrovsky@oracle.com, konrad.wilk@oracle.com
Subject: Re: [PATCH v2 0/9] x86/clear_huge_page: multi-page clearing
Date: Sun, 3 Sep 2023 10:14:04 +0200	[thread overview]
Message-ID: <20230903081404.hmkhnrk243h2nuoa@f> (raw)
In-Reply-To: <20230830184958.2333078-1-ankur.a.arora@oracle.com>

On Wed, Aug 30, 2023 at 11:49:49AM -0700, Ankur Arora wrote:
> This series adds a multi-page clearing primitive, clear_pages(),
> which enables more effective use of x86 string instructions by
> advertising the real region-size to be cleared. 
> 
> Region-size can be used as a hint by uarchs to optimize the
> clearing.
> 
> Also add allow_resched() which marks a code-section as allowing
> rescheduling in the irqentry_exit path. This allows clear_pages()
> to get by without having to call cond_sched() periodically.
> (preempt_model_full() already handles this via
> irqentry_exit_cond_resched(), so we handle this similarly for
> preempt_model_none() and preempt_model_voluntary().)
> 
> Performance
> ==
> 
> With this demand fault performance gets a decent increase:
> 
>   *Milan*     mm/clear_huge_page   x86/clear_huge_page   change    
>                           (GB/s)                (GB/s)             
>                                                                    
>   pg-sz=2MB                14.55                 19.29    +32.5%
>   pg-sz=1GB                19.34                 49.60   +156.4%  
> 
> Milan (and some other AMD Zen uarchs tested) take advantage of the
> hint to elide cacheline allocation for pg-sz=1GB. The cut-off for
> this optimization seems to be at around region-size > LLC-size so
> the pg-sz=2MB load still allocates cachelines.
> 

Have you benchmarked clzero? It is an AMD-specific instruction issuing
non-temporal stores. It is definitely something to try out for 1G pages.

One would think rep stosq has to be at least not worse since the CPU is
explicitly told what to do and is free to optimize it however it sees
fit, but the rep prefix has a long history of underperforming.

I'm not saying it is going to be better, but that this should be tested,
albeit one can easily argue this can be done at a later date.

I would do it myself but my access to AMD CPUs is limited.

> 
>   *Icelakex*  mm/clear_huge_page   x86/clear_huge_page   change   
>                           (GB/s)                (GB/s)            
>                                                                   
>   pg-sz=2MB                 9.19                 12.94   +40.8%  
>   pg-sz=1GB                 9.36                 12.97   +38.5%  
> 
> Icelakex sees a decent improvement in performance but for both
> region-sizes does continue to allocate cachelines.
> 
> 
> Negative: there is, a downside to clearing in larger chunks: the
> current approach clears page-at-a-time, narrowing towards
> the faulting subpage. This has better cache characteristics for
> some sequential access workloads where subpages near the faulting
> page have a greater likelihood of access.
> 
> I'm not sure if there are real cases which care about this workload
> but one example is the vm-scalability/case-anon-w-seq-hugetlb test.
> This test starts a process for each online CPU, with each process
> writing sequentially to its set of hugepages.
> 
> The bottleneck here is the memory pipe and so the improvement in
> stime is limited, and because the clearing is less cache-optimal 
> now, utime suffers from worse user cache misses.
> 
>   *Icelakex*               mm/clear_huge_page  x86/clear_huge_page  change
>   (tasks=128, mem=4GB/task)
> 
>   stime                        286.8 +- 3.6%      243.9 +- 4.1%     -14.9%
>   utime                        497.7 +- 4.1%      553.5 +- 2.0%     +11.2%
>   wall-clock                     6.9 +- 2.8%        7.0 +- 1.4%     + 1.4%
> 
> 
>   *Milan*                  mm/clear_huge_page  x86/clear_huge_page  change
>   (mem=1GB/task, tasks=512)
> 
>   stime                        501.3 +- 1.4%      498.0 +- 0.9%      -0.5%
>   utime                        298.7 +- 1.1%      335.0 +- 2.2%     +12.1%
>   wall-clock                     3.5 +- 2.8%        3.8 +- 2.6%      +8.5%
> 
> The same test performs better if we have a smaller number of processes,
> since there is more backend BW available, and thus the improved stime
> compensates for the worse utime.
> 
> This could be improved by using more circuitous chunking (somewhat
> like this:
> https://lore.kernel.org/lkml/20220606203725.1313715-1-ankur.a.arora@oracle.com/).
> But I'm not sure if it is worth doing. Opinions?
> 
> Patches
> ==
> 
> Patch 1, 2, 3:
>   "mm/clear_huge_page: allow arch override for clear_huge_page()",
>   "mm/huge_page: separate clear_huge_page() and copy_huge_page()",
>   "mm/huge_page: cleanup clear_/copy_subpage()"
> are minor. The first one allows clear_huge_page() to have an
> arch specific version and the other two are mechanical cleanup
> patches.
> 
> Patches 3, 4, 5:
>   "x86/clear_page: extend clear_page*() for multi-page clearing",
>   "x86/clear_page: add clear_pages()",
>   "x86/clear_huge_page: multi-page clearing"
> define the x86 specific clear_pages() and clear_huge_pages().
> 
> Patches 6, 7, 8:
>   "sched: define TIF_ALLOW_RESCHED"
>   "irqentry: define irqentry_exit_allow_resched()"
> which defines allow_resched() to demarcate preemptible sections.
> 
> This gets used in patch 9:
>   "x86/clear_huge_page: make clear_contig_region() preemptible".
> 
> Changelog:
> 
> v2:
>   - Addressed review comments from peterz, tglx.
>   - Removed clear_user_pages(), and CONFIG_X86_32:clear_pages()
>   - General code cleanup
> 
> Also at:
>   github.com/terminus/linux clear-pages.v2
> 
> Comments appreciated!
> 
> Ankur Arora (9):
>   mm/clear_huge_page: allow arch override for clear_huge_page()
>   mm/huge_page: separate clear_huge_page() and copy_huge_page()
>   mm/huge_page: cleanup clear_/copy_subpage()
>   x86/clear_page: extend clear_page*() for multi-page clearing
>   x86/clear_page: add clear_pages()
>   x86/clear_huge_page: multi-page clearing
>   sched: define TIF_ALLOW_RESCHED
>   irqentry: define irqentry_exit_allow_resched()
>   x86/clear_huge_page: make clear_contig_region() preemptible
> 
>  arch/x86/include/asm/page_64.h     |  27 +++--
>  arch/x86/include/asm/thread_info.h |   2 +
>  arch/x86/lib/clear_page_64.S       |  52 ++++++---
>  arch/x86/mm/hugetlbpage.c          |  59 ++++++++++
>  include/linux/entry-common.h       |  13 +++
>  include/linux/sched.h              |  30 +++++
>  kernel/entry/common.c              |  13 ++-
>  kernel/sched/core.c                |  32 ++---
>  mm/memory.c                        | 181 +++++++++++++++++------------
>  9 files changed, 297 insertions(+), 112 deletions(-)
> 
> -- 
> 2.31.1
> 
> 

  parent reply	other threads:[~2023-09-03  8:14 UTC|newest]

Thread overview: 152+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-30 18:49 [PATCH v2 0/9] x86/clear_huge_page: multi-page clearing Ankur Arora
2023-08-30 18:49 ` [PATCH v2 1/9] mm/clear_huge_page: allow arch override for clear_huge_page() Ankur Arora
2023-08-30 18:49 ` [PATCH v2 2/9] mm/huge_page: separate clear_huge_page() and copy_huge_page() Ankur Arora
2023-08-30 18:49 ` [PATCH v2 3/9] mm/huge_page: cleanup clear_/copy_subpage() Ankur Arora
2023-09-08 13:09   ` Matthew Wilcox
2023-09-11 17:22     ` Ankur Arora
2023-08-30 18:49 ` [PATCH v2 4/9] x86/clear_page: extend clear_page*() for multi-page clearing Ankur Arora
2023-09-08 13:11   ` Matthew Wilcox
2023-08-30 18:49 ` [PATCH v2 5/9] x86/clear_page: add clear_pages() Ankur Arora
2023-08-30 18:49 ` [PATCH v2 6/9] x86/clear_huge_page: multi-page clearing Ankur Arora
2023-08-31 18:26   ` kernel test robot
2023-09-08 12:38   ` Peter Zijlstra
2023-09-13  6:43   ` Raghavendra K T
2023-08-30 18:49 ` [PATCH v2 7/9] sched: define TIF_ALLOW_RESCHED Ankur Arora
2023-09-08  7:02   ` Peter Zijlstra
2023-09-08 17:15     ` Linus Torvalds
2023-09-08 22:50       ` Peter Zijlstra
2023-09-09  5:15         ` Linus Torvalds
2023-09-09  6:39           ` Ankur Arora
2023-09-09  9:11             ` Peter Zijlstra
2023-09-09 20:04               ` Ankur Arora
2023-09-09  5:30       ` Ankur Arora
2023-09-09  9:12         ` Peter Zijlstra
2023-09-09 20:15     ` Ankur Arora
2023-09-09 21:16       ` Linus Torvalds
2023-09-10  3:48         ` Ankur Arora
2023-09-10  4:35           ` Linus Torvalds
2023-09-10 10:01             ` Ankur Arora
2023-09-10 18:32               ` Linus Torvalds
2023-09-11 15:04                 ` Peter Zijlstra
2023-09-11 16:29                   ` andrew.cooper3
2023-09-11 17:04                   ` Ankur Arora
2023-09-12  8:26                     ` Peter Zijlstra
2023-09-12 12:24                       ` Phil Auld
2023-09-12 12:33                       ` Matthew Wilcox
2023-09-18 23:42                       ` Thomas Gleixner
2023-09-19  1:57                         ` Linus Torvalds
2023-09-19  8:03                           ` Ingo Molnar
2023-09-19  8:43                             ` Ingo Molnar
2023-09-19 13:43                               ` Thomas Gleixner
2023-09-19 13:25                             ` Thomas Gleixner
2023-09-19 12:30                           ` Thomas Gleixner
2023-09-19 13:00                             ` Arches that don't support PREEMPT Matthew Wilcox
2023-09-19 13:34                               ` Geert Uytterhoeven
2023-09-19 13:37                               ` John Paul Adrian Glaubitz
2023-09-19 13:42                                 ` Peter Zijlstra
2023-09-19 13:48                                   ` John Paul Adrian Glaubitz
2023-09-19 14:16                                     ` Peter Zijlstra
2023-09-19 14:24                                       ` John Paul Adrian Glaubitz
2023-09-19 14:32                                         ` Matthew Wilcox
2023-09-19 15:31                                           ` Steven Rostedt
2023-09-20 14:38                                       ` Anton Ivanov
2023-09-21 12:20                                       ` Arnd Bergmann
2023-09-19 14:17                                     ` Thomas Gleixner
2023-09-19 14:50                                       ` H. Peter Anvin
2023-09-19 14:57                                         ` Matt Turner
2023-09-19 17:09                                         ` Ulrich Teichert
2023-09-19 17:25                                     ` Linus Torvalds
2023-09-19 17:58                                       ` John Paul Adrian Glaubitz
2023-09-19 18:31                                       ` Thomas Gleixner
2023-09-19 18:38                                         ` Steven Rostedt
2023-09-19 18:52                                           ` Linus Torvalds
2023-09-19 19:53                                             ` Thomas Gleixner
2023-09-20  7:32                                           ` Ingo Molnar
2023-09-20  7:29                                         ` Ingo Molnar
2023-09-20  8:26                                       ` Thomas Gleixner
2023-09-20 10:37                                       ` David Laight
2023-09-19 14:21                                   ` Anton Ivanov
2023-09-19 15:17                                     ` Thomas Gleixner
2023-09-19 15:21                                       ` Anton Ivanov
2023-09-19 16:22                                         ` Richard Weinberger
2023-09-19 16:41                                           ` Anton Ivanov
2023-09-19 17:33                                             ` Thomas Gleixner
2023-10-06 14:51                               ` Geert Uytterhoeven
2023-09-20 14:22                             ` [PATCH v2 7/9] sched: define TIF_ALLOW_RESCHED Ankur Arora
2023-09-20 20:51                               ` Thomas Gleixner
2023-09-21  0:14                                 ` Thomas Gleixner
2023-09-21  0:58                                 ` Ankur Arora
2023-09-21  2:12                                   ` Thomas Gleixner
2023-09-20 23:58                             ` Thomas Gleixner
2023-09-21  0:57                               ` Ankur Arora
2023-09-21  2:02                                 ` Thomas Gleixner
2023-09-21  4:16                                   ` Ankur Arora
2023-09-21 13:59                                     ` Steven Rostedt
2023-09-21 16:00                               ` Linus Torvalds
2023-09-21 22:55                                 ` Thomas Gleixner
2023-09-23  1:11                                   ` Thomas Gleixner
2023-10-02 14:15                                     ` Steven Rostedt
2023-10-02 16:13                                       ` Thomas Gleixner
2023-10-18  1:03                                     ` Paul E. McKenney
2023-10-18 12:09                                       ` Ankur Arora
2023-10-18 17:51                                         ` Paul E. McKenney
2023-10-18 22:53                                           ` Thomas Gleixner
2023-10-18 23:25                                             ` Paul E. McKenney
2023-10-18 13:16                                       ` Thomas Gleixner
2023-10-18 14:31                                         ` Steven Rostedt
2023-10-18 17:55                                           ` Paul E. McKenney
2023-10-18 18:00                                             ` Steven Rostedt
2023-10-18 18:13                                               ` Paul E. McKenney
2023-10-19 12:37                                                 ` Daniel Bristot de Oliveira
2023-10-19 17:08                                                   ` Paul E. McKenney
2023-10-18 17:19                                         ` Paul E. McKenney
2023-10-18 17:41                                           ` Steven Rostedt
2023-10-18 17:59                                             ` Paul E. McKenney
2023-10-18 20:15                                           ` Ankur Arora
2023-10-18 20:42                                             ` Paul E. McKenney
2023-10-19  0:21                                           ` Thomas Gleixner
2023-10-19 19:13                                             ` Paul E. McKenney
2023-10-20 21:59                                               ` Paul E. McKenney
2023-10-20 22:56                                               ` Ankur Arora
2023-10-20 23:36                                                 ` Paul E. McKenney
2023-10-21  1:05                                                   ` Ankur Arora
2023-10-21  2:08                                                     ` Paul E. McKenney
2023-10-24 12:15                                               ` Thomas Gleixner
2023-10-24 18:59                                                 ` Paul E. McKenney
2023-09-23 22:50                             ` Thomas Gleixner
2023-09-24  0:10                               ` Thomas Gleixner
2023-09-24  7:19                               ` Matthew Wilcox
2023-09-24  7:55                                 ` Thomas Gleixner
2023-09-24 10:29                                   ` Matthew Wilcox
2023-09-25  0:13                               ` Ankur Arora
2023-10-06 13:01                             ` Geert Uytterhoeven
2023-09-19  7:21                         ` Ingo Molnar
2023-09-19 19:05                         ` Ankur Arora
2023-10-24 14:34                         ` Steven Rostedt
2023-10-25  1:49                           ` Steven Rostedt
2023-10-26  7:50                           ` Sergey Senozhatsky
2023-10-26 12:48                             ` Steven Rostedt
2023-09-11 16:48             ` Steven Rostedt
2023-09-11 20:50               ` Linus Torvalds
2023-09-11 21:16                 ` Linus Torvalds
2023-09-12  7:20                   ` Peter Zijlstra
2023-09-12  7:38                     ` Ingo Molnar
2023-09-11 22:20                 ` Steven Rostedt
2023-09-11 23:10                   ` Ankur Arora
2023-09-11 23:16                     ` Steven Rostedt
2023-09-12 16:30                   ` Linus Torvalds
2023-09-12  3:27                 ` Matthew Wilcox
2023-09-12 16:20                   ` Linus Torvalds
2023-09-19  3:21   ` Andy Lutomirski
2023-09-19  9:20     ` Thomas Gleixner
2023-09-19  9:49       ` Ingo Molnar
2023-08-30 18:49 ` [PATCH v2 8/9] irqentry: define irqentry_exit_allow_resched() Ankur Arora
2023-09-08 12:42   ` Peter Zijlstra
2023-09-11 17:24     ` Ankur Arora
2023-08-30 18:49 ` [PATCH v2 9/9] x86/clear_huge_page: make clear_contig_region() preemptible Ankur Arora
2023-09-08 12:45   ` Peter Zijlstra
2023-09-03  8:14 ` Mateusz Guzik [this message]
2023-09-05 22:14   ` [PATCH v2 0/9] x86/clear_huge_page: multi-page clearing Ankur Arora
2023-09-08  2:18   ` Raghavendra K T
2023-09-05  1:06 ` Raghavendra K T
2023-09-05 19:36   ` Ankur Arora

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230903081404.hmkhnrk243h2nuoa@f \
    --to=mjguzik@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=ankur.a.arora@oracle.com \
    --cc=bharata@amd.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=jon.grimm@amd.com \
    --cc=juri.lelli@redhat.com \
    --cc=konrad.wilk@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=raghavendra.kt@amd.com \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=vincent.guittot@linaro.org \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).