From: Andrew Morton <akpm@linux-foundation.org>
To: Cannon Matthews <cannonmatthews@google.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>,
Matthew Wilcox <willy@infradead.org>,
Michal Hocko <mhocko@kernel.org>,
David Rientjes <rientjes@google.com>,
Greg Thelen <gthelen@google.com>, Salman Qazi <sqazi@google.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Ingo Molnar <mingo@elte.hu>, Thomas Gleixner <tglx@linutronix.de>,
"H. Peter Anvin" <hpa@zytor.com>,
Peter Zijlstra <peterz@infradead.org>
Subject: Re: [PATCH] mm: clear 1G pages with streaming stores on x86
Date: Sat, 7 Mar 2020 14:06:16 -0800 [thread overview]
Message-ID: <20200307140616.034b3e537ffd46eb757dec73@linux-foundation.org> (raw)
In-Reply-To: <20200307010353.172991-1-cannonmatthews@google.com>
On Fri, 6 Mar 2020 17:03:53 -0800 Cannon Matthews <cannonmatthews@google.com> wrote:
> Reimplement clear_gigantic_page() to clear gigabytes pages using the
> non-temporal streaming store instructions that bypass the cache
> (movnti), since an entire 1GiB region will not fit in the cache anyway.
>
> Doing an mlock() on a 512GiB 1G-hugetlb region previously would take on
> average 134 seconds, about 260ms/GiB which is quite slow. Using `movnti`
> and optimizing the control flow over the constituent small pages, this
> can be improved roughly by a factor of 3-4x, with the 512GiB mlock()
> taking only 34 seconds on average, or 67ms/GiB.
>
> The assembly code for the __clear_page_nt routine is more or less
> taken directly from the output of gcc with -O3 for this function with
> some tweaks to support arbitrary sizes and moving memory barriers:
>
> void clear_page_nt_64i (void *page)
> {
> for (int i = 0; i < GiB /sizeof(long long int); ++i)
> {
> _mm_stream_si64 (((long long int*)page) + i, 0);
> }
> sfence();
> }
>
> Tested:
> Time to `mlock()` a 512GiB region on broadwell CPU
> AVG time (s) % imp. ms/page
> clear_page_erms 133.584 - 261
> clear_page_nt 34.154 74.43% 67
>
> An earlier version of this code was sent as an RFC patch ~July 2018
> https://patchwork.kernel.org/patch/10543193/ but never merged.
>
> ...
>
> MAINTAINERS | 1 +
> arch/x86/Kconfig | 4 ++++
> arch/x86/include/asm/page_64.h | 1 +
> arch/x86/lib/Makefile | 2 +-
> arch/x86/lib/clear_gigantic_page.c | 28 ++++++++++++++++++++++++++++
> arch/x86/lib/clear_page_64.S | 19 +++++++++++++++++++
> include/linux/mm.h | 2 ++
> mm/memory.c | 2 ++
Please cc the x86 maintainers on such things.
>
> ...
>
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -70,6 +70,7 @@ config X86
> select ARCH_HAS_KCOV if X86_64
> select ARCH_HAS_MEM_ENCRYPT
> select ARCH_HAS_MEMBARRIER_SYNC_CORE
> + select ARCH_HAS_CLEAR_GIGANTIC_PAGE if X86_64
> select ARCH_HAS_PMEM_API if X86_64
> select ARCH_HAS_PTE_DEVMAP if X86_64
> select ARCH_HAS_PTE_SPECIAL
> @@ -290,6 +291,9 @@ config ARCH_MAY_HAVE_PC_FDC
> config GENERIC_CALIBRATE_DELAY
> def_bool y
>
> +config ARCH_HAS_CLEAR_GIGANTIC_PAGE
> + bool
Opinions might differ, but I believe the Linus-approved way of doing
this sort of thing is
#ifndef clear_gigantic_page
extern void clear_gigantic_page(...);
#define clear_gigantic_page clear_gigantic_page
#endif
alternatively, __weak works nicely.
>
> ...
>
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2856,6 +2856,8 @@ enum mf_action_page_type {
> };
>
> #if defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_HUGETLBFS)
> +extern void clear_gigantic_page(struct page *page, unsigned long addr,
> + unsigned int pages);
Shouldn't this be inside #ifdef CONFIG_ARCH_HAS_CLEAR_GIGANTIC_PAGE?
Also, the third argument here is called "pages" but here:
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -4706,6 +4706,7 @@ static inline void process_huge_page(
> }
> }
>
> +#ifndef CONFIG_ARCH_HAS_CLEAR_GIGANTIC_PAGE
> static void clear_gigantic_page(struct page *page,
> unsigned long addr,
> unsigned int pages_per_huge_page)
it is called "pages_per_huge_page".
> @@ -4720,6 +4721,7 @@ static void clear_gigantic_page(struct page *page,
> clear_user_highpage(p, addr + i * PAGE_SIZE);
> }
> }
> +#endif /* CONFIG_ARCH_HAS_CLEAR_GIGANTIC_PAGE */
>
> ...
next prev parent reply other threads:[~2020-03-07 22:08 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-03-07 1:03 [PATCH] mm: clear 1G pages with streaming stores on x86 Cannon Matthews
2020-03-07 22:06 ` Andrew Morton [this message]
2020-03-09 0:08 ` Kirill A. Shutemov
2020-03-09 9:06 ` Michal Hocko
2020-03-09 9:35 ` Kirill A. Shutemov
2020-03-09 11:36 ` Kirill A. Shutemov
2020-03-09 12:26 ` Michal Hocko
2020-03-09 18:01 ` Mike Kravetz
2020-03-09 15:38 ` Andi Kleen
2020-03-09 18:37 ` Matthew Wilcox
2020-03-11 0:21 ` Cannon Matthews
2020-03-11 0:54 ` Kirill A. Shutemov
2020-03-11 3:35 ` Arvind Sankar
2020-03-11 8:16 ` Kirill A. Shutemov
2020-03-11 18:32 ` Arvind Sankar
2020-03-11 20:32 ` Arvind Sankar
2020-03-12 0:52 ` Kirill A. Shutemov
2020-03-31 0:40 ` Elliott, Robert (Servers)
2020-03-16 10:18 ` Michal Hocko
2020-03-16 12:19 ` Kirill A. Shutemov
2020-03-26 19:46 ` Matthew Wilcox
2020-03-11 15:07 ` David Laight
2020-03-09 15:33 ` Andi Kleen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200307140616.034b3e537ffd46eb757dec73@linux-foundation.org \
--to=akpm@linux-foundation.org \
--cc=cannonmatthews@google.com \
--cc=gthelen@google.com \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=mike.kravetz@oracle.com \
--cc=mingo@elte.hu \
--cc=peterz@infradead.org \
--cc=rientjes@google.com \
--cc=sqazi@google.com \
--cc=tglx@linutronix.de \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).