linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ankur Arora <ankur.a.arora@oracle.com>
To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, x86@kernel.org
Cc: mingo@kernel.org, bp@alien8.de, luto@kernel.org,
	akpm@linux-foundation.org, mike.kravetz@oracle.com,
	jon.grimm@amd.com, kvm@vger.kernel.org, konrad.wilk@oracle.com,
	boris.ostrovsky@oracle.com,
	Ankur Arora <ankur.a.arora@oracle.com>
Subject: [PATCH v2 04/14] x86/asm: add clzero based page clearing
Date: Wed, 20 Oct 2021 10:02:55 -0700	[thread overview]
Message-ID: <20211020170305.376118-5-ankur.a.arora@oracle.com> (raw)
In-Reply-To: <20211020170305.376118-1-ankur.a.arora@oracle.com>

Add clear_page_clzero(), which uses CLZERO as the underlying primitive.
CLZERO skips the memory hierarchy, so this provides a non-polluting
implementation of clear_page(). Available if X86_FEATURE_CLZERO is set.

CLZERO, from the AMD architecture guide (Vol 3, Rev 3.30):
 "Clears the cache line specified by the logical address in rAX by
  writing a zero to every byte in the line. The instruction uses an
  implied non temporal memory type, similar to a streaming store, and
  uses the write combining protocol to minimize cache pollution.

  CLZERO is weakly-ordered with respect to other instructions that
  operate on memory. Software should use an SFENCE or stronger to
  enforce memory ordering of CLZERO with respect to other store
  instructions.

  The CLZERO instruction executes at any privilege level. CLZERO
  performs all the segmentation and paging checks that a store of
  the specified cache line would perform."

The use-case is similar to clear_page_movnt(), except that
clear_page_clzero() is expected to be more performant.

Cc: jon.grimm@amd.com
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
---
 arch/x86/include/asm/page_64.h |  1 +
 arch/x86/lib/clear_page_64.S   | 19 +++++++++++++++++++
 2 files changed, 20 insertions(+)

diff --git a/arch/x86/include/asm/page_64.h b/arch/x86/include/asm/page_64.h
index cfb95069cf9e..3c53f8ef8818 100644
--- a/arch/x86/include/asm/page_64.h
+++ b/arch/x86/include/asm/page_64.h
@@ -44,6 +44,7 @@ void clear_page_orig(void *page);
 void clear_page_rep(void *page);
 void clear_page_erms(void *page);
 void clear_page_movnt(void *page);
+void clear_page_clzero(void *page);
 
 static inline void clear_page(void *page)
 {
diff --git a/arch/x86/lib/clear_page_64.S b/arch/x86/lib/clear_page_64.S
index 578f40db0716..1cb29a4454e1 100644
--- a/arch/x86/lib/clear_page_64.S
+++ b/arch/x86/lib/clear_page_64.S
@@ -76,3 +76,22 @@ SYM_FUNC_START(clear_page_movnt)
         ja      .Lstart
 	ret
 SYM_FUNC_END(clear_page_movnt)
+
+/*
+ * Zero a page using clzero (on AMD.)
+ * %rdi - page
+ *
+ * Caller needs to issue a sfence at the end.
+ */
+SYM_FUNC_START(clear_page_clzero)
+	movl	$4096,%ecx
+	movq	%rdi,%rax
+
+	.p2align 4
+.Liter:
+        clzero
+        addq    $0x40, %rax
+        sub     $0x40, %ecx
+        ja      .Liter
+	ret
+SYM_FUNC_END(clear_page_clzero)
-- 
2.29.2


  parent reply	other threads:[~2021-10-20 17:05 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-20 17:02 [PATCH v2 00/14] Use uncached stores while clearing huge pages Ankur Arora
2021-10-20 17:02 ` [PATCH v2 01/14] x86/asm: add memset_movnti() Ankur Arora
2021-10-20 17:02 ` [PATCH v2 02/14] perf bench: " Ankur Arora
2021-10-20 17:02 ` [PATCH v2 03/14] x86/asm: add uncached page clearing Ankur Arora
2021-10-20 17:02 ` Ankur Arora [this message]
2021-10-20 17:02 ` [PATCH v2 05/14] x86/cpuid: add X86_FEATURE_MOVNT_SLOW Ankur Arora
2021-10-20 17:02 ` [PATCH v2 06/14] sparse: add address_space __incoherent Ankur Arora
2021-10-20 17:02 ` [PATCH v2 07/14] x86/clear_page: add clear_page_uncached() Ankur Arora
2021-12-08  8:58   ` Yu Xu
2021-12-10  4:37     ` Ankur Arora
2021-12-10  4:48       ` Yu Xu
2021-12-10 14:26         ` Ankur Arora
2021-10-20 17:02 ` [PATCH v2 08/14] mm/clear_page: add clear_page_uncached_threshold() Ankur Arora
2021-10-20 17:03 ` [PATCH v2 09/14] x86/clear_page: add arch_clear_page_uncached_threshold() Ankur Arora
2021-10-20 17:03 ` [PATCH v2 10/14] clear_huge_page: use uncached path Ankur Arora
2021-10-20 17:03 ` [PATCH v2 11/14] gup: add FOLL_HINT_BULK, FAULT_FLAG_UNCACHED Ankur Arora
2021-10-20 17:03 ` [PATCH v2 12/14] gup: use uncached path when clearing large regions Ankur Arora
2021-10-20 18:52 ` [PATCH v2 13/14] vfio_iommu_type1: specify FOLL_HINT_BULK to pin_user_pages() Ankur Arora
2021-10-20 18:52 ` [PATCH v2 14/14] x86/cpu/intel: set X86_FEATURE_MOVNT_SLOW for Skylake Ankur Arora

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211020170305.376118-5-ankur.a.arora@oracle.com \
    --to=ankur.a.arora@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=boris.ostrovsky@oracle.com \
    --cc=bp@alien8.de \
    --cc=jon.grimm@amd.com \
    --cc=konrad.wilk@oracle.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mike.kravetz@oracle.com \
    --cc=mingo@kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).