All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yu Zhao <yuzhao@google.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: "Andi Kleen" <ak@linux.intel.com>,
	"Aneesh Kumar" <aneesh.kumar@linux.ibm.com>,
	"Catalin Marinas" <catalin.marinas@arm.com>,
	"Dave Hansen" <dave.hansen@linux.intel.com>,
	"Hillf Danton" <hdanton@sina.com>, "Jens Axboe" <axboe@kernel.dk>,
	"Johannes Weiner" <hannes@cmpxchg.org>,
	"Jonathan Corbet" <corbet@lwn.net>,
	"Linus Torvalds" <torvalds@linux-foundation.org>,
	"Matthew Wilcox" <willy@infradead.org>,
	"Mel Gorman" <mgorman@suse.de>,
	"Michael Larabel" <Michael@michaellarabel.com>,
	"Michal Hocko" <mhocko@kernel.org>,
	"Mike Rapoport" <rppt@kernel.org>,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Tejun Heo" <tj@kernel.org>, "Vlastimil Babka" <vbabka@suse.cz>,
	"Will Deacon" <will@kernel.org>,
	linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org, x86@kernel.org,
	page-reclaim@google.com, "Yu Zhao" <yuzhao@google.com>,
	"Barry Song" <baohua@kernel.org>,
	"Brian Geffon" <bgeffon@google.com>,
	"Jan Alexander Steffens" <heftig@archlinux.org>,
	"Oleksandr Natalenko" <oleksandr@natalenko.name>,
	"Steven Barrett" <steven@liquorix.net>,
	"Suleiman Souhlal" <suleiman@google.com>,
	"Daniel Byrne" <djbyrne@mtu.edu>,
	"Donald Carr" <d@chaos-reins.com>,
	"Holger Hoffstätte" <holger@applied-asynchrony.com>,
	"Konstantin Kharlamov" <Hi-Angel@yandex.ru>,
	"Shuang Zhai" <szhai2@cs.rochester.edu>,
	"Sofia Trinh" <sofia.trinh@edi.works>,
	"Vaibhav Jain" <vaibhav@linux.ibm.com>
Subject: [PATCH v13 01/14] mm: x86, arm64: add arch_has_hw_pte_young()
Date: Wed,  6 Jul 2022 16:00:10 -0600	[thread overview]
Message-ID: <20220706220022.968789-2-yuzhao@google.com> (raw)
In-Reply-To: <20220706220022.968789-1-yuzhao@google.com>

Some architectures automatically set the accessed bit in PTEs, e.g.,
x86 and arm64 v8.2. On architectures that do not have this capability,
clearing the accessed bit in a PTE usually triggers a page fault
following the TLB miss of this PTE (to emulate the accessed bit).

Being aware of this capability can help make better decisions, e.g.,
whether to spread the work out over a period of time to reduce bursty
page faults when trying to clear the accessed bit in many PTEs.

Note that theoretically this capability can be unreliable, e.g.,
hotplugged CPUs might be different from builtin ones. Therefore it
should not be used in architecture-independent code that involves
correctness, e.g., to determine whether TLB flushes are required (in
combination with the accessed bit).

Signed-off-by: Yu Zhao <yuzhao@google.com>
Reviewed-by: Barry Song <baohua@kernel.org>
Acked-by: Brian Geffon <bgeffon@google.com>
Acked-by: Jan Alexander Steffens (heftig) <heftig@archlinux.org>
Acked-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Acked-by: Steven Barrett <steven@liquorix.net>
Acked-by: Suleiman Souhlal <suleiman@google.com>
Acked-by: Will Deacon <will@kernel.org>
Tested-by: Daniel Byrne <djbyrne@mtu.edu>
Tested-by: Donald Carr <d@chaos-reins.com>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Tested-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>
Tested-by: Shuang Zhai <szhai2@cs.rochester.edu>
Tested-by: Sofia Trinh <sofia.trinh@edi.works>
Tested-by: Vaibhav Jain <vaibhav@linux.ibm.com>
---
 arch/arm64/include/asm/pgtable.h | 15 ++-------------
 arch/x86/include/asm/pgtable.h   |  6 +++---
 include/linux/pgtable.h          | 13 +++++++++++++
 mm/memory.c                      | 14 +-------------
 4 files changed, 19 insertions(+), 29 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 0b6632f18364..c46399c0500c 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -1066,24 +1066,13 @@ static inline void update_mmu_cache(struct vm_area_struct *vma,
  * page after fork() + CoW for pfn mappings. We don't always have a
  * hardware-managed access flag on arm64.
  */
-static inline bool arch_faults_on_old_pte(void)
-{
-	/* The register read below requires a stable CPU to make any sense */
-	cant_migrate();
-
-	return !cpu_has_hw_af();
-}
-#define arch_faults_on_old_pte		arch_faults_on_old_pte
+#define arch_has_hw_pte_young		cpu_has_hw_af
 
 /*
  * Experimentally, it's cheap to set the access flag in hardware and we
  * benefit from prefaulting mappings as 'old' to start with.
  */
-static inline bool arch_wants_old_prefaulted_pte(void)
-{
-	return !arch_faults_on_old_pte();
-}
-#define arch_wants_old_prefaulted_pte	arch_wants_old_prefaulted_pte
+#define arch_wants_old_prefaulted_pte	cpu_has_hw_af
 
 static inline bool pud_sect_supported(void)
 {
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 44e2d6f1dbaa..dc5f7d8ef68a 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1431,10 +1431,10 @@ static inline bool arch_has_pfn_modify_check(void)
 	return boot_cpu_has_bug(X86_BUG_L1TF);
 }
 
-#define arch_faults_on_old_pte arch_faults_on_old_pte
-static inline bool arch_faults_on_old_pte(void)
+#define arch_has_hw_pte_young arch_has_hw_pte_young
+static inline bool arch_has_hw_pte_young(void)
 {
-	return false;
+	return true;
 }
 
 #ifdef CONFIG_PAGE_TABLE_CHECK
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index 3cdc16cfd867..8eee31bc9bde 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -260,6 +260,19 @@ static inline int pmdp_clear_flush_young(struct vm_area_struct *vma,
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 #endif
 
+#ifndef arch_has_hw_pte_young
+/*
+ * Return whether the accessed bit is supported on the local CPU.
+ *
+ * This stub assumes accessing through an old PTE triggers a page fault.
+ * Architectures that automatically set the access bit should overwrite it.
+ */
+static inline bool arch_has_hw_pte_young(void)
+{
+	return false;
+}
+#endif
+
 #ifndef __HAVE_ARCH_PTEP_GET_AND_CLEAR
 static inline pte_t ptep_get_and_clear(struct mm_struct *mm,
 				       unsigned long address,
diff --git a/mm/memory.c b/mm/memory.c
index 7a089145cad4..49500390b91b 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -125,18 +125,6 @@ int randomize_va_space __read_mostly =
 					2;
 #endif
 
-#ifndef arch_faults_on_old_pte
-static inline bool arch_faults_on_old_pte(void)
-{
-	/*
-	 * Those arches which don't have hw access flag feature need to
-	 * implement their own helper. By default, "true" means pagefault
-	 * will be hit on old pte.
-	 */
-	return true;
-}
-#endif
-
 #ifndef arch_wants_old_prefaulted_pte
 static inline bool arch_wants_old_prefaulted_pte(void)
 {
@@ -2862,7 +2850,7 @@ static inline bool __wp_page_copy_user(struct page *dst, struct page *src,
 	 * On architectures with software "accessed" bits, we would
 	 * take a double page fault, so mark it accessed here.
 	 */
-	if (arch_faults_on_old_pte() && !pte_young(vmf->orig_pte)) {
+	if (!arch_has_hw_pte_young() && !pte_young(vmf->orig_pte)) {
 		pte_t entry;
 
 		vmf->pte = pte_offset_map_lock(mm, vmf->pmd, addr, &vmf->ptl);
-- 
2.37.0.rc0.161.g10f37bed90-goog


WARNING: multiple messages have this Message-ID (diff)
From: Yu Zhao <yuzhao@google.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: "Andi Kleen" <ak@linux.intel.com>,
	"Aneesh Kumar" <aneesh.kumar@linux.ibm.com>,
	"Catalin Marinas" <catalin.marinas@arm.com>,
	"Dave Hansen" <dave.hansen@linux.intel.com>,
	"Hillf Danton" <hdanton@sina.com>, "Jens Axboe" <axboe@kernel.dk>,
	"Johannes Weiner" <hannes@cmpxchg.org>,
	"Jonathan Corbet" <corbet@lwn.net>,
	"Linus Torvalds" <torvalds@linux-foundation.org>,
	"Matthew Wilcox" <willy@infradead.org>,
	"Mel Gorman" <mgorman@suse.de>,
	"Michael Larabel" <Michael@michaellarabel.com>,
	"Michal Hocko" <mhocko@kernel.org>,
	"Mike Rapoport" <rppt@kernel.org>,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Tejun Heo" <tj@kernel.org>, "Vlastimil Babka" <vbabka@suse.cz>,
	"Will Deacon" <will@kernel.org>,
	linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org, x86@kernel.org,
	page-reclaim@google.com, "Yu Zhao" <yuzhao@google.com>,
	"Barry Song" <baohua@kernel.org>,
	"Brian Geffon" <bgeffon@google.com>,
	"Jan Alexander Steffens" <heftig@archlinux.org>,
	"Oleksandr Natalenko" <oleksandr@natalenko.name>,
	"Steven Barrett" <steven@liquorix.net>,
	"Suleiman Souhlal" <suleiman@google.com>,
	"Daniel Byrne" <djbyrne@mtu.edu>,
	"Donald Carr" <d@chaos-reins.com>,
	"Holger Hoffstätte" <holger@applied-asynchrony.com>,
	"Konstantin Kharlamov" <Hi-Angel@yandex.ru>,
	"Shuang Zhai" <szhai2@cs.rochester.edu>,
	"Sofia Trinh" <sofia.trinh@edi.works>,
	"Vaibhav Jain" <vaibhav@linux.ibm.com>
Subject: [PATCH v13 01/14] mm: x86, arm64: add arch_has_hw_pte_young()
Date: Wed,  6 Jul 2022 16:00:10 -0600	[thread overview]
Message-ID: <20220706220022.968789-2-yuzhao@google.com> (raw)
In-Reply-To: <20220706220022.968789-1-yuzhao@google.com>

Some architectures automatically set the accessed bit in PTEs, e.g.,
x86 and arm64 v8.2. On architectures that do not have this capability,
clearing the accessed bit in a PTE usually triggers a page fault
following the TLB miss of this PTE (to emulate the accessed bit).

Being aware of this capability can help make better decisions, e.g.,
whether to spread the work out over a period of time to reduce bursty
page faults when trying to clear the accessed bit in many PTEs.

Note that theoretically this capability can be unreliable, e.g.,
hotplugged CPUs might be different from builtin ones. Therefore it
should not be used in architecture-independent code that involves
correctness, e.g., to determine whether TLB flushes are required (in
combination with the accessed bit).

Signed-off-by: Yu Zhao <yuzhao@google.com>
Reviewed-by: Barry Song <baohua@kernel.org>
Acked-by: Brian Geffon <bgeffon@google.com>
Acked-by: Jan Alexander Steffens (heftig) <heftig@archlinux.org>
Acked-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Acked-by: Steven Barrett <steven@liquorix.net>
Acked-by: Suleiman Souhlal <suleiman@google.com>
Acked-by: Will Deacon <will@kernel.org>
Tested-by: Daniel Byrne <djbyrne@mtu.edu>
Tested-by: Donald Carr <d@chaos-reins.com>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Tested-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>
Tested-by: Shuang Zhai <szhai2@cs.rochester.edu>
Tested-by: Sofia Trinh <sofia.trinh@edi.works>
Tested-by: Vaibhav Jain <vaibhav@linux.ibm.com>
---
 arch/arm64/include/asm/pgtable.h | 15 ++-------------
 arch/x86/include/asm/pgtable.h   |  6 +++---
 include/linux/pgtable.h          | 13 +++++++++++++
 mm/memory.c                      | 14 +-------------
 4 files changed, 19 insertions(+), 29 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 0b6632f18364..c46399c0500c 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -1066,24 +1066,13 @@ static inline void update_mmu_cache(struct vm_area_struct *vma,
  * page after fork() + CoW for pfn mappings. We don't always have a
  * hardware-managed access flag on arm64.
  */
-static inline bool arch_faults_on_old_pte(void)
-{
-	/* The register read below requires a stable CPU to make any sense */
-	cant_migrate();
-
-	return !cpu_has_hw_af();
-}
-#define arch_faults_on_old_pte		arch_faults_on_old_pte
+#define arch_has_hw_pte_young		cpu_has_hw_af
 
 /*
  * Experimentally, it's cheap to set the access flag in hardware and we
  * benefit from prefaulting mappings as 'old' to start with.
  */
-static inline bool arch_wants_old_prefaulted_pte(void)
-{
-	return !arch_faults_on_old_pte();
-}
-#define arch_wants_old_prefaulted_pte	arch_wants_old_prefaulted_pte
+#define arch_wants_old_prefaulted_pte	cpu_has_hw_af
 
 static inline bool pud_sect_supported(void)
 {
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 44e2d6f1dbaa..dc5f7d8ef68a 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1431,10 +1431,10 @@ static inline bool arch_has_pfn_modify_check(void)
 	return boot_cpu_has_bug(X86_BUG_L1TF);
 }
 
-#define arch_faults_on_old_pte arch_faults_on_old_pte
-static inline bool arch_faults_on_old_pte(void)
+#define arch_has_hw_pte_young arch_has_hw_pte_young
+static inline bool arch_has_hw_pte_young(void)
 {
-	return false;
+	return true;
 }
 
 #ifdef CONFIG_PAGE_TABLE_CHECK
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index 3cdc16cfd867..8eee31bc9bde 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -260,6 +260,19 @@ static inline int pmdp_clear_flush_young(struct vm_area_struct *vma,
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 #endif
 
+#ifndef arch_has_hw_pte_young
+/*
+ * Return whether the accessed bit is supported on the local CPU.
+ *
+ * This stub assumes accessing through an old PTE triggers a page fault.
+ * Architectures that automatically set the access bit should overwrite it.
+ */
+static inline bool arch_has_hw_pte_young(void)
+{
+	return false;
+}
+#endif
+
 #ifndef __HAVE_ARCH_PTEP_GET_AND_CLEAR
 static inline pte_t ptep_get_and_clear(struct mm_struct *mm,
 				       unsigned long address,
diff --git a/mm/memory.c b/mm/memory.c
index 7a089145cad4..49500390b91b 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -125,18 +125,6 @@ int randomize_va_space __read_mostly =
 					2;
 #endif
 
-#ifndef arch_faults_on_old_pte
-static inline bool arch_faults_on_old_pte(void)
-{
-	/*
-	 * Those arches which don't have hw access flag feature need to
-	 * implement their own helper. By default, "true" means pagefault
-	 * will be hit on old pte.
-	 */
-	return true;
-}
-#endif
-
 #ifndef arch_wants_old_prefaulted_pte
 static inline bool arch_wants_old_prefaulted_pte(void)
 {
@@ -2862,7 +2850,7 @@ static inline bool __wp_page_copy_user(struct page *dst, struct page *src,
 	 * On architectures with software "accessed" bits, we would
 	 * take a double page fault, so mark it accessed here.
 	 */
-	if (arch_faults_on_old_pte() && !pte_young(vmf->orig_pte)) {
+	if (!arch_has_hw_pte_young() && !pte_young(vmf->orig_pte)) {
 		pte_t entry;
 
 		vmf->pte = pte_offset_map_lock(mm, vmf->pmd, addr, &vmf->ptl);
-- 
2.37.0.rc0.161.g10f37bed90-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2022-07-06 22:01 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-06 22:00 [PATCH v13 00/14] Multi-Gen LRU Framework Yu Zhao
2022-07-06 22:00 ` Yu Zhao
2022-07-06 22:00 ` Yu Zhao [this message]
2022-07-06 22:00   ` [PATCH v13 01/14] mm: x86, arm64: add arch_has_hw_pte_young() Yu Zhao
2022-07-06 22:00 ` [PATCH v13 02/14] mm: x86: add CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG Yu Zhao
2022-07-06 22:00   ` Yu Zhao
2022-07-06 22:00 ` [PATCH v13 03/14] mm/vmscan.c: refactor shrink_node() Yu Zhao
2022-07-06 22:00   ` Yu Zhao
2022-07-06 22:00 ` [PATCH v13 04/14] Revert "include/linux/mm_inline.h: fold __update_lru_size() into its sole caller" Yu Zhao
2022-07-06 22:00   ` Yu Zhao
2022-07-06 22:00 ` [PATCH v13 05/14] mm: multi-gen LRU: groundwork Yu Zhao
2022-07-06 22:00   ` Yu Zhao
2022-07-06 22:00 ` [PATCH v13 06/14] mm: multi-gen LRU: minimal implementation Yu Zhao
2022-07-06 22:00   ` Yu Zhao
2022-07-06 22:00 ` [PATCH v13 07/14] mm: multi-gen LRU: exploit locality in rmap Yu Zhao
2022-07-06 22:00   ` Yu Zhao
2022-07-06 22:00 ` [PATCH v13 08/14] mm: multi-gen LRU: support page table walks Yu Zhao
2022-07-06 22:00   ` Yu Zhao
2022-07-06 22:26   ` Yu Zhao
2022-07-06 22:26     ` Yu Zhao
2022-07-07  1:23     ` Liam Howlett
2022-07-07  1:23       ` Liam Howlett
2022-07-06 22:00 ` [PATCH v13 09/14] mm: multi-gen LRU: optimize multiple memcgs Yu Zhao
2022-07-06 22:00   ` Yu Zhao
2022-07-06 22:00 ` [PATCH v13 10/14] mm: multi-gen LRU: kill switch Yu Zhao
2022-07-06 22:00   ` Yu Zhao
2022-07-06 22:00 ` [PATCH v13 11/14] mm: multi-gen LRU: thrashing prevention Yu Zhao
2022-07-06 22:00   ` Yu Zhao
2022-07-06 22:00 ` [PATCH v13 12/14] mm: multi-gen LRU: debugfs interface Yu Zhao
2022-07-06 22:00   ` Yu Zhao
2022-07-06 22:00 ` [PATCH v13 13/14] mm: multi-gen LRU: admin guide Yu Zhao
2022-07-06 22:00   ` Yu Zhao
2022-07-10  2:20   ` Bagas Sanjaya
2022-07-10  2:20     ` Bagas Sanjaya
2022-07-06 22:00 ` [PATCH v13 14/14] mm: multi-gen LRU: design doc Yu Zhao
2022-07-06 22:00   ` Yu Zhao
2022-07-10  2:21   ` Bagas Sanjaya
2022-07-10  2:21     ` Bagas Sanjaya
2022-07-07  3:03 ` [PATCH v13 00/14] Multi-Gen LRU Framework Bagas Sanjaya
2022-07-07  3:03   ` Bagas Sanjaya
2022-07-09 19:31   ` Yu Zhao
2022-07-09 19:31     ` Yu Zhao
2022-07-10  1:09     ` Bagas Sanjaya
2022-07-10  1:09       ` Bagas Sanjaya

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220706220022.968789-2-yuzhao@google.com \
    --to=yuzhao@google.com \
    --cc=Hi-Angel@yandex.ru \
    --cc=Michael@michaellarabel.com \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=axboe@kernel.dk \
    --cc=baohua@kernel.org \
    --cc=bgeffon@google.com \
    --cc=catalin.marinas@arm.com \
    --cc=corbet@lwn.net \
    --cc=d@chaos-reins.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=djbyrne@mtu.edu \
    --cc=hannes@cmpxchg.org \
    --cc=hdanton@sina.com \
    --cc=heftig@archlinux.org \
    --cc=holger@applied-asynchrony.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@kernel.org \
    --cc=oleksandr@natalenko.name \
    --cc=page-reclaim@google.com \
    --cc=peterz@infradead.org \
    --cc=rppt@kernel.org \
    --cc=sofia.trinh@edi.works \
    --cc=steven@liquorix.net \
    --cc=suleiman@google.com \
    --cc=szhai2@cs.rochester.edu \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=vaibhav@linux.ibm.com \
    --cc=vbabka@suse.cz \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.