linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V3 0/3] Numabalancing preserve write fix
@ 2017-02-19 10:03 Aneesh Kumar K.V
  2017-02-19 10:03 ` [PATCH V3 1/3] mm/autonuma: Let architecture override how the write bit should be stashed in a protnone pte Aneesh Kumar K.V
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Aneesh Kumar K.V @ 2017-02-19 10:03 UTC (permalink / raw)
  To: akpm, Rik van Riel, Mel Gorman, paulus, benh
  Cc: linux-mm, linux-kernel, linuxppc-dev, Aneesh Kumar K.V

This patch series address an issue w.r.t THP migration and autonuma
preserve write feature. migrate_misplaced_transhuge_page() cannot deal with
concurrent modification of the page. It does a page copy without
following the migration pte sequence. IIUC, this was done to keep the
migration simpler and at the time of implemenation we didn't had THP
page cache which would have required a more elaborate migration scheme.
That means thp autonuma migration expect the protnone with saved write
to be done such that both kernel and user cannot update
the page content. This patch series enables archs like ppc64 to do that.
We are good with the hash translation mode with the current code,
because we never create a hardware page table entry for a protnone pte. 

Changes form V2:
* Fix kvm crashes due to ksm not clearing savedwrite bit.

Changes from V1:
* Update the patch so that it apply cleanly to upstream.
* Add acked-by from Michael Neuling

Aneesh Kumar K.V (3):
  mm/autonuma: Let architecture override how the write bit should be
    stashed in a protnone pte.
  mm/ksm: Handle protnone saved writes when making page write protect
  powerpc/mm/autonuma: Switch ppc64 to its own implementeation of saved
    write

 arch/powerpc/include/asm/book3s/64/pgtable.h | 52 ++++++++++++++++++++++++----
 include/asm-generic/pgtable.h                | 24 +++++++++++++
 mm/huge_memory.c                             |  6 ++--
 mm/ksm.c                                     |  9 +++--
 mm/memory.c                                  |  2 +-
 mm/mprotect.c                                |  4 +--
 6 files changed, 82 insertions(+), 15 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH V3 1/3] mm/autonuma: Let architecture override how the write bit should be stashed in a protnone pte.
  2017-02-19 10:03 [PATCH V3 0/3] Numabalancing preserve write fix Aneesh Kumar K.V
@ 2017-02-19 10:03 ` Aneesh Kumar K.V
  2017-02-19 10:03 ` [PATCH V3 2/3] mm/ksm: Handle protnone saved writes when making page write protect Aneesh Kumar K.V
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Aneesh Kumar K.V @ 2017-02-19 10:03 UTC (permalink / raw)
  To: akpm, Rik van Riel, Mel Gorman, paulus, benh
  Cc: linux-mm, linux-kernel, linuxppc-dev, Aneesh Kumar K.V

Autonuma preserves the write permission across numa fault to avoid taking
a writefault after a numa fault (Commit: b191f9b106ea " mm: numa: preserve PTE
write permissions across a NUMA hinting fault"). Architecture can implement
protnone in different ways and some may choose to implement that by clearing Read/
Write/Exec bit of pte. Setting the write bit on such pte can result in wrong
behaviour. Fix this up by allowing arch to override how to save the write bit
on a protnone pte.

Acked-By: Michael Neuling <mikey@neuling.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 include/asm-generic/pgtable.h | 16 ++++++++++++++++
 mm/huge_memory.c              |  6 +++---
 mm/memory.c                   |  2 +-
 mm/mprotect.c                 |  4 ++--
 4 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index 18af2bcefe6a..b6f3a8a4b738 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -192,6 +192,22 @@ static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addres
 }
 #endif
 
+#ifndef pte_savedwrite
+#define pte_savedwrite pte_write
+#endif
+
+#ifndef pte_mk_savedwrite
+#define pte_mk_savedwrite pte_mkwrite
+#endif
+
+#ifndef pmd_savedwrite
+#define pmd_savedwrite pmd_write
+#endif
+
+#ifndef pmd_mk_savedwrite
+#define pmd_mk_savedwrite pmd_mkwrite
+#endif
+
 #ifndef __HAVE_ARCH_PMDP_SET_WRPROTECT
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 static inline void pmdp_set_wrprotect(struct mm_struct *mm,
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 8f1d93257fb9..e6de801fa477 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1253,7 +1253,7 @@ int do_huge_pmd_numa_page(struct vm_fault *vmf, pmd_t pmd)
 	}
 
 	/* See similar comment in do_numa_page for explanation */
-	if (!pmd_write(pmd))
+	if (!pmd_savedwrite(pmd))
 		flags |= TNF_NO_GROUP;
 
 	/*
@@ -1316,7 +1316,7 @@ int do_huge_pmd_numa_page(struct vm_fault *vmf, pmd_t pmd)
 	goto out;
 clear_pmdnuma:
 	BUG_ON(!PageLocked(page));
-	was_writable = pmd_write(pmd);
+	was_writable = pmd_savedwrite(pmd);
 	pmd = pmd_modify(pmd, vma->vm_page_prot);
 	pmd = pmd_mkyoung(pmd);
 	if (was_writable)
@@ -1571,7 +1571,7 @@ int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
 			entry = pmdp_huge_get_and_clear_notify(mm, addr, pmd);
 			entry = pmd_modify(entry, newprot);
 			if (preserve_write)
-				entry = pmd_mkwrite(entry);
+				entry = pmd_mk_savedwrite(entry);
 			ret = HPAGE_PMD_NR;
 			set_pmd_at(mm, addr, pmd, entry);
 			BUG_ON(vma_is_anonymous(vma) && !preserve_write &&
diff --git a/mm/memory.c b/mm/memory.c
index 6bf2b471e30c..641b83dbff60 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3388,7 +3388,7 @@ static int do_numa_page(struct vm_fault *vmf)
 	int target_nid;
 	bool migrated = false;
 	pte_t pte = vmf->orig_pte;
-	bool was_writable = pte_write(pte);
+	bool was_writable = pte_savedwrite(pte);
 	int flags = 0;
 
 	/*
diff --git a/mm/mprotect.c b/mm/mprotect.c
index f9c07f54dd62..15f5c174a7c1 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -113,13 +113,13 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
 			ptent = ptep_modify_prot_start(mm, addr, pte);
 			ptent = pte_modify(ptent, newprot);
 			if (preserve_write)
-				ptent = pte_mkwrite(ptent);
+				ptent = pte_mk_savedwrite(ptent);
 
 			/* Avoid taking write faults for known dirty pages */
 			if (dirty_accountable && pte_dirty(ptent) &&
 					(pte_soft_dirty(ptent) ||
 					 !(vma->vm_flags & VM_SOFTDIRTY))) {
-				ptent = pte_mkwrite(ptent);
+				ptent = pte_mk_savedwrite(ptent);
 			}
 			ptep_modify_prot_commit(mm, addr, pte, ptent);
 			pages++;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH V3 2/3] mm/ksm: Handle protnone saved writes when making page write protect
  2017-02-19 10:03 [PATCH V3 0/3] Numabalancing preserve write fix Aneesh Kumar K.V
  2017-02-19 10:03 ` [PATCH V3 1/3] mm/autonuma: Let architecture override how the write bit should be stashed in a protnone pte Aneesh Kumar K.V
@ 2017-02-19 10:03 ` Aneesh Kumar K.V
  2017-02-19 10:03 ` [PATCH V3 3/3] powerpc/mm/autonuma: Switch ppc64 to its own implementeation of saved write Aneesh Kumar K.V
  2017-02-19 10:25 ` [PATCH V3 0/3] Numabalancing preserve write fix Aneesh Kumar K.V
  3 siblings, 0 replies; 5+ messages in thread
From: Aneesh Kumar K.V @ 2017-02-19 10:03 UTC (permalink / raw)
  To: akpm, Rik van Riel, Mel Gorman, paulus, benh
  Cc: linux-mm, linux-kernel, linuxppc-dev, Aneesh Kumar K.V

Without this KSM will consider the page write protected, but a numa fault can
later mark the page writable. This can result in memory corruption.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 include/asm-generic/pgtable.h | 8 ++++++++
 mm/ksm.c                      | 9 +++++++--
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index b6f3a8a4b738..8c8ba48bef0b 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -200,6 +200,10 @@ static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addres
 #define pte_mk_savedwrite pte_mkwrite
 #endif
 
+#ifndef pte_clear_savedwrite
+#define pte_clear_savedwrite pte_wrprotect
+#endif
+
 #ifndef pmd_savedwrite
 #define pmd_savedwrite pmd_write
 #endif
@@ -208,6 +212,10 @@ static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addres
 #define pmd_mk_savedwrite pmd_mkwrite
 #endif
 
+#ifndef pmd_clear_savedwrite
+#define pmd_clear_savedwrite pmd_wrprotect
+#endif
+
 #ifndef __HAVE_ARCH_PMDP_SET_WRPROTECT
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 static inline void pmdp_set_wrprotect(struct mm_struct *mm,
diff --git a/mm/ksm.c b/mm/ksm.c
index 9ae6011a41f8..768202831578 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -872,7 +872,8 @@ static int write_protect_page(struct vm_area_struct *vma, struct page *page,
 	if (!ptep)
 		goto out_mn;
 
-	if (pte_write(*ptep) || pte_dirty(*ptep)) {
+	if (pte_write(*ptep) || pte_dirty(*ptep) ||
+	    (pte_protnone(*ptep) && pte_savedwrite(*ptep))) {
 		pte_t entry;
 
 		swapped = PageSwapCache(page);
@@ -897,7 +898,11 @@ static int write_protect_page(struct vm_area_struct *vma, struct page *page,
 		}
 		if (pte_dirty(entry))
 			set_page_dirty(page);
-		entry = pte_mkclean(pte_wrprotect(entry));
+
+		if (pte_protnone(entry))
+			entry = pte_mkclean(pte_clear_savedwrite(entry));
+		else
+			entry = pte_mkclean(pte_wrprotect(entry));
 		set_pte_at_notify(mm, addr, ptep, entry);
 	}
 	*orig_pte = *ptep;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH V3 3/3] powerpc/mm/autonuma: Switch ppc64 to its own implementeation of saved write
  2017-02-19 10:03 [PATCH V3 0/3] Numabalancing preserve write fix Aneesh Kumar K.V
  2017-02-19 10:03 ` [PATCH V3 1/3] mm/autonuma: Let architecture override how the write bit should be stashed in a protnone pte Aneesh Kumar K.V
  2017-02-19 10:03 ` [PATCH V3 2/3] mm/ksm: Handle protnone saved writes when making page write protect Aneesh Kumar K.V
@ 2017-02-19 10:03 ` Aneesh Kumar K.V
  2017-02-19 10:25 ` [PATCH V3 0/3] Numabalancing preserve write fix Aneesh Kumar K.V
  3 siblings, 0 replies; 5+ messages in thread
From: Aneesh Kumar K.V @ 2017-02-19 10:03 UTC (permalink / raw)
  To: akpm, Rik van Riel, Mel Gorman, paulus, benh
  Cc: linux-mm, linux-kernel, linuxppc-dev, Aneesh Kumar K.V

With this our protnone becomes a present pte with READ/WRITE/EXEC bit cleared.
By default we also set _PAGE_PRIVILEGED on such pte. This is now used to help
us identify a protnone pte that as saved write bit. For such pte, we will clear
the _PAGE_PRIVILEGED bit. The pte still remain non-accessible from both user
and kernel.

Acked-By: Michael Neuling <mikey@neuling.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/book3s/64/pgtable.h | 52 ++++++++++++++++++++++++----
 1 file changed, 45 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 6a55bbe91556..d87bee85fc44 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -1,6 +1,9 @@
 #ifndef _ASM_POWERPC_BOOK3S_64_PGTABLE_H_
 #define _ASM_POWERPC_BOOK3S_64_PGTABLE_H_
 
+#ifndef __ASSEMBLY__
+#include <linux/mmdebug.h>
+#endif
 /*
  * Common bits between hash and Radix page table
  */
@@ -428,15 +431,47 @@ static inline pte_t pte_clear_soft_dirty(pte_t pte)
 #endif /* CONFIG_HAVE_ARCH_SOFT_DIRTY */
 
 #ifdef CONFIG_NUMA_BALANCING
-/*
- * These work without NUMA balancing but the kernel does not care. See the
- * comment in include/asm-generic/pgtable.h . On powerpc, this will only
- * work for user pages and always return true for kernel pages.
- */
 static inline int pte_protnone(pte_t pte)
 {
-	return (pte_raw(pte) & cpu_to_be64(_PAGE_PRESENT | _PAGE_PRIVILEGED)) ==
-		cpu_to_be64(_PAGE_PRESENT | _PAGE_PRIVILEGED);
+	return (pte_raw(pte) & cpu_to_be64(_PAGE_PRESENT | _PAGE_PTE | _PAGE_RWX)) ==
+		cpu_to_be64(_PAGE_PRESENT | _PAGE_PTE);
+}
+
+#define pte_mk_savedwrite pte_mk_savedwrite
+static inline pte_t pte_mk_savedwrite(pte_t pte)
+{
+	/*
+	 * Used by Autonuma subsystem to preserve the write bit
+	 * while marking the pte PROT_NONE. Only allow this
+	 * on PROT_NONE pte
+	 */
+	VM_BUG_ON((pte_raw(pte) & cpu_to_be64(_PAGE_PRESENT | _PAGE_RWX | _PAGE_PRIVILEGED)) !=
+		  cpu_to_be64(_PAGE_PRESENT | _PAGE_PRIVILEGED));
+	return __pte(pte_val(pte) & ~_PAGE_PRIVILEGED);
+}
+
+#define pte_clear_savedwrite pte_clear_savedwrite
+static inline pte_t pte_clear_savedwrite(pte_t pte)
+{
+	/*
+	 * Used by KSM subsystem to make a protnone pte readonly.
+	 */
+	VM_BUG_ON(!pte_protnone(pte));
+	return __pte(pte_val(pte) | _PAGE_PRIVILEGED);
+}
+
+#define pte_savedwrite pte_savedwrite
+static inline bool pte_savedwrite(pte_t pte)
+{
+	/*
+	 * Saved write ptes are prot none ptes that doesn't have
+	 * privileged bit sit. We mark prot none as one which has
+	 * present and pviliged bit set and RWX cleared. To mark
+	 * protnone which used to have _PAGE_WRITE set we clear
+	 * the privileged bit.
+	 */
+	VM_BUG_ON(!pte_protnone(pte));
+	return !(pte_raw(pte) & cpu_to_be64(_PAGE_RWX | _PAGE_PRIVILEGED));
 }
 #endif /* CONFIG_NUMA_BALANCING */
 
@@ -867,6 +902,8 @@ static inline pte_t *pmdp_ptep(pmd_t *pmd)
 #define pmd_mkclean(pmd)	pte_pmd(pte_mkclean(pmd_pte(pmd)))
 #define pmd_mkyoung(pmd)	pte_pmd(pte_mkyoung(pmd_pte(pmd)))
 #define pmd_mkwrite(pmd)	pte_pmd(pte_mkwrite(pmd_pte(pmd)))
+#define pmd_mk_savedwrite(pmd)	pte_pmd(pte_mk_savedwrite(pmd_pte(pmd)))
+#define pmd_clear_savedwrite(pmd)	pte_pmd(pte_clear_savedwrite(pmd_pte(pmd)))
 
 #ifdef CONFIG_HAVE_ARCH_SOFT_DIRTY
 #define pmd_soft_dirty(pmd)    pte_soft_dirty(pmd_pte(pmd))
@@ -883,6 +920,7 @@ static inline int pmd_protnone(pmd_t pmd)
 
 #define __HAVE_ARCH_PMD_WRITE
 #define pmd_write(pmd)		pte_write(pmd_pte(pmd))
+#define pmd_savedwrite(pmd)	pte_savedwrite(pmd_pte(pmd))
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 extern pmd_t pfn_pmd(unsigned long pfn, pgprot_t pgprot);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH V3 0/3] Numabalancing preserve write fix
  2017-02-19 10:03 [PATCH V3 0/3] Numabalancing preserve write fix Aneesh Kumar K.V
                   ` (2 preceding siblings ...)
  2017-02-19 10:03 ` [PATCH V3 3/3] powerpc/mm/autonuma: Switch ppc64 to its own implementeation of saved write Aneesh Kumar K.V
@ 2017-02-19 10:25 ` Aneesh Kumar K.V
  3 siblings, 0 replies; 5+ messages in thread
From: Aneesh Kumar K.V @ 2017-02-19 10:25 UTC (permalink / raw)
  To: akpm, Rik van Riel, Mel Gorman, paulus, benh
  Cc: linux-mm, linux-kernel, linuxppc-dev


I am not sure whether we want to merge this debug patch. This will help
us in identifying wrong pte_wrprotect usage in the kernel.

>From a0fbbbbb302fd204159a1327b67decb8f14ffa21 Mon Sep 17 00:00:00 2001
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Date: Sat, 18 Feb 2017 10:39:47 +0530
Subject: [PATCH] powerpc/autonuma: Add debug check for wrong writable pte
 check

With ppc64, protnone ptes don't use _PAGE_WRITE bit for savedwrite. Hence
we need to make sure we don't do pte_write* functions on protnone ptes.
Add debug check to catch wrong usage.

This should be only used for debugging and can give wrong results w.r.t change
bit on radix. Even on hash with kvm we will insert the page table entry in
guest hash page table with write bit set, even if the pte is marked protnone.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/book3s/64/pgtable.h | 130 +++++++++++++++++----------
 1 file changed, 85 insertions(+), 45 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
index d87bee85fc44..1c99deac3966 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -341,10 +341,36 @@ static inline int __ptep_test_and_clear_young(struct mm_struct *mm,
 	__r;							\
 })
 
+#undef SAVED_WRITE_DEBUG
+#ifdef CONFIG_NUMA_BALANCING
+static inline int pte_protnone(pte_t pte)
+{
+	/*
+	 * We want to catch wrong usage of pte_write w.r.t protnone ptes.
+	 * The way we do that is to make saved write as _PAGE_WRITE for hash
+	 * translation mode. This only will work with hash translation mode.
+	 */
+#ifdef SAVED_WRITE_DEBUG
+	if (!radix_enabled())
+		return (pte_raw(pte) & cpu_to_be64(_PAGE_PRESENT | _PAGE_PRIVILEGED)) ==
+			cpu_to_be64(_PAGE_PRESENT | _PAGE_PRIVILEGED);
+#endif
+	return (pte_raw(pte) & cpu_to_be64(_PAGE_PRESENT | _PAGE_PTE | _PAGE_RWX)) ==
+		cpu_to_be64(_PAGE_PRESENT | _PAGE_PTE);
+}
+#endif
+
 #define __HAVE_ARCH_PTEP_SET_WRPROTECT
 static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr,
 				      pte_t *ptep)
 {
+#ifdef SAVED_WRITE_DEBUG
+	/*
+	 * Cannot use this with protnone pte, For protnone, writes
+	 * will be marked via savedwrite bit.
+	 */
+	VM_WARN_ON(pte_protnone(*ptep));
+#endif
 	if ((pte_raw(*ptep) & cpu_to_be64(_PAGE_WRITE)) == 0)
 		return;
 
@@ -430,51 +456,6 @@ static inline pte_t pte_clear_soft_dirty(pte_t pte)
 }
 #endif /* CONFIG_HAVE_ARCH_SOFT_DIRTY */
 
-#ifdef CONFIG_NUMA_BALANCING
-static inline int pte_protnone(pte_t pte)
-{
-	return (pte_raw(pte) & cpu_to_be64(_PAGE_PRESENT | _PAGE_PTE | _PAGE_RWX)) ==
-		cpu_to_be64(_PAGE_PRESENT | _PAGE_PTE);
-}
-
-#define pte_mk_savedwrite pte_mk_savedwrite
-static inline pte_t pte_mk_savedwrite(pte_t pte)
-{
-	/*
-	 * Used by Autonuma subsystem to preserve the write bit
-	 * while marking the pte PROT_NONE. Only allow this
-	 * on PROT_NONE pte
-	 */
-	VM_BUG_ON((pte_raw(pte) & cpu_to_be64(_PAGE_PRESENT | _PAGE_RWX | _PAGE_PRIVILEGED)) !=
-		  cpu_to_be64(_PAGE_PRESENT | _PAGE_PRIVILEGED));
-	return __pte(pte_val(pte) & ~_PAGE_PRIVILEGED);
-}
-
-#define pte_clear_savedwrite pte_clear_savedwrite
-static inline pte_t pte_clear_savedwrite(pte_t pte)
-{
-	/*
-	 * Used by KSM subsystem to make a protnone pte readonly.
-	 */
-	VM_BUG_ON(!pte_protnone(pte));
-	return __pte(pte_val(pte) | _PAGE_PRIVILEGED);
-}
-
-#define pte_savedwrite pte_savedwrite
-static inline bool pte_savedwrite(pte_t pte)
-{
-	/*
-	 * Saved write ptes are prot none ptes that doesn't have
-	 * privileged bit sit. We mark prot none as one which has
-	 * present and pviliged bit set and RWX cleared. To mark
-	 * protnone which used to have _PAGE_WRITE set we clear
-	 * the privileged bit.
-	 */
-	VM_BUG_ON(!pte_protnone(pte));
-	return !(pte_raw(pte) & cpu_to_be64(_PAGE_RWX | _PAGE_PRIVILEGED));
-}
-#endif /* CONFIG_NUMA_BALANCING */
-
 static inline int pte_present(pte_t pte)
 {
 	return !!(pte_raw(pte) & cpu_to_be64(_PAGE_PRESENT));
@@ -500,6 +481,14 @@ static inline unsigned long pte_pfn(pte_t pte)
 /* Generic modifiers for PTE bits */
 static inline pte_t pte_wrprotect(pte_t pte)
 {
+
+#ifdef SAVED_WRITE_DEBUG
+	/*
+	 * Cannot use this with protnone pte, For protnone, writes
+	 * will be marked via savedwrite bit.
+	 */
+	VM_WARN_ON(pte_protnone(pte));
+#endif
 	return __pte(pte_val(pte) & ~_PAGE_WRITE);
 }
 
@@ -552,6 +541,57 @@ static inline bool pte_user(pte_t pte)
 	return !(pte_raw(pte) & cpu_to_be64(_PAGE_PRIVILEGED));
 }
 
+#ifdef CONFIG_NUMA_BALANCING
+#define pte_mk_savedwrite pte_mk_savedwrite
+static inline pte_t pte_mk_savedwrite(pte_t pte)
+{
+#ifdef SAVED_WRITE_DEBUG
+	if (!radix_enabled())
+		return pte_mkwrite(pte);
+#endif
+	/*
+	 * Used by Autonuma subsystem to preserve the write bit
+	 * while marking the pte PROT_NONE. Only allow this
+	 * on PROT_NONE pte
+	 */
+	VM_BUG_ON((pte_raw(pte) & cpu_to_be64(_PAGE_PRESENT | _PAGE_RWX | _PAGE_PRIVILEGED)) !=
+		  cpu_to_be64(_PAGE_PRESENT | _PAGE_PRIVILEGED));
+	return __pte(pte_val(pte) & ~_PAGE_PRIVILEGED);
+}
+
+#define pte_clear_savedwrite pte_clear_savedwrite
+static inline pte_t pte_clear_savedwrite(pte_t pte)
+{
+	/*
+	 * Used by KSM subsystem to make a protnone pte readonly.
+	 */
+	VM_BUG_ON(!pte_protnone(pte));
+#ifdef SAVED_WRITE_DEBUG
+	if (!radix_enabled())
+		return __pte(pte_val(pte) & ~_PAGE_WRITE);
+#endif
+	return __pte(pte_val(pte) | _PAGE_PRIVILEGED);
+}
+
+#define pte_savedwrite pte_savedwrite
+static inline bool pte_savedwrite(pte_t pte)
+{
+	/*
+	 * Saved write ptes are prot none ptes that doesn't have
+	 * privileged bit sit. We mark prot none as one which has
+	 * present and pviliged bit set and RWX cleared. To mark
+	 * protnone which used to have _PAGE_WRITE set we clear
+	 * the privileged bit.
+	 */
+	VM_BUG_ON(!pte_protnone(pte));
+#ifdef SAVED_WRITE_DEBUG
+	if (!radix_enabled())
+		return pte_write(pte);
+#endif
+	return !(pte_raw(pte) & cpu_to_be64(_PAGE_RWX | _PAGE_PRIVILEGED));
+}
+#endif /* CONFIG_NUMA_BALANCING */
+
 /* Encode and de-code a swap entry */
 #define MAX_SWAPFILES_CHECK() do { \
 	BUILD_BUG_ON(MAX_SWAPFILES_SHIFT > SWP_TYPE_BITS); \
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-02-19 10:26 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-19 10:03 [PATCH V3 0/3] Numabalancing preserve write fix Aneesh Kumar K.V
2017-02-19 10:03 ` [PATCH V3 1/3] mm/autonuma: Let architecture override how the write bit should be stashed in a protnone pte Aneesh Kumar K.V
2017-02-19 10:03 ` [PATCH V3 2/3] mm/ksm: Handle protnone saved writes when making page write protect Aneesh Kumar K.V
2017-02-19 10:03 ` [PATCH V3 3/3] powerpc/mm/autonuma: Switch ppc64 to its own implementeation of saved write Aneesh Kumar K.V
2017-02-19 10:25 ` [PATCH V3 0/3] Numabalancing preserve write fix Aneesh Kumar K.V

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).