linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V5 0/5] NestMMU pte upgrade workaround for mprotect
@ 2019-01-16  8:50 Aneesh Kumar K.V
  2019-01-16  8:50 ` [PATCH V5 1/5] mm: Update ptep_modify_prot_start/commit to take vm_area_struct as arg Aneesh Kumar K.V
                   ` (7 more replies)
  0 siblings, 8 replies; 18+ messages in thread
From: Aneesh Kumar K.V @ 2019-01-16  8:50 UTC (permalink / raw)
  To: npiggin, benh, paulus, mpe, akpm, x86
  Cc: linuxppc-dev, linux-mm, Aneesh Kumar K.V

We can upgrade pte access (R -> RW transition) via mprotect. We need
to make sure we follow the recommended pte update sequence as outlined in
commit bd5050e38aec ("powerpc/mm/radix: Change pte relax sequence to handle nest MMU hang")
for such updates. This patch series do that.

Changes from V4:
* Drop EXPORT_SYMBOL 

Changes from V3:
* Build fix for x86

Changes from V2:
* Update commit message for patch 4
* use radix tlb flush routines directly.

Changes from V1:
* Restrict ths only for R->RW upgrade. We don't need to do this for Autonuma
* Restrict this only for radix translation mode.


Aneesh Kumar K.V (5):
  mm: Update ptep_modify_prot_start/commit to take vm_area_struct as arg
  mm: update ptep_modify_prot_commit to take old pte value as arg
  arch/powerpc/mm: Nest MMU workaround for mprotect RW upgrade.
  mm/hugetlb: Add prot_modify_start/commit sequence for hugetlb update
  arch/powerpc/mm/hugetlb: NestMMU workaround for hugetlb mprotect RW
    upgrade

 arch/powerpc/include/asm/book3s/64/hugetlb.h | 12 ++++++++++
 arch/powerpc/include/asm/book3s/64/pgtable.h | 18 ++++++++++++++
 arch/powerpc/include/asm/book3s/64/radix.h   |  4 ++++
 arch/powerpc/mm/hugetlbpage-hash64.c         | 25 ++++++++++++++++++++
 arch/powerpc/mm/hugetlbpage-radix.c          | 17 +++++++++++++
 arch/powerpc/mm/pgtable-book3s64.c           | 25 ++++++++++++++++++++
 arch/powerpc/mm/pgtable-radix.c              | 18 ++++++++++++++
 arch/s390/include/asm/pgtable.h              |  5 ++--
 arch/s390/mm/pgtable.c                       |  8 ++++---
 arch/x86/include/asm/paravirt.h              | 13 +++++-----
 arch/x86/include/asm/paravirt_types.h        |  5 ++--
 arch/x86/xen/mmu.h                           |  4 ++--
 arch/x86/xen/mmu_pv.c                        |  8 +++----
 fs/proc/task_mmu.c                           |  8 ++++---
 include/asm-generic/pgtable.h                | 18 +++++++-------
 include/linux/hugetlb.h                      | 20 ++++++++++++++++
 mm/hugetlb.c                                 |  8 ++++---
 mm/memory.c                                  |  8 +++----
 mm/mprotect.c                                |  6 ++---
 19 files changed, 189 insertions(+), 41 deletions(-)

-- 
2.20.1

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH V5 1/5] mm: Update ptep_modify_prot_start/commit to take vm_area_struct as arg
  2019-01-16  8:50 [PATCH V5 0/5] NestMMU pte upgrade workaround for mprotect Aneesh Kumar K.V
@ 2019-01-16  8:50 ` Aneesh Kumar K.V
  2019-01-30 10:33   ` Michael Ellerman
  2019-01-16  8:50 ` [PATCH V5 2/5] mm: update ptep_modify_prot_commit to take old pte value " Aneesh Kumar K.V
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: Aneesh Kumar K.V @ 2019-01-16  8:50 UTC (permalink / raw)
  To: npiggin, benh, paulus, mpe, akpm, x86
  Cc: linuxppc-dev, linux-mm, Aneesh Kumar K.V

Some architecture may want to call flush_tlb_range from these helpers.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/s390/include/asm/pgtable.h       |  4 ++--
 arch/s390/mm/pgtable.c                |  6 ++++--
 arch/x86/include/asm/paravirt.h       | 11 ++++++-----
 arch/x86/include/asm/paravirt_types.h |  5 +++--
 arch/x86/xen/mmu.h                    |  4 ++--
 arch/x86/xen/mmu_pv.c                 |  8 ++++----
 fs/proc/task_mmu.c                    |  4 ++--
 include/asm-generic/pgtable.h         | 16 ++++++++--------
 mm/memory.c                           |  4 ++--
 mm/mprotect.c                         |  4 ++--
 10 files changed, 35 insertions(+), 31 deletions(-)

diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index 063732414dfb..5d730199e37b 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -1069,8 +1069,8 @@ static inline pte_t ptep_get_and_clear(struct mm_struct *mm,
 }
 
 #define __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION
-pte_t ptep_modify_prot_start(struct mm_struct *, unsigned long, pte_t *);
-void ptep_modify_prot_commit(struct mm_struct *, unsigned long, pte_t *, pte_t);
+pte_t ptep_modify_prot_start(struct vm_area_struct *, unsigned long, pte_t *);
+void ptep_modify_prot_commit(struct vm_area_struct *, unsigned long, pte_t *, pte_t);
 
 #define __HAVE_ARCH_PTEP_CLEAR_FLUSH
 static inline pte_t ptep_clear_flush(struct vm_area_struct *vma,
diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
index f2cc7da473e4..29c0a21cd34a 100644
--- a/arch/s390/mm/pgtable.c
+++ b/arch/s390/mm/pgtable.c
@@ -301,12 +301,13 @@ pte_t ptep_xchg_lazy(struct mm_struct *mm, unsigned long addr,
 }
 EXPORT_SYMBOL(ptep_xchg_lazy);
 
-pte_t ptep_modify_prot_start(struct mm_struct *mm, unsigned long addr,
+pte_t ptep_modify_prot_start(struct vm_area_struct *vma, unsigned long addr,
 			     pte_t *ptep)
 {
 	pgste_t pgste;
 	pte_t old;
 	int nodat;
+	struct mm_struct *mm = vma->vm_mm;
 
 	preempt_disable();
 	pgste = ptep_xchg_start(mm, addr, ptep);
@@ -320,10 +321,11 @@ pte_t ptep_modify_prot_start(struct mm_struct *mm, unsigned long addr,
 }
 EXPORT_SYMBOL(ptep_modify_prot_start);
 
-void ptep_modify_prot_commit(struct mm_struct *mm, unsigned long addr,
+void ptep_modify_prot_commit(struct vm_area_struct *vma, unsigned long addr,
 			     pte_t *ptep, pte_t pte)
 {
 	pgste_t pgste;
+	struct mm_struct *mm = vma->vm_mm;
 
 	if (!MACHINE_HAS_NX)
 		pte_val(pte) &= ~_PAGE_NOEXEC;
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index a97f28d914d5..c5a7f18cce7e 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -422,25 +422,26 @@ static inline pgdval_t pgd_val(pgd_t pgd)
 }
 
 #define  __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION
-static inline pte_t ptep_modify_prot_start(struct mm_struct *mm, unsigned long addr,
+static inline pte_t ptep_modify_prot_start(struct vm_area_struct *vma, unsigned long addr,
 					   pte_t *ptep)
 {
 	pteval_t ret;
 
-	ret = PVOP_CALL3(pteval_t, mmu.ptep_modify_prot_start, mm, addr, ptep);
+	ret = PVOP_CALL3(pteval_t, mmu.ptep_modify_prot_start, vma, addr, ptep);
 
 	return (pte_t) { .pte = ret };
 }
 
-static inline void ptep_modify_prot_commit(struct mm_struct *mm, unsigned long addr,
+static inline void ptep_modify_prot_commit(struct vm_area_struct *vma, unsigned long addr,
 					   pte_t *ptep, pte_t pte)
 {
+
 	if (sizeof(pteval_t) > sizeof(long))
 		/* 5 arg words */
-		pv_ops.mmu.ptep_modify_prot_commit(mm, addr, ptep, pte);
+		pv_ops.mmu.ptep_modify_prot_commit(vma, addr, ptep, pte);
 	else
 		PVOP_VCALL4(mmu.ptep_modify_prot_commit,
-			    mm, addr, ptep, pte.pte);
+			    vma, addr, ptep, pte.pte);
 }
 
 static inline void set_pte(pte_t *ptep, pte_t pte)
diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index 488c59686a73..2474e434a6f7 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -55,6 +55,7 @@ struct task_struct;
 struct cpumask;
 struct flush_tlb_info;
 struct mmu_gather;
+struct vm_area_struct;
 
 /*
  * Wrapper type for pointers to code which uses the non-standard
@@ -254,9 +255,9 @@ struct pv_mmu_ops {
 			   pte_t *ptep, pte_t pteval);
 	void (*set_pmd)(pmd_t *pmdp, pmd_t pmdval);
 
-	pte_t (*ptep_modify_prot_start)(struct mm_struct *mm, unsigned long addr,
+	pte_t (*ptep_modify_prot_start)(struct vm_area_struct *vma, unsigned long addr,
 					pte_t *ptep);
-	void (*ptep_modify_prot_commit)(struct mm_struct *mm, unsigned long addr,
+	void (*ptep_modify_prot_commit)(struct vm_area_struct *vma, unsigned long addr,
 					pte_t *ptep, pte_t pte);
 
 	struct paravirt_callee_save pte_val;
diff --git a/arch/x86/xen/mmu.h b/arch/x86/xen/mmu.h
index a7e47cf7ec6c..6e4c6bd62203 100644
--- a/arch/x86/xen/mmu.h
+++ b/arch/x86/xen/mmu.h
@@ -17,8 +17,8 @@ bool __set_phys_to_machine(unsigned long pfn, unsigned long mfn);
 
 void set_pte_mfn(unsigned long vaddr, unsigned long pfn, pgprot_t flags);
 
-pte_t xen_ptep_modify_prot_start(struct mm_struct *mm, unsigned long addr, pte_t *ptep);
-void  xen_ptep_modify_prot_commit(struct mm_struct *mm, unsigned long addr,
+pte_t xen_ptep_modify_prot_start(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep);
+void  xen_ptep_modify_prot_commit(struct vm_area_struct *vma, unsigned long addr,
 				  pte_t *ptep, pte_t pte);
 
 unsigned long xen_read_cr2_direct(void);
diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c
index 0f4fe206dcc2..856a85814f00 100644
--- a/arch/x86/xen/mmu_pv.c
+++ b/arch/x86/xen/mmu_pv.c
@@ -306,20 +306,20 @@ static void xen_set_pte_at(struct mm_struct *mm, unsigned long addr,
 	__xen_set_pte(ptep, pteval);
 }
 
-pte_t xen_ptep_modify_prot_start(struct mm_struct *mm,
+pte_t xen_ptep_modify_prot_start(struct vm_area_struct *vma,
 				 unsigned long addr, pte_t *ptep)
 {
 	/* Just return the pte as-is.  We preserve the bits on commit */
-	trace_xen_mmu_ptep_modify_prot_start(mm, addr, ptep, *ptep);
+	trace_xen_mmu_ptep_modify_prot_start(vma->vm_mm, addr, ptep, *ptep);
 	return *ptep;
 }
 
-void xen_ptep_modify_prot_commit(struct mm_struct *mm, unsigned long addr,
+void xen_ptep_modify_prot_commit(struct vm_area_struct *vma, unsigned long addr,
 				 pte_t *ptep, pte_t pte)
 {
 	struct mmu_update u;
 
-	trace_xen_mmu_ptep_modify_prot_commit(mm, addr, ptep, pte);
+	trace_xen_mmu_ptep_modify_prot_commit(vma->vm_mm, addr, ptep, pte);
 	xen_mc_batch();
 
 	u.ptr = virt_to_machine(ptep).maddr | MMU_PT_UPDATE_PRESERVE_AD;
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index f0ec9edab2f3..0adcaf338d64 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -942,10 +942,10 @@ static inline void clear_soft_dirty(struct vm_area_struct *vma,
 	pte_t ptent = *pte;
 
 	if (pte_present(ptent)) {
-		ptent = ptep_modify_prot_start(vma->vm_mm, addr, pte);
+		ptent = ptep_modify_prot_start(vma, addr, pte);
 		ptent = pte_wrprotect(ptent);
 		ptent = pte_clear_soft_dirty(ptent);
-		ptep_modify_prot_commit(vma->vm_mm, addr, pte, ptent);
+		ptep_modify_prot_commit(vma, addr, pte, ptent);
 	} else if (is_swap_pte(ptent)) {
 		ptent = pte_swp_clear_soft_dirty(ptent);
 		set_pte_at(vma->vm_mm, addr, pte, ptent);
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index 05e61e6c843f..8b0e933efe26 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -606,7 +606,7 @@ static inline int pmd_none_or_clear_bad(pmd_t *pmd)
 	return 0;
 }
 
-static inline pte_t __ptep_modify_prot_start(struct mm_struct *mm,
+static inline pte_t __ptep_modify_prot_start(struct vm_area_struct *vma,
 					     unsigned long addr,
 					     pte_t *ptep)
 {
@@ -615,10 +615,10 @@ static inline pte_t __ptep_modify_prot_start(struct mm_struct *mm,
 	 * non-present, preventing the hardware from asynchronously
 	 * updating it.
 	 */
-	return ptep_get_and_clear(mm, addr, ptep);
+	return ptep_get_and_clear(vma->vm_mm, addr, ptep);
 }
 
-static inline void __ptep_modify_prot_commit(struct mm_struct *mm,
+static inline void __ptep_modify_prot_commit(struct vm_area_struct *vma,
 					     unsigned long addr,
 					     pte_t *ptep, pte_t pte)
 {
@@ -626,7 +626,7 @@ static inline void __ptep_modify_prot_commit(struct mm_struct *mm,
 	 * The pte is non-present, so there's no hardware state to
 	 * preserve.
 	 */
-	set_pte_at(mm, addr, ptep, pte);
+	set_pte_at(vma->vm_mm, addr, ptep, pte);
 }
 
 #ifndef __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION
@@ -644,22 +644,22 @@ static inline void __ptep_modify_prot_commit(struct mm_struct *mm,
  * queue the update to be done at some later time.  The update must be
  * actually committed before the pte lock is released, however.
  */
-static inline pte_t ptep_modify_prot_start(struct mm_struct *mm,
+static inline pte_t ptep_modify_prot_start(struct vm_area_struct *vma,
 					   unsigned long addr,
 					   pte_t *ptep)
 {
-	return __ptep_modify_prot_start(mm, addr, ptep);
+	return __ptep_modify_prot_start(vma, addr, ptep);
 }
 
 /*
  * Commit an update to a pte, leaving any hardware-controlled bits in
  * the PTE unmodified.
  */
-static inline void ptep_modify_prot_commit(struct mm_struct *mm,
+static inline void ptep_modify_prot_commit(struct vm_area_struct *vma,
 					   unsigned long addr,
 					   pte_t *ptep, pte_t pte)
 {
-	__ptep_modify_prot_commit(mm, addr, ptep, pte);
+	__ptep_modify_prot_commit(vma, addr, ptep, pte);
 }
 #endif /* __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION */
 #endif /* CONFIG_MMU */
diff --git a/mm/memory.c b/mm/memory.c
index e11ca9dd823f..fb742f3237ad 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3610,12 +3610,12 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf)
 	 * Make it present again, Depending on how arch implementes non
 	 * accessible ptes, some can allow access by kernel mode.
 	 */
-	pte = ptep_modify_prot_start(vma->vm_mm, vmf->address, vmf->pte);
+	pte = ptep_modify_prot_start(vma, vmf->address, vmf->pte);
 	pte = pte_modify(pte, vma->vm_page_prot);
 	pte = pte_mkyoung(pte);
 	if (was_writable)
 		pte = pte_mkwrite(pte);
-	ptep_modify_prot_commit(vma->vm_mm, vmf->address, vmf->pte, pte);
+	ptep_modify_prot_commit(vma, vmf->address, vmf->pte, pte);
 	update_mmu_cache(vma, vmf->address, vmf->pte);
 
 	page = vm_normal_page(vma, vmf->address, pte);
diff --git a/mm/mprotect.c b/mm/mprotect.c
index 36cb358db170..c89ce07923c8 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -110,7 +110,7 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
 					continue;
 			}
 
-			ptent = ptep_modify_prot_start(mm, addr, pte);
+			ptent = ptep_modify_prot_start(vma, addr, pte);
 			ptent = pte_modify(ptent, newprot);
 			if (preserve_write)
 				ptent = pte_mk_savedwrite(ptent);
@@ -121,7 +121,7 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
 					 !(vma->vm_flags & VM_SOFTDIRTY))) {
 				ptent = pte_mkwrite(ptent);
 			}
-			ptep_modify_prot_commit(mm, addr, pte, ptent);
+			ptep_modify_prot_commit(vma, addr, pte, ptent);
 			pages++;
 		} else if (IS_ENABLED(CONFIG_MIGRATION)) {
 			swp_entry_t entry = pte_to_swp_entry(oldpte);
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH V5 2/5] mm: update ptep_modify_prot_commit to take old pte value as arg
  2019-01-16  8:50 [PATCH V5 0/5] NestMMU pte upgrade workaround for mprotect Aneesh Kumar K.V
  2019-01-16  8:50 ` [PATCH V5 1/5] mm: Update ptep_modify_prot_start/commit to take vm_area_struct as arg Aneesh Kumar K.V
@ 2019-01-16  8:50 ` Aneesh Kumar K.V
  2019-01-30 10:46   ` Michael Ellerman
  2019-01-16  8:50 ` [PATCH V5 3/5] arch/powerpc/mm: Nest MMU workaround for mprotect RW upgrade Aneesh Kumar K.V
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: Aneesh Kumar K.V @ 2019-01-16  8:50 UTC (permalink / raw)
  To: npiggin, benh, paulus, mpe, akpm, x86
  Cc: linuxppc-dev, linux-mm, Aneesh Kumar K.V

Architectures like ppc64 require to do a conditional tlb flush based on the old
and new value of pte. Enable that by passing old pte value as the arg.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/s390/include/asm/pgtable.h | 3 ++-
 arch/s390/mm/pgtable.c          | 2 +-
 arch/x86/include/asm/paravirt.h | 2 +-
 fs/proc/task_mmu.c              | 8 +++++---
 include/asm-generic/pgtable.h   | 2 +-
 mm/memory.c                     | 8 ++++----
 mm/mprotect.c                   | 6 +++---
 7 files changed, 17 insertions(+), 14 deletions(-)

diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index 5d730199e37b..76dc344edb8c 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -1070,7 +1070,8 @@ static inline pte_t ptep_get_and_clear(struct mm_struct *mm,
 
 #define __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION
 pte_t ptep_modify_prot_start(struct vm_area_struct *, unsigned long, pte_t *);
-void ptep_modify_prot_commit(struct vm_area_struct *, unsigned long, pte_t *, pte_t);
+void ptep_modify_prot_commit(struct vm_area_struct *, unsigned long,
+			     pte_t *, pte_t, pte_t);
 
 #define __HAVE_ARCH_PTEP_CLEAR_FLUSH
 static inline pte_t ptep_clear_flush(struct vm_area_struct *vma,
diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
index 29c0a21cd34a..b283b92722cc 100644
--- a/arch/s390/mm/pgtable.c
+++ b/arch/s390/mm/pgtable.c
@@ -322,7 +322,7 @@ pte_t ptep_modify_prot_start(struct vm_area_struct *vma, unsigned long addr,
 EXPORT_SYMBOL(ptep_modify_prot_start);
 
 void ptep_modify_prot_commit(struct vm_area_struct *vma, unsigned long addr,
-			     pte_t *ptep, pte_t pte)
+			     pte_t *ptep, pte_t old_pte, pte_t pte)
 {
 	pgste_t pgste;
 	struct mm_struct *mm = vma->vm_mm;
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index c5a7f18cce7e..c25c38a05c1c 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -433,7 +433,7 @@ static inline pte_t ptep_modify_prot_start(struct vm_area_struct *vma, unsigned
 }
 
 static inline void ptep_modify_prot_commit(struct vm_area_struct *vma, unsigned long addr,
-					   pte_t *ptep, pte_t pte)
+					   pte_t *ptep, pte_t old_pte, pte_t pte)
 {
 
 	if (sizeof(pteval_t) > sizeof(long))
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 0adcaf338d64..f367121c747e 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -942,10 +942,12 @@ static inline void clear_soft_dirty(struct vm_area_struct *vma,
 	pte_t ptent = *pte;
 
 	if (pte_present(ptent)) {
-		ptent = ptep_modify_prot_start(vma, addr, pte);
-		ptent = pte_wrprotect(ptent);
+		pte_t old_pte;
+
+		old_pte = ptep_modify_prot_start(vma, addr, pte);
+		ptent = pte_wrprotect(old_pte);
 		ptent = pte_clear_soft_dirty(ptent);
-		ptep_modify_prot_commit(vma, addr, pte, ptent);
+		ptep_modify_prot_commit(vma, addr, pte, old_pte, ptent);
 	} else if (is_swap_pte(ptent)) {
 		ptent = pte_swp_clear_soft_dirty(ptent);
 		set_pte_at(vma->vm_mm, addr, pte, ptent);
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index 8b0e933efe26..fa782fba51ee 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -657,7 +657,7 @@ static inline pte_t ptep_modify_prot_start(struct vm_area_struct *vma,
  */
 static inline void ptep_modify_prot_commit(struct vm_area_struct *vma,
 					   unsigned long addr,
-					   pte_t *ptep, pte_t pte)
+					   pte_t *ptep, pte_t old_pte, pte_t pte)
 {
 	__ptep_modify_prot_commit(vma, addr, ptep, pte);
 }
diff --git a/mm/memory.c b/mm/memory.c
index fb742f3237ad..b5b77fb5ba2c 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3590,7 +3590,7 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf)
 	int last_cpupid;
 	int target_nid;
 	bool migrated = false;
-	pte_t pte;
+	pte_t pte, old_pte;
 	bool was_writable = pte_savedwrite(vmf->orig_pte);
 	int flags = 0;
 
@@ -3610,12 +3610,12 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf)
 	 * Make it present again, Depending on how arch implementes non
 	 * accessible ptes, some can allow access by kernel mode.
 	 */
-	pte = ptep_modify_prot_start(vma, vmf->address, vmf->pte);
-	pte = pte_modify(pte, vma->vm_page_prot);
+	old_pte = ptep_modify_prot_start(vma, vmf->address, vmf->pte);
+	pte = pte_modify(old_pte, vma->vm_page_prot);
 	pte = pte_mkyoung(pte);
 	if (was_writable)
 		pte = pte_mkwrite(pte);
-	ptep_modify_prot_commit(vma, vmf->address, vmf->pte, pte);
+	ptep_modify_prot_commit(vma, vmf->address, vmf->pte, old_pte, pte);
 	update_mmu_cache(vma, vmf->address, vmf->pte);
 
 	page = vm_normal_page(vma, vmf->address, pte);
diff --git a/mm/mprotect.c b/mm/mprotect.c
index c89ce07923c8..028c724dcb1a 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -110,8 +110,8 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
 					continue;
 			}
 
-			ptent = ptep_modify_prot_start(vma, addr, pte);
-			ptent = pte_modify(ptent, newprot);
+			oldpte = ptep_modify_prot_start(vma, addr, pte);
+			ptent = pte_modify(oldpte, newprot);
 			if (preserve_write)
 				ptent = pte_mk_savedwrite(ptent);
 
@@ -121,7 +121,7 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
 					 !(vma->vm_flags & VM_SOFTDIRTY))) {
 				ptent = pte_mkwrite(ptent);
 			}
-			ptep_modify_prot_commit(vma, addr, pte, ptent);
+			ptep_modify_prot_commit(vma, addr, pte, oldpte, ptent);
 			pages++;
 		} else if (IS_ENABLED(CONFIG_MIGRATION)) {
 			swp_entry_t entry = pte_to_swp_entry(oldpte);
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH V5 3/5] arch/powerpc/mm: Nest MMU workaround for mprotect RW upgrade.
  2019-01-16  8:50 [PATCH V5 0/5] NestMMU pte upgrade workaround for mprotect Aneesh Kumar K.V
  2019-01-16  8:50 ` [PATCH V5 1/5] mm: Update ptep_modify_prot_start/commit to take vm_area_struct as arg Aneesh Kumar K.V
  2019-01-16  8:50 ` [PATCH V5 2/5] mm: update ptep_modify_prot_commit to take old pte value " Aneesh Kumar K.V
@ 2019-01-16  8:50 ` Aneesh Kumar K.V
  2019-01-30 10:52   ` Michael Ellerman
  2019-01-16  8:50 ` [PATCH V5 4/5] mm/hugetlb: Add prot_modify_start/commit sequence for hugetlb update Aneesh Kumar K.V
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: Aneesh Kumar K.V @ 2019-01-16  8:50 UTC (permalink / raw)
  To: npiggin, benh, paulus, mpe, akpm, x86
  Cc: linuxppc-dev, linux-mm, Aneesh Kumar K.V

NestMMU requires us to mark the pte invalid and flush the tlb when we do a
RW upgrade of pte. We fixed a variant of this in the fault path in commit
Fixes: bd5050e38aec ("powerpc/mm/radix: Change pte relax sequence to handle nest MMU hang")

Do the same for mprotect upgrades.

Hugetlb is handled in the next patch.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/powerpc/include/asm/book3s/64/pgtable.h | 18 ++++++++++++++
 arch/powerpc/include/asm/book3s/64/radix.h   |  4 ++++
 arch/powerpc/mm/pgtable-book3s64.c           | 25 ++++++++++++++++++++
 arch/powerpc/mm/pgtable-radix.c              | 18 ++++++++++++++
 4 files changed, 65 insertions(+)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 2e6ada28da64..92eaea164700 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -1314,6 +1314,24 @@ static inline int pud_pfn(pud_t pud)
 	BUILD_BUG();
 	return 0;
 }
+#define __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION
+pte_t ptep_modify_prot_start(struct vm_area_struct *, unsigned long, pte_t *);
+void ptep_modify_prot_commit(struct vm_area_struct *, unsigned long,
+			     pte_t *, pte_t, pte_t);
+
+/*
+ * Returns true for a R -> RW upgrade of pte
+ */
+static inline bool is_pte_rw_upgrade(unsigned long old_val, unsigned long new_val)
+{
+	if (!(old_val & _PAGE_READ))
+		return false;
+
+	if ((!(old_val & _PAGE_WRITE)) && (new_val & _PAGE_WRITE))
+		return true;
+
+	return false;
+}
 
 #endif /* __ASSEMBLY__ */
 #endif /* _ASM_POWERPC_BOOK3S_64_PGTABLE_H_ */
diff --git a/arch/powerpc/include/asm/book3s/64/radix.h b/arch/powerpc/include/asm/book3s/64/radix.h
index 7d1a3d1543fc..5ab134eeed20 100644
--- a/arch/powerpc/include/asm/book3s/64/radix.h
+++ b/arch/powerpc/include/asm/book3s/64/radix.h
@@ -127,6 +127,10 @@ extern void radix__ptep_set_access_flags(struct vm_area_struct *vma, pte_t *ptep
 					 pte_t entry, unsigned long address,
 					 int psize);
 
+extern void radix__ptep_modify_prot_commit(struct vm_area_struct *vma,
+					   unsigned long addr, pte_t *ptep,
+					   pte_t old_pte, pte_t pte);
+
 static inline unsigned long __radix_pte_update(pte_t *ptep, unsigned long clr,
 					       unsigned long set)
 {
diff --git a/arch/powerpc/mm/pgtable-book3s64.c b/arch/powerpc/mm/pgtable-book3s64.c
index f3c31f5e1026..47c742f002ea 100644
--- a/arch/powerpc/mm/pgtable-book3s64.c
+++ b/arch/powerpc/mm/pgtable-book3s64.c
@@ -400,3 +400,28 @@ void arch_report_meminfo(struct seq_file *m)
 		   atomic_long_read(&direct_pages_count[MMU_PAGE_1G]) << 20);
 }
 #endif /* CONFIG_PROC_FS */
+
+pte_t ptep_modify_prot_start(struct vm_area_struct *vma, unsigned long addr,
+			     pte_t *ptep)
+{
+	unsigned long pte_val;
+
+	/*
+	 * Clear the _PAGE_PRESENT so that no hardware parallel update is
+	 * possible. Also keep the pte_present true so that we don't take
+	 * wrong fault.
+	 */
+	pte_val = pte_update(vma->vm_mm, addr, ptep, _PAGE_PRESENT, _PAGE_INVALID, 0);
+
+	return __pte(pte_val);
+
+}
+
+void ptep_modify_prot_commit(struct vm_area_struct *vma, unsigned long addr,
+			     pte_t *ptep, pte_t old_pte, pte_t pte)
+{
+	if (radix_enabled())
+		return radix__ptep_modify_prot_commit(vma, addr,
+						      ptep, old_pte, pte);
+	set_pte_at(vma->vm_mm, addr, ptep, pte);
+}
diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c
index 931156069a81..dced3cd241c2 100644
--- a/arch/powerpc/mm/pgtable-radix.c
+++ b/arch/powerpc/mm/pgtable-radix.c
@@ -1063,3 +1063,21 @@ void radix__ptep_set_access_flags(struct vm_area_struct *vma, pte_t *ptep,
 	}
 	/* See ptesync comment in radix__set_pte_at */
 }
+
+void radix__ptep_modify_prot_commit(struct vm_area_struct *vma,
+				    unsigned long addr, pte_t *ptep,
+				    pte_t old_pte, pte_t pte)
+{
+	struct mm_struct *mm = vma->vm_mm;
+
+	/*
+	 * To avoid NMMU hang while relaxing access we need to flush the tlb before
+	 * we set the new value. We need to do this only for radix, because hash
+	 * translation does flush when updating the linux pte.
+	 */
+	if (is_pte_rw_upgrade(pte_val(old_pte), pte_val(pte)) &&
+	    (atomic_read(&mm->context.copros) > 0))
+		radix__flush_tlb_page(vma, addr);
+
+	set_pte_at(mm, addr, ptep, pte);
+}
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH V5 4/5] mm/hugetlb: Add prot_modify_start/commit sequence for hugetlb update
  2019-01-16  8:50 [PATCH V5 0/5] NestMMU pte upgrade workaround for mprotect Aneesh Kumar K.V
                   ` (2 preceding siblings ...)
  2019-01-16  8:50 ` [PATCH V5 3/5] arch/powerpc/mm: Nest MMU workaround for mprotect RW upgrade Aneesh Kumar K.V
@ 2019-01-16  8:50 ` Aneesh Kumar K.V
  2019-01-30 10:54   ` Michael Ellerman
  2019-01-16  8:50 ` [PATCH V5 5/5] arch/powerpc/mm/hugetlb: NestMMU workaround for hugetlb mprotect RW upgrade Aneesh Kumar K.V
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: Aneesh Kumar K.V @ 2019-01-16  8:50 UTC (permalink / raw)
  To: npiggin, benh, paulus, mpe, akpm, x86
  Cc: linuxppc-dev, linux-mm, Aneesh Kumar K.V

Architectures like ppc64 require to do a conditional tlb flush based on the old
and new value of pte. Follow the regular pte change protection sequence for
hugetlb too. This allows the architectures to override the update sequence.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 include/linux/hugetlb.h | 20 ++++++++++++++++++++
 mm/hugetlb.c            |  8 +++++---
 2 files changed, 25 insertions(+), 3 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 087fd5f48c91..39e78b80375c 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -543,6 +543,26 @@ static inline void set_huge_swap_pte_at(struct mm_struct *mm, unsigned long addr
 	set_huge_pte_at(mm, addr, ptep, pte);
 }
 #endif
+
+#ifndef huge_ptep_modify_prot_start
+#define huge_ptep_modify_prot_start huge_ptep_modify_prot_start
+static inline pte_t huge_ptep_modify_prot_start(struct vm_area_struct *vma,
+						unsigned long addr, pte_t *ptep)
+{
+	return huge_ptep_get_and_clear(vma->vm_mm, addr, ptep);
+}
+#endif
+
+#ifndef huge_ptep_modify_prot_commit
+#define huge_ptep_modify_prot_commit huge_ptep_modify_prot_commit
+static inline void huge_ptep_modify_prot_commit(struct vm_area_struct *vma,
+						unsigned long addr, pte_t *ptep,
+						pte_t old_pte, pte_t pte)
+{
+	set_huge_pte_at(vma->vm_mm, addr, ptep, pte);
+}
+#endif
+
 #else	/* CONFIG_HUGETLB_PAGE */
 struct hstate {};
 #define alloc_huge_page(v, a, r) NULL
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index df2e7dd5ff17..f824d2200ca9 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4387,10 +4387,12 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
 			continue;
 		}
 		if (!huge_pte_none(pte)) {
-			pte = huge_ptep_get_and_clear(mm, address, ptep);
-			pte = pte_mkhuge(huge_pte_modify(pte, newprot));
+			pte_t old_pte;
+
+			old_pte = huge_ptep_modify_prot_start(vma, address, ptep);
+			pte = pte_mkhuge(huge_pte_modify(old_pte, newprot));
 			pte = arch_make_huge_pte(pte, vma, NULL, 0);
-			set_huge_pte_at(mm, address, ptep, pte);
+			huge_ptep_modify_prot_commit(vma, address, ptep, old_pte, pte);
 			pages++;
 		}
 		spin_unlock(ptl);
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH V5 5/5] arch/powerpc/mm/hugetlb: NestMMU workaround for hugetlb mprotect RW upgrade
  2019-01-16  8:50 [PATCH V5 0/5] NestMMU pte upgrade workaround for mprotect Aneesh Kumar K.V
                   ` (3 preceding siblings ...)
  2019-01-16  8:50 ` [PATCH V5 4/5] mm/hugetlb: Add prot_modify_start/commit sequence for hugetlb update Aneesh Kumar K.V
@ 2019-01-16  8:50 ` Aneesh Kumar K.V
  2019-01-30 11:01   ` Michael Ellerman
  2019-01-29 10:43 ` [PATCH V5 0/5] NestMMU pte upgrade workaround for mprotect Aneesh Kumar K.V
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: Aneesh Kumar K.V @ 2019-01-16  8:50 UTC (permalink / raw)
  To: npiggin, benh, paulus, mpe, akpm, x86
  Cc: linuxppc-dev, linux-mm, Aneesh Kumar K.V

NestMMU requires us to mark the pte invalid and flush the tlb when we do a
RW upgrade of pte. We fixed a variant of this in the fault path in commit
Fixes: bd5050e38aec ("powerpc/mm/radix: Change pte relax sequence to handle nest MMU hang")

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/powerpc/include/asm/book3s/64/hugetlb.h | 12 ++++++++++
 arch/powerpc/mm/hugetlbpage-hash64.c         | 25 ++++++++++++++++++++
 arch/powerpc/mm/hugetlbpage-radix.c          | 17 +++++++++++++
 3 files changed, 54 insertions(+)

diff --git a/arch/powerpc/include/asm/book3s/64/hugetlb.h b/arch/powerpc/include/asm/book3s/64/hugetlb.h
index 5b0177733994..66c1e4f88d65 100644
--- a/arch/powerpc/include/asm/book3s/64/hugetlb.h
+++ b/arch/powerpc/include/asm/book3s/64/hugetlb.h
@@ -13,6 +13,10 @@ radix__hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
 				unsigned long len, unsigned long pgoff,
 				unsigned long flags);
 
+extern void radix__huge_ptep_modify_prot_commit(struct vm_area_struct *vma,
+						unsigned long addr, pte_t *ptep,
+						pte_t old_pte, pte_t pte);
+
 static inline int hstate_get_psize(struct hstate *hstate)
 {
 	unsigned long shift;
@@ -42,4 +46,12 @@ static inline bool gigantic_page_supported(void)
 /* hugepd entry valid bit */
 #define HUGEPD_VAL_BITS		(0x8000000000000000UL)
 
+#define huge_ptep_modify_prot_start huge_ptep_modify_prot_start
+extern pte_t huge_ptep_modify_prot_start(struct vm_area_struct *vma,
+					 unsigned long addr, pte_t *ptep);
+
+#define huge_ptep_modify_prot_commit huge_ptep_modify_prot_commit
+extern void huge_ptep_modify_prot_commit(struct vm_area_struct *vma,
+					 unsigned long addr, pte_t *ptep,
+					 pte_t old_pte, pte_t new_pte);
 #endif
diff --git a/arch/powerpc/mm/hugetlbpage-hash64.c b/arch/powerpc/mm/hugetlbpage-hash64.c
index 2e6a8f9345d3..367ce3a4a503 100644
--- a/arch/powerpc/mm/hugetlbpage-hash64.c
+++ b/arch/powerpc/mm/hugetlbpage-hash64.c
@@ -121,3 +121,28 @@ int __hash_page_huge(unsigned long ea, unsigned long access, unsigned long vsid,
 	*ptep = __pte(new_pte & ~H_PAGE_BUSY);
 	return 0;
 }
+
+pte_t huge_ptep_modify_prot_start(struct vm_area_struct *vma,
+				  unsigned long addr, pte_t *ptep)
+{
+	unsigned long pte_val;
+	/*
+	 * Clear the _PAGE_PRESENT so that no hardware parallel update is
+	 * possible. Also keep the pte_present true so that we don't take
+	 * wrong fault.
+	 */
+	pte_val = pte_update(vma->vm_mm, addr, ptep,
+			     _PAGE_PRESENT, _PAGE_INVALID, 1);
+
+	return __pte(pte_val);
+}
+
+void huge_ptep_modify_prot_commit(struct vm_area_struct *vma, unsigned long addr,
+				  pte_t *ptep, pte_t old_pte, pte_t pte)
+{
+
+	if (radix_enabled())
+		return radix__huge_ptep_modify_prot_commit(vma, addr, ptep,
+							   old_pte, pte);
+	set_huge_pte_at(vma->vm_mm, addr, ptep, pte);
+}
diff --git a/arch/powerpc/mm/hugetlbpage-radix.c b/arch/powerpc/mm/hugetlbpage-radix.c
index 2486bee0f93e..11d9ea28a816 100644
--- a/arch/powerpc/mm/hugetlbpage-radix.c
+++ b/arch/powerpc/mm/hugetlbpage-radix.c
@@ -90,3 +90,20 @@ radix__hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
 
 	return vm_unmapped_area(&info);
 }
+
+void radix__huge_ptep_modify_prot_commit(struct vm_area_struct *vma,
+					 unsigned long addr, pte_t *ptep,
+					 pte_t old_pte, pte_t pte)
+{
+	struct mm_struct *mm = vma->vm_mm;
+
+	/*
+	 * To avoid NMMU hang while relaxing access we need to flush the tlb before
+	 * we set the new value.
+	 */
+	if (is_pte_rw_upgrade(pte_val(old_pte), pte_val(pte)) &&
+	    (atomic_read(&mm->context.copros) > 0))
+		radix__flush_hugetlb_page(vma, addr);
+
+	set_huge_pte_at(vma->vm_mm, addr, ptep, pte);
+}
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH V5 0/5] NestMMU pte upgrade workaround for mprotect
  2019-01-16  8:50 [PATCH V5 0/5] NestMMU pte upgrade workaround for mprotect Aneesh Kumar K.V
                   ` (4 preceding siblings ...)
  2019-01-16  8:50 ` [PATCH V5 5/5] arch/powerpc/mm/hugetlb: NestMMU workaround for hugetlb mprotect RW upgrade Aneesh Kumar K.V
@ 2019-01-29 10:43 ` Aneesh Kumar K.V
  2019-01-29 18:29 ` Andrew Morton
  2019-02-26 23:37 ` Andrew Morton
  7 siblings, 0 replies; 18+ messages in thread
From: Aneesh Kumar K.V @ 2019-01-29 10:43 UTC (permalink / raw)
  To: npiggin, benh, paulus, mpe, akpm, x86; +Cc: linuxppc-dev, linux-mm


Andrew,

How do you want to merge this? Michael Ellerman suggests this should go
via -mm tree.

-aneesh


"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:

> We can upgrade pte access (R -> RW transition) via mprotect. We need
> to make sure we follow the recommended pte update sequence as outlined in
> commit bd5050e38aec ("powerpc/mm/radix: Change pte relax sequence to handle nest MMU hang")
> for such updates. This patch series do that.
>
> Changes from V4:
> * Drop EXPORT_SYMBOL 
>
> Changes from V3:
> * Build fix for x86
>
> Changes from V2:
> * Update commit message for patch 4
> * use radix tlb flush routines directly.
>
> Changes from V1:
> * Restrict ths only for R->RW upgrade. We don't need to do this for Autonuma
> * Restrict this only for radix translation mode.
>
>
> Aneesh Kumar K.V (5):
>   mm: Update ptep_modify_prot_start/commit to take vm_area_struct as arg
>   mm: update ptep_modify_prot_commit to take old pte value as arg
>   arch/powerpc/mm: Nest MMU workaround for mprotect RW upgrade.
>   mm/hugetlb: Add prot_modify_start/commit sequence for hugetlb update
>   arch/powerpc/mm/hugetlb: NestMMU workaround for hugetlb mprotect RW
>     upgrade
>
>  arch/powerpc/include/asm/book3s/64/hugetlb.h | 12 ++++++++++
>  arch/powerpc/include/asm/book3s/64/pgtable.h | 18 ++++++++++++++
>  arch/powerpc/include/asm/book3s/64/radix.h   |  4 ++++
>  arch/powerpc/mm/hugetlbpage-hash64.c         | 25 ++++++++++++++++++++
>  arch/powerpc/mm/hugetlbpage-radix.c          | 17 +++++++++++++
>  arch/powerpc/mm/pgtable-book3s64.c           | 25 ++++++++++++++++++++
>  arch/powerpc/mm/pgtable-radix.c              | 18 ++++++++++++++
>  arch/s390/include/asm/pgtable.h              |  5 ++--
>  arch/s390/mm/pgtable.c                       |  8 ++++---
>  arch/x86/include/asm/paravirt.h              | 13 +++++-----
>  arch/x86/include/asm/paravirt_types.h        |  5 ++--
>  arch/x86/xen/mmu.h                           |  4 ++--
>  arch/x86/xen/mmu_pv.c                        |  8 +++----
>  fs/proc/task_mmu.c                           |  8 ++++---
>  include/asm-generic/pgtable.h                | 18 +++++++-------
>  include/linux/hugetlb.h                      | 20 ++++++++++++++++
>  mm/hugetlb.c                                 |  8 ++++---
>  mm/memory.c                                  |  8 +++----
>  mm/mprotect.c                                |  6 ++---
>  19 files changed, 189 insertions(+), 41 deletions(-)
>
> -- 
> 2.20.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH V5 0/5] NestMMU pte upgrade workaround for mprotect
  2019-01-16  8:50 [PATCH V5 0/5] NestMMU pte upgrade workaround for mprotect Aneesh Kumar K.V
                   ` (5 preceding siblings ...)
  2019-01-29 10:43 ` [PATCH V5 0/5] NestMMU pte upgrade workaround for mprotect Aneesh Kumar K.V
@ 2019-01-29 18:29 ` Andrew Morton
  2019-02-26 23:37 ` Andrew Morton
  7 siblings, 0 replies; 18+ messages in thread
From: Andrew Morton @ 2019-01-29 18:29 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: npiggin, benh, paulus, mpe, x86, linuxppc-dev, linux-mm

On Wed, 16 Jan 2019 14:20:30 +0530 "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> wrote:

> We can upgrade pte access (R -> RW transition) via mprotect. We need
> to make sure we follow the recommended pte update sequence as outlined in
> commit bd5050e38aec ("powerpc/mm/radix: Change pte relax sequence to handle nest MMU hang")
> for such updates. This patch series do that.

This series is at v5 and has no recorded review activity.  Perhaps you
cold hunt down some people to help out here?


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH V5 1/5] mm: Update ptep_modify_prot_start/commit to take vm_area_struct as arg
  2019-01-16  8:50 ` [PATCH V5 1/5] mm: Update ptep_modify_prot_start/commit to take vm_area_struct as arg Aneesh Kumar K.V
@ 2019-01-30 10:33   ` Michael Ellerman
  0 siblings, 0 replies; 18+ messages in thread
From: Michael Ellerman @ 2019-01-30 10:33 UTC (permalink / raw)
  To: Aneesh Kumar K.V, npiggin, benh, paulus, akpm, x86
  Cc: linuxppc-dev, linux-mm, Aneesh Kumar K.V

"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:

> Some architecture may want to call flush_tlb_range from these helpers.

That's what we want to do, but wouldn't a better description be that
some architectures may need access to the vma for some reason, one of
which might be flushing the TLB.

> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
>  arch/s390/include/asm/pgtable.h       |  4 ++--
>  arch/s390/mm/pgtable.c                |  6 ++++--
>  arch/x86/include/asm/paravirt.h       | 11 ++++++-----
>  arch/x86/include/asm/paravirt_types.h |  5 +++--
>  arch/x86/xen/mmu.h                    |  4 ++--
>  arch/x86/xen/mmu_pv.c                 |  8 ++++----
>  fs/proc/task_mmu.c                    |  4 ++--
>  include/asm-generic/pgtable.h         | 16 ++++++++--------
>  mm/memory.c                           |  4 ++--
>  mm/mprotect.c                         |  4 ++--
>  10 files changed, 35 insertions(+), 31 deletions(-)
>
> diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
> index 063732414dfb..5d730199e37b 100644
> --- a/arch/s390/include/asm/pgtable.h
> +++ b/arch/s390/include/asm/pgtable.h
> @@ -1069,8 +1069,8 @@ static inline pte_t ptep_get_and_clear(struct mm_struct *mm,
>  }
>  
>  #define __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION
> -pte_t ptep_modify_prot_start(struct mm_struct *, unsigned long, pte_t *);
> -void ptep_modify_prot_commit(struct mm_struct *, unsigned long, pte_t *, pte_t);
> +pte_t ptep_modify_prot_start(struct vm_area_struct *, unsigned long, pte_t *);
> +void ptep_modify_prot_commit(struct vm_area_struct *, unsigned long, pte_t *, pte_t);
>  
>  #define __HAVE_ARCH_PTEP_CLEAR_FLUSH
>  static inline pte_t ptep_clear_flush(struct vm_area_struct *vma,
> diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
> index f2cc7da473e4..29c0a21cd34a 100644
> --- a/arch/s390/mm/pgtable.c
> +++ b/arch/s390/mm/pgtable.c
> @@ -301,12 +301,13 @@ pte_t ptep_xchg_lazy(struct mm_struct *mm, unsigned long addr,
>  }
>  EXPORT_SYMBOL(ptep_xchg_lazy);
>  
> -pte_t ptep_modify_prot_start(struct mm_struct *mm, unsigned long addr,
> +pte_t ptep_modify_prot_start(struct vm_area_struct *vma, unsigned long addr,
>  			     pte_t *ptep)
>  {
>  	pgste_t pgste;
>  	pte_t old;
>  	int nodat;
> +	struct mm_struct *mm = vma->vm_mm;

If this was my code I'd want the mm as the first variable, to preserve
the Reverse Christmas tree format.

>  	preempt_disable();
>  	pgste = ptep_xchg_start(mm, addr, ptep);
> @@ -320,10 +321,11 @@ pte_t ptep_modify_prot_start(struct mm_struct *mm, unsigned long addr,
>  }
>  EXPORT_SYMBOL(ptep_modify_prot_start);
>  
> -void ptep_modify_prot_commit(struct mm_struct *mm, unsigned long addr,
> +void ptep_modify_prot_commit(struct vm_area_struct *vma, unsigned long addr,
>  			     pte_t *ptep, pte_t pte)
>  {
>  	pgste_t pgste;
> +	struct mm_struct *mm = vma->vm_mm;
  
Ditto.

>  	if (!MACHINE_HAS_NX)
>  		pte_val(pte) &= ~_PAGE_NOEXEC;
> diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
> index a97f28d914d5..c5a7f18cce7e 100644
> --- a/arch/x86/include/asm/paravirt.h
> +++ b/arch/x86/include/asm/paravirt.h
> @@ -422,25 +422,26 @@ static inline pgdval_t pgd_val(pgd_t pgd)
>  }
>  
>  #define  __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION
> -static inline pte_t ptep_modify_prot_start(struct mm_struct *mm, unsigned long addr,
> +static inline pte_t ptep_modify_prot_start(struct vm_area_struct *vma, unsigned long addr,
>  					   pte_t *ptep)
>  {
>  	pteval_t ret;
>  
> -	ret = PVOP_CALL3(pteval_t, mmu.ptep_modify_prot_start, mm, addr, ptep);
> +	ret = PVOP_CALL3(pteval_t, mmu.ptep_modify_prot_start, vma, addr, ptep);
>  
>  	return (pte_t) { .pte = ret };
>  }
>  
> -static inline void ptep_modify_prot_commit(struct mm_struct *mm, unsigned long addr,
> +static inline void ptep_modify_prot_commit(struct vm_area_struct *vma, unsigned long addr,
>  					   pte_t *ptep, pte_t pte)
>  {
> +

Unnecessary blank line here.

>  	if (sizeof(pteval_t) > sizeof(long))
>  		/* 5 arg words */
> -		pv_ops.mmu.ptep_modify_prot_commit(mm, addr, ptep, pte);
> +		pv_ops.mmu.ptep_modify_prot_commit(vma, addr, ptep, pte);
>  	else
>  		PVOP_VCALL4(mmu.ptep_modify_prot_commit,
> -			    mm, addr, ptep, pte.pte);
> +			    vma, addr, ptep, pte.pte);
>  }
>  
>  static inline void set_pte(pte_t *ptep, pte_t pte)

The rest looks good.

cheers


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH V5 2/5] mm: update ptep_modify_prot_commit to take old pte value as arg
  2019-01-16  8:50 ` [PATCH V5 2/5] mm: update ptep_modify_prot_commit to take old pte value " Aneesh Kumar K.V
@ 2019-01-30 10:46   ` Michael Ellerman
  2019-01-31  5:03     ` Aneesh Kumar K.V
  0 siblings, 1 reply; 18+ messages in thread
From: Michael Ellerman @ 2019-01-30 10:46 UTC (permalink / raw)
  To: Aneesh Kumar K.V, npiggin, benh, paulus, akpm, x86
  Cc: linuxppc-dev, linux-mm, Aneesh Kumar K.V

"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:

> Architectures like ppc64 require to do a conditional tlb flush based on the old
> and new value of pte. Enable that by passing old pte value as the arg.

It's not actually the architecture, it's to work around a specific bug
on Power9.

> diff --git a/mm/mprotect.c b/mm/mprotect.c
> index c89ce07923c8..028c724dcb1a 100644
> --- a/mm/mprotect.c
> +++ b/mm/mprotect.c
> @@ -110,8 +110,8 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
>  					continue;
>  			}
>  
> -			ptent = ptep_modify_prot_start(vma, addr, pte);
> -			ptent = pte_modify(ptent, newprot);
> +			oldpte = ptep_modify_prot_start(vma, addr, pte);
> +			ptent = pte_modify(oldpte, newprot);
>  			if (preserve_write)
>  				ptent = pte_mk_savedwrite(ptent);

Is it OK to reuse oldpte here?

It was set at the top of the loop with:

		oldpte = *pte;

Is it guaranteed that ptep_modify_prot_start() returns the old value
unmodified, or could an implementation conceivably filter some bits out?

If so then it could be confusing for oldpte to have its value change
half way through the loop.


cheers


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH V5 3/5] arch/powerpc/mm: Nest MMU workaround for mprotect RW upgrade.
  2019-01-16  8:50 ` [PATCH V5 3/5] arch/powerpc/mm: Nest MMU workaround for mprotect RW upgrade Aneesh Kumar K.V
@ 2019-01-30 10:52   ` Michael Ellerman
  2019-01-31  5:07     ` Aneesh Kumar K.V
  0 siblings, 1 reply; 18+ messages in thread
From: Michael Ellerman @ 2019-01-30 10:52 UTC (permalink / raw)
  To: Aneesh Kumar K.V, npiggin, benh, paulus, akpm, x86
  Cc: linuxppc-dev, linux-mm, Aneesh Kumar K.V

"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:
> NestMMU requires us to mark the pte invalid and flush the tlb when we do a
> RW upgrade of pte. We fixed a variant of this in the fault path in commit
> Fixes: bd5050e38aec ("powerpc/mm/radix: Change pte relax sequence to handle nest MMU hang")

You don't want the "Fixes:" there.

>
> Do the same for mprotect upgrades.
>
> Hugetlb is handled in the next patch.
>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
>  arch/powerpc/include/asm/book3s/64/pgtable.h | 18 ++++++++++++++
>  arch/powerpc/include/asm/book3s/64/radix.h   |  4 ++++
>  arch/powerpc/mm/pgtable-book3s64.c           | 25 ++++++++++++++++++++
>  arch/powerpc/mm/pgtable-radix.c              | 18 ++++++++++++++
>  4 files changed, 65 insertions(+)
>
> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
> index 2e6ada28da64..92eaea164700 100644
> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
> @@ -1314,6 +1314,24 @@ static inline int pud_pfn(pud_t pud)
>  	BUILD_BUG();
>  	return 0;
>  }

Can we get a blank line here?

> +#define __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION
> +pte_t ptep_modify_prot_start(struct vm_area_struct *, unsigned long, pte_t *);
> +void ptep_modify_prot_commit(struct vm_area_struct *, unsigned long,
> +			     pte_t *, pte_t, pte_t);

So these are not inline ...

> +/*
> + * Returns true for a R -> RW upgrade of pte
> + */
> +static inline bool is_pte_rw_upgrade(unsigned long old_val, unsigned long new_val)
> +{
> +	if (!(old_val & _PAGE_READ))
> +		return false;
> +
> +	if ((!(old_val & _PAGE_WRITE)) && (new_val & _PAGE_WRITE))
> +		return true;
> +
> +	return false;
> +}
>  
>  #endif /* __ASSEMBLY__ */
>  #endif /* _ASM_POWERPC_BOOK3S_64_PGTABLE_H_ */
> diff --git a/arch/powerpc/mm/pgtable-book3s64.c b/arch/powerpc/mm/pgtable-book3s64.c
> index f3c31f5e1026..47c742f002ea 100644
> --- a/arch/powerpc/mm/pgtable-book3s64.c
> +++ b/arch/powerpc/mm/pgtable-book3s64.c
> @@ -400,3 +400,28 @@ void arch_report_meminfo(struct seq_file *m)
>  		   atomic_long_read(&direct_pages_count[MMU_PAGE_1G]) << 20);
>  }
>  #endif /* CONFIG_PROC_FS */
> +
> +pte_t ptep_modify_prot_start(struct vm_area_struct *vma, unsigned long addr,
> +			     pte_t *ptep)
> +{
> +	unsigned long pte_val;
> +
> +	/*
> +	 * Clear the _PAGE_PRESENT so that no hardware parallel update is
> +	 * possible. Also keep the pte_present true so that we don't take
> +	 * wrong fault.
> +	 */
> +	pte_val = pte_update(vma->vm_mm, addr, ptep, _PAGE_PRESENT, _PAGE_INVALID, 0);
> +
> +	return __pte(pte_val);
> +
> +}
> +
> +void ptep_modify_prot_commit(struct vm_area_struct *vma, unsigned long addr,
> +			     pte_t *ptep, pte_t old_pte, pte_t pte)
> +{

Which means we're going to be doing a function call to get to here ...

> +	if (radix_enabled())
> +		return radix__ptep_modify_prot_commit(vma, addr,
> +						      ptep, old_pte, pte);

And then another function call to get to the radix version ...

> +	set_pte_at(vma->vm_mm, addr, ptep, pte);
> +}
> diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c
> index 931156069a81..dced3cd241c2 100644
> --- a/arch/powerpc/mm/pgtable-radix.c
> +++ b/arch/powerpc/mm/pgtable-radix.c
> @@ -1063,3 +1063,21 @@ void radix__ptep_set_access_flags(struct vm_area_struct *vma, pte_t *ptep,
>  	}
>  	/* See ptesync comment in radix__set_pte_at */
>  }
> +
> +void radix__ptep_modify_prot_commit(struct vm_area_struct *vma,
> +				    unsigned long addr, pte_t *ptep,
> +				    pte_t old_pte, pte_t pte)
> +{
> +	struct mm_struct *mm = vma->vm_mm;
> +
> +	/*
> +	 * To avoid NMMU hang while relaxing access we need to flush the tlb before
> +	 * we set the new value. We need to do this only for radix, because hash
> +	 * translation does flush when updating the linux pte.
> +	 */
> +	if (is_pte_rw_upgrade(pte_val(old_pte), pte_val(pte)) &&
> +	    (atomic_read(&mm->context.copros) > 0))
> +		radix__flush_tlb_page(vma, addr);

To finally get here, where we'll realise that 99.99% of processes don't
use copros and so we have nothing to do except set the PTE.

> +
> +	set_pte_at(mm, addr, ptep, pte);
> +}

So can we just make it all inline in the header? Or do we think it's not
a hot enough path to worry about it?

cheers


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH V5 4/5] mm/hugetlb: Add prot_modify_start/commit sequence for hugetlb update
  2019-01-16  8:50 ` [PATCH V5 4/5] mm/hugetlb: Add prot_modify_start/commit sequence for hugetlb update Aneesh Kumar K.V
@ 2019-01-30 10:54   ` Michael Ellerman
  0 siblings, 0 replies; 18+ messages in thread
From: Michael Ellerman @ 2019-01-30 10:54 UTC (permalink / raw)
  To: Aneesh Kumar K.V, npiggin, benh, paulus, akpm, x86
  Cc: linuxppc-dev, linux-mm, Aneesh Kumar K.V

"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:

> Architectures like ppc64 require to do a conditional tlb flush based on the old
> and new value of pte. Follow the regular pte change protection sequence for
> hugetlb too. This allows the architectures to override the update sequence.
>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
>  include/linux/hugetlb.h | 20 ++++++++++++++++++++
>  mm/hugetlb.c            |  8 +++++---
>  2 files changed, 25 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 087fd5f48c91..39e78b80375c 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -543,6 +543,26 @@ static inline void set_huge_swap_pte_at(struct mm_struct *mm, unsigned long addr
>  	set_huge_pte_at(mm, addr, ptep, pte);
>  }
>  #endif
> +
> +#ifndef huge_ptep_modify_prot_start
> +#define huge_ptep_modify_prot_start huge_ptep_modify_prot_start
> +static inline pte_t huge_ptep_modify_prot_start(struct vm_area_struct *vma,
> +						unsigned long addr, pte_t *ptep)
> +{
> +	return huge_ptep_get_and_clear(vma->vm_mm, addr, ptep);
> +}
> +#endif
> +
> +#ifndef huge_ptep_modify_prot_commit
> +#define huge_ptep_modify_prot_commit huge_ptep_modify_prot_commit
> +static inline void huge_ptep_modify_prot_commit(struct vm_area_struct *vma,
> +						unsigned long addr, pte_t *ptep,
> +						pte_t old_pte, pte_t pte)
> +{
> +	set_huge_pte_at(vma->vm_mm, addr, ptep, pte);
> +}
> +#endif
> +
>  #else	/* CONFIG_HUGETLB_PAGE */
>  struct hstate {};
>  #define alloc_huge_page(v, a, r) NULL
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index df2e7dd5ff17..f824d2200ca9 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -4387,10 +4387,12 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
>  			continue;
>  		}
>  		if (!huge_pte_none(pte)) {
> -			pte = huge_ptep_get_and_clear(mm, address, ptep);
> -			pte = pte_mkhuge(huge_pte_modify(pte, newprot));
> +			pte_t old_pte;
> +
> +			old_pte = huge_ptep_modify_prot_start(vma, address, ptep);
> +			pte = pte_mkhuge(huge_pte_modify(old_pte, newprot));
>  			pte = arch_make_huge_pte(pte, vma, NULL, 0);
> -			set_huge_pte_at(mm, address, ptep, pte);
> +			huge_ptep_modify_prot_commit(vma, address, ptep, old_pte, pte);
>  			pages++;
>  		}
>  		spin_unlock(ptl);

Looks like a faithful conversion.

Reviewed-by: Michael Ellerman <mpe@ellerman.id.au>

cheers


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH V5 5/5] arch/powerpc/mm/hugetlb: NestMMU workaround for hugetlb mprotect RW upgrade
  2019-01-16  8:50 ` [PATCH V5 5/5] arch/powerpc/mm/hugetlb: NestMMU workaround for hugetlb mprotect RW upgrade Aneesh Kumar K.V
@ 2019-01-30 11:01   ` Michael Ellerman
  0 siblings, 0 replies; 18+ messages in thread
From: Michael Ellerman @ 2019-01-30 11:01 UTC (permalink / raw)
  To: Aneesh Kumar K.V, npiggin, benh, paulus, akpm, x86
  Cc: linuxppc-dev, linux-mm, Aneesh Kumar K.V

"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:

> NestMMU requires us to mark the pte invalid and flush the tlb when we do a
> RW upgrade of pte. We fixed a variant of this in the fault path in commit
> Fixes: bd5050e38aec ("powerpc/mm/radix: Change pte relax sequence to handle nest MMU hang")

Odd "Fixes:" again.

> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
>  arch/powerpc/include/asm/book3s/64/hugetlb.h | 12 ++++++++++
>  arch/powerpc/mm/hugetlbpage-hash64.c         | 25 ++++++++++++++++++++
>  arch/powerpc/mm/hugetlbpage-radix.c          | 17 +++++++++++++
>  3 files changed, 54 insertions(+)

Same comment about inlining.

But otherwise looks good.

Reviewed-by: Michael Ellerman <mpe@ellerman.id.au>

cheers


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH V5 2/5] mm: update ptep_modify_prot_commit to take old pte value as arg
  2019-01-30 10:46   ` Michael Ellerman
@ 2019-01-31  5:03     ` Aneesh Kumar K.V
  0 siblings, 0 replies; 18+ messages in thread
From: Aneesh Kumar K.V @ 2019-01-31  5:03 UTC (permalink / raw)
  To: Michael Ellerman, npiggin, benh, paulus, akpm, x86; +Cc: linuxppc-dev, linux-mm

Michael Ellerman <mpe@ellerman.id.au> writes:

> "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:
>
>> Architectures like ppc64 require to do a conditional tlb flush based on the old
>> and new value of pte. Enable that by passing old pte value as the arg.
>
> It's not actually the architecture, it's to work around a specific bug
> on Power9.
>
>> diff --git a/mm/mprotect.c b/mm/mprotect.c
>> index c89ce07923c8..028c724dcb1a 100644
>> --- a/mm/mprotect.c
>> +++ b/mm/mprotect.c
>> @@ -110,8 +110,8 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
>>  					continue;
>>  			}
>>  
>> -			ptent = ptep_modify_prot_start(vma, addr, pte);
>> -			ptent = pte_modify(ptent, newprot);
>> +			oldpte = ptep_modify_prot_start(vma, addr, pte);
>> +			ptent = pte_modify(oldpte, newprot);
>>  			if (preserve_write)
>>  				ptent = pte_mk_savedwrite(ptent);
>
> Is it OK to reuse oldpte here?
>
> It was set at the top of the loop with:
>
> 		oldpte = *pte;
>
> Is it guaranteed that ptep_modify_prot_start() returns the old value
> unmodified, or could an implementation conceivably filter some bits out?
>
> If so then it could be confusing for oldpte to have its value change
> half way through the loop.
>

ptep_modify_prot_start and ptep_modify_prot_commit is the sequence that
we can safely use to do read/modify/update of a pte entry. Now w.r.t old
pte, we can't update the pte bits from software because we are holding
the page table lock(ptl). Now we could definitely end up having updated
reference and change bit. But we make sure we don't lose those by using
prot_start and prot_commit sequence.

-aneesh


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH V5 3/5] arch/powerpc/mm: Nest MMU workaround for mprotect RW upgrade.
  2019-01-30 10:52   ` Michael Ellerman
@ 2019-01-31  5:07     ` Aneesh Kumar K.V
  0 siblings, 0 replies; 18+ messages in thread
From: Aneesh Kumar K.V @ 2019-01-31  5:07 UTC (permalink / raw)
  To: Michael Ellerman, npiggin, benh, paulus, akpm, x86; +Cc: linuxppc-dev, linux-mm

Michael Ellerman <mpe@ellerman.id.au> writes:

> "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:
>> NestMMU requires us to mark the pte invalid and flush the tlb when we do a
>> RW upgrade of pte. We fixed a variant of this in the fault path in commit
>> Fixes: bd5050e38aec ("powerpc/mm/radix: Change pte relax sequence to handle nest MMU hang")
>
> You don't want the "Fixes:" there.
>
>>
>> Do the same for mprotect upgrades.
>>
>> Hugetlb is handled in the next patch.
>>
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>> ---
>>  arch/powerpc/include/asm/book3s/64/pgtable.h | 18 ++++++++++++++
>>  arch/powerpc/include/asm/book3s/64/radix.h   |  4 ++++
>>  arch/powerpc/mm/pgtable-book3s64.c           | 25 ++++++++++++++++++++
>>  arch/powerpc/mm/pgtable-radix.c              | 18 ++++++++++++++
>>  4 files changed, 65 insertions(+)
>>
>> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
>> index 2e6ada28da64..92eaea164700 100644
>> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
>> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
>> @@ -1314,6 +1314,24 @@ static inline int pud_pfn(pud_t pud)
>>  	BUILD_BUG();
>>  	return 0;
>>  }
>
> Can we get a blank line here?
>
>> +#define __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION
>> +pte_t ptep_modify_prot_start(struct vm_area_struct *, unsigned long, pte_t *);
>> +void ptep_modify_prot_commit(struct vm_area_struct *, unsigned long,
>> +			     pte_t *, pte_t, pte_t);
>
> So these are not inline ...
>
>> +/*
>> + * Returns true for a R -> RW upgrade of pte
>> + */
>> +static inline bool is_pte_rw_upgrade(unsigned long old_val, unsigned long new_val)
>> +{
>> +	if (!(old_val & _PAGE_READ))
>> +		return false;
>> +
>> +	if ((!(old_val & _PAGE_WRITE)) && (new_val & _PAGE_WRITE))
>> +		return true;
>> +
>> +	return false;
>> +}
>>  
>>  #endif /* __ASSEMBLY__ */
>>  #endif /* _ASM_POWERPC_BOOK3S_64_PGTABLE_H_ */
>> diff --git a/arch/powerpc/mm/pgtable-book3s64.c b/arch/powerpc/mm/pgtable-book3s64.c
>> index f3c31f5e1026..47c742f002ea 100644
>> --- a/arch/powerpc/mm/pgtable-book3s64.c
>> +++ b/arch/powerpc/mm/pgtable-book3s64.c
>> @@ -400,3 +400,28 @@ void arch_report_meminfo(struct seq_file *m)
>>  		   atomic_long_read(&direct_pages_count[MMU_PAGE_1G]) << 20);
>>  }
>>  #endif /* CONFIG_PROC_FS */
>> +
>> +pte_t ptep_modify_prot_start(struct vm_area_struct *vma, unsigned long addr,
>> +			     pte_t *ptep)
>> +{
>> +	unsigned long pte_val;
>> +
>> +	/*
>> +	 * Clear the _PAGE_PRESENT so that no hardware parallel update is
>> +	 * possible. Also keep the pte_present true so that we don't take
>> +	 * wrong fault.
>> +	 */
>> +	pte_val = pte_update(vma->vm_mm, addr, ptep, _PAGE_PRESENT, _PAGE_INVALID, 0);
>> +
>> +	return __pte(pte_val);
>> +
>> +}
>> +
>> +void ptep_modify_prot_commit(struct vm_area_struct *vma, unsigned long addr,
>> +			     pte_t *ptep, pte_t old_pte, pte_t pte)
>> +{
>
> Which means we're going to be doing a function call to get to here ...
>
>> +	if (radix_enabled())
>> +		return radix__ptep_modify_prot_commit(vma, addr,
>> +						      ptep, old_pte, pte);
>
> And then another function call to get to the radix version ...
>
>> +	set_pte_at(vma->vm_mm, addr, ptep, pte);
>> +}
>> diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c
>> index 931156069a81..dced3cd241c2 100644
>> --- a/arch/powerpc/mm/pgtable-radix.c
>> +++ b/arch/powerpc/mm/pgtable-radix.c
>> @@ -1063,3 +1063,21 @@ void radix__ptep_set_access_flags(struct vm_area_struct *vma, pte_t *ptep,
>>  	}
>>  	/* See ptesync comment in radix__set_pte_at */
>>  }
>> +
>> +void radix__ptep_modify_prot_commit(struct vm_area_struct *vma,
>> +				    unsigned long addr, pte_t *ptep,
>> +				    pte_t old_pte, pte_t pte)
>> +{
>> +	struct mm_struct *mm = vma->vm_mm;
>> +
>> +	/*
>> +	 * To avoid NMMU hang while relaxing access we need to flush the tlb before
>> +	 * we set the new value. We need to do this only for radix, because hash
>> +	 * translation does flush when updating the linux pte.
>> +	 */
>> +	if (is_pte_rw_upgrade(pte_val(old_pte), pte_val(pte)) &&
>> +	    (atomic_read(&mm->context.copros) > 0))
>> +		radix__flush_tlb_page(vma, addr);
>
> To finally get here, where we'll realise that 99.99% of processes don't
> use copros and so we have nothing to do except set the PTE.
>
>> +
>> +	set_pte_at(mm, addr, ptep, pte);
>> +}
>
> So can we just make it all inline in the header? Or do we think it's not
> a hot enough path to worry about it?
>

I did try that earlier, But IIRC that didn't work due to header
inclusion issue. I can try that again in an addon patch. That would
require moving things around so that we find different struct
definitions correctly.

-aneesh


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH V5 0/5] NestMMU pte upgrade workaround for mprotect
  2019-01-16  8:50 [PATCH V5 0/5] NestMMU pte upgrade workaround for mprotect Aneesh Kumar K.V
                   ` (6 preceding siblings ...)
  2019-01-29 18:29 ` Andrew Morton
@ 2019-02-26 23:37 ` Andrew Morton
  2019-02-27  8:58   ` Aneesh Kumar K.V
  7 siblings, 1 reply; 18+ messages in thread
From: Andrew Morton @ 2019-02-26 23:37 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: npiggin, benh, paulus, mpe, x86, linuxppc-dev, linux-mm

[patch 1/5]: unreviewed and has unaddressed comments from mpe.
[patch 2/5]: ditto
[patch 3/5]: ditto
[patch 4/5]: seems ready
[patch 5/5]: reviewed by mpe, but appears to need more work


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH V5 0/5] NestMMU pte upgrade workaround for mprotect
  2019-02-26 23:37 ` Andrew Morton
@ 2019-02-27  8:58   ` Aneesh Kumar K.V
  2019-02-28 19:39     ` Andrew Morton
  0 siblings, 1 reply; 18+ messages in thread
From: Aneesh Kumar K.V @ 2019-02-27  8:58 UTC (permalink / raw)
  To: Andrew Morton; +Cc: npiggin, benh, paulus, mpe, x86, linuxppc-dev, linux-mm

Andrew Morton <akpm@linux-foundation.org> writes:

> [patch 1/5]: unreviewed and has unaddressed comments from mpe.
> [patch 2/5]: ditto
> [patch 3/5]: ditto
> [patch 4/5]: seems ready
> [patch 5/5]: reviewed by mpe, but appears to need more work

That was mostly variable naming preferences. I like the christmas
tree style not the inverted christmas tree. There is one detail about
commit message, which indicate the change may be required by other
architecture too. Was not sure whether that needed a commit message
update.

I didn't send an updated series because after replying to most of them I
didn't find a strong request to get the required changes in. If you want
me update the series with this variable name ordering and commit message
update I can send a new series today.

-aneesh


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH V5 0/5] NestMMU pte upgrade workaround for mprotect
  2019-02-27  8:58   ` Aneesh Kumar K.V
@ 2019-02-28 19:39     ` Andrew Morton
  0 siblings, 0 replies; 18+ messages in thread
From: Andrew Morton @ 2019-02-28 19:39 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: npiggin, benh, paulus, mpe, x86, linuxppc-dev, linux-mm

On Wed, 27 Feb 2019 14:28:43 +0530 "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> wrote:

> Andrew Morton <akpm@linux-foundation.org> writes:
> 
> > [patch 1/5]: unreviewed and has unaddressed comments from mpe.
> > [patch 2/5]: ditto
> > [patch 3/5]: ditto
> > [patch 4/5]: seems ready
> > [patch 5/5]: reviewed by mpe, but appears to need more work
> 
> That was mostly variable naming preferences. I like the christmas
> tree style not the inverted christmas tree. There is one detail about
> commit message, which indicate the change may be required by other
> architecture too. Was not sure whether that needed a commit message
> update.
> 
> I didn't send an updated series because after replying to most of them I
> didn't find a strong request to get the required changes in. If you want
> me update the series with this variable name ordering and commit message
> update I can send a new series today.
> 

OK, minor stuff.

The patches have been in -next for a month, which is good but we really
should get some review of the first three.


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2019-02-28 19:39 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-16  8:50 [PATCH V5 0/5] NestMMU pte upgrade workaround for mprotect Aneesh Kumar K.V
2019-01-16  8:50 ` [PATCH V5 1/5] mm: Update ptep_modify_prot_start/commit to take vm_area_struct as arg Aneesh Kumar K.V
2019-01-30 10:33   ` Michael Ellerman
2019-01-16  8:50 ` [PATCH V5 2/5] mm: update ptep_modify_prot_commit to take old pte value " Aneesh Kumar K.V
2019-01-30 10:46   ` Michael Ellerman
2019-01-31  5:03     ` Aneesh Kumar K.V
2019-01-16  8:50 ` [PATCH V5 3/5] arch/powerpc/mm: Nest MMU workaround for mprotect RW upgrade Aneesh Kumar K.V
2019-01-30 10:52   ` Michael Ellerman
2019-01-31  5:07     ` Aneesh Kumar K.V
2019-01-16  8:50 ` [PATCH V5 4/5] mm/hugetlb: Add prot_modify_start/commit sequence for hugetlb update Aneesh Kumar K.V
2019-01-30 10:54   ` Michael Ellerman
2019-01-16  8:50 ` [PATCH V5 5/5] arch/powerpc/mm/hugetlb: NestMMU workaround for hugetlb mprotect RW upgrade Aneesh Kumar K.V
2019-01-30 11:01   ` Michael Ellerman
2019-01-29 10:43 ` [PATCH V5 0/5] NestMMU pte upgrade workaround for mprotect Aneesh Kumar K.V
2019-01-29 18:29 ` Andrew Morton
2019-02-26 23:37 ` Andrew Morton
2019-02-27  8:58   ` Aneesh Kumar K.V
2019-02-28 19:39     ` Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).