All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 1/3] powerpc/mm: update pmdp_invalidate to return old pmd value
@ 2017-07-27  8:37 ` Aneesh Kumar K.V
  0 siblings, 0 replies; 26+ messages in thread
From: Aneesh Kumar K.V @ 2017-07-27  8:37 UTC (permalink / raw)
  To: benh, paulus, mpe, Kirill A . Shutemov
  Cc: linuxppc-dev, linux-mm, Aneesh Kumar K.V

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/book3s/64/pgtable.h | 4 ++--
 arch/powerpc/mm/pgtable-book3s64.c           | 9 ++++++---
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 41d484ac0822..ece6912fae8e 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -1119,8 +1119,8 @@ static inline pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm,
 }
 
 #define __HAVE_ARCH_PMDP_INVALIDATE
-extern void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
-			    pmd_t *pmdp);
+extern pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
+			     pmd_t *pmdp);
 
 #define __HAVE_ARCH_PMDP_HUGE_SPLIT_PREPARE
 static inline void pmdp_huge_split_prepare(struct vm_area_struct *vma,
diff --git a/arch/powerpc/mm/pgtable-book3s64.c b/arch/powerpc/mm/pgtable-book3s64.c
index 3b65917785a5..0bb7f824ecdd 100644
--- a/arch/powerpc/mm/pgtable-book3s64.c
+++ b/arch/powerpc/mm/pgtable-book3s64.c
@@ -90,16 +90,19 @@ void serialize_against_pte_lookup(struct mm_struct *mm)
  * We use this to invalidate a pmdp entry before switching from a
  * hugepte to regular pmd entry.
  */
-void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
-		     pmd_t *pmdp)
+pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
+		      pmd_t *pmdp)
 {
-	pmd_hugepage_update(vma->vm_mm, address, pmdp, _PAGE_PRESENT, 0);
+	unsigned long old_pmd;
+
+	old_pmd = pmd_hugepage_update(vma->vm_mm, address, pmdp, _PAGE_PRESENT, 0);
 	flush_pmd_tlb_range(vma, address, address + HPAGE_PMD_SIZE);
 	/*
 	 * This ensures that generic code that rely on IRQ disabling
 	 * to prevent a parallel THP split work as expected.
 	 */
 	serialize_against_pte_lookup(vma->vm_mm);
+	return __pmd(old_pmd);
 }
 
 static pmd_t pmd_set_protbits(pmd_t pmd, pgprot_t pgprot)
-- 
2.13.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 1/3] powerpc/mm: update pmdp_invalidate to return old pmd value
@ 2017-07-27  8:37 ` Aneesh Kumar K.V
  0 siblings, 0 replies; 26+ messages in thread
From: Aneesh Kumar K.V @ 2017-07-27  8:37 UTC (permalink / raw)
  To: benh, paulus, mpe, Kirill A . Shutemov
  Cc: linuxppc-dev, linux-mm, Aneesh Kumar K.V

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/book3s/64/pgtable.h | 4 ++--
 arch/powerpc/mm/pgtable-book3s64.c           | 9 ++++++---
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 41d484ac0822..ece6912fae8e 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -1119,8 +1119,8 @@ static inline pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm,
 }
 
 #define __HAVE_ARCH_PMDP_INVALIDATE
-extern void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
-			    pmd_t *pmdp);
+extern pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
+			     pmd_t *pmdp);
 
 #define __HAVE_ARCH_PMDP_HUGE_SPLIT_PREPARE
 static inline void pmdp_huge_split_prepare(struct vm_area_struct *vma,
diff --git a/arch/powerpc/mm/pgtable-book3s64.c b/arch/powerpc/mm/pgtable-book3s64.c
index 3b65917785a5..0bb7f824ecdd 100644
--- a/arch/powerpc/mm/pgtable-book3s64.c
+++ b/arch/powerpc/mm/pgtable-book3s64.c
@@ -90,16 +90,19 @@ void serialize_against_pte_lookup(struct mm_struct *mm)
  * We use this to invalidate a pmdp entry before switching from a
  * hugepte to regular pmd entry.
  */
-void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
-		     pmd_t *pmdp)
+pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
+		      pmd_t *pmdp)
 {
-	pmd_hugepage_update(vma->vm_mm, address, pmdp, _PAGE_PRESENT, 0);
+	unsigned long old_pmd;
+
+	old_pmd = pmd_hugepage_update(vma->vm_mm, address, pmdp, _PAGE_PRESENT, 0);
 	flush_pmd_tlb_range(vma, address, address + HPAGE_PMD_SIZE);
 	/*
 	 * This ensures that generic code that rely on IRQ disabling
 	 * to prevent a parallel THP split work as expected.
 	 */
 	serialize_against_pte_lookup(vma->vm_mm);
+	return __pmd(old_pmd);
 }
 
 static pmd_t pmd_set_protbits(pmd_t pmd, pgprot_t pgprot)
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 2/3] powerpc/mm: Implement pmdp_establish for ppc64
  2017-07-27  8:37 ` Aneesh Kumar K.V
@ 2017-07-27  8:37   ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 26+ messages in thread
From: Aneesh Kumar K.V @ 2017-07-27  8:37 UTC (permalink / raw)
  To: benh, paulus, mpe, Kirill A . Shutemov
  Cc: linuxppc-dev, linux-mm, Aneesh Kumar K.V

We can now use this to set pmd page table entries to absolute values. THP
need to ensure that we always update pmd PTE entries such that we never mark
the pmd none. pmdp_establish helps in implementing that.

This doesn't flush the tlb. Based on the old_pmd value returned caller can
decide to call flush_pmd_tlb_range()

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/book3s/64/radix.h |  9 ++++++---
 arch/powerpc/mm/pgtable-book3s64.c         | 10 ++++++++++
 2 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/radix.h b/arch/powerpc/include/asm/book3s/64/radix.h
index cd481ab601b6..558fea3b2d22 100644
--- a/arch/powerpc/include/asm/book3s/64/radix.h
+++ b/arch/powerpc/include/asm/book3s/64/radix.h
@@ -131,7 +131,8 @@ static inline unsigned long __radix_pte_update(pte_t *ptep, unsigned long clr,
 	do {
 		pte = READ_ONCE(*ptep);
 		old_pte = pte_val(pte);
-		new_pte = (old_pte | set) & ~clr;
+		new_pte = old_pte & ~clr;
+		new_pte |= set;
 
 	} while (!pte_xchg(ptep, __pte(old_pte), __pte(new_pte)));
 
@@ -153,9 +154,11 @@ static inline unsigned long radix__pte_update(struct mm_struct *mm,
 
 		old_pte = __radix_pte_update(ptep, ~0ul, 0);
 		/*
-		 * new value of pte
+		 * new value of pte. We clear all the bits in clr mask
+		 * first and set the bits in set mask.
 		 */
-		new_pte = (old_pte | set) & ~clr;
+		new_pte = old_pte & ~clr;
+		new_pte |= set;
 		radix__flush_tlb_pte_p9_dd1(old_pte, mm, addr);
 		if (new_pte)
 			__radix_pte_update(ptep, 0, new_pte);
diff --git a/arch/powerpc/mm/pgtable-book3s64.c b/arch/powerpc/mm/pgtable-book3s64.c
index 0bb7f824ecdd..7100b0150a2a 100644
--- a/arch/powerpc/mm/pgtable-book3s64.c
+++ b/arch/powerpc/mm/pgtable-book3s64.c
@@ -45,6 +45,16 @@ int pmdp_set_access_flags(struct vm_area_struct *vma, unsigned long address,
 	return changed;
 }
 
+pmd_t pmdp_establish(struct vm_area_struct *vma, unsigned long addr,
+		     pmd_t *pmdp, pmd_t entry)
+{
+	long pmdval;
+
+	pmdval = pmd_hugepage_update(vma->vm_mm, addr, pmdp, ~0UL, pmd_val(entry));
+	return __pmd(pmdval);
+}
+
+
 int pmdp_test_and_clear_young(struct vm_area_struct *vma,
 			      unsigned long address, pmd_t *pmdp)
 {
-- 
2.13.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 2/3] powerpc/mm: Implement pmdp_establish for ppc64
@ 2017-07-27  8:37   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 26+ messages in thread
From: Aneesh Kumar K.V @ 2017-07-27  8:37 UTC (permalink / raw)
  To: benh, paulus, mpe, Kirill A . Shutemov
  Cc: linuxppc-dev, linux-mm, Aneesh Kumar K.V

We can now use this to set pmd page table entries to absolute values. THP
need to ensure that we always update pmd PTE entries such that we never mark
the pmd none. pmdp_establish helps in implementing that.

This doesn't flush the tlb. Based on the old_pmd value returned caller can
decide to call flush_pmd_tlb_range()

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/book3s/64/radix.h |  9 ++++++---
 arch/powerpc/mm/pgtable-book3s64.c         | 10 ++++++++++
 2 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/radix.h b/arch/powerpc/include/asm/book3s/64/radix.h
index cd481ab601b6..558fea3b2d22 100644
--- a/arch/powerpc/include/asm/book3s/64/radix.h
+++ b/arch/powerpc/include/asm/book3s/64/radix.h
@@ -131,7 +131,8 @@ static inline unsigned long __radix_pte_update(pte_t *ptep, unsigned long clr,
 	do {
 		pte = READ_ONCE(*ptep);
 		old_pte = pte_val(pte);
-		new_pte = (old_pte | set) & ~clr;
+		new_pte = old_pte & ~clr;
+		new_pte |= set;
 
 	} while (!pte_xchg(ptep, __pte(old_pte), __pte(new_pte)));
 
@@ -153,9 +154,11 @@ static inline unsigned long radix__pte_update(struct mm_struct *mm,
 
 		old_pte = __radix_pte_update(ptep, ~0ul, 0);
 		/*
-		 * new value of pte
+		 * new value of pte. We clear all the bits in clr mask
+		 * first and set the bits in set mask.
 		 */
-		new_pte = (old_pte | set) & ~clr;
+		new_pte = old_pte & ~clr;
+		new_pte |= set;
 		radix__flush_tlb_pte_p9_dd1(old_pte, mm, addr);
 		if (new_pte)
 			__radix_pte_update(ptep, 0, new_pte);
diff --git a/arch/powerpc/mm/pgtable-book3s64.c b/arch/powerpc/mm/pgtable-book3s64.c
index 0bb7f824ecdd..7100b0150a2a 100644
--- a/arch/powerpc/mm/pgtable-book3s64.c
+++ b/arch/powerpc/mm/pgtable-book3s64.c
@@ -45,6 +45,16 @@ int pmdp_set_access_flags(struct vm_area_struct *vma, unsigned long address,
 	return changed;
 }
 
+pmd_t pmdp_establish(struct vm_area_struct *vma, unsigned long addr,
+		     pmd_t *pmdp, pmd_t entry)
+{
+	long pmdval;
+
+	pmdval = pmd_hugepage_update(vma->vm_mm, addr, pmdp, ~0UL, pmd_val(entry));
+	return __pmd(pmdval);
+}
+
+
 int pmdp_test_and_clear_young(struct vm_area_struct *vma,
 			      unsigned long address, pmd_t *pmdp)
 {
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 3/3] mm/hugetlb: Remove pmd_huge_split_prepare
  2017-07-27  8:37 ` Aneesh Kumar K.V
@ 2017-07-27  8:37   ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 26+ messages in thread
From: Aneesh Kumar K.V @ 2017-07-27  8:37 UTC (permalink / raw)
  To: benh, paulus, mpe, Kirill A . Shutemov
  Cc: linuxppc-dev, linux-mm, Aneesh Kumar K.V

Instead of marking the pmd ready for split, invalidate the pmd. This should
take care of powerpc requirement. Only side effect is that we mark the pmd
invalid early. This can result in us blocking access to the page a bit longer
if we race against a thp split.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/book3s/64/hash-4k.h  |  2 -
 arch/powerpc/include/asm/book3s/64/hash-64k.h |  2 -
 arch/powerpc/include/asm/book3s/64/pgtable.h  |  9 ----
 arch/powerpc/include/asm/book3s/64/radix.h    |  6 ---
 arch/powerpc/mm/pgtable-hash64.c              | 22 --------
 include/asm-generic/pgtable.h                 |  8 ---
 mm/huge_memory.c                              | 73 +++++++++++++--------------
 7 files changed, 35 insertions(+), 87 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash-4k.h b/arch/powerpc/include/asm/book3s/64/hash-4k.h
index 0c4e470571ca..7d914f4fc534 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-4k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-4k.h
@@ -100,8 +100,6 @@ extern pmd_t hash__pmdp_collapse_flush(struct vm_area_struct *vma,
 extern void hash__pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp,
 					 pgtable_t pgtable);
 extern pgtable_t hash__pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp);
-extern void hash__pmdp_huge_split_prepare(struct vm_area_struct *vma,
-				      unsigned long address, pmd_t *pmdp);
 extern pmd_t hash__pmdp_huge_get_and_clear(struct mm_struct *mm,
 				       unsigned long addr, pmd_t *pmdp);
 extern int hash__has_transparent_hugepage(void);
diff --git a/arch/powerpc/include/asm/book3s/64/hash-64k.h b/arch/powerpc/include/asm/book3s/64/hash-64k.h
index 8c8fb6fbdabe..b856e130c678 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-64k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-64k.h
@@ -164,8 +164,6 @@ extern pmd_t hash__pmdp_collapse_flush(struct vm_area_struct *vma,
 extern void hash__pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp,
 					 pgtable_t pgtable);
 extern pgtable_t hash__pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp);
-extern void hash__pmdp_huge_split_prepare(struct vm_area_struct *vma,
-				      unsigned long address, pmd_t *pmdp);
 extern pmd_t hash__pmdp_huge_get_and_clear(struct mm_struct *mm,
 				       unsigned long addr, pmd_t *pmdp);
 extern int hash__has_transparent_hugepage(void);
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
index ece6912fae8e..557915792214 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -1122,15 +1122,6 @@ static inline pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm,
 extern pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
 			     pmd_t *pmdp);
 
-#define __HAVE_ARCH_PMDP_HUGE_SPLIT_PREPARE
-static inline void pmdp_huge_split_prepare(struct vm_area_struct *vma,
-					   unsigned long address, pmd_t *pmdp)
-{
-	if (radix_enabled())
-		return radix__pmdp_huge_split_prepare(vma, address, pmdp);
-	return hash__pmdp_huge_split_prepare(vma, address, pmdp);
-}
-
 #define pmd_move_must_withdraw pmd_move_must_withdraw
 struct spinlock;
 static inline int pmd_move_must_withdraw(struct spinlock *new_pmd_ptl,
diff --git a/arch/powerpc/include/asm/book3s/64/radix.h b/arch/powerpc/include/asm/book3s/64/radix.h
index 558fea3b2d22..a779a43b643b 100644
--- a/arch/powerpc/include/asm/book3s/64/radix.h
+++ b/arch/powerpc/include/asm/book3s/64/radix.h
@@ -270,12 +270,6 @@ static inline pmd_t radix__pmd_mkhuge(pmd_t pmd)
 		return __pmd(pmd_val(pmd) | _PAGE_PTE | R_PAGE_LARGE);
 	return __pmd(pmd_val(pmd) | _PAGE_PTE);
 }
-static inline void radix__pmdp_huge_split_prepare(struct vm_area_struct *vma,
-					    unsigned long address, pmd_t *pmdp)
-{
-	/* Nothing to do for radix. */
-	return;
-}
 
 extern unsigned long radix__pmd_hugepage_update(struct mm_struct *mm, unsigned long addr,
 					  pmd_t *pmdp, unsigned long clr,
diff --git a/arch/powerpc/mm/pgtable-hash64.c b/arch/powerpc/mm/pgtable-hash64.c
index c0a7372bdaa6..00aee1485714 100644
--- a/arch/powerpc/mm/pgtable-hash64.c
+++ b/arch/powerpc/mm/pgtable-hash64.c
@@ -296,28 +296,6 @@ pgtable_t hash__pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp)
 	return pgtable;
 }
 
-void hash__pmdp_huge_split_prepare(struct vm_area_struct *vma,
-			       unsigned long address, pmd_t *pmdp)
-{
-	VM_BUG_ON(address & ~HPAGE_PMD_MASK);
-	VM_BUG_ON(REGION_ID(address) != USER_REGION_ID);
-	VM_BUG_ON(pmd_devmap(*pmdp));
-
-	/*
-	 * We can't mark the pmd none here, because that will cause a race
-	 * against exit_mmap. We need to continue mark pmd TRANS HUGE, while
-	 * we spilt, but at the same time we wan't rest of the ppc64 code
-	 * not to insert hash pte on this, because we will be modifying
-	 * the deposited pgtable in the caller of this function. Hence
-	 * clear the _PAGE_USER so that we move the fault handling to
-	 * higher level function and that will serialize against ptl.
-	 * We need to flush existing hash pte entries here even though,
-	 * the translation is still valid, because we will withdraw
-	 * pgtable_t after this.
-	 */
-	pmd_hugepage_update(vma->vm_mm, address, pmdp, 0, _PAGE_PRIVILEGED);
-}
-
 /*
  * A linux hugepage PMD was changed and the corresponding hash table entries
  * neesd to be flushed.
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index ece5e399567a..b934e41277ac 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -313,14 +313,6 @@ extern pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
 			    pmd_t *pmdp);
 #endif
 
-#ifndef __HAVE_ARCH_PMDP_HUGE_SPLIT_PREPARE
-static inline void pmdp_huge_split_prepare(struct vm_area_struct *vma,
-					   unsigned long address, pmd_t *pmdp)
-{
-
-}
-#endif
-
 #ifndef __HAVE_ARCH_PTE_SAME
 static inline int pte_same(pte_t pte_a, pte_t pte_b)
 {
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index e654592f04e7..0bd9850b1d81 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1923,8 +1923,8 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
 	struct mm_struct *mm = vma->vm_mm;
 	struct page *page;
 	pgtable_t pgtable;
-	pmd_t old, _pmd;
-	bool young, write, soft_dirty;
+	pmd_t old_pmd, _pmd;
+	bool young, write, dirty, soft_dirty;
 	unsigned long addr;
 	int i;
 
@@ -1956,14 +1956,39 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
 		return __split_huge_zero_page_pmd(vma, haddr, pmd);
 	}
 
-	page = pmd_page(*pmd);
+	/*
+	 * Up to this point the pmd is present and huge and userland has the
+	 * whole access to the hugepage during the split (which happens in
+	 * place). If we overwrite the pmd with the not-huge version pointing
+	 * to the pte here (which of course we could if all CPUs were bug
+	 * free), userland could trigger a small page size TLB miss on the
+	 * small sized TLB while the hugepage TLB entry is still established in
+	 * the huge TLB. Some CPU doesn't like that.
+	 * See http://support.amd.com/us/Processor_TechDocs/41322.pdf, Erratum
+	 * 383 on page 93. Intel should be safe but is also warns that it's
+	 * only safe if the permission and cache attributes of the two entries
+	 * loaded in the two TLB is identical (which should be the case here).
+	 * But it is generally safer to never allow small and huge TLB entries
+	 * for the same virtual address to be loaded simultaneously. So instead
+	 * of doing "pmd_populate(); flush_pmd_tlb_range();" we first mark the
+	 * current pmd notpresent (atomically because here the pmd_trans_huge
+	 * and pmd_trans_splitting must remain set at all times on the pmd
+	 * until the split is complete for this pmd), then we flush the SMP TLB
+	 * and finally we write the non-huge version of the pmd entry with
+	 * pmd_populate.
+	 */
+	old_pmd = pmdp_invalidate(vma, haddr, pmd);
+
+	page = pmd_page(old_pmd);
 	VM_BUG_ON_PAGE(!page_count(page), page);
 	page_ref_add(page, HPAGE_PMD_NR - 1);
-	write = pmd_write(*pmd);
-	young = pmd_young(*pmd);
-	soft_dirty = pmd_soft_dirty(*pmd);
-
-	pmdp_huge_split_prepare(vma, haddr, pmd);
+	write = pmd_write(old_pmd);
+	young = pmd_young(old_pmd);
+	dirty = pmd_dirty(*pmd);
+	soft_dirty = pmd_soft_dirty(old_pmd);
+	/*
+	 * withdraw the table only after we mark the pmd entry invalid
+	 */
 	pgtable = pgtable_trans_huge_withdraw(mm, pmd);
 	pmd_populate(mm, &_pmd, pgtable);
 
@@ -1990,6 +2015,8 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
 			if (soft_dirty)
 				entry = pte_mksoft_dirty(entry);
 		}
+		if (dirty)
+			SetPageDirty(page + i);
 		pte = pte_offset_map(&_pmd, addr);
 		BUG_ON(!pte_none(*pte));
 		set_pte_at(mm, addr, pte, entry);
@@ -2017,36 +2044,6 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
 	}
 
 	smp_wmb(); /* make pte visible before pmd */
-	/*
-	 * Up to this point the pmd is present and huge and userland has the
-	 * whole access to the hugepage during the split (which happens in
-	 * place). If we overwrite the pmd with the not-huge version pointing
-	 * to the pte here (which of course we could if all CPUs were bug
-	 * free), userland could trigger a small page size TLB miss on the
-	 * small sized TLB while the hugepage TLB entry is still established in
-	 * the huge TLB. Some CPU doesn't like that.
-	 * See http://support.amd.com/us/Processor_TechDocs/41322.pdf, Erratum
-	 * 383 on page 93. Intel should be safe but is also warns that it's
-	 * only safe if the permission and cache attributes of the two entries
-	 * loaded in the two TLB is identical (which should be the case here).
-	 * But it is generally safer to never allow small and huge TLB entries
-	 * for the same virtual address to be loaded simultaneously. So instead
-	 * of doing "pmd_populate(); flush_pmd_tlb_range();" we first mark the
-	 * current pmd notpresent (atomically because here the pmd_trans_huge
-	 * and pmd_trans_splitting must remain set at all times on the pmd
-	 * until the split is complete for this pmd), then we flush the SMP TLB
-	 * and finally we write the non-huge version of the pmd entry with
-	 * pmd_populate.
-	 */
-	old = pmdp_invalidate(vma, haddr, pmd);
-
-	/*
-	 * Transfer dirty bit using value returned by pmd_invalidate() to be
-	 * sure we don't race with CPU that can set the bit under us.
-	 */
-	if (pmd_dirty(old))
-		SetPageDirty(page);
-
 	pmd_populate(mm, pmd, pgtable);
 
 	if (freeze) {
-- 
2.13.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 3/3] mm/hugetlb: Remove pmd_huge_split_prepare
@ 2017-07-27  8:37   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 26+ messages in thread
From: Aneesh Kumar K.V @ 2017-07-27  8:37 UTC (permalink / raw)
  To: benh, paulus, mpe, Kirill A . Shutemov
  Cc: linuxppc-dev, linux-mm, Aneesh Kumar K.V

Instead of marking the pmd ready for split, invalidate the pmd. This should
take care of powerpc requirement. Only side effect is that we mark the pmd
invalid early. This can result in us blocking access to the page a bit longer
if we race against a thp split.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/book3s/64/hash-4k.h  |  2 -
 arch/powerpc/include/asm/book3s/64/hash-64k.h |  2 -
 arch/powerpc/include/asm/book3s/64/pgtable.h  |  9 ----
 arch/powerpc/include/asm/book3s/64/radix.h    |  6 ---
 arch/powerpc/mm/pgtable-hash64.c              | 22 --------
 include/asm-generic/pgtable.h                 |  8 ---
 mm/huge_memory.c                              | 73 +++++++++++++--------------
 7 files changed, 35 insertions(+), 87 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash-4k.h b/arch/powerpc/include/asm/book3s/64/hash-4k.h
index 0c4e470571ca..7d914f4fc534 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-4k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-4k.h
@@ -100,8 +100,6 @@ extern pmd_t hash__pmdp_collapse_flush(struct vm_area_struct *vma,
 extern void hash__pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp,
 					 pgtable_t pgtable);
 extern pgtable_t hash__pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp);
-extern void hash__pmdp_huge_split_prepare(struct vm_area_struct *vma,
-				      unsigned long address, pmd_t *pmdp);
 extern pmd_t hash__pmdp_huge_get_and_clear(struct mm_struct *mm,
 				       unsigned long addr, pmd_t *pmdp);
 extern int hash__has_transparent_hugepage(void);
diff --git a/arch/powerpc/include/asm/book3s/64/hash-64k.h b/arch/powerpc/include/asm/book3s/64/hash-64k.h
index 8c8fb6fbdabe..b856e130c678 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-64k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-64k.h
@@ -164,8 +164,6 @@ extern pmd_t hash__pmdp_collapse_flush(struct vm_area_struct *vma,
 extern void hash__pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp,
 					 pgtable_t pgtable);
 extern pgtable_t hash__pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp);
-extern void hash__pmdp_huge_split_prepare(struct vm_area_struct *vma,
-				      unsigned long address, pmd_t *pmdp);
 extern pmd_t hash__pmdp_huge_get_and_clear(struct mm_struct *mm,
 				       unsigned long addr, pmd_t *pmdp);
 extern int hash__has_transparent_hugepage(void);
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
index ece6912fae8e..557915792214 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -1122,15 +1122,6 @@ static inline pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm,
 extern pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
 			     pmd_t *pmdp);
 
-#define __HAVE_ARCH_PMDP_HUGE_SPLIT_PREPARE
-static inline void pmdp_huge_split_prepare(struct vm_area_struct *vma,
-					   unsigned long address, pmd_t *pmdp)
-{
-	if (radix_enabled())
-		return radix__pmdp_huge_split_prepare(vma, address, pmdp);
-	return hash__pmdp_huge_split_prepare(vma, address, pmdp);
-}
-
 #define pmd_move_must_withdraw pmd_move_must_withdraw
 struct spinlock;
 static inline int pmd_move_must_withdraw(struct spinlock *new_pmd_ptl,
diff --git a/arch/powerpc/include/asm/book3s/64/radix.h b/arch/powerpc/include/asm/book3s/64/radix.h
index 558fea3b2d22..a779a43b643b 100644
--- a/arch/powerpc/include/asm/book3s/64/radix.h
+++ b/arch/powerpc/include/asm/book3s/64/radix.h
@@ -270,12 +270,6 @@ static inline pmd_t radix__pmd_mkhuge(pmd_t pmd)
 		return __pmd(pmd_val(pmd) | _PAGE_PTE | R_PAGE_LARGE);
 	return __pmd(pmd_val(pmd) | _PAGE_PTE);
 }
-static inline void radix__pmdp_huge_split_prepare(struct vm_area_struct *vma,
-					    unsigned long address, pmd_t *pmdp)
-{
-	/* Nothing to do for radix. */
-	return;
-}
 
 extern unsigned long radix__pmd_hugepage_update(struct mm_struct *mm, unsigned long addr,
 					  pmd_t *pmdp, unsigned long clr,
diff --git a/arch/powerpc/mm/pgtable-hash64.c b/arch/powerpc/mm/pgtable-hash64.c
index c0a7372bdaa6..00aee1485714 100644
--- a/arch/powerpc/mm/pgtable-hash64.c
+++ b/arch/powerpc/mm/pgtable-hash64.c
@@ -296,28 +296,6 @@ pgtable_t hash__pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp)
 	return pgtable;
 }
 
-void hash__pmdp_huge_split_prepare(struct vm_area_struct *vma,
-			       unsigned long address, pmd_t *pmdp)
-{
-	VM_BUG_ON(address & ~HPAGE_PMD_MASK);
-	VM_BUG_ON(REGION_ID(address) != USER_REGION_ID);
-	VM_BUG_ON(pmd_devmap(*pmdp));
-
-	/*
-	 * We can't mark the pmd none here, because that will cause a race
-	 * against exit_mmap. We need to continue mark pmd TRANS HUGE, while
-	 * we spilt, but at the same time we wan't rest of the ppc64 code
-	 * not to insert hash pte on this, because we will be modifying
-	 * the deposited pgtable in the caller of this function. Hence
-	 * clear the _PAGE_USER so that we move the fault handling to
-	 * higher level function and that will serialize against ptl.
-	 * We need to flush existing hash pte entries here even though,
-	 * the translation is still valid, because we will withdraw
-	 * pgtable_t after this.
-	 */
-	pmd_hugepage_update(vma->vm_mm, address, pmdp, 0, _PAGE_PRIVILEGED);
-}
-
 /*
  * A linux hugepage PMD was changed and the corresponding hash table entries
  * neesd to be flushed.
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index ece5e399567a..b934e41277ac 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -313,14 +313,6 @@ extern pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
 			    pmd_t *pmdp);
 #endif
 
-#ifndef __HAVE_ARCH_PMDP_HUGE_SPLIT_PREPARE
-static inline void pmdp_huge_split_prepare(struct vm_area_struct *vma,
-					   unsigned long address, pmd_t *pmdp)
-{
-
-}
-#endif
-
 #ifndef __HAVE_ARCH_PTE_SAME
 static inline int pte_same(pte_t pte_a, pte_t pte_b)
 {
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index e654592f04e7..0bd9850b1d81 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1923,8 +1923,8 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
 	struct mm_struct *mm = vma->vm_mm;
 	struct page *page;
 	pgtable_t pgtable;
-	pmd_t old, _pmd;
-	bool young, write, soft_dirty;
+	pmd_t old_pmd, _pmd;
+	bool young, write, dirty, soft_dirty;
 	unsigned long addr;
 	int i;
 
@@ -1956,14 +1956,39 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
 		return __split_huge_zero_page_pmd(vma, haddr, pmd);
 	}
 
-	page = pmd_page(*pmd);
+	/*
+	 * Up to this point the pmd is present and huge and userland has the
+	 * whole access to the hugepage during the split (which happens in
+	 * place). If we overwrite the pmd with the not-huge version pointing
+	 * to the pte here (which of course we could if all CPUs were bug
+	 * free), userland could trigger a small page size TLB miss on the
+	 * small sized TLB while the hugepage TLB entry is still established in
+	 * the huge TLB. Some CPU doesn't like that.
+	 * See http://support.amd.com/us/Processor_TechDocs/41322.pdf, Erratum
+	 * 383 on page 93. Intel should be safe but is also warns that it's
+	 * only safe if the permission and cache attributes of the two entries
+	 * loaded in the two TLB is identical (which should be the case here).
+	 * But it is generally safer to never allow small and huge TLB entries
+	 * for the same virtual address to be loaded simultaneously. So instead
+	 * of doing "pmd_populate(); flush_pmd_tlb_range();" we first mark the
+	 * current pmd notpresent (atomically because here the pmd_trans_huge
+	 * and pmd_trans_splitting must remain set at all times on the pmd
+	 * until the split is complete for this pmd), then we flush the SMP TLB
+	 * and finally we write the non-huge version of the pmd entry with
+	 * pmd_populate.
+	 */
+	old_pmd = pmdp_invalidate(vma, haddr, pmd);
+
+	page = pmd_page(old_pmd);
 	VM_BUG_ON_PAGE(!page_count(page), page);
 	page_ref_add(page, HPAGE_PMD_NR - 1);
-	write = pmd_write(*pmd);
-	young = pmd_young(*pmd);
-	soft_dirty = pmd_soft_dirty(*pmd);
-
-	pmdp_huge_split_prepare(vma, haddr, pmd);
+	write = pmd_write(old_pmd);
+	young = pmd_young(old_pmd);
+	dirty = pmd_dirty(*pmd);
+	soft_dirty = pmd_soft_dirty(old_pmd);
+	/*
+	 * withdraw the table only after we mark the pmd entry invalid
+	 */
 	pgtable = pgtable_trans_huge_withdraw(mm, pmd);
 	pmd_populate(mm, &_pmd, pgtable);
 
@@ -1990,6 +2015,8 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
 			if (soft_dirty)
 				entry = pte_mksoft_dirty(entry);
 		}
+		if (dirty)
+			SetPageDirty(page + i);
 		pte = pte_offset_map(&_pmd, addr);
 		BUG_ON(!pte_none(*pte));
 		set_pte_at(mm, addr, pte, entry);
@@ -2017,36 +2044,6 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
 	}
 
 	smp_wmb(); /* make pte visible before pmd */
-	/*
-	 * Up to this point the pmd is present and huge and userland has the
-	 * whole access to the hugepage during the split (which happens in
-	 * place). If we overwrite the pmd with the not-huge version pointing
-	 * to the pte here (which of course we could if all CPUs were bug
-	 * free), userland could trigger a small page size TLB miss on the
-	 * small sized TLB while the hugepage TLB entry is still established in
-	 * the huge TLB. Some CPU doesn't like that.
-	 * See http://support.amd.com/us/Processor_TechDocs/41322.pdf, Erratum
-	 * 383 on page 93. Intel should be safe but is also warns that it's
-	 * only safe if the permission and cache attributes of the two entries
-	 * loaded in the two TLB is identical (which should be the case here).
-	 * But it is generally safer to never allow small and huge TLB entries
-	 * for the same virtual address to be loaded simultaneously. So instead
-	 * of doing "pmd_populate(); flush_pmd_tlb_range();" we first mark the
-	 * current pmd notpresent (atomically because here the pmd_trans_huge
-	 * and pmd_trans_splitting must remain set at all times on the pmd
-	 * until the split is complete for this pmd), then we flush the SMP TLB
-	 * and finally we write the non-huge version of the pmd entry with
-	 * pmd_populate.
-	 */
-	old = pmdp_invalidate(vma, haddr, pmd);
-
-	/*
-	 * Transfer dirty bit using value returned by pmd_invalidate() to be
-	 * sure we don't race with CPU that can set the bit under us.
-	 */
-	if (pmd_dirty(old))
-		SetPageDirty(page);
-
 	pmd_populate(mm, pmd, pgtable);
 
 	if (freeze) {
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 3/3] mm/hugetlb: Remove pmd_huge_split_prepare
  2017-07-27  8:37   ` Aneesh Kumar K.V
@ 2017-07-27 10:50     ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 26+ messages in thread
From: Aneesh Kumar K.V @ 2017-07-27 10:50 UTC (permalink / raw)
  To: benh, paulus, mpe, Kirill A . Shutemov; +Cc: linuxppc-dev, linux-mm



On 07/27/2017 02:07 PM, Aneesh Kumar K.V wrote:

> diff --git a/arch/powerpc/include/asm/book3s/64/hash-64k.h b/arch/powerpc/include/asm/book3s/64/hash-64k.h
> index 8c8fb6fbdabe..b856e130c678 100644
> --- a/arch/powerpc/include/asm/book3s/64/hash-64k.h
> +++ b/arch/powerpc/include/asm/book3s/64/hash-64k.h
> @@ -164,8 +164,6 @@ extern pmd_t hash__pmdp_collapse_flush(struct vm_area_struct *vma,
>   extern void hash__pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp,
>   					 pgtable_t pgtable);
>   extern pgtable_t hash__pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp);
> -extern void hash__pmdp_huge_split_prepare(struct vm_area_struct *vma,
> 
> @@ -1956,14 +1956,39 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
>   		return __split_huge_zero_page_pmd(vma, haddr, pmd);
>   	}
> 


....

> +	 */
> +	old_pmd = pmdp_invalidate(vma, haddr, pmd);
> +
> +	page = pmd_page(old_pmd);
>   	VM_BUG_ON_PAGE(!page_count(page), page);
>   	page_ref_add(page, HPAGE_PMD_NR - 1);
> -	write = pmd_write(*pmd);
> -	young = pmd_young(*pmd);
> -	soft_dirty = pmd_soft_dirty(*pmd);
> -
> -	pmdp_huge_split_prepare(vma, haddr, pmd);
> +	write = pmd_write(old_pmd);
> +	young = pmd_young(old_pmd);
> +	dirty = pmd_dirty(*pmd);

This should be

	dirty = pmd_dirty(old_pmd);


> +	soft_dirty = pmd_soft_dirty(old_pmd);
> +	/*
> +	 * withdraw the table only after we mark the pmd entry invalid
> +	 */
>   	pgtable = pgtable_trans_huge_withdraw(mm, pmd);

-aneesh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 3/3] mm/hugetlb: Remove pmd_huge_split_prepare
@ 2017-07-27 10:50     ` Aneesh Kumar K.V
  0 siblings, 0 replies; 26+ messages in thread
From: Aneesh Kumar K.V @ 2017-07-27 10:50 UTC (permalink / raw)
  To: benh, paulus, mpe, Kirill A . Shutemov; +Cc: linuxppc-dev, linux-mm



On 07/27/2017 02:07 PM, Aneesh Kumar K.V wrote:

> diff --git a/arch/powerpc/include/asm/book3s/64/hash-64k.h b/arch/powerpc/include/asm/book3s/64/hash-64k.h
> index 8c8fb6fbdabe..b856e130c678 100644
> --- a/arch/powerpc/include/asm/book3s/64/hash-64k.h
> +++ b/arch/powerpc/include/asm/book3s/64/hash-64k.h
> @@ -164,8 +164,6 @@ extern pmd_t hash__pmdp_collapse_flush(struct vm_area_struct *vma,
>   extern void hash__pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp,
>   					 pgtable_t pgtable);
>   extern pgtable_t hash__pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp);
> -extern void hash__pmdp_huge_split_prepare(struct vm_area_struct *vma,
> 
> @@ -1956,14 +1956,39 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
>   		return __split_huge_zero_page_pmd(vma, haddr, pmd);
>   	}
> 


....

> +	 */
> +	old_pmd = pmdp_invalidate(vma, haddr, pmd);
> +
> +	page = pmd_page(old_pmd);
>   	VM_BUG_ON_PAGE(!page_count(page), page);
>   	page_ref_add(page, HPAGE_PMD_NR - 1);
> -	write = pmd_write(*pmd);
> -	young = pmd_young(*pmd);
> -	soft_dirty = pmd_soft_dirty(*pmd);
> -
> -	pmdp_huge_split_prepare(vma, haddr, pmd);
> +	write = pmd_write(old_pmd);
> +	young = pmd_young(old_pmd);
> +	dirty = pmd_dirty(*pmd);

This should be

	dirty = pmd_dirty(old_pmd);


> +	soft_dirty = pmd_soft_dirty(old_pmd);
> +	/*
> +	 * withdraw the table only after we mark the pmd entry invalid
> +	 */
>   	pgtable = pgtable_trans_huge_withdraw(mm, pmd);

-aneesh

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 1/3] powerpc/mm: update pmdp_invalidate to return old pmd value
  2017-07-27  8:37 ` Aneesh Kumar K.V
@ 2017-07-27 12:54   ` Michal Hocko
  -1 siblings, 0 replies; 26+ messages in thread
From: Michal Hocko @ 2017-07-27 12:54 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: benh, paulus, mpe, Kirill A . Shutemov, linuxppc-dev, linux-mm

EMISSING_CHANGELOG

besides that no user actually uses the return value. Please fold this
into the patch which uses the new functionality.

On Thu 27-07-17 14:07:54, Aneesh Kumar K.V wrote:
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> ---
>  arch/powerpc/include/asm/book3s/64/pgtable.h | 4 ++--
>  arch/powerpc/mm/pgtable-book3s64.c           | 9 ++++++---
>  2 files changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
> index 41d484ac0822..ece6912fae8e 100644
> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
> @@ -1119,8 +1119,8 @@ static inline pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm,
>  }
>  
>  #define __HAVE_ARCH_PMDP_INVALIDATE
> -extern void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
> -			    pmd_t *pmdp);
> +extern pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
> +			     pmd_t *pmdp);
>  
>  #define __HAVE_ARCH_PMDP_HUGE_SPLIT_PREPARE
>  static inline void pmdp_huge_split_prepare(struct vm_area_struct *vma,
> diff --git a/arch/powerpc/mm/pgtable-book3s64.c b/arch/powerpc/mm/pgtable-book3s64.c
> index 3b65917785a5..0bb7f824ecdd 100644
> --- a/arch/powerpc/mm/pgtable-book3s64.c
> +++ b/arch/powerpc/mm/pgtable-book3s64.c
> @@ -90,16 +90,19 @@ void serialize_against_pte_lookup(struct mm_struct *mm)
>   * We use this to invalidate a pmdp entry before switching from a
>   * hugepte to regular pmd entry.
>   */
> -void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
> -		     pmd_t *pmdp)
> +pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
> +		      pmd_t *pmdp)
>  {
> -	pmd_hugepage_update(vma->vm_mm, address, pmdp, _PAGE_PRESENT, 0);
> +	unsigned long old_pmd;
> +
> +	old_pmd = pmd_hugepage_update(vma->vm_mm, address, pmdp, _PAGE_PRESENT, 0);
>  	flush_pmd_tlb_range(vma, address, address + HPAGE_PMD_SIZE);
>  	/*
>  	 * This ensures that generic code that rely on IRQ disabling
>  	 * to prevent a parallel THP split work as expected.
>  	 */
>  	serialize_against_pte_lookup(vma->vm_mm);
> +	return __pmd(old_pmd);
>  }
>  
>  static pmd_t pmd_set_protbits(pmd_t pmd, pgprot_t pgprot)
> -- 
> 2.13.3
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 1/3] powerpc/mm: update pmdp_invalidate to return old pmd value
@ 2017-07-27 12:54   ` Michal Hocko
  0 siblings, 0 replies; 26+ messages in thread
From: Michal Hocko @ 2017-07-27 12:54 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: benh, paulus, mpe, Kirill A . Shutemov, linuxppc-dev, linux-mm

EMISSING_CHANGELOG

besides that no user actually uses the return value. Please fold this
into the patch which uses the new functionality.

On Thu 27-07-17 14:07:54, Aneesh Kumar K.V wrote:
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> ---
>  arch/powerpc/include/asm/book3s/64/pgtable.h | 4 ++--
>  arch/powerpc/mm/pgtable-book3s64.c           | 9 ++++++---
>  2 files changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
> index 41d484ac0822..ece6912fae8e 100644
> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
> @@ -1119,8 +1119,8 @@ static inline pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm,
>  }
>  
>  #define __HAVE_ARCH_PMDP_INVALIDATE
> -extern void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
> -			    pmd_t *pmdp);
> +extern pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
> +			     pmd_t *pmdp);
>  
>  #define __HAVE_ARCH_PMDP_HUGE_SPLIT_PREPARE
>  static inline void pmdp_huge_split_prepare(struct vm_area_struct *vma,
> diff --git a/arch/powerpc/mm/pgtable-book3s64.c b/arch/powerpc/mm/pgtable-book3s64.c
> index 3b65917785a5..0bb7f824ecdd 100644
> --- a/arch/powerpc/mm/pgtable-book3s64.c
> +++ b/arch/powerpc/mm/pgtable-book3s64.c
> @@ -90,16 +90,19 @@ void serialize_against_pte_lookup(struct mm_struct *mm)
>   * We use this to invalidate a pmdp entry before switching from a
>   * hugepte to regular pmd entry.
>   */
> -void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
> -		     pmd_t *pmdp)
> +pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
> +		      pmd_t *pmdp)
>  {
> -	pmd_hugepage_update(vma->vm_mm, address, pmdp, _PAGE_PRESENT, 0);
> +	unsigned long old_pmd;
> +
> +	old_pmd = pmd_hugepage_update(vma->vm_mm, address, pmdp, _PAGE_PRESENT, 0);
>  	flush_pmd_tlb_range(vma, address, address + HPAGE_PMD_SIZE);
>  	/*
>  	 * This ensures that generic code that rely on IRQ disabling
>  	 * to prevent a parallel THP split work as expected.
>  	 */
>  	serialize_against_pte_lookup(vma->vm_mm);
> +	return __pmd(old_pmd);
>  }
>  
>  static pmd_t pmd_set_protbits(pmd_t pmd, pgprot_t pgprot)
> -- 
> 2.13.3
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 2/3] powerpc/mm: Implement pmdp_establish for ppc64
  2017-07-27  8:37   ` Aneesh Kumar K.V
@ 2017-07-27 12:56     ` Michal Hocko
  -1 siblings, 0 replies; 26+ messages in thread
From: Michal Hocko @ 2017-07-27 12:56 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: benh, paulus, mpe, Kirill A . Shutemov, linuxppc-dev, linux-mm

On Thu 27-07-17 14:07:55, Aneesh Kumar K.V wrote:
> We can now use this to set pmd page table entries to absolute values. THP
> need to ensure that we always update pmd PTE entries such that we never mark
> the pmd none. pmdp_establish helps in implementing that.
> 
> This doesn't flush the tlb. Based on the old_pmd value returned caller can
> decide to call flush_pmd_tlb_range()

_Why_ do we need this. It doesn't really help that the newly added
function is not used so we could check that...

> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> ---
>  arch/powerpc/include/asm/book3s/64/radix.h |  9 ++++++---
>  arch/powerpc/mm/pgtable-book3s64.c         | 10 ++++++++++
>  2 files changed, 16 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/book3s/64/radix.h b/arch/powerpc/include/asm/book3s/64/radix.h
> index cd481ab601b6..558fea3b2d22 100644
> --- a/arch/powerpc/include/asm/book3s/64/radix.h
> +++ b/arch/powerpc/include/asm/book3s/64/radix.h
> @@ -131,7 +131,8 @@ static inline unsigned long __radix_pte_update(pte_t *ptep, unsigned long clr,
>  	do {
>  		pte = READ_ONCE(*ptep);
>  		old_pte = pte_val(pte);
> -		new_pte = (old_pte | set) & ~clr;
> +		new_pte = old_pte & ~clr;
> +		new_pte |= set;
>  
>  	} while (!pte_xchg(ptep, __pte(old_pte), __pte(new_pte)));
>  
> @@ -153,9 +154,11 @@ static inline unsigned long radix__pte_update(struct mm_struct *mm,
>  
>  		old_pte = __radix_pte_update(ptep, ~0ul, 0);
>  		/*
> -		 * new value of pte
> +		 * new value of pte. We clear all the bits in clr mask
> +		 * first and set the bits in set mask.
>  		 */
> -		new_pte = (old_pte | set) & ~clr;
> +		new_pte = old_pte & ~clr;
> +		new_pte |= set;
>  		radix__flush_tlb_pte_p9_dd1(old_pte, mm, addr);
>  		if (new_pte)
>  			__radix_pte_update(ptep, 0, new_pte);
> diff --git a/arch/powerpc/mm/pgtable-book3s64.c b/arch/powerpc/mm/pgtable-book3s64.c
> index 0bb7f824ecdd..7100b0150a2a 100644
> --- a/arch/powerpc/mm/pgtable-book3s64.c
> +++ b/arch/powerpc/mm/pgtable-book3s64.c
> @@ -45,6 +45,16 @@ int pmdp_set_access_flags(struct vm_area_struct *vma, unsigned long address,
>  	return changed;
>  }
>  
> +pmd_t pmdp_establish(struct vm_area_struct *vma, unsigned long addr,
> +		     pmd_t *pmdp, pmd_t entry)
> +{
> +	long pmdval;
> +
> +	pmdval = pmd_hugepage_update(vma->vm_mm, addr, pmdp, ~0UL, pmd_val(entry));
> +	return __pmd(pmdval);
> +}
> +
> +
>  int pmdp_test_and_clear_young(struct vm_area_struct *vma,
>  			      unsigned long address, pmd_t *pmdp)
>  {
> -- 
> 2.13.3
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 2/3] powerpc/mm: Implement pmdp_establish for ppc64
@ 2017-07-27 12:56     ` Michal Hocko
  0 siblings, 0 replies; 26+ messages in thread
From: Michal Hocko @ 2017-07-27 12:56 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: benh, paulus, mpe, Kirill A . Shutemov, linuxppc-dev, linux-mm

On Thu 27-07-17 14:07:55, Aneesh Kumar K.V wrote:
> We can now use this to set pmd page table entries to absolute values. THP
> need to ensure that we always update pmd PTE entries such that we never mark
> the pmd none. pmdp_establish helps in implementing that.
> 
> This doesn't flush the tlb. Based on the old_pmd value returned caller can
> decide to call flush_pmd_tlb_range()

_Why_ do we need this. It doesn't really help that the newly added
function is not used so we could check that...

> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> ---
>  arch/powerpc/include/asm/book3s/64/radix.h |  9 ++++++---
>  arch/powerpc/mm/pgtable-book3s64.c         | 10 ++++++++++
>  2 files changed, 16 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/book3s/64/radix.h b/arch/powerpc/include/asm/book3s/64/radix.h
> index cd481ab601b6..558fea3b2d22 100644
> --- a/arch/powerpc/include/asm/book3s/64/radix.h
> +++ b/arch/powerpc/include/asm/book3s/64/radix.h
> @@ -131,7 +131,8 @@ static inline unsigned long __radix_pte_update(pte_t *ptep, unsigned long clr,
>  	do {
>  		pte = READ_ONCE(*ptep);
>  		old_pte = pte_val(pte);
> -		new_pte = (old_pte | set) & ~clr;
> +		new_pte = old_pte & ~clr;
> +		new_pte |= set;
>  
>  	} while (!pte_xchg(ptep, __pte(old_pte), __pte(new_pte)));
>  
> @@ -153,9 +154,11 @@ static inline unsigned long radix__pte_update(struct mm_struct *mm,
>  
>  		old_pte = __radix_pte_update(ptep, ~0ul, 0);
>  		/*
> -		 * new value of pte
> +		 * new value of pte. We clear all the bits in clr mask
> +		 * first and set the bits in set mask.
>  		 */
> -		new_pte = (old_pte | set) & ~clr;
> +		new_pte = old_pte & ~clr;
> +		new_pte |= set;
>  		radix__flush_tlb_pte_p9_dd1(old_pte, mm, addr);
>  		if (new_pte)
>  			__radix_pte_update(ptep, 0, new_pte);
> diff --git a/arch/powerpc/mm/pgtable-book3s64.c b/arch/powerpc/mm/pgtable-book3s64.c
> index 0bb7f824ecdd..7100b0150a2a 100644
> --- a/arch/powerpc/mm/pgtable-book3s64.c
> +++ b/arch/powerpc/mm/pgtable-book3s64.c
> @@ -45,6 +45,16 @@ int pmdp_set_access_flags(struct vm_area_struct *vma, unsigned long address,
>  	return changed;
>  }
>  
> +pmd_t pmdp_establish(struct vm_area_struct *vma, unsigned long addr,
> +		     pmd_t *pmdp, pmd_t entry)
> +{
> +	long pmdval;
> +
> +	pmdval = pmd_hugepage_update(vma->vm_mm, addr, pmdp, ~0UL, pmd_val(entry));
> +	return __pmd(pmdval);
> +}
> +
> +
>  int pmdp_test_and_clear_young(struct vm_area_struct *vma,
>  			      unsigned long address, pmd_t *pmdp)
>  {
> -- 
> 2.13.3
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 3/3] mm/hugetlb: Remove pmd_huge_split_prepare
  2017-07-27  8:37   ` Aneesh Kumar K.V
@ 2017-07-27 12:57     ` Michal Hocko
  -1 siblings, 0 replies; 26+ messages in thread
From: Michal Hocko @ 2017-07-27 12:57 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: benh, paulus, mpe, Kirill A . Shutemov, linuxppc-dev, linux-mm

On Thu 27-07-17 14:07:56, Aneesh Kumar K.V wrote:
> Instead of marking the pmd ready for split, invalidate the pmd. This should
> take care of powerpc requirement.

which is?

> Only side effect is that we mark the pmd
> invalid early. This can result in us blocking access to the page a bit longer
> if we race against a thp split.

Again, this doesn't tell me what is the problem and why do we care.

> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> ---
>  arch/powerpc/include/asm/book3s/64/hash-4k.h  |  2 -
>  arch/powerpc/include/asm/book3s/64/hash-64k.h |  2 -
>  arch/powerpc/include/asm/book3s/64/pgtable.h  |  9 ----
>  arch/powerpc/include/asm/book3s/64/radix.h    |  6 ---
>  arch/powerpc/mm/pgtable-hash64.c              | 22 --------
>  include/asm-generic/pgtable.h                 |  8 ---
>  mm/huge_memory.c                              | 73 +++++++++++++--------------
>  7 files changed, 35 insertions(+), 87 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/book3s/64/hash-4k.h b/arch/powerpc/include/asm/book3s/64/hash-4k.h
> index 0c4e470571ca..7d914f4fc534 100644
> --- a/arch/powerpc/include/asm/book3s/64/hash-4k.h
> +++ b/arch/powerpc/include/asm/book3s/64/hash-4k.h
> @@ -100,8 +100,6 @@ extern pmd_t hash__pmdp_collapse_flush(struct vm_area_struct *vma,
>  extern void hash__pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp,
>  					 pgtable_t pgtable);
>  extern pgtable_t hash__pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp);
> -extern void hash__pmdp_huge_split_prepare(struct vm_area_struct *vma,
> -				      unsigned long address, pmd_t *pmdp);
>  extern pmd_t hash__pmdp_huge_get_and_clear(struct mm_struct *mm,
>  				       unsigned long addr, pmd_t *pmdp);
>  extern int hash__has_transparent_hugepage(void);
> diff --git a/arch/powerpc/include/asm/book3s/64/hash-64k.h b/arch/powerpc/include/asm/book3s/64/hash-64k.h
> index 8c8fb6fbdabe..b856e130c678 100644
> --- a/arch/powerpc/include/asm/book3s/64/hash-64k.h
> +++ b/arch/powerpc/include/asm/book3s/64/hash-64k.h
> @@ -164,8 +164,6 @@ extern pmd_t hash__pmdp_collapse_flush(struct vm_area_struct *vma,
>  extern void hash__pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp,
>  					 pgtable_t pgtable);
>  extern pgtable_t hash__pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp);
> -extern void hash__pmdp_huge_split_prepare(struct vm_area_struct *vma,
> -				      unsigned long address, pmd_t *pmdp);
>  extern pmd_t hash__pmdp_huge_get_and_clear(struct mm_struct *mm,
>  				       unsigned long addr, pmd_t *pmdp);
>  extern int hash__has_transparent_hugepage(void);
> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
> index ece6912fae8e..557915792214 100644
> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
> @@ -1122,15 +1122,6 @@ static inline pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm,
>  extern pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
>  			     pmd_t *pmdp);
>  
> -#define __HAVE_ARCH_PMDP_HUGE_SPLIT_PREPARE
> -static inline void pmdp_huge_split_prepare(struct vm_area_struct *vma,
> -					   unsigned long address, pmd_t *pmdp)
> -{
> -	if (radix_enabled())
> -		return radix__pmdp_huge_split_prepare(vma, address, pmdp);
> -	return hash__pmdp_huge_split_prepare(vma, address, pmdp);
> -}
> -
>  #define pmd_move_must_withdraw pmd_move_must_withdraw
>  struct spinlock;
>  static inline int pmd_move_must_withdraw(struct spinlock *new_pmd_ptl,
> diff --git a/arch/powerpc/include/asm/book3s/64/radix.h b/arch/powerpc/include/asm/book3s/64/radix.h
> index 558fea3b2d22..a779a43b643b 100644
> --- a/arch/powerpc/include/asm/book3s/64/radix.h
> +++ b/arch/powerpc/include/asm/book3s/64/radix.h
> @@ -270,12 +270,6 @@ static inline pmd_t radix__pmd_mkhuge(pmd_t pmd)
>  		return __pmd(pmd_val(pmd) | _PAGE_PTE | R_PAGE_LARGE);
>  	return __pmd(pmd_val(pmd) | _PAGE_PTE);
>  }
> -static inline void radix__pmdp_huge_split_prepare(struct vm_area_struct *vma,
> -					    unsigned long address, pmd_t *pmdp)
> -{
> -	/* Nothing to do for radix. */
> -	return;
> -}
>  
>  extern unsigned long radix__pmd_hugepage_update(struct mm_struct *mm, unsigned long addr,
>  					  pmd_t *pmdp, unsigned long clr,
> diff --git a/arch/powerpc/mm/pgtable-hash64.c b/arch/powerpc/mm/pgtable-hash64.c
> index c0a7372bdaa6..00aee1485714 100644
> --- a/arch/powerpc/mm/pgtable-hash64.c
> +++ b/arch/powerpc/mm/pgtable-hash64.c
> @@ -296,28 +296,6 @@ pgtable_t hash__pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp)
>  	return pgtable;
>  }
>  
> -void hash__pmdp_huge_split_prepare(struct vm_area_struct *vma,
> -			       unsigned long address, pmd_t *pmdp)
> -{
> -	VM_BUG_ON(address & ~HPAGE_PMD_MASK);
> -	VM_BUG_ON(REGION_ID(address) != USER_REGION_ID);
> -	VM_BUG_ON(pmd_devmap(*pmdp));
> -
> -	/*
> -	 * We can't mark the pmd none here, because that will cause a race
> -	 * against exit_mmap. We need to continue mark pmd TRANS HUGE, while
> -	 * we spilt, but at the same time we wan't rest of the ppc64 code
> -	 * not to insert hash pte on this, because we will be modifying
> -	 * the deposited pgtable in the caller of this function. Hence
> -	 * clear the _PAGE_USER so that we move the fault handling to
> -	 * higher level function and that will serialize against ptl.
> -	 * We need to flush existing hash pte entries here even though,
> -	 * the translation is still valid, because we will withdraw
> -	 * pgtable_t after this.
> -	 */
> -	pmd_hugepage_update(vma->vm_mm, address, pmdp, 0, _PAGE_PRIVILEGED);
> -}
> -
>  /*
>   * A linux hugepage PMD was changed and the corresponding hash table entries
>   * neesd to be flushed.
> diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
> index ece5e399567a..b934e41277ac 100644
> --- a/include/asm-generic/pgtable.h
> +++ b/include/asm-generic/pgtable.h
> @@ -313,14 +313,6 @@ extern pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
>  			    pmd_t *pmdp);
>  #endif
>  
> -#ifndef __HAVE_ARCH_PMDP_HUGE_SPLIT_PREPARE
> -static inline void pmdp_huge_split_prepare(struct vm_area_struct *vma,
> -					   unsigned long address, pmd_t *pmdp)
> -{
> -
> -}
> -#endif
> -
>  #ifndef __HAVE_ARCH_PTE_SAME
>  static inline int pte_same(pte_t pte_a, pte_t pte_b)
>  {
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index e654592f04e7..0bd9850b1d81 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1923,8 +1923,8 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
>  	struct mm_struct *mm = vma->vm_mm;
>  	struct page *page;
>  	pgtable_t pgtable;
> -	pmd_t old, _pmd;
> -	bool young, write, soft_dirty;
> +	pmd_t old_pmd, _pmd;
> +	bool young, write, dirty, soft_dirty;
>  	unsigned long addr;
>  	int i;
>  
> @@ -1956,14 +1956,39 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
>  		return __split_huge_zero_page_pmd(vma, haddr, pmd);
>  	}
>  
> -	page = pmd_page(*pmd);
> +	/*
> +	 * Up to this point the pmd is present and huge and userland has the
> +	 * whole access to the hugepage during the split (which happens in
> +	 * place). If we overwrite the pmd with the not-huge version pointing
> +	 * to the pte here (which of course we could if all CPUs were bug
> +	 * free), userland could trigger a small page size TLB miss on the
> +	 * small sized TLB while the hugepage TLB entry is still established in
> +	 * the huge TLB. Some CPU doesn't like that.
> +	 * See http://support.amd.com/us/Processor_TechDocs/41322.pdf, Erratum
> +	 * 383 on page 93. Intel should be safe but is also warns that it's
> +	 * only safe if the permission and cache attributes of the two entries
> +	 * loaded in the two TLB is identical (which should be the case here).
> +	 * But it is generally safer to never allow small and huge TLB entries
> +	 * for the same virtual address to be loaded simultaneously. So instead
> +	 * of doing "pmd_populate(); flush_pmd_tlb_range();" we first mark the
> +	 * current pmd notpresent (atomically because here the pmd_trans_huge
> +	 * and pmd_trans_splitting must remain set at all times on the pmd
> +	 * until the split is complete for this pmd), then we flush the SMP TLB
> +	 * and finally we write the non-huge version of the pmd entry with
> +	 * pmd_populate.
> +	 */
> +	old_pmd = pmdp_invalidate(vma, haddr, pmd);
> +
> +	page = pmd_page(old_pmd);
>  	VM_BUG_ON_PAGE(!page_count(page), page);
>  	page_ref_add(page, HPAGE_PMD_NR - 1);
> -	write = pmd_write(*pmd);
> -	young = pmd_young(*pmd);
> -	soft_dirty = pmd_soft_dirty(*pmd);
> -
> -	pmdp_huge_split_prepare(vma, haddr, pmd);
> +	write = pmd_write(old_pmd);
> +	young = pmd_young(old_pmd);
> +	dirty = pmd_dirty(*pmd);
> +	soft_dirty = pmd_soft_dirty(old_pmd);
> +	/*
> +	 * withdraw the table only after we mark the pmd entry invalid
> +	 */
>  	pgtable = pgtable_trans_huge_withdraw(mm, pmd);
>  	pmd_populate(mm, &_pmd, pgtable);
>  
> @@ -1990,6 +2015,8 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
>  			if (soft_dirty)
>  				entry = pte_mksoft_dirty(entry);
>  		}
> +		if (dirty)
> +			SetPageDirty(page + i);
>  		pte = pte_offset_map(&_pmd, addr);
>  		BUG_ON(!pte_none(*pte));
>  		set_pte_at(mm, addr, pte, entry);
> @@ -2017,36 +2044,6 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
>  	}
>  
>  	smp_wmb(); /* make pte visible before pmd */
> -	/*
> -	 * Up to this point the pmd is present and huge and userland has the
> -	 * whole access to the hugepage during the split (which happens in
> -	 * place). If we overwrite the pmd with the not-huge version pointing
> -	 * to the pte here (which of course we could if all CPUs were bug
> -	 * free), userland could trigger a small page size TLB miss on the
> -	 * small sized TLB while the hugepage TLB entry is still established in
> -	 * the huge TLB. Some CPU doesn't like that.
> -	 * See http://support.amd.com/us/Processor_TechDocs/41322.pdf, Erratum
> -	 * 383 on page 93. Intel should be safe but is also warns that it's
> -	 * only safe if the permission and cache attributes of the two entries
> -	 * loaded in the two TLB is identical (which should be the case here).
> -	 * But it is generally safer to never allow small and huge TLB entries
> -	 * for the same virtual address to be loaded simultaneously. So instead
> -	 * of doing "pmd_populate(); flush_pmd_tlb_range();" we first mark the
> -	 * current pmd notpresent (atomically because here the pmd_trans_huge
> -	 * and pmd_trans_splitting must remain set at all times on the pmd
> -	 * until the split is complete for this pmd), then we flush the SMP TLB
> -	 * and finally we write the non-huge version of the pmd entry with
> -	 * pmd_populate.
> -	 */
> -	old = pmdp_invalidate(vma, haddr, pmd);
> -
> -	/*
> -	 * Transfer dirty bit using value returned by pmd_invalidate() to be
> -	 * sure we don't race with CPU that can set the bit under us.
> -	 */
> -	if (pmd_dirty(old))
> -		SetPageDirty(page);
> -
>  	pmd_populate(mm, pmd, pgtable);
>  
>  	if (freeze) {
> -- 
> 2.13.3
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 3/3] mm/hugetlb: Remove pmd_huge_split_prepare
@ 2017-07-27 12:57     ` Michal Hocko
  0 siblings, 0 replies; 26+ messages in thread
From: Michal Hocko @ 2017-07-27 12:57 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: benh, paulus, mpe, Kirill A . Shutemov, linuxppc-dev, linux-mm

On Thu 27-07-17 14:07:56, Aneesh Kumar K.V wrote:
> Instead of marking the pmd ready for split, invalidate the pmd. This should
> take care of powerpc requirement.

which is?

> Only side effect is that we mark the pmd
> invalid early. This can result in us blocking access to the page a bit longer
> if we race against a thp split.

Again, this doesn't tell me what is the problem and why do we care.

> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> ---
>  arch/powerpc/include/asm/book3s/64/hash-4k.h  |  2 -
>  arch/powerpc/include/asm/book3s/64/hash-64k.h |  2 -
>  arch/powerpc/include/asm/book3s/64/pgtable.h  |  9 ----
>  arch/powerpc/include/asm/book3s/64/radix.h    |  6 ---
>  arch/powerpc/mm/pgtable-hash64.c              | 22 --------
>  include/asm-generic/pgtable.h                 |  8 ---
>  mm/huge_memory.c                              | 73 +++++++++++++--------------
>  7 files changed, 35 insertions(+), 87 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/book3s/64/hash-4k.h b/arch/powerpc/include/asm/book3s/64/hash-4k.h
> index 0c4e470571ca..7d914f4fc534 100644
> --- a/arch/powerpc/include/asm/book3s/64/hash-4k.h
> +++ b/arch/powerpc/include/asm/book3s/64/hash-4k.h
> @@ -100,8 +100,6 @@ extern pmd_t hash__pmdp_collapse_flush(struct vm_area_struct *vma,
>  extern void hash__pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp,
>  					 pgtable_t pgtable);
>  extern pgtable_t hash__pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp);
> -extern void hash__pmdp_huge_split_prepare(struct vm_area_struct *vma,
> -				      unsigned long address, pmd_t *pmdp);
>  extern pmd_t hash__pmdp_huge_get_and_clear(struct mm_struct *mm,
>  				       unsigned long addr, pmd_t *pmdp);
>  extern int hash__has_transparent_hugepage(void);
> diff --git a/arch/powerpc/include/asm/book3s/64/hash-64k.h b/arch/powerpc/include/asm/book3s/64/hash-64k.h
> index 8c8fb6fbdabe..b856e130c678 100644
> --- a/arch/powerpc/include/asm/book3s/64/hash-64k.h
> +++ b/arch/powerpc/include/asm/book3s/64/hash-64k.h
> @@ -164,8 +164,6 @@ extern pmd_t hash__pmdp_collapse_flush(struct vm_area_struct *vma,
>  extern void hash__pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp,
>  					 pgtable_t pgtable);
>  extern pgtable_t hash__pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp);
> -extern void hash__pmdp_huge_split_prepare(struct vm_area_struct *vma,
> -				      unsigned long address, pmd_t *pmdp);
>  extern pmd_t hash__pmdp_huge_get_and_clear(struct mm_struct *mm,
>  				       unsigned long addr, pmd_t *pmdp);
>  extern int hash__has_transparent_hugepage(void);
> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
> index ece6912fae8e..557915792214 100644
> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
> @@ -1122,15 +1122,6 @@ static inline pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm,
>  extern pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
>  			     pmd_t *pmdp);
>  
> -#define __HAVE_ARCH_PMDP_HUGE_SPLIT_PREPARE
> -static inline void pmdp_huge_split_prepare(struct vm_area_struct *vma,
> -					   unsigned long address, pmd_t *pmdp)
> -{
> -	if (radix_enabled())
> -		return radix__pmdp_huge_split_prepare(vma, address, pmdp);
> -	return hash__pmdp_huge_split_prepare(vma, address, pmdp);
> -}
> -
>  #define pmd_move_must_withdraw pmd_move_must_withdraw
>  struct spinlock;
>  static inline int pmd_move_must_withdraw(struct spinlock *new_pmd_ptl,
> diff --git a/arch/powerpc/include/asm/book3s/64/radix.h b/arch/powerpc/include/asm/book3s/64/radix.h
> index 558fea3b2d22..a779a43b643b 100644
> --- a/arch/powerpc/include/asm/book3s/64/radix.h
> +++ b/arch/powerpc/include/asm/book3s/64/radix.h
> @@ -270,12 +270,6 @@ static inline pmd_t radix__pmd_mkhuge(pmd_t pmd)
>  		return __pmd(pmd_val(pmd) | _PAGE_PTE | R_PAGE_LARGE);
>  	return __pmd(pmd_val(pmd) | _PAGE_PTE);
>  }
> -static inline void radix__pmdp_huge_split_prepare(struct vm_area_struct *vma,
> -					    unsigned long address, pmd_t *pmdp)
> -{
> -	/* Nothing to do for radix. */
> -	return;
> -}
>  
>  extern unsigned long radix__pmd_hugepage_update(struct mm_struct *mm, unsigned long addr,
>  					  pmd_t *pmdp, unsigned long clr,
> diff --git a/arch/powerpc/mm/pgtable-hash64.c b/arch/powerpc/mm/pgtable-hash64.c
> index c0a7372bdaa6..00aee1485714 100644
> --- a/arch/powerpc/mm/pgtable-hash64.c
> +++ b/arch/powerpc/mm/pgtable-hash64.c
> @@ -296,28 +296,6 @@ pgtable_t hash__pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp)
>  	return pgtable;
>  }
>  
> -void hash__pmdp_huge_split_prepare(struct vm_area_struct *vma,
> -			       unsigned long address, pmd_t *pmdp)
> -{
> -	VM_BUG_ON(address & ~HPAGE_PMD_MASK);
> -	VM_BUG_ON(REGION_ID(address) != USER_REGION_ID);
> -	VM_BUG_ON(pmd_devmap(*pmdp));
> -
> -	/*
> -	 * We can't mark the pmd none here, because that will cause a race
> -	 * against exit_mmap. We need to continue mark pmd TRANS HUGE, while
> -	 * we spilt, but at the same time we wan't rest of the ppc64 code
> -	 * not to insert hash pte on this, because we will be modifying
> -	 * the deposited pgtable in the caller of this function. Hence
> -	 * clear the _PAGE_USER so that we move the fault handling to
> -	 * higher level function and that will serialize against ptl.
> -	 * We need to flush existing hash pte entries here even though,
> -	 * the translation is still valid, because we will withdraw
> -	 * pgtable_t after this.
> -	 */
> -	pmd_hugepage_update(vma->vm_mm, address, pmdp, 0, _PAGE_PRIVILEGED);
> -}
> -
>  /*
>   * A linux hugepage PMD was changed and the corresponding hash table entries
>   * neesd to be flushed.
> diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
> index ece5e399567a..b934e41277ac 100644
> --- a/include/asm-generic/pgtable.h
> +++ b/include/asm-generic/pgtable.h
> @@ -313,14 +313,6 @@ extern pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
>  			    pmd_t *pmdp);
>  #endif
>  
> -#ifndef __HAVE_ARCH_PMDP_HUGE_SPLIT_PREPARE
> -static inline void pmdp_huge_split_prepare(struct vm_area_struct *vma,
> -					   unsigned long address, pmd_t *pmdp)
> -{
> -
> -}
> -#endif
> -
>  #ifndef __HAVE_ARCH_PTE_SAME
>  static inline int pte_same(pte_t pte_a, pte_t pte_b)
>  {
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index e654592f04e7..0bd9850b1d81 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1923,8 +1923,8 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
>  	struct mm_struct *mm = vma->vm_mm;
>  	struct page *page;
>  	pgtable_t pgtable;
> -	pmd_t old, _pmd;
> -	bool young, write, soft_dirty;
> +	pmd_t old_pmd, _pmd;
> +	bool young, write, dirty, soft_dirty;
>  	unsigned long addr;
>  	int i;
>  
> @@ -1956,14 +1956,39 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
>  		return __split_huge_zero_page_pmd(vma, haddr, pmd);
>  	}
>  
> -	page = pmd_page(*pmd);
> +	/*
> +	 * Up to this point the pmd is present and huge and userland has the
> +	 * whole access to the hugepage during the split (which happens in
> +	 * place). If we overwrite the pmd with the not-huge version pointing
> +	 * to the pte here (which of course we could if all CPUs were bug
> +	 * free), userland could trigger a small page size TLB miss on the
> +	 * small sized TLB while the hugepage TLB entry is still established in
> +	 * the huge TLB. Some CPU doesn't like that.
> +	 * See http://support.amd.com/us/Processor_TechDocs/41322.pdf, Erratum
> +	 * 383 on page 93. Intel should be safe but is also warns that it's
> +	 * only safe if the permission and cache attributes of the two entries
> +	 * loaded in the two TLB is identical (which should be the case here).
> +	 * But it is generally safer to never allow small and huge TLB entries
> +	 * for the same virtual address to be loaded simultaneously. So instead
> +	 * of doing "pmd_populate(); flush_pmd_tlb_range();" we first mark the
> +	 * current pmd notpresent (atomically because here the pmd_trans_huge
> +	 * and pmd_trans_splitting must remain set at all times on the pmd
> +	 * until the split is complete for this pmd), then we flush the SMP TLB
> +	 * and finally we write the non-huge version of the pmd entry with
> +	 * pmd_populate.
> +	 */
> +	old_pmd = pmdp_invalidate(vma, haddr, pmd);
> +
> +	page = pmd_page(old_pmd);
>  	VM_BUG_ON_PAGE(!page_count(page), page);
>  	page_ref_add(page, HPAGE_PMD_NR - 1);
> -	write = pmd_write(*pmd);
> -	young = pmd_young(*pmd);
> -	soft_dirty = pmd_soft_dirty(*pmd);
> -
> -	pmdp_huge_split_prepare(vma, haddr, pmd);
> +	write = pmd_write(old_pmd);
> +	young = pmd_young(old_pmd);
> +	dirty = pmd_dirty(*pmd);
> +	soft_dirty = pmd_soft_dirty(old_pmd);
> +	/*
> +	 * withdraw the table only after we mark the pmd entry invalid
> +	 */
>  	pgtable = pgtable_trans_huge_withdraw(mm, pmd);
>  	pmd_populate(mm, &_pmd, pgtable);
>  
> @@ -1990,6 +2015,8 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
>  			if (soft_dirty)
>  				entry = pte_mksoft_dirty(entry);
>  		}
> +		if (dirty)
> +			SetPageDirty(page + i);
>  		pte = pte_offset_map(&_pmd, addr);
>  		BUG_ON(!pte_none(*pte));
>  		set_pte_at(mm, addr, pte, entry);
> @@ -2017,36 +2044,6 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
>  	}
>  
>  	smp_wmb(); /* make pte visible before pmd */
> -	/*
> -	 * Up to this point the pmd is present and huge and userland has the
> -	 * whole access to the hugepage during the split (which happens in
> -	 * place). If we overwrite the pmd with the not-huge version pointing
> -	 * to the pte here (which of course we could if all CPUs were bug
> -	 * free), userland could trigger a small page size TLB miss on the
> -	 * small sized TLB while the hugepage TLB entry is still established in
> -	 * the huge TLB. Some CPU doesn't like that.
> -	 * See http://support.amd.com/us/Processor_TechDocs/41322.pdf, Erratum
> -	 * 383 on page 93. Intel should be safe but is also warns that it's
> -	 * only safe if the permission and cache attributes of the two entries
> -	 * loaded in the two TLB is identical (which should be the case here).
> -	 * But it is generally safer to never allow small and huge TLB entries
> -	 * for the same virtual address to be loaded simultaneously. So instead
> -	 * of doing "pmd_populate(); flush_pmd_tlb_range();" we first mark the
> -	 * current pmd notpresent (atomically because here the pmd_trans_huge
> -	 * and pmd_trans_splitting must remain set at all times on the pmd
> -	 * until the split is complete for this pmd), then we flush the SMP TLB
> -	 * and finally we write the non-huge version of the pmd entry with
> -	 * pmd_populate.
> -	 */
> -	old = pmdp_invalidate(vma, haddr, pmd);
> -
> -	/*
> -	 * Transfer dirty bit using value returned by pmd_invalidate() to be
> -	 * sure we don't race with CPU that can set the bit under us.
> -	 */
> -	if (pmd_dirty(old))
> -		SetPageDirty(page);
> -
>  	pmd_populate(mm, pmd, pgtable);
>  
>  	if (freeze) {
> -- 
> 2.13.3
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 1/3] powerpc/mm: update pmdp_invalidate to return old pmd value
  2017-07-27 12:54   ` Michal Hocko
@ 2017-07-27 12:58     ` Kirill A. Shutemov
  -1 siblings, 0 replies; 26+ messages in thread
From: Kirill A. Shutemov @ 2017-07-27 12:58 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Aneesh Kumar K.V, benh, paulus, mpe, linuxppc-dev, linux-mm

On Thu, Jul 27, 2017 at 02:54:49PM +0200, Michal Hocko wrote:
> EMISSING_CHANGELOG
> 
> besides that no user actually uses the return value. Please fold this
> into the patch which uses the new functionality.

That's for patchset I'm working on[1].

[1] http://lkml.kernel.org/r/20170615145224.66200-1-kirill.shutemov@linux.intel.com

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 1/3] powerpc/mm: update pmdp_invalidate to return old pmd value
@ 2017-07-27 12:58     ` Kirill A. Shutemov
  0 siblings, 0 replies; 26+ messages in thread
From: Kirill A. Shutemov @ 2017-07-27 12:58 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Aneesh Kumar K.V, benh, paulus, mpe, linuxppc-dev, linux-mm

On Thu, Jul 27, 2017 at 02:54:49PM +0200, Michal Hocko wrote:
> EMISSING_CHANGELOG
> 
> besides that no user actually uses the return value. Please fold this
> into the patch which uses the new functionality.

That's for patchset I'm working on[1].

[1] http://lkml.kernel.org/r/20170615145224.66200-1-kirill.shutemov@linux.intel.com

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 1/3] powerpc/mm: update pmdp_invalidate to return old pmd value
  2017-07-27 12:54   ` Michal Hocko
@ 2017-07-27 15:48     ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 26+ messages in thread
From: Aneesh Kumar K.V @ 2017-07-27 15:48 UTC (permalink / raw)
  To: Michal Hocko
  Cc: benh, paulus, mpe, Kirill A . Shutemov, linuxppc-dev, linux-mm



On 07/27/2017 06:24 PM, Michal Hocko wrote:
> EMISSING_CHANGELOG
> 
> besides that no user actually uses the return value. Please fold this
> into the patch which uses the new functionality.


The patch series was suppose to help Kirill to make progress with the 
his series at


https://lkml.kernel.org/r/20170615145224.66200-1-kirill.shutemov@linux.intel.com

It is essentially implementing the pmdp_invalidate update for ppc64. His 
series does it for x86-64.

-aneesh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 1/3] powerpc/mm: update pmdp_invalidate to return old pmd value
@ 2017-07-27 15:48     ` Aneesh Kumar K.V
  0 siblings, 0 replies; 26+ messages in thread
From: Aneesh Kumar K.V @ 2017-07-27 15:48 UTC (permalink / raw)
  To: Michal Hocko
  Cc: benh, paulus, mpe, Kirill A . Shutemov, linuxppc-dev, linux-mm



On 07/27/2017 06:24 PM, Michal Hocko wrote:
> EMISSING_CHANGELOG
> 
> besides that no user actually uses the return value. Please fold this
> into the patch which uses the new functionality.


The patch series was suppose to help Kirill to make progress with the 
his series at


https://lkml.kernel.org/r/20170615145224.66200-1-kirill.shutemov@linux.intel.com

It is essentially implementing the pmdp_invalidate update for ppc64. His 
series does it for x86-64.

-aneesh

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 2/3] powerpc/mm: Implement pmdp_establish for ppc64
  2017-07-27 12:56     ` Michal Hocko
@ 2017-07-27 15:52       ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 26+ messages in thread
From: Aneesh Kumar K.V @ 2017-07-27 15:52 UTC (permalink / raw)
  To: Michal Hocko
  Cc: benh, paulus, mpe, Kirill A . Shutemov, linuxppc-dev, linux-mm



On 07/27/2017 06:26 PM, Michal Hocko wrote:
> On Thu 27-07-17 14:07:55, Aneesh Kumar K.V wrote:
>> We can now use this to set pmd page table entries to absolute values. THP
>> need to ensure that we always update pmd PTE entries such that we never mark
>> the pmd none. pmdp_establish helps in implementing that.
>>
>> This doesn't flush the tlb. Based on the old_pmd value returned caller can
>> decide to call flush_pmd_tlb_range()
> 
> _Why_ do we need this. It doesn't really help that the newly added
> function is not used so we could check that...


We were looking at having pmdp_establish used by the core code. But i 
guess Kirill ended up using pmdp_invalidate. If we don't have 
pmdp_establish usage in core code, we can drop this. This is to help 
Kiril make progress with series at


https://lkml.kernel.org/r/20170615145224.66200-1-kirill.shutemov@linux.intel.com


Also thinking about the interface further, I guess pmdp_establish 
interface is some what confusing. So we may want to rethink this 
further. I know that i asked for pmdp_establish in earlier review of 
Kirill's patchset. But now looking back i am not sure we can clearly 
explain only semantic requirement of pmdp_establish. One thing we may 
want to clarify is whether we should retain the Reference and change bit 
from the old entry when we are doing a pmdp_establish ?

Kirill,

Considering core code is still only using pmdp_invalidate(), we may want 
to drop this interface completely ?

-aneesh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 2/3] powerpc/mm: Implement pmdp_establish for ppc64
@ 2017-07-27 15:52       ` Aneesh Kumar K.V
  0 siblings, 0 replies; 26+ messages in thread
From: Aneesh Kumar K.V @ 2017-07-27 15:52 UTC (permalink / raw)
  To: Michal Hocko
  Cc: benh, paulus, mpe, Kirill A . Shutemov, linuxppc-dev, linux-mm



On 07/27/2017 06:26 PM, Michal Hocko wrote:
> On Thu 27-07-17 14:07:55, Aneesh Kumar K.V wrote:
>> We can now use this to set pmd page table entries to absolute values. THP
>> need to ensure that we always update pmd PTE entries such that we never mark
>> the pmd none. pmdp_establish helps in implementing that.
>>
>> This doesn't flush the tlb. Based on the old_pmd value returned caller can
>> decide to call flush_pmd_tlb_range()
> 
> _Why_ do we need this. It doesn't really help that the newly added
> function is not used so we could check that...


We were looking at having pmdp_establish used by the core code. But i 
guess Kirill ended up using pmdp_invalidate. If we don't have 
pmdp_establish usage in core code, we can drop this. This is to help 
Kiril make progress with series at


https://lkml.kernel.org/r/20170615145224.66200-1-kirill.shutemov@linux.intel.com


Also thinking about the interface further, I guess pmdp_establish 
interface is some what confusing. So we may want to rethink this 
further. I know that i asked for pmdp_establish in earlier review of 
Kirill's patchset. But now looking back i am not sure we can clearly 
explain only semantic requirement of pmdp_establish. One thing we may 
want to clarify is whether we should retain the Reference and change bit 
from the old entry when we are doing a pmdp_establish ?

Kirill,

Considering core code is still only using pmdp_invalidate(), we may want 
to drop this interface completely ?

-aneesh

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 3/3] mm/hugetlb: Remove pmd_huge_split_prepare
  2017-07-27 12:57     ` Michal Hocko
@ 2017-07-27 15:57       ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 26+ messages in thread
From: Aneesh Kumar K.V @ 2017-07-27 15:57 UTC (permalink / raw)
  To: Michal Hocko
  Cc: benh, paulus, mpe, Kirill A . Shutemov, linuxppc-dev, linux-mm



On 07/27/2017 06:27 PM, Michal Hocko wrote:
> On Thu 27-07-17 14:07:56, Aneesh Kumar K.V wrote:
>> Instead of marking the pmd ready for split, invalidate the pmd. This should
>> take care of powerpc requirement.
> 
> which is?

I can add the commit which explain details here. Or add more details 
from the older commit here.

c777e2a8b65420b31dac28a453e35be984f5808b

powerpc/mm: Fix Multi hit ERAT cause by recent THP update


> 
>> Only side effect is that we mark the pmd
>> invalid early. This can result in us blocking access to the page a bit longer
>> if we race against a thp split.
> 
> Again, this doesn't tell me what is the problem and why do we care.

Primary motivation is code reduction.

   7 files changed, 35 insertions(+), 87 deletions(-)


-aneesh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 3/3] mm/hugetlb: Remove pmd_huge_split_prepare
@ 2017-07-27 15:57       ` Aneesh Kumar K.V
  0 siblings, 0 replies; 26+ messages in thread
From: Aneesh Kumar K.V @ 2017-07-27 15:57 UTC (permalink / raw)
  To: Michal Hocko
  Cc: benh, paulus, mpe, Kirill A . Shutemov, linuxppc-dev, linux-mm



On 07/27/2017 06:27 PM, Michal Hocko wrote:
> On Thu 27-07-17 14:07:56, Aneesh Kumar K.V wrote:
>> Instead of marking the pmd ready for split, invalidate the pmd. This should
>> take care of powerpc requirement.
> 
> which is?

I can add the commit which explain details here. Or add more details 
from the older commit here.

c777e2a8b65420b31dac28a453e35be984f5808b

powerpc/mm: Fix Multi hit ERAT cause by recent THP update


> 
>> Only side effect is that we mark the pmd
>> invalid early. This can result in us blocking access to the page a bit longer
>> if we race against a thp split.
> 
> Again, this doesn't tell me what is the problem and why do we care.

Primary motivation is code reduction.

   7 files changed, 35 insertions(+), 87 deletions(-)


-aneesh

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 1/3] powerpc/mm: update pmdp_invalidate to return old pmd value
  2017-07-27 15:48     ` Aneesh Kumar K.V
@ 2017-07-28  7:01       ` Michal Hocko
  -1 siblings, 0 replies; 26+ messages in thread
From: Michal Hocko @ 2017-07-28  7:01 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: benh, paulus, mpe, Kirill A . Shutemov, linuxppc-dev, linux-mm

On Thu 27-07-17 21:18:35, Aneesh Kumar K.V wrote:
> 
> 
> On 07/27/2017 06:24 PM, Michal Hocko wrote:
> >EMISSING_CHANGELOG
> >
> >besides that no user actually uses the return value. Please fold this
> >into the patch which uses the new functionality.
> 
> 
> The patch series was suppose to help Kirill to make progress with the his
> series at
> 
> 
> https://lkml.kernel.org/r/20170615145224.66200-1-kirill.shutemov@linux.intel.com
> 
> It is essentially implementing the pmdp_invalidate update for ppc64. His
> series does it for x86-64.

OK, that was not clear from the patch, however. You could either reply
to the original thread or make it explicitly clear in the cover letter.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 1/3] powerpc/mm: update pmdp_invalidate to return old pmd value
@ 2017-07-28  7:01       ` Michal Hocko
  0 siblings, 0 replies; 26+ messages in thread
From: Michal Hocko @ 2017-07-28  7:01 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: benh, paulus, mpe, Kirill A . Shutemov, linuxppc-dev, linux-mm

On Thu 27-07-17 21:18:35, Aneesh Kumar K.V wrote:
> 
> 
> On 07/27/2017 06:24 PM, Michal Hocko wrote:
> >EMISSING_CHANGELOG
> >
> >besides that no user actually uses the return value. Please fold this
> >into the patch which uses the new functionality.
> 
> 
> The patch series was suppose to help Kirill to make progress with the his
> series at
> 
> 
> https://lkml.kernel.org/r/20170615145224.66200-1-kirill.shutemov@linux.intel.com
> 
> It is essentially implementing the pmdp_invalidate update for ppc64. His
> series does it for x86-64.

OK, that was not clear from the patch, however. You could either reply
to the original thread or make it explicitly clear in the cover letter.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 3/3] mm/hugetlb: Remove pmd_huge_split_prepare
  2017-07-27 15:57       ` Aneesh Kumar K.V
@ 2017-07-28  7:04         ` Michal Hocko
  -1 siblings, 0 replies; 26+ messages in thread
From: Michal Hocko @ 2017-07-28  7:04 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: benh, paulus, mpe, Kirill A . Shutemov, linuxppc-dev, linux-mm

On Thu 27-07-17 21:27:37, Aneesh Kumar K.V wrote:
> 
> 
> On 07/27/2017 06:27 PM, Michal Hocko wrote:
> >On Thu 27-07-17 14:07:56, Aneesh Kumar K.V wrote:
> >>Instead of marking the pmd ready for split, invalidate the pmd. This should
> >>take care of powerpc requirement.
> >
> >which is?
> 
> I can add the commit which explain details here. Or add more details from
> the older commit here.
> 
> c777e2a8b65420b31dac28a453e35be984f5808b
> 
> powerpc/mm: Fix Multi hit ERAT cause by recent THP update

Each patch should be self descriptive. You can reference older commits
but always make sure that the full context is clear. This will make the
life easier to whoever is going to look at it later.
 
> >>Only side effect is that we mark the pmd
> >>invalid early. This can result in us blocking access to the page a bit longer
> >>if we race against a thp split.
> >
> >Again, this doesn't tell me what is the problem and why do we care.
> 
> Primary motivation is code reduction.

Then be explicit about it. This wasn't clear from the above description.
At least not to me.
 
>   7 files changed, 35 insertions(+), 87 deletions(-)
> 
> 
> -aneesh

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 3/3] mm/hugetlb: Remove pmd_huge_split_prepare
@ 2017-07-28  7:04         ` Michal Hocko
  0 siblings, 0 replies; 26+ messages in thread
From: Michal Hocko @ 2017-07-28  7:04 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: benh, paulus, mpe, Kirill A . Shutemov, linuxppc-dev, linux-mm

On Thu 27-07-17 21:27:37, Aneesh Kumar K.V wrote:
> 
> 
> On 07/27/2017 06:27 PM, Michal Hocko wrote:
> >On Thu 27-07-17 14:07:56, Aneesh Kumar K.V wrote:
> >>Instead of marking the pmd ready for split, invalidate the pmd. This should
> >>take care of powerpc requirement.
> >
> >which is?
> 
> I can add the commit which explain details here. Or add more details from
> the older commit here.
> 
> c777e2a8b65420b31dac28a453e35be984f5808b
> 
> powerpc/mm: Fix Multi hit ERAT cause by recent THP update

Each patch should be self descriptive. You can reference older commits
but always make sure that the full context is clear. This will make the
life easier to whoever is going to look at it later.
 
> >>Only side effect is that we mark the pmd
> >>invalid early. This can result in us blocking access to the page a bit longer
> >>if we race against a thp split.
> >
> >Again, this doesn't tell me what is the problem and why do we care.
> 
> Primary motivation is code reduction.

Then be explicit about it. This wasn't clear from the above description.
At least not to me.
 
>   7 files changed, 35 insertions(+), 87 deletions(-)
> 
> 
> -aneesh

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2017-07-28  7:04 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-27  8:37 [RFC PATCH 1/3] powerpc/mm: update pmdp_invalidate to return old pmd value Aneesh Kumar K.V
2017-07-27  8:37 ` Aneesh Kumar K.V
2017-07-27  8:37 ` [RFC PATCH 2/3] powerpc/mm: Implement pmdp_establish for ppc64 Aneesh Kumar K.V
2017-07-27  8:37   ` Aneesh Kumar K.V
2017-07-27 12:56   ` Michal Hocko
2017-07-27 12:56     ` Michal Hocko
2017-07-27 15:52     ` Aneesh Kumar K.V
2017-07-27 15:52       ` Aneesh Kumar K.V
2017-07-27  8:37 ` [RFC PATCH 3/3] mm/hugetlb: Remove pmd_huge_split_prepare Aneesh Kumar K.V
2017-07-27  8:37   ` Aneesh Kumar K.V
2017-07-27 10:50   ` Aneesh Kumar K.V
2017-07-27 10:50     ` Aneesh Kumar K.V
2017-07-27 12:57   ` Michal Hocko
2017-07-27 12:57     ` Michal Hocko
2017-07-27 15:57     ` Aneesh Kumar K.V
2017-07-27 15:57       ` Aneesh Kumar K.V
2017-07-28  7:04       ` Michal Hocko
2017-07-28  7:04         ` Michal Hocko
2017-07-27 12:54 ` [RFC PATCH 1/3] powerpc/mm: update pmdp_invalidate to return old pmd value Michal Hocko
2017-07-27 12:54   ` Michal Hocko
2017-07-27 12:58   ` Kirill A. Shutemov
2017-07-27 12:58     ` Kirill A. Shutemov
2017-07-27 15:48   ` Aneesh Kumar K.V
2017-07-27 15:48     ` Aneesh Kumar K.V
2017-07-28  7:01     ` Michal Hocko
2017-07-28  7:01       ` Michal Hocko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.