All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHv3 00/11] Do not loose dirty bit on THP pages
@ 2017-09-12 15:39 ` Kirill A. Shutemov
  0 siblings, 0 replies; 33+ messages in thread
From: Kirill A. Shutemov @ 2017-09-12 15:39 UTC (permalink / raw)
  To: Andrew Morton, Vlastimil Babka, Vineet Gupta, Russell King,
	Will Deacon, Catalin Marinas, Ralf Baechle, David S. Miller,
	Aneesh Kumar K . V, Martin Schwidefsky, Heiko Carstens,
	Andrea Arcangeli
  Cc: linux-arch, linux-mm, linux-kernel, Kirill A. Shutemov

Vlastimil noted that pmdp_invalidate() is not atomic and we can loose
dirty and access bits if CPU sets them after pmdp dereference, but
before set_pmd_at().

The bug can lead to data loss, but the race window is tiny and I haven't
seen any reports that suggested that it happens in reality. So I don't
think it worth sending it to stable.

Unfortunately, there's no way to address the issue in a generic way. We need to
fix all architectures that support THP one-by-one.

All architectures that have THP supported have to provide atomic
pmdp_invalidate() that returns previous value.

If generic implementation of pmdp_invalidate() is used, architecture needs to
provide atomic pmdp_estabish().

pmdp_estabish() is not used out-side generic implementation of
pmdp_invalidate() so far, but I think this can change in the future.

Aneesh Kumar K.V (2):
  powerpc/mm: update pmdp_invalidate to return old pmd value
  sparc64: update pmdp_invalidate to return old pmd value

Catalin Marinas (1):
  arm64: Provide pmdp_establish() helper

Kirill A. Shutemov (7):
  asm-generic: Provide generic_pmdp_establish()
  arc: Use generic_pmdp_establish as pmdp_establish
  arm/mm: Provide pmdp_establish() helper
  mips: Use generic_pmdp_establish as pmdp_establish
  x86/mm: Provide pmdp_establish() helper
  mm: Do not loose dirty and access bits in pmdp_invalidate()
  mm: Use updated pmdp_invalidate() interface to track dirty/accessed
    bits

Martin Schwidefsky (1):
  s390/mm: Modify pmdp_invalidate to return old value.

 arch/arc/include/asm/hugepage.h              |  3 +++
 arch/arm/include/asm/pgtable-3level.h        |  3 +++
 arch/arm64/include/asm/pgtable.h             |  7 ++++++
 arch/mips/include/asm/pgtable.h              |  3 +++
 arch/powerpc/include/asm/book3s/64/pgtable.h |  4 +--
 arch/powerpc/mm/pgtable-book3s64.c           |  7 ++++--
 arch/s390/include/asm/pgtable.h              |  5 ++--
 arch/sparc/include/asm/pgtable_64.h          |  2 +-
 arch/sparc/mm/tlb.c                          | 23 +++++++++++++----
 arch/x86/include/asm/pgtable-3level.h        | 37 +++++++++++++++++++++++++++-
 arch/x86/include/asm/pgtable.h               | 15 +++++++++++
 fs/proc/task_mmu.c                           |  8 +++---
 include/asm-generic/pgtable.h                | 17 ++++++++++++-
 mm/huge_memory.c                             | 29 +++++++++-------------
 mm/pgtable-generic.c                         |  6 ++---
 15 files changed, 131 insertions(+), 38 deletions(-)

-- 
2.14.1

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCHv3 00/11] Do not loose dirty bit on THP pages
@ 2017-09-12 15:39 ` Kirill A. Shutemov
  0 siblings, 0 replies; 33+ messages in thread
From: Kirill A. Shutemov @ 2017-09-12 15:39 UTC (permalink / raw)
  To: Andrew Morton, Vlastimil Babka, Vineet Gupta, Russell King,
	Will Deacon, Catalin Marinas, Ralf Baechle, David S. Miller,
	Aneesh Kumar K . V, Martin Schwidefsky, Heiko Carstens,
	Andrea Arcangeli
  Cc: linux-arch, linux-mm, linux-kernel, Kirill A. Shutemov

Vlastimil noted that pmdp_invalidate() is not atomic and we can loose
dirty and access bits if CPU sets them after pmdp dereference, but
before set_pmd_at().

The bug can lead to data loss, but the race window is tiny and I haven't
seen any reports that suggested that it happens in reality. So I don't
think it worth sending it to stable.

Unfortunately, there's no way to address the issue in a generic way. We need to
fix all architectures that support THP one-by-one.

All architectures that have THP supported have to provide atomic
pmdp_invalidate() that returns previous value.

If generic implementation of pmdp_invalidate() is used, architecture needs to
provide atomic pmdp_estabish().

pmdp_estabish() is not used out-side generic implementation of
pmdp_invalidate() so far, but I think this can change in the future.

Aneesh Kumar K.V (2):
  powerpc/mm: update pmdp_invalidate to return old pmd value
  sparc64: update pmdp_invalidate to return old pmd value

Catalin Marinas (1):
  arm64: Provide pmdp_establish() helper

Kirill A. Shutemov (7):
  asm-generic: Provide generic_pmdp_establish()
  arc: Use generic_pmdp_establish as pmdp_establish
  arm/mm: Provide pmdp_establish() helper
  mips: Use generic_pmdp_establish as pmdp_establish
  x86/mm: Provide pmdp_establish() helper
  mm: Do not loose dirty and access bits in pmdp_invalidate()
  mm: Use updated pmdp_invalidate() interface to track dirty/accessed
    bits

Martin Schwidefsky (1):
  s390/mm: Modify pmdp_invalidate to return old value.

 arch/arc/include/asm/hugepage.h              |  3 +++
 arch/arm/include/asm/pgtable-3level.h        |  3 +++
 arch/arm64/include/asm/pgtable.h             |  7 ++++++
 arch/mips/include/asm/pgtable.h              |  3 +++
 arch/powerpc/include/asm/book3s/64/pgtable.h |  4 +--
 arch/powerpc/mm/pgtable-book3s64.c           |  7 ++++--
 arch/s390/include/asm/pgtable.h              |  5 ++--
 arch/sparc/include/asm/pgtable_64.h          |  2 +-
 arch/sparc/mm/tlb.c                          | 23 +++++++++++++----
 arch/x86/include/asm/pgtable-3level.h        | 37 +++++++++++++++++++++++++++-
 arch/x86/include/asm/pgtable.h               | 15 +++++++++++
 fs/proc/task_mmu.c                           |  8 +++---
 include/asm-generic/pgtable.h                | 17 ++++++++++++-
 mm/huge_memory.c                             | 29 +++++++++-------------
 mm/pgtable-generic.c                         |  6 ++---
 15 files changed, 131 insertions(+), 38 deletions(-)

-- 
2.14.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCHv3 01/11] asm-generic: Provide generic_pmdp_establish()
  2017-09-12 15:39 ` Kirill A. Shutemov
@ 2017-09-12 15:39   ` Kirill A. Shutemov
  -1 siblings, 0 replies; 33+ messages in thread
From: Kirill A. Shutemov @ 2017-09-12 15:39 UTC (permalink / raw)
  To: Andrew Morton, Vlastimil Babka, Vineet Gupta, Russell King,
	Will Deacon, Catalin Marinas, Ralf Baechle, David S. Miller,
	Aneesh Kumar K . V, Martin Schwidefsky, Heiko Carstens,
	Andrea Arcangeli
  Cc: linux-arch, linux-mm, linux-kernel, Kirill A. Shutemov

This is implementation of pmdp_establish() that is only suitable for an
architecture that doesn't have hardware dirty/accessed bits. In this
case we can't race with CPU which sets these bits and non-atomic
approach is fine.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/asm-generic/pgtable.h | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index 8e0243036564..bf0889eb774d 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -308,6 +308,21 @@ extern void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp,
 extern pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp);
 #endif
 
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+/*
+ * This is implementation of pmdp_establish() that is only suitable for an
+ * architecture that doesn't have hardware dirty/accessed bits. In this case we
+ * can't race with CPU which sets these bits and non-atomic aproach is fine.
+ */
+static inline pmd_t generic_pmdp_establish(struct vm_area_struct *vma,
+		unsigned long address, pmd_t *pmdp, pmd_t pmd)
+{
+	pmd_t old_pmd = *pmdp;
+	set_pmd_at(vma->vm_mm, address, pmdp, pmd);
+	return old_pmd;
+}
+#endif
+
 #ifndef __HAVE_ARCH_PMDP_INVALIDATE
 extern void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
 			    pmd_t *pmdp);
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCHv3 01/11] asm-generic: Provide generic_pmdp_establish()
@ 2017-09-12 15:39   ` Kirill A. Shutemov
  0 siblings, 0 replies; 33+ messages in thread
From: Kirill A. Shutemov @ 2017-09-12 15:39 UTC (permalink / raw)
  To: Andrew Morton, Vlastimil Babka, Vineet Gupta, Russell King,
	Will Deacon, Catalin Marinas, Ralf Baechle, David S. Miller,
	Aneesh Kumar K . V, Martin Schwidefsky, Heiko Carstens,
	Andrea Arcangeli
  Cc: linux-arch, linux-mm, linux-kernel, Kirill A. Shutemov

This is implementation of pmdp_establish() that is only suitable for an
architecture that doesn't have hardware dirty/accessed bits. In this
case we can't race with CPU which sets these bits and non-atomic
approach is fine.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/asm-generic/pgtable.h | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index 8e0243036564..bf0889eb774d 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -308,6 +308,21 @@ extern void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp,
 extern pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp);
 #endif
 
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+/*
+ * This is implementation of pmdp_establish() that is only suitable for an
+ * architecture that doesn't have hardware dirty/accessed bits. In this case we
+ * can't race with CPU which sets these bits and non-atomic aproach is fine.
+ */
+static inline pmd_t generic_pmdp_establish(struct vm_area_struct *vma,
+		unsigned long address, pmd_t *pmdp, pmd_t pmd)
+{
+	pmd_t old_pmd = *pmdp;
+	set_pmd_at(vma->vm_mm, address, pmdp, pmd);
+	return old_pmd;
+}
+#endif
+
 #ifndef __HAVE_ARCH_PMDP_INVALIDATE
 extern void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
 			    pmd_t *pmdp);
-- 
2.14.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCHv3 02/11] arc: Use generic_pmdp_establish as pmdp_establish
  2017-09-12 15:39 ` Kirill A. Shutemov
@ 2017-09-12 15:39   ` Kirill A. Shutemov
  -1 siblings, 0 replies; 33+ messages in thread
From: Kirill A. Shutemov @ 2017-09-12 15:39 UTC (permalink / raw)
  To: Andrew Morton, Vlastimil Babka, Vineet Gupta, Russell King,
	Will Deacon, Catalin Marinas, Ralf Baechle, David S. Miller,
	Aneesh Kumar K . V, Martin Schwidefsky, Heiko Carstens,
	Andrea Arcangeli
  Cc: linux-arch, linux-mm, linux-kernel, Kirill A. Shutemov

ARC doesn't support hardware dirty/accessed bits.
generic_pmdp_establish() is suitable in this case.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
---
 arch/arc/include/asm/hugepage.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/arc/include/asm/hugepage.h b/arch/arc/include/asm/hugepage.h
index b18fcb606908..dc8ee011882f 100644
--- a/arch/arc/include/asm/hugepage.h
+++ b/arch/arc/include/asm/hugepage.h
@@ -74,4 +74,7 @@ extern pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp);
 extern void flush_pmd_tlb_range(struct vm_area_struct *vma, unsigned long start,
 				unsigned long end);
 
+/* We don't have hardware dirty/accessed bits, generic_pmdp_establish is fine.*/
+#define pmdp_establish generic_pmdp_establish
+
 #endif
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCHv3 02/11] arc: Use generic_pmdp_establish as pmdp_establish
@ 2017-09-12 15:39   ` Kirill A. Shutemov
  0 siblings, 0 replies; 33+ messages in thread
From: Kirill A. Shutemov @ 2017-09-12 15:39 UTC (permalink / raw)
  To: Andrew Morton, Vlastimil Babka, Vineet Gupta, Russell King,
	Will Deacon, Catalin Marinas, Ralf Baechle, David S. Miller,
	Aneesh Kumar K . V, Martin Schwidefsky, Heiko Carstens,
	Andrea Arcangeli
  Cc: linux-arch, linux-mm, linux-kernel, Kirill A. Shutemov

ARC doesn't support hardware dirty/accessed bits.
generic_pmdp_establish() is suitable in this case.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
---
 arch/arc/include/asm/hugepage.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/arc/include/asm/hugepage.h b/arch/arc/include/asm/hugepage.h
index b18fcb606908..dc8ee011882f 100644
--- a/arch/arc/include/asm/hugepage.h
+++ b/arch/arc/include/asm/hugepage.h
@@ -74,4 +74,7 @@ extern pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp);
 extern void flush_pmd_tlb_range(struct vm_area_struct *vma, unsigned long start,
 				unsigned long end);
 
+/* We don't have hardware dirty/accessed bits, generic_pmdp_establish is fine.*/
+#define pmdp_establish generic_pmdp_establish
+
 #endif
-- 
2.14.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCHv3 03/11] arm/mm: Provide pmdp_establish() helper
  2017-09-12 15:39 ` Kirill A. Shutemov
@ 2017-09-12 15:39   ` Kirill A. Shutemov
  -1 siblings, 0 replies; 33+ messages in thread
From: Kirill A. Shutemov @ 2017-09-12 15:39 UTC (permalink / raw)
  To: Andrew Morton, Vlastimil Babka, Vineet Gupta, Russell King,
	Will Deacon, Catalin Marinas, Ralf Baechle, David S. Miller,
	Aneesh Kumar K . V, Martin Schwidefsky, Heiko Carstens,
	Andrea Arcangeli
  Cc: linux-arch, linux-mm, linux-kernel, Kirill A. Shutemov

ARM LPAE doesn't have hardware dirty/accessed bits.

generic_pmdp_establish() is the right implementation of pmdp_establish
for this case.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm/include/asm/pgtable-3level.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
index 2a029bceaf2f..57d57cb8cb9a 100644
--- a/arch/arm/include/asm/pgtable-3level.h
+++ b/arch/arm/include/asm/pgtable-3level.h
@@ -250,6 +250,9 @@ PMD_BIT_FUNC(mkyoung,   |= PMD_SECT_AF);
 #define pfn_pmd(pfn,prot)	(__pmd(((phys_addr_t)(pfn) << PAGE_SHIFT) | pgprot_val(prot)))
 #define mk_pmd(page,prot)	pfn_pmd(page_to_pfn(page),prot)
 
+/* No hardware dirty/accessed bits -- generic_pmdp_establish() fits*/
+#define pmdp_establish generic_pmdp_establish
+
 /* represent a notpresent pmd by faulting entry, this is used by pmdp_invalidate */
 static inline pmd_t pmd_mknotpresent(pmd_t pmd)
 {
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCHv3 03/11] arm/mm: Provide pmdp_establish() helper
@ 2017-09-12 15:39   ` Kirill A. Shutemov
  0 siblings, 0 replies; 33+ messages in thread
From: Kirill A. Shutemov @ 2017-09-12 15:39 UTC (permalink / raw)
  To: Andrew Morton, Vlastimil Babka, Vineet Gupta, Russell King,
	Will Deacon, Catalin Marinas, Ralf Baechle, David S. Miller,
	Aneesh Kumar K . V, Martin Schwidefsky, Heiko Carstens,
	Andrea Arcangeli
  Cc: linux-arch, linux-mm, linux-kernel, Kirill A. Shutemov

ARM LPAE doesn't have hardware dirty/accessed bits.

generic_pmdp_establish() is the right implementation of pmdp_establish
for this case.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm/include/asm/pgtable-3level.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
index 2a029bceaf2f..57d57cb8cb9a 100644
--- a/arch/arm/include/asm/pgtable-3level.h
+++ b/arch/arm/include/asm/pgtable-3level.h
@@ -250,6 +250,9 @@ PMD_BIT_FUNC(mkyoung,   |= PMD_SECT_AF);
 #define pfn_pmd(pfn,prot)	(__pmd(((phys_addr_t)(pfn) << PAGE_SHIFT) | pgprot_val(prot)))
 #define mk_pmd(page,prot)	pfn_pmd(page_to_pfn(page),prot)
 
+/* No hardware dirty/accessed bits -- generic_pmdp_establish() fits*/
+#define pmdp_establish generic_pmdp_establish
+
 /* represent a notpresent pmd by faulting entry, this is used by pmdp_invalidate */
 static inline pmd_t pmd_mknotpresent(pmd_t pmd)
 {
-- 
2.14.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCHv3 04/11] arm64: Provide pmdp_establish() helper
  2017-09-12 15:39 ` Kirill A. Shutemov
@ 2017-09-12 15:39   ` Kirill A. Shutemov
  -1 siblings, 0 replies; 33+ messages in thread
From: Kirill A. Shutemov @ 2017-09-12 15:39 UTC (permalink / raw)
  To: Andrew Morton, Vlastimil Babka, Vineet Gupta, Russell King,
	Will Deacon, Catalin Marinas, Ralf Baechle, David S. Miller,
	Aneesh Kumar K . V, Martin Schwidefsky, Heiko Carstens,
	Andrea Arcangeli
  Cc: linux-arch, linux-mm, linux-kernel, Kirill A . Shutemov

From: Catalin Marinas <catalin.marinas@arm.com>

We need an atomic way to setup pmd page table entry, avoiding races with
CPU setting dirty/accessed bits. This is required to implement
pmdp_invalidate() that doesn't lose these bits.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/arm64/include/asm/pgtable.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index bc4e92337d16..09bb86533d32 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -663,6 +663,13 @@ static inline void pmdp_set_wrprotect(struct mm_struct *mm,
 {
 	ptep_set_wrprotect(mm, address, (pte_t *)pmdp);
 }
+
+#define pmdp_establish pmdp_establish
+static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
+		unsigned long address, pmd_t *pmdp, pmd_t pmd)
+{
+	return __pmd(xchg_relaxed(&pmd_val(*pmdp), pmd_val(pmd)));
+}
 #endif
 
 extern pgd_t swapper_pg_dir[PTRS_PER_PGD];
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCHv3 04/11] arm64: Provide pmdp_establish() helper
@ 2017-09-12 15:39   ` Kirill A. Shutemov
  0 siblings, 0 replies; 33+ messages in thread
From: Kirill A. Shutemov @ 2017-09-12 15:39 UTC (permalink / raw)
  To: Andrew Morton, Vlastimil Babka, Vineet Gupta, Russell King,
	Will Deacon, Catalin Marinas, Ralf Baechle, David S. Miller,
	Aneesh Kumar K . V, Martin Schwidefsky, Heiko Carstens,
	Andrea Arcangeli
  Cc: linux-arch, linux-mm, linux-kernel, Kirill A . Shutemov

From: Catalin Marinas <catalin.marinas@arm.com>

We need an atomic way to setup pmd page table entry, avoiding races with
CPU setting dirty/accessed bits. This is required to implement
pmdp_invalidate() that doesn't lose these bits.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/arm64/include/asm/pgtable.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index bc4e92337d16..09bb86533d32 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -663,6 +663,13 @@ static inline void pmdp_set_wrprotect(struct mm_struct *mm,
 {
 	ptep_set_wrprotect(mm, address, (pte_t *)pmdp);
 }
+
+#define pmdp_establish pmdp_establish
+static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
+		unsigned long address, pmd_t *pmdp, pmd_t pmd)
+{
+	return __pmd(xchg_relaxed(&pmd_val(*pmdp), pmd_val(pmd)));
+}
 #endif
 
 extern pgd_t swapper_pg_dir[PTRS_PER_PGD];
-- 
2.14.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCHv3 05/11] mips: Use generic_pmdp_establish as pmdp_establish
  2017-09-12 15:39 ` Kirill A. Shutemov
@ 2017-09-12 15:39   ` Kirill A. Shutemov
  -1 siblings, 0 replies; 33+ messages in thread
From: Kirill A. Shutemov @ 2017-09-12 15:39 UTC (permalink / raw)
  To: Andrew Morton, Vlastimil Babka, Vineet Gupta, Russell King,
	Will Deacon, Catalin Marinas, Ralf Baechle, David S. Miller,
	Aneesh Kumar K . V, Martin Schwidefsky, Heiko Carstens,
	Andrea Arcangeli
  Cc: linux-arch, linux-mm, linux-kernel, Kirill A. Shutemov,
	David Daney, linux-mips

MIPS doesn't support hardware dirty/accessed bits.
generic_pmdp_establish() is suitable in this case.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Daney <david.daney@cavium.com>
Cc: linux-mips@linux-mips.org
---
 arch/mips/include/asm/pgtable.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/mips/include/asm/pgtable.h b/arch/mips/include/asm/pgtable.h
index 9e9e94415d08..7b3a3139e82d 100644
--- a/arch/mips/include/asm/pgtable.h
+++ b/arch/mips/include/asm/pgtable.h
@@ -534,6 +534,9 @@ static inline int io_remap_pfn_range(struct vm_area_struct *vma,
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 
+/* We don't have hardware dirty/accessed bits, generic_pmdp_establish is fine.*/
+#define pmdp_establish generic_pmdp_establish
+
 #define has_transparent_hugepage has_transparent_hugepage
 extern int has_transparent_hugepage(void);
 
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCHv3 05/11] mips: Use generic_pmdp_establish as pmdp_establish
@ 2017-09-12 15:39   ` Kirill A. Shutemov
  0 siblings, 0 replies; 33+ messages in thread
From: Kirill A. Shutemov @ 2017-09-12 15:39 UTC (permalink / raw)
  To: Andrew Morton, Vlastimil Babka, Vineet Gupta, Russell King,
	Will Deacon, Catalin Marinas, Ralf Baechle, David S. Miller,
	Aneesh Kumar K . V, Martin Schwidefsky, Heiko Carstens,
	Andrea Arcangeli
  Cc: linux-arch, linux-mm, linux-kernel, Kirill A. Shutemov,
	David Daney, linux-mips

MIPS doesn't support hardware dirty/accessed bits.
generic_pmdp_establish() is suitable in this case.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Daney <david.daney@cavium.com>
Cc: linux-mips@linux-mips.org
---
 arch/mips/include/asm/pgtable.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/mips/include/asm/pgtable.h b/arch/mips/include/asm/pgtable.h
index 9e9e94415d08..7b3a3139e82d 100644
--- a/arch/mips/include/asm/pgtable.h
+++ b/arch/mips/include/asm/pgtable.h
@@ -534,6 +534,9 @@ static inline int io_remap_pfn_range(struct vm_area_struct *vma,
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 
+/* We don't have hardware dirty/accessed bits, generic_pmdp_establish is fine.*/
+#define pmdp_establish generic_pmdp_establish
+
 #define has_transparent_hugepage has_transparent_hugepage
 extern int has_transparent_hugepage(void);
 
-- 
2.14.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCHv3 06/11] powerpc/mm: update pmdp_invalidate to return old pmd value
  2017-09-12 15:39 ` Kirill A. Shutemov
@ 2017-09-12 15:39   ` Kirill A. Shutemov
  -1 siblings, 0 replies; 33+ messages in thread
From: Kirill A. Shutemov @ 2017-09-12 15:39 UTC (permalink / raw)
  To: Andrew Morton, Vlastimil Babka, Vineet Gupta, Russell King,
	Will Deacon, Catalin Marinas, Ralf Baechle, David S. Miller,
	Aneesh Kumar K . V, Martin Schwidefsky, Heiko Carstens,
	Andrea Arcangeli
  Cc: linux-arch, linux-mm, linux-kernel, Kirill A . Shutemov

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

It's required to avoid loosing dirty and accessed bits.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/powerpc/include/asm/book3s/64/pgtable.h | 4 ++--
 arch/powerpc/mm/pgtable-book3s64.c           | 7 +++++--
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
index b9aff515b4de..aca7cfa349eb 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -1137,8 +1137,8 @@ static inline pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm,
 }
 
 #define __HAVE_ARCH_PMDP_INVALIDATE
-extern void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
-			    pmd_t *pmdp);
+extern pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
+			     pmd_t *pmdp);
 
 #define __HAVE_ARCH_PMDP_HUGE_SPLIT_PREPARE
 static inline void pmdp_huge_split_prepare(struct vm_area_struct *vma,
diff --git a/arch/powerpc/mm/pgtable-book3s64.c b/arch/powerpc/mm/pgtable-book3s64.c
index 3b65917785a5..422e80253a33 100644
--- a/arch/powerpc/mm/pgtable-book3s64.c
+++ b/arch/powerpc/mm/pgtable-book3s64.c
@@ -90,16 +90,19 @@ void serialize_against_pte_lookup(struct mm_struct *mm)
  * We use this to invalidate a pmdp entry before switching from a
  * hugepte to regular pmd entry.
  */
-void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
+pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
 		     pmd_t *pmdp)
 {
-	pmd_hugepage_update(vma->vm_mm, address, pmdp, _PAGE_PRESENT, 0);
+	unsigned long old_pmd;
+
+	old_pmd = pmd_hugepage_update(vma->vm_mm, address, pmdp, _PAGE_PRESENT, 0);
 	flush_pmd_tlb_range(vma, address, address + HPAGE_PMD_SIZE);
 	/*
 	 * This ensures that generic code that rely on IRQ disabling
 	 * to prevent a parallel THP split work as expected.
 	 */
 	serialize_against_pte_lookup(vma->vm_mm);
+	return __pmd(old_pmd);
 }
 
 static pmd_t pmd_set_protbits(pmd_t pmd, pgprot_t pgprot)
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCHv3 06/11] powerpc/mm: update pmdp_invalidate to return old pmd value
@ 2017-09-12 15:39   ` Kirill A. Shutemov
  0 siblings, 0 replies; 33+ messages in thread
From: Kirill A. Shutemov @ 2017-09-12 15:39 UTC (permalink / raw)
  To: Andrew Morton, Vlastimil Babka, Vineet Gupta, Russell King,
	Will Deacon, Catalin Marinas, Ralf Baechle, David S. Miller,
	Aneesh Kumar K . V, Martin Schwidefsky, Heiko Carstens,
	Andrea Arcangeli
  Cc: linux-arch, linux-mm, linux-kernel, Kirill A . Shutemov

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

It's required to avoid loosing dirty and accessed bits.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/powerpc/include/asm/book3s/64/pgtable.h | 4 ++--
 arch/powerpc/mm/pgtable-book3s64.c           | 7 +++++--
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
index b9aff515b4de..aca7cfa349eb 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -1137,8 +1137,8 @@ static inline pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm,
 }
 
 #define __HAVE_ARCH_PMDP_INVALIDATE
-extern void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
-			    pmd_t *pmdp);
+extern pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
+			     pmd_t *pmdp);
 
 #define __HAVE_ARCH_PMDP_HUGE_SPLIT_PREPARE
 static inline void pmdp_huge_split_prepare(struct vm_area_struct *vma,
diff --git a/arch/powerpc/mm/pgtable-book3s64.c b/arch/powerpc/mm/pgtable-book3s64.c
index 3b65917785a5..422e80253a33 100644
--- a/arch/powerpc/mm/pgtable-book3s64.c
+++ b/arch/powerpc/mm/pgtable-book3s64.c
@@ -90,16 +90,19 @@ void serialize_against_pte_lookup(struct mm_struct *mm)
  * We use this to invalidate a pmdp entry before switching from a
  * hugepte to regular pmd entry.
  */
-void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
+pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
 		     pmd_t *pmdp)
 {
-	pmd_hugepage_update(vma->vm_mm, address, pmdp, _PAGE_PRESENT, 0);
+	unsigned long old_pmd;
+
+	old_pmd = pmd_hugepage_update(vma->vm_mm, address, pmdp, _PAGE_PRESENT, 0);
 	flush_pmd_tlb_range(vma, address, address + HPAGE_PMD_SIZE);
 	/*
 	 * This ensures that generic code that rely on IRQ disabling
 	 * to prevent a parallel THP split work as expected.
 	 */
 	serialize_against_pte_lookup(vma->vm_mm);
+	return __pmd(old_pmd);
 }
 
 static pmd_t pmd_set_protbits(pmd_t pmd, pgprot_t pgprot)
-- 
2.14.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCHv3 07/11] s390/mm: Modify pmdp_invalidate to return old value.
  2017-09-12 15:39 ` Kirill A. Shutemov
@ 2017-09-12 15:39   ` Kirill A. Shutemov
  -1 siblings, 0 replies; 33+ messages in thread
From: Kirill A. Shutemov @ 2017-09-12 15:39 UTC (permalink / raw)
  To: Andrew Morton, Vlastimil Babka, Vineet Gupta, Russell King,
	Will Deacon, Catalin Marinas, Ralf Baechle, David S. Miller,
	Aneesh Kumar K . V, Martin Schwidefsky, Heiko Carstens,
	Andrea Arcangeli
  Cc: linux-arch, linux-mm, linux-kernel, Kirill A . Shutemov

From: Martin Schwidefsky <schwidefsky@de.ibm.com>

It's required to avoid loosing dirty and accessed bits.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/s390/include/asm/pgtable.h | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index dce708e061ea..d3de8ddc55ec 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -1504,10 +1504,11 @@ static inline pmd_t pmdp_huge_clear_flush(struct vm_area_struct *vma,
 }
 
 #define __HAVE_ARCH_PMDP_INVALIDATE
-static inline void pmdp_invalidate(struct vm_area_struct *vma,
+static inline pmd_t pmdp_invalidate(struct vm_area_struct *vma,
 				   unsigned long addr, pmd_t *pmdp)
 {
-	pmdp_xchg_direct(vma->vm_mm, addr, pmdp, __pmd(_SEGMENT_ENTRY_EMPTY));
+	return pmdp_xchg_direct(vma->vm_mm, addr, pmdp,
+			__pmd(_SEGMENT_ENTRY_EMPTY));
 }
 
 #define __HAVE_ARCH_PMDP_SET_WRPROTECT
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCHv3 07/11] s390/mm: Modify pmdp_invalidate to return old value.
@ 2017-09-12 15:39   ` Kirill A. Shutemov
  0 siblings, 0 replies; 33+ messages in thread
From: Kirill A. Shutemov @ 2017-09-12 15:39 UTC (permalink / raw)
  To: Andrew Morton, Vlastimil Babka, Vineet Gupta, Russell King,
	Will Deacon, Catalin Marinas, Ralf Baechle, David S. Miller,
	Aneesh Kumar K . V, Martin Schwidefsky, Heiko Carstens,
	Andrea Arcangeli
  Cc: linux-arch, linux-mm, linux-kernel, Kirill A . Shutemov

From: Martin Schwidefsky <schwidefsky@de.ibm.com>

It's required to avoid loosing dirty and accessed bits.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/s390/include/asm/pgtable.h | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index dce708e061ea..d3de8ddc55ec 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -1504,10 +1504,11 @@ static inline pmd_t pmdp_huge_clear_flush(struct vm_area_struct *vma,
 }
 
 #define __HAVE_ARCH_PMDP_INVALIDATE
-static inline void pmdp_invalidate(struct vm_area_struct *vma,
+static inline pmd_t pmdp_invalidate(struct vm_area_struct *vma,
 				   unsigned long addr, pmd_t *pmdp)
 {
-	pmdp_xchg_direct(vma->vm_mm, addr, pmdp, __pmd(_SEGMENT_ENTRY_EMPTY));
+	return pmdp_xchg_direct(vma->vm_mm, addr, pmdp,
+			__pmd(_SEGMENT_ENTRY_EMPTY));
 }
 
 #define __HAVE_ARCH_PMDP_SET_WRPROTECT
-- 
2.14.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCHv3 08/11] sparc64: update pmdp_invalidate to return old pmd value
  2017-09-12 15:39 ` Kirill A. Shutemov
@ 2017-09-12 15:39   ` Kirill A. Shutemov
  -1 siblings, 0 replies; 33+ messages in thread
From: Kirill A. Shutemov @ 2017-09-12 15:39 UTC (permalink / raw)
  To: Andrew Morton, Vlastimil Babka, Vineet Gupta, Russell King,
	Will Deacon, Catalin Marinas, Ralf Baechle, David S. Miller,
	Aneesh Kumar K . V, Martin Schwidefsky, Heiko Carstens,
	Andrea Arcangeli
  Cc: linux-arch, linux-mm, linux-kernel, Nitin Gupta, Kirill A . Shutemov

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

It's required to avoid loosing dirty and accessed bits.

Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/sparc/include/asm/pgtable_64.h |  2 +-
 arch/sparc/mm/tlb.c                 | 23 ++++++++++++++++++-----
 2 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/arch/sparc/include/asm/pgtable_64.h b/arch/sparc/include/asm/pgtable_64.h
index 4fefe3762083..83b06c98bb94 100644
--- a/arch/sparc/include/asm/pgtable_64.h
+++ b/arch/sparc/include/asm/pgtable_64.h
@@ -979,7 +979,7 @@ void update_mmu_cache_pmd(struct vm_area_struct *vma, unsigned long addr,
 			  pmd_t *pmd);
 
 #define __HAVE_ARCH_PMDP_INVALIDATE
-extern void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
+extern pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
 			    pmd_t *pmdp);
 
 #define __HAVE_ARCH_PGTABLE_DEPOSIT
diff --git a/arch/sparc/mm/tlb.c b/arch/sparc/mm/tlb.c
index ee8066c3d96c..d36c65fc55cf 100644
--- a/arch/sparc/mm/tlb.c
+++ b/arch/sparc/mm/tlb.c
@@ -218,17 +218,28 @@ void set_pmd_at(struct mm_struct *mm, unsigned long addr,
 	}
 }
 
+static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
+		unsigned long address, pmd_t *pmdp, pmd_t pmd)
+{
+	pmd_t old;
+
+	{
+		old = *pmdp;
+	} while (cmpxchg64(&pmdp->pmd, old.pmd, pmd.pmd) != old.pmd);
+
+	return old;
+}
+
 /*
  * This routine is only called when splitting a THP
  */
-void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
+pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
 		     pmd_t *pmdp)
 {
-	pmd_t entry = *pmdp;
-
-	pmd_val(entry) &= ~_PAGE_VALID;
+	pmd_t old, entry;
 
-	set_pmd_at(vma->vm_mm, address, pmdp, entry);
+	entry = __pmd(pmd_val(*pmdp) & ~_PAGE_VALID);
+	old = pmdp_establish(vma, address, pmdp, entry);
 	flush_tlb_range(vma, address, address + HPAGE_PMD_SIZE);
 
 	/*
@@ -239,6 +250,8 @@ void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
 	if ((pmd_val(entry) & _PAGE_PMD_HUGE) &&
 	    !is_huge_zero_page(pmd_page(entry)))
 		(vma->vm_mm)->context.thp_pte_count--;
+
+	return old;
 }
 
 void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp,
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCHv3 08/11] sparc64: update pmdp_invalidate to return old pmd value
@ 2017-09-12 15:39   ` Kirill A. Shutemov
  0 siblings, 0 replies; 33+ messages in thread
From: Kirill A. Shutemov @ 2017-09-12 15:39 UTC (permalink / raw)
  To: Andrew Morton, Vlastimil Babka, Vineet Gupta, Russell King,
	Will Deacon, Catalin Marinas, Ralf Baechle, David S. Miller,
	Aneesh Kumar K . V, Martin Schwidefsky, Heiko Carstens,
	Andrea Arcangeli
  Cc: linux-arch, linux-mm, linux-kernel, Nitin Gupta, Kirill A . Shutemov

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

It's required to avoid loosing dirty and accessed bits.

Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/sparc/include/asm/pgtable_64.h |  2 +-
 arch/sparc/mm/tlb.c                 | 23 ++++++++++++++++++-----
 2 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/arch/sparc/include/asm/pgtable_64.h b/arch/sparc/include/asm/pgtable_64.h
index 4fefe3762083..83b06c98bb94 100644
--- a/arch/sparc/include/asm/pgtable_64.h
+++ b/arch/sparc/include/asm/pgtable_64.h
@@ -979,7 +979,7 @@ void update_mmu_cache_pmd(struct vm_area_struct *vma, unsigned long addr,
 			  pmd_t *pmd);
 
 #define __HAVE_ARCH_PMDP_INVALIDATE
-extern void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
+extern pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
 			    pmd_t *pmdp);
 
 #define __HAVE_ARCH_PGTABLE_DEPOSIT
diff --git a/arch/sparc/mm/tlb.c b/arch/sparc/mm/tlb.c
index ee8066c3d96c..d36c65fc55cf 100644
--- a/arch/sparc/mm/tlb.c
+++ b/arch/sparc/mm/tlb.c
@@ -218,17 +218,28 @@ void set_pmd_at(struct mm_struct *mm, unsigned long addr,
 	}
 }
 
+static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
+		unsigned long address, pmd_t *pmdp, pmd_t pmd)
+{
+	pmd_t old;
+
+	{
+		old = *pmdp;
+	} while (cmpxchg64(&pmdp->pmd, old.pmd, pmd.pmd) != old.pmd);
+
+	return old;
+}
+
 /*
  * This routine is only called when splitting a THP
  */
-void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
+pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
 		     pmd_t *pmdp)
 {
-	pmd_t entry = *pmdp;
-
-	pmd_val(entry) &= ~_PAGE_VALID;
+	pmd_t old, entry;
 
-	set_pmd_at(vma->vm_mm, address, pmdp, entry);
+	entry = __pmd(pmd_val(*pmdp) & ~_PAGE_VALID);
+	old = pmdp_establish(vma, address, pmdp, entry);
 	flush_tlb_range(vma, address, address + HPAGE_PMD_SIZE);
 
 	/*
@@ -239,6 +250,8 @@ void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
 	if ((pmd_val(entry) & _PAGE_PMD_HUGE) &&
 	    !is_huge_zero_page(pmd_page(entry)))
 		(vma->vm_mm)->context.thp_pte_count--;
+
+	return old;
 }
 
 void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp,
-- 
2.14.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCHv3 09/11] x86/mm: Provide pmdp_establish() helper
  2017-09-12 15:39 ` Kirill A. Shutemov
@ 2017-09-12 15:39   ` Kirill A. Shutemov
  -1 siblings, 0 replies; 33+ messages in thread
From: Kirill A. Shutemov @ 2017-09-12 15:39 UTC (permalink / raw)
  To: Andrew Morton, Vlastimil Babka, Vineet Gupta, Russell King,
	Will Deacon, Catalin Marinas, Ralf Baechle, David S. Miller,
	Aneesh Kumar K . V, Martin Schwidefsky, Heiko Carstens,
	Andrea Arcangeli
  Cc: linux-arch, linux-mm, linux-kernel, Kirill A. Shutemov,
	Ingo Molnar, H . Peter Anvin, Thomas Gleixner

We need an atomic way to setup pmd page table entry, avoiding races with
CPU setting dirty/accessed bits. This is required to implement
pmdp_invalidate() that doesn't loose these bits.

On PAE we can avoid expensive cmpxchg8b for cases when new page table
entry is not present. If it's present, fallback to cpmxchg loop.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/pgtable-3level.h | 37 ++++++++++++++++++++++++++++++++++-
 arch/x86/include/asm/pgtable.h        | 15 ++++++++++++++
 2 files changed, 51 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/pgtable-3level.h b/arch/x86/include/asm/pgtable-3level.h
index c8821bab938f..cd73be22be1d 100644
--- a/arch/x86/include/asm/pgtable-3level.h
+++ b/arch/x86/include/asm/pgtable-3level.h
@@ -157,7 +157,6 @@ static inline pte_t native_ptep_get_and_clear(pte_t *ptep)
 #define native_ptep_get_and_clear(xp) native_local_ptep_get_and_clear(xp)
 #endif
 
-#ifdef CONFIG_SMP
 union split_pmd {
 	struct {
 		u32 pmd_low;
@@ -165,6 +164,8 @@ union split_pmd {
 	};
 	pmd_t pmd;
 };
+
+#ifdef CONFIG_SMP
 static inline pmd_t native_pmdp_get_and_clear(pmd_t *pmdp)
 {
 	union split_pmd res, *orig = (union split_pmd *)pmdp;
@@ -180,6 +181,40 @@ static inline pmd_t native_pmdp_get_and_clear(pmd_t *pmdp)
 #define native_pmdp_get_and_clear(xp) native_local_pmdp_get_and_clear(xp)
 #endif
 
+#ifndef pmdp_establish
+#define pmdp_establish pmdp_establish
+static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
+		unsigned long address, pmd_t *pmdp, pmd_t pmd)
+{
+	pmd_t old;
+
+	/*
+	 * If pmd has present bit cleared we can get away without expensive
+	 * cmpxchg64: we can update pmdp half-by-half without racing with
+	 * anybody.
+	 */
+	if (!(pmd_val(pmd) & _PAGE_PRESENT)) {
+		union split_pmd old, new, *ptr;
+
+		ptr = (union split_pmd *)pmdp;
+
+		new.pmd = pmd;
+
+		/* xchg acts as a barrier before setting of the high bits */
+		old.pmd_low = xchg(&ptr->pmd_low, new.pmd_low);
+		old.pmd_high = ptr->pmd_high;
+		ptr->pmd_high = new.pmd_high;
+		return old.pmd;
+	}
+
+	{
+		old = *pmdp;
+	} while (cmpxchg64(&pmdp->pmd, old.pmd, pmd.pmd) != old.pmd);
+
+	return old;
+}
+#endif
+
 #ifdef CONFIG_SMP
 union split_pud {
 	struct {
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 5b4c44d419c5..ff19dbd6c93d 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1111,6 +1111,21 @@ static inline void pmdp_set_wrprotect(struct mm_struct *mm,
 	clear_bit(_PAGE_BIT_RW, (unsigned long *)pmdp);
 }
 
+#ifndef pmdp_establish
+#define pmdp_establish pmdp_establish
+static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
+		unsigned long address, pmd_t *pmdp, pmd_t pmd)
+{
+	if (IS_ENABLED(CONFIG_SMP)) {
+		return xchg(pmdp, pmd);
+	} else {
+		pmd_t old = *pmdp;
+		*pmdp = pmd;
+		return old;
+	}
+}
+#endif
+
 /*
  * clone_pgd_range(pgd_t *dst, pgd_t *src, int count);
  *
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCHv3 09/11] x86/mm: Provide pmdp_establish() helper
@ 2017-09-12 15:39   ` Kirill A. Shutemov
  0 siblings, 0 replies; 33+ messages in thread
From: Kirill A. Shutemov @ 2017-09-12 15:39 UTC (permalink / raw)
  To: Andrew Morton, Vlastimil Babka, Vineet Gupta, Russell King,
	Will Deacon, Catalin Marinas, Ralf Baechle, David S. Miller,
	Aneesh Kumar K . V, Martin Schwidefsky, Heiko Carstens,
	Andrea Arcangeli
  Cc: linux-arch, linux-mm, linux-kernel, Kirill A. Shutemov,
	Ingo Molnar, H . Peter Anvin, Thomas Gleixner

We need an atomic way to setup pmd page table entry, avoiding races with
CPU setting dirty/accessed bits. This is required to implement
pmdp_invalidate() that doesn't loose these bits.

On PAE we can avoid expensive cmpxchg8b for cases when new page table
entry is not present. If it's present, fallback to cpmxchg loop.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/pgtable-3level.h | 37 ++++++++++++++++++++++++++++++++++-
 arch/x86/include/asm/pgtable.h        | 15 ++++++++++++++
 2 files changed, 51 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/pgtable-3level.h b/arch/x86/include/asm/pgtable-3level.h
index c8821bab938f..cd73be22be1d 100644
--- a/arch/x86/include/asm/pgtable-3level.h
+++ b/arch/x86/include/asm/pgtable-3level.h
@@ -157,7 +157,6 @@ static inline pte_t native_ptep_get_and_clear(pte_t *ptep)
 #define native_ptep_get_and_clear(xp) native_local_ptep_get_and_clear(xp)
 #endif
 
-#ifdef CONFIG_SMP
 union split_pmd {
 	struct {
 		u32 pmd_low;
@@ -165,6 +164,8 @@ union split_pmd {
 	};
 	pmd_t pmd;
 };
+
+#ifdef CONFIG_SMP
 static inline pmd_t native_pmdp_get_and_clear(pmd_t *pmdp)
 {
 	union split_pmd res, *orig = (union split_pmd *)pmdp;
@@ -180,6 +181,40 @@ static inline pmd_t native_pmdp_get_and_clear(pmd_t *pmdp)
 #define native_pmdp_get_and_clear(xp) native_local_pmdp_get_and_clear(xp)
 #endif
 
+#ifndef pmdp_establish
+#define pmdp_establish pmdp_establish
+static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
+		unsigned long address, pmd_t *pmdp, pmd_t pmd)
+{
+	pmd_t old;
+
+	/*
+	 * If pmd has present bit cleared we can get away without expensive
+	 * cmpxchg64: we can update pmdp half-by-half without racing with
+	 * anybody.
+	 */
+	if (!(pmd_val(pmd) & _PAGE_PRESENT)) {
+		union split_pmd old, new, *ptr;
+
+		ptr = (union split_pmd *)pmdp;
+
+		new.pmd = pmd;
+
+		/* xchg acts as a barrier before setting of the high bits */
+		old.pmd_low = xchg(&ptr->pmd_low, new.pmd_low);
+		old.pmd_high = ptr->pmd_high;
+		ptr->pmd_high = new.pmd_high;
+		return old.pmd;
+	}
+
+	{
+		old = *pmdp;
+	} while (cmpxchg64(&pmdp->pmd, old.pmd, pmd.pmd) != old.pmd);
+
+	return old;
+}
+#endif
+
 #ifdef CONFIG_SMP
 union split_pud {
 	struct {
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 5b4c44d419c5..ff19dbd6c93d 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1111,6 +1111,21 @@ static inline void pmdp_set_wrprotect(struct mm_struct *mm,
 	clear_bit(_PAGE_BIT_RW, (unsigned long *)pmdp);
 }
 
+#ifndef pmdp_establish
+#define pmdp_establish pmdp_establish
+static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
+		unsigned long address, pmd_t *pmdp, pmd_t pmd)
+{
+	if (IS_ENABLED(CONFIG_SMP)) {
+		return xchg(pmdp, pmd);
+	} else {
+		pmd_t old = *pmdp;
+		*pmdp = pmd;
+		return old;
+	}
+}
+#endif
+
 /*
  * clone_pgd_range(pgd_t *dst, pgd_t *src, int count);
  *
-- 
2.14.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCHv3 10/11] mm: Do not loose dirty and access bits in pmdp_invalidate()
  2017-09-12 15:39 ` Kirill A. Shutemov
@ 2017-09-12 15:39   ` Kirill A. Shutemov
  -1 siblings, 0 replies; 33+ messages in thread
From: Kirill A. Shutemov @ 2017-09-12 15:39 UTC (permalink / raw)
  To: Andrew Morton, Vlastimil Babka, Vineet Gupta, Russell King,
	Will Deacon, Catalin Marinas, Ralf Baechle, David S. Miller,
	Aneesh Kumar K . V, Martin Schwidefsky, Heiko Carstens,
	Andrea Arcangeli
  Cc: linux-arch, linux-mm, linux-kernel, Kirill A. Shutemov, Hugh Dickins

Vlastimil noted that pmdp_invalidate() is not atomic and we can loose
dirty and access bits if CPU sets them after pmdp dereference, but
before set_pmd_at().

The patch change pmdp_invalidate() to make the entry non-present atomically and
return previous value of the entry. This value can be used to check if
CPU set dirty/accessed bits under us.

The race window is very small and I haven't seen any reports that can be
attributed to the bug. For this reason, I don't think backporting to
stable trees needed.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
---
 include/asm-generic/pgtable.h | 2 +-
 mm/pgtable-generic.c          | 6 +++---
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index bf0889eb774d..9df1da175fb0 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -324,7 +324,7 @@ static inline pmd_t generic_pmdp_establish(struct vm_area_struct *vma,
 #endif
 
 #ifndef __HAVE_ARCH_PMDP_INVALIDATE
-extern void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
+extern pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
 			    pmd_t *pmdp);
 #endif
 
diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c
index 1175f6a24fdb..3db8f2f76666 100644
--- a/mm/pgtable-generic.c
+++ b/mm/pgtable-generic.c
@@ -180,12 +180,12 @@ pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp)
 #endif
 
 #ifndef __HAVE_ARCH_PMDP_INVALIDATE
-void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
+pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
 		     pmd_t *pmdp)
 {
-	pmd_t entry = *pmdp;
-	set_pmd_at(vma->vm_mm, address, pmdp, pmd_mknotpresent(entry));
+	pmd_t old = pmdp_establish(vma, address, pmdp, pmd_mknotpresent(*pmdp));
 	flush_pmd_tlb_range(vma, address, address + HPAGE_PMD_SIZE);
+	return old;
 }
 #endif
 
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCHv3 10/11] mm: Do not loose dirty and access bits in pmdp_invalidate()
@ 2017-09-12 15:39   ` Kirill A. Shutemov
  0 siblings, 0 replies; 33+ messages in thread
From: Kirill A. Shutemov @ 2017-09-12 15:39 UTC (permalink / raw)
  To: Andrew Morton, Vlastimil Babka, Vineet Gupta, Russell King,
	Will Deacon, Catalin Marinas, Ralf Baechle, David S. Miller,
	Aneesh Kumar K . V, Martin Schwidefsky, Heiko Carstens,
	Andrea Arcangeli
  Cc: linux-arch, linux-mm, linux-kernel, Kirill A. Shutemov, Hugh Dickins

Vlastimil noted that pmdp_invalidate() is not atomic and we can loose
dirty and access bits if CPU sets them after pmdp dereference, but
before set_pmd_at().

The patch change pmdp_invalidate() to make the entry non-present atomically and
return previous value of the entry. This value can be used to check if
CPU set dirty/accessed bits under us.

The race window is very small and I haven't seen any reports that can be
attributed to the bug. For this reason, I don't think backporting to
stable trees needed.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
---
 include/asm-generic/pgtable.h | 2 +-
 mm/pgtable-generic.c          | 6 +++---
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index bf0889eb774d..9df1da175fb0 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -324,7 +324,7 @@ static inline pmd_t generic_pmdp_establish(struct vm_area_struct *vma,
 #endif
 
 #ifndef __HAVE_ARCH_PMDP_INVALIDATE
-extern void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
+extern pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
 			    pmd_t *pmdp);
 #endif
 
diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c
index 1175f6a24fdb..3db8f2f76666 100644
--- a/mm/pgtable-generic.c
+++ b/mm/pgtable-generic.c
@@ -180,12 +180,12 @@ pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp)
 #endif
 
 #ifndef __HAVE_ARCH_PMDP_INVALIDATE
-void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
+pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
 		     pmd_t *pmdp)
 {
-	pmd_t entry = *pmdp;
-	set_pmd_at(vma->vm_mm, address, pmdp, pmd_mknotpresent(entry));
+	pmd_t old = pmdp_establish(vma, address, pmdp, pmd_mknotpresent(*pmdp));
 	flush_pmd_tlb_range(vma, address, address + HPAGE_PMD_SIZE);
+	return old;
 }
 #endif
 
-- 
2.14.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCHv3 11/11] mm: Use updated pmdp_invalidate() interface to track dirty/accessed bits
  2017-09-12 15:39 ` Kirill A. Shutemov
@ 2017-09-12 15:39   ` Kirill A. Shutemov
  -1 siblings, 0 replies; 33+ messages in thread
From: Kirill A. Shutemov @ 2017-09-12 15:39 UTC (permalink / raw)
  To: Andrew Morton, Vlastimil Babka, Vineet Gupta, Russell King,
	Will Deacon, Catalin Marinas, Ralf Baechle, David S. Miller,
	Aneesh Kumar K . V, Martin Schwidefsky, Heiko Carstens,
	Andrea Arcangeli
  Cc: linux-arch, linux-mm, linux-kernel, Kirill A. Shutemov

This patch uses modifed pmdp_invalidate(), that return previous value of pmd,
to transfer dirty and accessed bits.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 fs/proc/task_mmu.c |  8 ++++----
 mm/huge_memory.c   | 29 ++++++++++++-----------------
 2 files changed, 16 insertions(+), 21 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 7b40e11ede9b..fe5bff79031a 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -979,14 +979,14 @@ static inline void clear_soft_dirty(struct vm_area_struct *vma,
 static inline void clear_soft_dirty_pmd(struct vm_area_struct *vma,
 		unsigned long addr, pmd_t *pmdp)
 {
-	pmd_t pmd = *pmdp;
+	pmd_t old, pmd = *pmdp;
 
 	if (pmd_present(pmd)) {
 		/* See comment in change_huge_pmd() */
-		pmdp_invalidate(vma, addr, pmdp);
-		if (pmd_dirty(*pmdp))
+		old = pmdp_invalidate(vma, addr, pmdp);
+		if (pmd_dirty(old))
 			pmd = pmd_mkdirty(pmd);
-		if (pmd_young(*pmdp))
+		if (pmd_young(old))
 			pmd = pmd_mkyoung(pmd);
 
 		pmd = pmd_wrprotect(pmd);
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 269b5df58543..c288c3ce9658 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1900,17 +1900,7 @@ int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
 	 * pmdp_invalidate() is required to make sure we don't miss
 	 * dirty/young flags set by hardware.
 	 */
-	entry = *pmd;
-	pmdp_invalidate(vma, addr, pmd);
-
-	/*
-	 * Recover dirty/young flags.  It relies on pmdp_invalidate to not
-	 * corrupt them.
-	 */
-	if (pmd_dirty(*pmd))
-		entry = pmd_mkdirty(entry);
-	if (pmd_young(*pmd))
-		entry = pmd_mkyoung(entry);
+	entry = pmdp_invalidate(vma, addr, pmd);
 
 	entry = pmd_modify(entry, newprot);
 	if (preserve_write)
@@ -2051,8 +2041,8 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
 	struct mm_struct *mm = vma->vm_mm;
 	struct page *page;
 	pgtable_t pgtable;
-	pmd_t _pmd;
-	bool young, write, dirty, soft_dirty, pmd_migration = false;
+	pmd_t old, _pmd;
+	bool young, write, soft_dirty, pmd_migration = false;
 	unsigned long addr;
 	int i;
 
@@ -2099,7 +2089,6 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
 	page_ref_add(page, HPAGE_PMD_NR - 1);
 	write = pmd_write(*pmd);
 	young = pmd_young(*pmd);
-	dirty = pmd_dirty(*pmd);
 	soft_dirty = pmd_soft_dirty(*pmd);
 
 	pmdp_huge_split_prepare(vma, haddr, pmd);
@@ -2129,8 +2118,6 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
 			if (soft_dirty)
 				entry = pte_mksoft_dirty(entry);
 		}
-		if (dirty)
-			SetPageDirty(page + i);
 		pte = pte_offset_map(&_pmd, addr);
 		BUG_ON(!pte_none(*pte));
 		set_pte_at(mm, addr, pte, entry);
@@ -2179,7 +2166,15 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
 	 * and finally we write the non-huge version of the pmd entry with
 	 * pmd_populate.
 	 */
-	pmdp_invalidate(vma, haddr, pmd);
+	old = pmdp_invalidate(vma, haddr, pmd);
+
+	/*
+	 * Transfer dirty bit using value returned by pmd_invalidate() to be
+	 * sure we don't race with CPU that can set the bit under us.
+	 */
+	if (pmd_dirty(old))
+		SetPageDirty(page);
+
 	pmd_populate(mm, pmd, pgtable);
 
 	if (freeze) {
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCHv3 11/11] mm: Use updated pmdp_invalidate() interface to track dirty/accessed bits
@ 2017-09-12 15:39   ` Kirill A. Shutemov
  0 siblings, 0 replies; 33+ messages in thread
From: Kirill A. Shutemov @ 2017-09-12 15:39 UTC (permalink / raw)
  To: Andrew Morton, Vlastimil Babka, Vineet Gupta, Russell King,
	Will Deacon, Catalin Marinas, Ralf Baechle, David S. Miller,
	Aneesh Kumar K . V, Martin Schwidefsky, Heiko Carstens,
	Andrea Arcangeli
  Cc: linux-arch, linux-mm, linux-kernel, Kirill A. Shutemov

This patch uses modifed pmdp_invalidate(), that return previous value of pmd,
to transfer dirty and accessed bits.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 fs/proc/task_mmu.c |  8 ++++----
 mm/huge_memory.c   | 29 ++++++++++++-----------------
 2 files changed, 16 insertions(+), 21 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 7b40e11ede9b..fe5bff79031a 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -979,14 +979,14 @@ static inline void clear_soft_dirty(struct vm_area_struct *vma,
 static inline void clear_soft_dirty_pmd(struct vm_area_struct *vma,
 		unsigned long addr, pmd_t *pmdp)
 {
-	pmd_t pmd = *pmdp;
+	pmd_t old, pmd = *pmdp;
 
 	if (pmd_present(pmd)) {
 		/* See comment in change_huge_pmd() */
-		pmdp_invalidate(vma, addr, pmdp);
-		if (pmd_dirty(*pmdp))
+		old = pmdp_invalidate(vma, addr, pmdp);
+		if (pmd_dirty(old))
 			pmd = pmd_mkdirty(pmd);
-		if (pmd_young(*pmdp))
+		if (pmd_young(old))
 			pmd = pmd_mkyoung(pmd);
 
 		pmd = pmd_wrprotect(pmd);
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 269b5df58543..c288c3ce9658 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1900,17 +1900,7 @@ int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
 	 * pmdp_invalidate() is required to make sure we don't miss
 	 * dirty/young flags set by hardware.
 	 */
-	entry = *pmd;
-	pmdp_invalidate(vma, addr, pmd);
-
-	/*
-	 * Recover dirty/young flags.  It relies on pmdp_invalidate to not
-	 * corrupt them.
-	 */
-	if (pmd_dirty(*pmd))
-		entry = pmd_mkdirty(entry);
-	if (pmd_young(*pmd))
-		entry = pmd_mkyoung(entry);
+	entry = pmdp_invalidate(vma, addr, pmd);
 
 	entry = pmd_modify(entry, newprot);
 	if (preserve_write)
@@ -2051,8 +2041,8 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
 	struct mm_struct *mm = vma->vm_mm;
 	struct page *page;
 	pgtable_t pgtable;
-	pmd_t _pmd;
-	bool young, write, dirty, soft_dirty, pmd_migration = false;
+	pmd_t old, _pmd;
+	bool young, write, soft_dirty, pmd_migration = false;
 	unsigned long addr;
 	int i;
 
@@ -2099,7 +2089,6 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
 	page_ref_add(page, HPAGE_PMD_NR - 1);
 	write = pmd_write(*pmd);
 	young = pmd_young(*pmd);
-	dirty = pmd_dirty(*pmd);
 	soft_dirty = pmd_soft_dirty(*pmd);
 
 	pmdp_huge_split_prepare(vma, haddr, pmd);
@@ -2129,8 +2118,6 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
 			if (soft_dirty)
 				entry = pte_mksoft_dirty(entry);
 		}
-		if (dirty)
-			SetPageDirty(page + i);
 		pte = pte_offset_map(&_pmd, addr);
 		BUG_ON(!pte_none(*pte));
 		set_pte_at(mm, addr, pte, entry);
@@ -2179,7 +2166,15 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
 	 * and finally we write the non-huge version of the pmd entry with
 	 * pmd_populate.
 	 */
-	pmdp_invalidate(vma, haddr, pmd);
+	old = pmdp_invalidate(vma, haddr, pmd);
+
+	/*
+	 * Transfer dirty bit using value returned by pmd_invalidate() to be
+	 * sure we don't race with CPU that can set the bit under us.
+	 */
+	if (pmd_dirty(old))
+		SetPageDirty(page);
+
 	pmd_populate(mm, pmd, pgtable);
 
 	if (freeze) {
-- 
2.14.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCHv3 11/11] mm: Use updated pmdp_invalidate() interface to track dirty/accessed bits
  2017-09-12 15:39   ` Kirill A. Shutemov
  (?)
@ 2017-09-13  2:08     ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 33+ messages in thread
From: Aneesh Kumar K.V @ 2017-09-13  2:08 UTC (permalink / raw)
  To: Kirill A. Shutemov, Andrew Morton, Vlastimil Babka, Vineet Gupta,
	Russell King, Will Deacon, Catalin Marinas, Ralf Baechle,
	David S. Miller, Martin Schwidefsky, Heiko Carstens,
	Andrea Arcangeli
  Cc: linux-arch, linux-mm, linux-kernel, Kirill A. Shutemov


How about this additional patch ?. This results in code reduction.

>From fed62d0541ae78206a1a25caeb46a3ffa7ade9c8 Mon Sep 17 00:00:00 2001
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Date: Thu, 27 Jul 2017 12:21:33 +0530
Subject: [PATCH] mm/thp: Remove pmd_huge_split_prepare

Instead of marking the pmd ready for split, invalidate the pmd. This should
take care of powerpc requirement. Only side effect is that we mark the pmd
invalid early. This can result in us blocking access to the page a bit longer
if we race against a thp split.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/book3s/64/hash-4k.h  |  2 -
 arch/powerpc/include/asm/book3s/64/hash-64k.h |  2 -
 arch/powerpc/include/asm/book3s/64/pgtable.h  |  9 ----
 arch/powerpc/include/asm/book3s/64/radix.h    |  6 ---
 arch/powerpc/mm/pgtable-hash64.c              | 22 --------
 include/asm-generic/pgtable.h                 |  8 ---
 mm/huge_memory.c                              | 73 +++++++++++++--------------
 7 files changed, 35 insertions(+), 87 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash-4k.h b/arch/powerpc/include/asm/book3s/64/hash-4k.h
index d65dcb5826ff..2416edb74d28 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-4k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-4k.h
@@ -112,8 +112,6 @@ extern pmd_t hash__pmdp_collapse_flush(struct vm_area_struct *vma,
 extern void hash__pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp,
 					 pgtable_t pgtable);
 extern pgtable_t hash__pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp);
-extern void hash__pmdp_huge_split_prepare(struct vm_area_struct *vma,
-				      unsigned long address, pmd_t *pmdp);
 extern pmd_t hash__pmdp_huge_get_and_clear(struct mm_struct *mm,
 				       unsigned long addr, pmd_t *pmdp);
 extern int hash__has_transparent_hugepage(void);
diff --git a/arch/powerpc/include/asm/book3s/64/hash-64k.h b/arch/powerpc/include/asm/book3s/64/hash-64k.h
index ab36323b8a3e..001202cabedf 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-64k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-64k.h
@@ -162,8 +162,6 @@ extern pmd_t hash__pmdp_collapse_flush(struct vm_area_struct *vma,
 extern void hash__pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp,
 					 pgtable_t pgtable);
 extern pgtable_t hash__pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp);
-extern void hash__pmdp_huge_split_prepare(struct vm_area_struct *vma,
-				      unsigned long address, pmd_t *pmdp);
 extern pmd_t hash__pmdp_huge_get_and_clear(struct mm_struct *mm,
 				       unsigned long addr, pmd_t *pmdp);
 extern int hash__has_transparent_hugepage(void);
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 6cf53dc70efc..fee01ffe3b60 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -1114,15 +1114,6 @@ static inline pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm,
 extern pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
 			     pmd_t *pmdp);
 
-#define __HAVE_ARCH_PMDP_HUGE_SPLIT_PREPARE
-static inline void pmdp_huge_split_prepare(struct vm_area_struct *vma,
-					   unsigned long address, pmd_t *pmdp)
-{
-	if (radix_enabled())
-		return radix__pmdp_huge_split_prepare(vma, address, pmdp);
-	return hash__pmdp_huge_split_prepare(vma, address, pmdp);
-}
-
 #define pmd_move_must_withdraw pmd_move_must_withdraw
 struct spinlock;
 static inline int pmd_move_must_withdraw(struct spinlock *new_pmd_ptl,
diff --git a/arch/powerpc/include/asm/book3s/64/radix.h b/arch/powerpc/include/asm/book3s/64/radix.h
index f5ece365d929..389be8b6c9f7 100644
--- a/arch/powerpc/include/asm/book3s/64/radix.h
+++ b/arch/powerpc/include/asm/book3s/64/radix.h
@@ -272,12 +272,6 @@ static inline pmd_t radix__pmd_mkhuge(pmd_t pmd)
 		return __pmd(pmd_val(pmd) | _PAGE_PTE | R_PAGE_LARGE);
 	return __pmd(pmd_val(pmd) | _PAGE_PTE);
 }
-static inline void radix__pmdp_huge_split_prepare(struct vm_area_struct *vma,
-					    unsigned long address, pmd_t *pmdp)
-{
-	/* Nothing to do for radix. */
-	return;
-}
 
 extern unsigned long radix__pmd_hugepage_update(struct mm_struct *mm, unsigned long addr,
 					  pmd_t *pmdp, unsigned long clr,
diff --git a/arch/powerpc/mm/pgtable-hash64.c b/arch/powerpc/mm/pgtable-hash64.c
index ec277913e01b..469808e77e58 100644
--- a/arch/powerpc/mm/pgtable-hash64.c
+++ b/arch/powerpc/mm/pgtable-hash64.c
@@ -296,28 +296,6 @@ pgtable_t hash__pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp)
 	return pgtable;
 }
 
-void hash__pmdp_huge_split_prepare(struct vm_area_struct *vma,
-			       unsigned long address, pmd_t *pmdp)
-{
-	VM_BUG_ON(address & ~HPAGE_PMD_MASK);
-	VM_BUG_ON(REGION_ID(address) != USER_REGION_ID);
-	VM_BUG_ON(pmd_devmap(*pmdp));
-
-	/*
-	 * We can't mark the pmd none here, because that will cause a race
-	 * against exit_mmap. We need to continue mark pmd TRANS HUGE, while
-	 * we spilt, but at the same time we wan't rest of the ppc64 code
-	 * not to insert hash pte on this, because we will be modifying
-	 * the deposited pgtable in the caller of this function. Hence
-	 * clear the _PAGE_USER so that we move the fault handling to
-	 * higher level function and that will serialize against ptl.
-	 * We need to flush existing hash pte entries here even though,
-	 * the translation is still valid, because we will withdraw
-	 * pgtable_t after this.
-	 */
-	pmd_hugepage_update(vma->vm_mm, address, pmdp, 0, _PAGE_PRIVILEGED);
-}
-
 /*
  * A linux hugepage PMD was changed and the corresponding hash table entries
  * neesd to be flushed.
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index ece5e399567a..b934e41277ac 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -313,14 +313,6 @@ extern pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
 			    pmd_t *pmdp);
 #endif
 
-#ifndef __HAVE_ARCH_PMDP_HUGE_SPLIT_PREPARE
-static inline void pmdp_huge_split_prepare(struct vm_area_struct *vma,
-					   unsigned long address, pmd_t *pmdp)
-{
-
-}
-#endif
-
 #ifndef __HAVE_ARCH_PTE_SAME
 static inline int pte_same(pte_t pte_a, pte_t pte_b)
 {
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index d72c2d20e9c6..59ec8c916368 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1944,8 +1944,8 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
 	struct mm_struct *mm = vma->vm_mm;
 	struct page *page;
 	pgtable_t pgtable;
-	pmd_t old, _pmd;
-	bool young, write, soft_dirty;
+	pmd_t old_pmd, _pmd;
+	bool young, write, dirty, soft_dirty;
 	unsigned long addr;
 	int i;
 
@@ -1977,14 +1977,39 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
 		return __split_huge_zero_page_pmd(vma, haddr, pmd);
 	}
 
-	page = pmd_page(*pmd);
+	/*
+	 * Up to this point the pmd is present and huge and userland has the
+	 * whole access to the hugepage during the split (which happens in
+	 * place). If we overwrite the pmd with the not-huge version pointing
+	 * to the pte here (which of course we could if all CPUs were bug
+	 * free), userland could trigger a small page size TLB miss on the
+	 * small sized TLB while the hugepage TLB entry is still established in
+	 * the huge TLB. Some CPU doesn't like that.
+	 * See http://support.amd.com/us/Processor_TechDocs/41322.pdf, Erratum
+	 * 383 on page 93. Intel should be safe but is also warns that it's
+	 * only safe if the permission and cache attributes of the two entries
+	 * loaded in the two TLB is identical (which should be the case here).
+	 * But it is generally safer to never allow small and huge TLB entries
+	 * for the same virtual address to be loaded simultaneously. So instead
+	 * of doing "pmd_populate(); flush_pmd_tlb_range();" we first mark the
+	 * current pmd notpresent (atomically because here the pmd_trans_huge
+	 * and pmd_trans_splitting must remain set at all times on the pmd
+	 * until the split is complete for this pmd), then we flush the SMP TLB
+	 * and finally we write the non-huge version of the pmd entry with
+	 * pmd_populate.
+	 */
+	old_pmd = pmdp_invalidate(vma, haddr, pmd);
+
+	page = pmd_page(old_pmd);
 	VM_BUG_ON_PAGE(!page_count(page), page);
 	page_ref_add(page, HPAGE_PMD_NR - 1);
-	write = pmd_write(*pmd);
-	young = pmd_young(*pmd);
-	soft_dirty = pmd_soft_dirty(*pmd);
-
-	pmdp_huge_split_prepare(vma, haddr, pmd);
+	write = pmd_write(old_pmd);
+	young = pmd_young(old_pmd);
+	dirty = pmd_dirty(old_pmd);
+	soft_dirty = pmd_soft_dirty(old_pmd);
+	/*
+	 * withdraw the table only after we mark the pmd entry invalid
+	 */
 	pgtable = pgtable_trans_huge_withdraw(mm, pmd);
 	pmd_populate(mm, &_pmd, pgtable);
 
@@ -2011,6 +2036,8 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
 			if (soft_dirty)
 				entry = pte_mksoft_dirty(entry);
 		}
+		if (dirty)
+			SetPageDirty(page + i);
 		pte = pte_offset_map(&_pmd, addr);
 		BUG_ON(!pte_none(*pte));
 		set_pte_at(mm, addr, pte, entry);
@@ -2038,36 +2065,6 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
 	}
 
 	smp_wmb(); /* make pte visible before pmd */
-	/*
-	 * Up to this point the pmd is present and huge and userland has the
-	 * whole access to the hugepage during the split (which happens in
-	 * place). If we overwrite the pmd with the not-huge version pointing
-	 * to the pte here (which of course we could if all CPUs were bug
-	 * free), userland could trigger a small page size TLB miss on the
-	 * small sized TLB while the hugepage TLB entry is still established in
-	 * the huge TLB. Some CPU doesn't like that.
-	 * See http://support.amd.com/us/Processor_TechDocs/41322.pdf, Erratum
-	 * 383 on page 93. Intel should be safe but is also warns that it's
-	 * only safe if the permission and cache attributes of the two entries
-	 * loaded in the two TLB is identical (which should be the case here).
-	 * But it is generally safer to never allow small and huge TLB entries
-	 * for the same virtual address to be loaded simultaneously. So instead
-	 * of doing "pmd_populate(); flush_pmd_tlb_range();" we first mark the
-	 * current pmd notpresent (atomically because here the pmd_trans_huge
-	 * and pmd_trans_splitting must remain set at all times on the pmd
-	 * until the split is complete for this pmd), then we flush the SMP TLB
-	 * and finally we write the non-huge version of the pmd entry with
-	 * pmd_populate.
-	 */
-	old = pmdp_invalidate(vma, haddr, pmd);
-
-	/*
-	 * Transfer dirty bit using value returned by pmd_invalidate() to be
-	 * sure we don't race with CPU that can set the bit under us.
-	 */
-	if (pmd_dirty(old))
-		SetPageDirty(page);
-
 	pmd_populate(mm, pmd, pgtable);
 
 	if (freeze) {
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCHv3 11/11] mm: Use updated pmdp_invalidate() interface to track dirty/accessed bits
@ 2017-09-13  2:08     ` Aneesh Kumar K.V
  0 siblings, 0 replies; 33+ messages in thread
From: Aneesh Kumar K.V @ 2017-09-13  2:08 UTC (permalink / raw)
  To: Andrew Morton, Vlastimil Babka, Vineet Gupta, Russell King,
	Will Deacon, Catalin Marinas, Ralf Baechle, David S. Miller,
	Martin Schwidefsky, Heiko Carstens, Andrea Arcangeli
  Cc: linux-arch, linux-mm, linux-kernel, Kirill A. Shutemov


How about this additional patch ?. This results in code reduction.

From fed62d0541ae78206a1a25caeb46a3ffa7ade9c8 Mon Sep 17 00:00:00 2001
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Date: Thu, 27 Jul 2017 12:21:33 +0530
Subject: [PATCH] mm/thp: Remove pmd_huge_split_prepare

Instead of marking the pmd ready for split, invalidate the pmd. This should
take care of powerpc requirement. Only side effect is that we mark the pmd
invalid early. This can result in us blocking access to the page a bit longer
if we race against a thp split.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/book3s/64/hash-4k.h  |  2 -
 arch/powerpc/include/asm/book3s/64/hash-64k.h |  2 -
 arch/powerpc/include/asm/book3s/64/pgtable.h  |  9 ----
 arch/powerpc/include/asm/book3s/64/radix.h    |  6 ---
 arch/powerpc/mm/pgtable-hash64.c              | 22 --------
 include/asm-generic/pgtable.h                 |  8 ---
 mm/huge_memory.c                              | 73 +++++++++++++--------------
 7 files changed, 35 insertions(+), 87 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash-4k.h b/arch/powerpc/include/asm/book3s/64/hash-4k.h
index d65dcb5826ff..2416edb74d28 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-4k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-4k.h
@@ -112,8 +112,6 @@ extern pmd_t hash__pmdp_collapse_flush(struct vm_area_struct *vma,
 extern void hash__pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp,
 					 pgtable_t pgtable);
 extern pgtable_t hash__pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp);
-extern void hash__pmdp_huge_split_prepare(struct vm_area_struct *vma,
-				      unsigned long address, pmd_t *pmdp);
 extern pmd_t hash__pmdp_huge_get_and_clear(struct mm_struct *mm,
 				       unsigned long addr, pmd_t *pmdp);
 extern int hash__has_transparent_hugepage(void);
diff --git a/arch/powerpc/include/asm/book3s/64/hash-64k.h b/arch/powerpc/include/asm/book3s/64/hash-64k.h
index ab36323b8a3e..001202cabedf 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-64k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-64k.h
@@ -162,8 +162,6 @@ extern pmd_t hash__pmdp_collapse_flush(struct vm_area_struct *vma,
 extern void hash__pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp,
 					 pgtable_t pgtable);
 extern pgtable_t hash__pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp);
-extern void hash__pmdp_huge_split_prepare(struct vm_area_struct *vma,
-				      unsigned long address, pmd_t *pmdp);
 extern pmd_t hash__pmdp_huge_get_and_clear(struct mm_struct *mm,
 				       unsigned long addr, pmd_t *pmdp);
 extern int hash__has_transparent_hugepage(void);
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 6cf53dc70efc..fee01ffe3b60 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -1114,15 +1114,6 @@ static inline pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm,
 extern pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
 			     pmd_t *pmdp);
 
-#define __HAVE_ARCH_PMDP_HUGE_SPLIT_PREPARE
-static inline void pmdp_huge_split_prepare(struct vm_area_struct *vma,
-					   unsigned long address, pmd_t *pmdp)
-{
-	if (radix_enabled())
-		return radix__pmdp_huge_split_prepare(vma, address, pmdp);
-	return hash__pmdp_huge_split_prepare(vma, address, pmdp);
-}
-
 #define pmd_move_must_withdraw pmd_move_must_withdraw
 struct spinlock;
 static inline int pmd_move_must_withdraw(struct spinlock *new_pmd_ptl,
diff --git a/arch/powerpc/include/asm/book3s/64/radix.h b/arch/powerpc/include/asm/book3s/64/radix.h
index f5ece365d929..389be8b6c9f7 100644
--- a/arch/powerpc/include/asm/book3s/64/radix.h
+++ b/arch/powerpc/include/asm/book3s/64/radix.h
@@ -272,12 +272,6 @@ static inline pmd_t radix__pmd_mkhuge(pmd_t pmd)
 		return __pmd(pmd_val(pmd) | _PAGE_PTE | R_PAGE_LARGE);
 	return __pmd(pmd_val(pmd) | _PAGE_PTE);
 }
-static inline void radix__pmdp_huge_split_prepare(struct vm_area_struct *vma,
-					    unsigned long address, pmd_t *pmdp)
-{
-	/* Nothing to do for radix. */
-	return;
-}
 
 extern unsigned long radix__pmd_hugepage_update(struct mm_struct *mm, unsigned long addr,
 					  pmd_t *pmdp, unsigned long clr,
diff --git a/arch/powerpc/mm/pgtable-hash64.c b/arch/powerpc/mm/pgtable-hash64.c
index ec277913e01b..469808e77e58 100644
--- a/arch/powerpc/mm/pgtable-hash64.c
+++ b/arch/powerpc/mm/pgtable-hash64.c
@@ -296,28 +296,6 @@ pgtable_t hash__pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp)
 	return pgtable;
 }
 
-void hash__pmdp_huge_split_prepare(struct vm_area_struct *vma,
-			       unsigned long address, pmd_t *pmdp)
-{
-	VM_BUG_ON(address & ~HPAGE_PMD_MASK);
-	VM_BUG_ON(REGION_ID(address) != USER_REGION_ID);
-	VM_BUG_ON(pmd_devmap(*pmdp));
-
-	/*
-	 * We can't mark the pmd none here, because that will cause a race
-	 * against exit_mmap. We need to continue mark pmd TRANS HUGE, while
-	 * we spilt, but at the same time we wan't rest of the ppc64 code
-	 * not to insert hash pte on this, because we will be modifying
-	 * the deposited pgtable in the caller of this function. Hence
-	 * clear the _PAGE_USER so that we move the fault handling to
-	 * higher level function and that will serialize against ptl.
-	 * We need to flush existing hash pte entries here even though,
-	 * the translation is still valid, because we will withdraw
-	 * pgtable_t after this.
-	 */
-	pmd_hugepage_update(vma->vm_mm, address, pmdp, 0, _PAGE_PRIVILEGED);
-}
-
 /*
  * A linux hugepage PMD was changed and the corresponding hash table entries
  * neesd to be flushed.
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index ece5e399567a..b934e41277ac 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -313,14 +313,6 @@ extern pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
 			    pmd_t *pmdp);
 #endif
 
-#ifndef __HAVE_ARCH_PMDP_HUGE_SPLIT_PREPARE
-static inline void pmdp_huge_split_prepare(struct vm_area_struct *vma,
-					   unsigned long address, pmd_t *pmdp)
-{
-
-}
-#endif
-
 #ifndef __HAVE_ARCH_PTE_SAME
 static inline int pte_same(pte_t pte_a, pte_t pte_b)
 {
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index d72c2d20e9c6..59ec8c916368 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1944,8 +1944,8 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
 	struct mm_struct *mm = vma->vm_mm;
 	struct page *page;
 	pgtable_t pgtable;
-	pmd_t old, _pmd;
-	bool young, write, soft_dirty;
+	pmd_t old_pmd, _pmd;
+	bool young, write, dirty, soft_dirty;
 	unsigned long addr;
 	int i;
 
@@ -1977,14 +1977,39 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
 		return __split_huge_zero_page_pmd(vma, haddr, pmd);
 	}
 
-	page = pmd_page(*pmd);
+	/*
+	 * Up to this point the pmd is present and huge and userland has the
+	 * whole access to the hugepage during the split (which happens in
+	 * place). If we overwrite the pmd with the not-huge version pointing
+	 * to the pte here (which of course we could if all CPUs were bug
+	 * free), userland could trigger a small page size TLB miss on the
+	 * small sized TLB while the hugepage TLB entry is still established in
+	 * the huge TLB. Some CPU doesn't like that.
+	 * See http://support.amd.com/us/Processor_TechDocs/41322.pdf, Erratum
+	 * 383 on page 93. Intel should be safe but is also warns that it's
+	 * only safe if the permission and cache attributes of the two entries
+	 * loaded in the two TLB is identical (which should be the case here).
+	 * But it is generally safer to never allow small and huge TLB entries
+	 * for the same virtual address to be loaded simultaneously. So instead
+	 * of doing "pmd_populate(); flush_pmd_tlb_range();" we first mark the
+	 * current pmd notpresent (atomically because here the pmd_trans_huge
+	 * and pmd_trans_splitting must remain set at all times on the pmd
+	 * until the split is complete for this pmd), then we flush the SMP TLB
+	 * and finally we write the non-huge version of the pmd entry with
+	 * pmd_populate.
+	 */
+	old_pmd = pmdp_invalidate(vma, haddr, pmd);
+
+	page = pmd_page(old_pmd);
 	VM_BUG_ON_PAGE(!page_count(page), page);
 	page_ref_add(page, HPAGE_PMD_NR - 1);
-	write = pmd_write(*pmd);
-	young = pmd_young(*pmd);
-	soft_dirty = pmd_soft_dirty(*pmd);
-
-	pmdp_huge_split_prepare(vma, haddr, pmd);
+	write = pmd_write(old_pmd);
+	young = pmd_young(old_pmd);
+	dirty = pmd_dirty(old_pmd);
+	soft_dirty = pmd_soft_dirty(old_pmd);
+	/*
+	 * withdraw the table only after we mark the pmd entry invalid
+	 */
 	pgtable = pgtable_trans_huge_withdraw(mm, pmd);
 	pmd_populate(mm, &_pmd, pgtable);
 
@@ -2011,6 +2036,8 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
 			if (soft_dirty)
 				entry = pte_mksoft_dirty(entry);
 		}
+		if (dirty)
+			SetPageDirty(page + i);
 		pte = pte_offset_map(&_pmd, addr);
 		BUG_ON(!pte_none(*pte));
 		set_pte_at(mm, addr, pte, entry);
@@ -2038,36 +2065,6 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
 	}
 
 	smp_wmb(); /* make pte visible before pmd */
-	/*
-	 * Up to this point the pmd is present and huge and userland has the
-	 * whole access to the hugepage during the split (which happens in
-	 * place). If we overwrite the pmd with the not-huge version pointing
-	 * to the pte here (which of course we could if all CPUs were bug
-	 * free), userland could trigger a small page size TLB miss on the
-	 * small sized TLB while the hugepage TLB entry is still established in
-	 * the huge TLB. Some CPU doesn't like that.
-	 * See http://support.amd.com/us/Processor_TechDocs/41322.pdf, Erratum
-	 * 383 on page 93. Intel should be safe but is also warns that it's
-	 * only safe if the permission and cache attributes of the two entries
-	 * loaded in the two TLB is identical (which should be the case here).
-	 * But it is generally safer to never allow small and huge TLB entries
-	 * for the same virtual address to be loaded simultaneously. So instead
-	 * of doing "pmd_populate(); flush_pmd_tlb_range();" we first mark the
-	 * current pmd notpresent (atomically because here the pmd_trans_huge
-	 * and pmd_trans_splitting must remain set at all times on the pmd
-	 * until the split is complete for this pmd), then we flush the SMP TLB
-	 * and finally we write the non-huge version of the pmd entry with
-	 * pmd_populate.
-	 */
-	old = pmdp_invalidate(vma, haddr, pmd);
-
-	/*
-	 * Transfer dirty bit using value returned by pmd_invalidate() to be
-	 * sure we don't race with CPU that can set the bit under us.
-	 */
-	if (pmd_dirty(old))
-		SetPageDirty(page);
-
 	pmd_populate(mm, pmd, pgtable);
 
 	if (freeze) {
-- 
2.13.5


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCHv3 11/11] mm: Use updated pmdp_invalidate() interface to track dirty/accessed bits
@ 2017-09-13  2:08     ` Aneesh Kumar K.V
  0 siblings, 0 replies; 33+ messages in thread
From: Aneesh Kumar K.V @ 2017-09-13  2:08 UTC (permalink / raw)
  To: Kirill A. Shutemov, Andrew Morton, Vlastimil Babka, Vineet Gupta,
	Russell King, Will Deacon, Catalin Marinas, Ralf Baechle,
	David S. Miller, Martin Schwidefsky, Heiko Carstens,
	Andrea Arcangeli
  Cc: linux-arch, linux-mm, linux-kernel


How about this additional patch ?. This results in code reduction.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCHv3 08/11] sparc64: update pmdp_invalidate to return old pmd value
  2017-09-12 15:39   ` Kirill A. Shutemov
  (?)
  (?)
@ 2017-09-13  5:06     ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 33+ messages in thread
From: Aneesh Kumar K.V @ 2017-09-13  5:06 UTC (permalink / raw)
  To: Kirill A. Shutemov, Andrew Morton, Vlastimil Babka, Vineet Gupta,
	Russell King, Will Deacon, Catalin Marinas, Ralf Baechle,
	David S. Miller, Martin Schwidefsky, Heiko Carstens,
	Andrea Arcangeli
  Cc: linux-arch, linux-mm, linux-kernel, Nitin Gupta, Kirill A . Shutemov

"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> writes:

> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>


You got the author wrong there.

>
> It's required to avoid loosing dirty and accessed bits.
>
> Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>  arch/sparc/include/asm/pgtable_64.h |  2 +-
>  arch/sparc/mm/tlb.c                 | 23 ++++++++++++++++++-----
>  2 files changed, 19 insertions(+), 6 deletions(-)
>
> diff --git a/arch/sparc/include/asm/pgtable_64.h b/arch/sparc/include/asm/pgtable_64.h
> index 4fefe3762083..83b06c98bb94 100644
> --- a/arch/sparc/include/asm/pgtable_64.h
> +++ b/arch/sparc/include/asm/pgtable_64.h
> @@ -979,7 +979,7 @@ void update_mmu_cache_pmd(struct vm_area_struct *vma, unsigned long addr,
>  			  pmd_t *pmd);
>
>  #define __HAVE_ARCH_PMDP_INVALIDATE
> -extern void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
> +extern pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
>  			    pmd_t *pmdp);
>
>  #define __HAVE_ARCH_PGTABLE_DEPOSIT
> diff --git a/arch/sparc/mm/tlb.c b/arch/sparc/mm/tlb.c
> index ee8066c3d96c..d36c65fc55cf 100644
> --- a/arch/sparc/mm/tlb.c
> +++ b/arch/sparc/mm/tlb.c
> @@ -218,17 +218,28 @@ void set_pmd_at(struct mm_struct *mm, unsigned long addr,
>  	}
>  }
>
> +static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
> +		unsigned long address, pmd_t *pmdp, pmd_t pmd)
> +{
> +	pmd_t old;
> +
> +	{
> +		old = *pmdp;
> +	} while (cmpxchg64(&pmdp->pmd, old.pmd, pmd.pmd) != old.pmd);
> +
> +	return old;
> +}
> +
>  /*
>   * This routine is only called when splitting a THP
>   */
> -void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
> +pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
>  		     pmd_t *pmdp)
>  {
> -	pmd_t entry = *pmdp;
> -
> -	pmd_val(entry) &= ~_PAGE_VALID;
> +	pmd_t old, entry;
>
> -	set_pmd_at(vma->vm_mm, address, pmdp, entry);
> +	entry = __pmd(pmd_val(*pmdp) & ~_PAGE_VALID);
> +	old = pmdp_establish(vma, address, pmdp, entry);
>  	flush_tlb_range(vma, address, address + HPAGE_PMD_SIZE);
>
>  	/*
> @@ -239,6 +250,8 @@ void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
>  	if ((pmd_val(entry) & _PAGE_PMD_HUGE) &&
>  	    !is_huge_zero_page(pmd_page(entry)))
>  		(vma->vm_mm)->context.thp_pte_count--;
> +
> +	return old;
>  }
>
>  void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp,
> -- 
> 2.14.1

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCHv3 08/11] sparc64: update pmdp_invalidate to return old pmd value
@ 2017-09-13  5:06     ` Aneesh Kumar K.V
  0 siblings, 0 replies; 33+ messages in thread
From: Aneesh Kumar K.V @ 2017-09-13  5:06 UTC (permalink / raw)
  To: Andrew Morton, Vlastimil Babka, Vineet Gupta, Russell King,
	Will Deacon, Catalin Marinas, Ralf Baechle, David S. Miller,
	Martin Schwidefsky, Heiko Carstens, Andrea Arcangeli
  Cc: linux-arch, linux-mm, linux-kernel, Nitin Gupta, Kirill A . Shutemov

"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> writes:

> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>


You got the author wrong there.

>
> It's required to avoid loosing dirty and accessed bits.
>
> Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>  arch/sparc/include/asm/pgtable_64.h |  2 +-
>  arch/sparc/mm/tlb.c                 | 23 ++++++++++++++++++-----
>  2 files changed, 19 insertions(+), 6 deletions(-)
>
> diff --git a/arch/sparc/include/asm/pgtable_64.h b/arch/sparc/include/asm/pgtable_64.h
> index 4fefe3762083..83b06c98bb94 100644
> --- a/arch/sparc/include/asm/pgtable_64.h
> +++ b/arch/sparc/include/asm/pgtable_64.h
> @@ -979,7 +979,7 @@ void update_mmu_cache_pmd(struct vm_area_struct *vma, unsigned long addr,
>  			  pmd_t *pmd);
>
>  #define __HAVE_ARCH_PMDP_INVALIDATE
> -extern void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
> +extern pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
>  			    pmd_t *pmdp);
>
>  #define __HAVE_ARCH_PGTABLE_DEPOSIT
> diff --git a/arch/sparc/mm/tlb.c b/arch/sparc/mm/tlb.c
> index ee8066c3d96c..d36c65fc55cf 100644
> --- a/arch/sparc/mm/tlb.c
> +++ b/arch/sparc/mm/tlb.c
> @@ -218,17 +218,28 @@ void set_pmd_at(struct mm_struct *mm, unsigned long addr,
>  	}
>  }
>
> +static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
> +		unsigned long address, pmd_t *pmdp, pmd_t pmd)
> +{
> +	pmd_t old;
> +
> +	{
> +		old = *pmdp;
> +	} while (cmpxchg64(&pmdp->pmd, old.pmd, pmd.pmd) != old.pmd);
> +
> +	return old;
> +}
> +
>  /*
>   * This routine is only called when splitting a THP
>   */
> -void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
> +pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
>  		     pmd_t *pmdp)
>  {
> -	pmd_t entry = *pmdp;
> -
> -	pmd_val(entry) &= ~_PAGE_VALID;
> +	pmd_t old, entry;
>
> -	set_pmd_at(vma->vm_mm, address, pmdp, entry);
> +	entry = __pmd(pmd_val(*pmdp) & ~_PAGE_VALID);
> +	old = pmdp_establish(vma, address, pmdp, entry);
>  	flush_tlb_range(vma, address, address + HPAGE_PMD_SIZE);
>
>  	/*
> @@ -239,6 +250,8 @@ void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
>  	if ((pmd_val(entry) & _PAGE_PMD_HUGE) &&
>  	    !is_huge_zero_page(pmd_page(entry)))
>  		(vma->vm_mm)->context.thp_pte_count--;
> +
> +	return old;
>  }
>
>  void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp,
> -- 
> 2.14.1

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCHv3 08/11] sparc64: update pmdp_invalidate to return old pmd value
@ 2017-09-13  5:06     ` Aneesh Kumar K.V
  0 siblings, 0 replies; 33+ messages in thread
From: Aneesh Kumar K.V @ 2017-09-13  5:06 UTC (permalink / raw)
  To: Kirill A. Shutemov, Andrew Morton, Vlastimil Babka, Vineet Gupta,
	Russell King, Will Deacon, Catalin Marinas, Ralf Baechle,
	David S. Miller, Martin Schwidefsky, Heiko Carstens,
	Andrea Arcangeli
  Cc: linux-arch, linux-mm, linux-kernel, Nitin Gupta

"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> writes:

> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>


You got the author wrong there.

>
> It's required to avoid loosing dirty and accessed bits.
>
> Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>  arch/sparc/include/asm/pgtable_64.h |  2 +-
>  arch/sparc/mm/tlb.c                 | 23 ++++++++++++++++++-----
>  2 files changed, 19 insertions(+), 6 deletions(-)
>
> diff --git a/arch/sparc/include/asm/pgtable_64.h b/arch/sparc/include/asm/pgtable_64.h
> index 4fefe3762083..83b06c98bb94 100644
> --- a/arch/sparc/include/asm/pgtable_64.h
> +++ b/arch/sparc/include/asm/pgtable_64.h
> @@ -979,7 +979,7 @@ void update_mmu_cache_pmd(struct vm_area_struct *vma, unsigned long addr,
>  			  pmd_t *pmd);
>
>  #define __HAVE_ARCH_PMDP_INVALIDATE
> -extern void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
> +extern pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
>  			    pmd_t *pmdp);
>
>  #define __HAVE_ARCH_PGTABLE_DEPOSIT
> diff --git a/arch/sparc/mm/tlb.c b/arch/sparc/mm/tlb.c
> index ee8066c3d96c..d36c65fc55cf 100644
> --- a/arch/sparc/mm/tlb.c
> +++ b/arch/sparc/mm/tlb.c
> @@ -218,17 +218,28 @@ void set_pmd_at(struct mm_struct *mm, unsigned long addr,
>  	}
>  }
>
> +static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
> +		unsigned long address, pmd_t *pmdp, pmd_t pmd)
> +{
> +	pmd_t old;
> +
> +	{
> +		old = *pmdp;
> +	} while (cmpxchg64(&pmdp->pmd, old.pmd, pmd.pmd) != old.pmd);
> +
> +	return old;
> +}
> +
>  /*
>   * This routine is only called when splitting a THP
>   */
> -void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
> +pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
>  		     pmd_t *pmdp)
>  {
> -	pmd_t entry = *pmdp;
> -
> -	pmd_val(entry) &= ~_PAGE_VALID;
> +	pmd_t old, entry;
>
> -	set_pmd_at(vma->vm_mm, address, pmdp, entry);
> +	entry = __pmd(pmd_val(*pmdp) & ~_PAGE_VALID);
> +	old = pmdp_establish(vma, address, pmdp, entry);
>  	flush_tlb_range(vma, address, address + HPAGE_PMD_SIZE);
>
>  	/*
> @@ -239,6 +250,8 @@ void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
>  	if ((pmd_val(entry) & _PAGE_PMD_HUGE) &&
>  	    !is_huge_zero_page(pmd_page(entry)))
>  		(vma->vm_mm)->context.thp_pte_count--;
> +
> +	return old;
>  }
>
>  void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp,
> -- 
> 2.14.1

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCHv3 08/11] sparc64: update pmdp_invalidate to return old pmd value
@ 2017-09-13  5:06     ` Aneesh Kumar K.V
  0 siblings, 0 replies; 33+ messages in thread
From: Aneesh Kumar K.V @ 2017-09-13  5:06 UTC (permalink / raw)
  To: Kirill A. Shutemov, Andrew Morton, Vlastimil Babka, Vineet Gupta,
	Russell King, Will Deacon, Catalin Marinas, Ralf Baechle,
	David S. Miller, Martin Schwidefsky, Heiko Carstens,
	Andrea Arcangeli
  Cc: linux-arch, linux-mm, linux-kernel, Nitin Gupta

"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> writes:

> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>


You got the author wrong there.

>
> It's required to avoid loosing dirty and accessed bits.
>
> Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>  arch/sparc/include/asm/pgtable_64.h |  2 +-
>  arch/sparc/mm/tlb.c                 | 23 ++++++++++++++++++-----
>  2 files changed, 19 insertions(+), 6 deletions(-)
>
> diff --git a/arch/sparc/include/asm/pgtable_64.h b/arch/sparc/include/asm/pgtable_64.h
> index 4fefe3762083..83b06c98bb94 100644
> --- a/arch/sparc/include/asm/pgtable_64.h
> +++ b/arch/sparc/include/asm/pgtable_64.h
> @@ -979,7 +979,7 @@ void update_mmu_cache_pmd(struct vm_area_struct *vma, unsigned long addr,
>  			  pmd_t *pmd);
>
>  #define __HAVE_ARCH_PMDP_INVALIDATE
> -extern void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
> +extern pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
>  			    pmd_t *pmdp);
>
>  #define __HAVE_ARCH_PGTABLE_DEPOSIT
> diff --git a/arch/sparc/mm/tlb.c b/arch/sparc/mm/tlb.c
> index ee8066c3d96c..d36c65fc55cf 100644
> --- a/arch/sparc/mm/tlb.c
> +++ b/arch/sparc/mm/tlb.c
> @@ -218,17 +218,28 @@ void set_pmd_at(struct mm_struct *mm, unsigned long addr,
>  	}
>  }
>
> +static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
> +		unsigned long address, pmd_t *pmdp, pmd_t pmd)
> +{
> +	pmd_t old;
> +
> +	{
> +		old = *pmdp;
> +	} while (cmpxchg64(&pmdp->pmd, old.pmd, pmd.pmd) != old.pmd);
> +
> +	return old;
> +}
> +
>  /*
>   * This routine is only called when splitting a THP
>   */
> -void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
> +pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
>  		     pmd_t *pmdp)
>  {
> -	pmd_t entry = *pmdp;
> -
> -	pmd_val(entry) &= ~_PAGE_VALID;
> +	pmd_t old, entry;
>
> -	set_pmd_at(vma->vm_mm, address, pmdp, entry);
> +	entry = __pmd(pmd_val(*pmdp) & ~_PAGE_VALID);
> +	old = pmdp_establish(vma, address, pmdp, entry);
>  	flush_tlb_range(vma, address, address + HPAGE_PMD_SIZE);
>
>  	/*
> @@ -239,6 +250,8 @@ void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
>  	if ((pmd_val(entry) & _PAGE_PMD_HUGE) &&
>  	    !is_huge_zero_page(pmd_page(entry)))
>  		(vma->vm_mm)->context.thp_pte_count--;
> +
> +	return old;
>  }
>
>  void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp,
> -- 
> 2.14.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCHv3 11/11] mm: Use updated pmdp_invalidate() interface to track dirty/accessed bits
  2017-09-13  2:08     ` Aneesh Kumar K.V
@ 2017-12-13 10:13       ` Kirill A. Shutemov
  -1 siblings, 0 replies; 33+ messages in thread
From: Kirill A. Shutemov @ 2017-12-13 10:13 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Andrew Morton, Vlastimil Babka, Vineet Gupta, Russell King,
	Will Deacon, Catalin Marinas, Ralf Baechle, David S. Miller,
	Martin Schwidefsky, Heiko Carstens, Andrea Arcangeli, linux-arch,
	linux-mm, linux-kernel

On Wed, Sep 13, 2017 at 02:08:58AM +0000, Aneesh Kumar K.V wrote:
> @@ -2011,6 +2036,8 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
>  			if (soft_dirty)
>  				entry = pte_mksoft_dirty(entry);
>  		}
> +		if (dirty)
> +			SetPageDirty(page + i);
>  		pte = pte_offset_map(&_pmd, addr);
>  		BUG_ON(!pte_none(*pte));
>  		set_pte_at(mm, addr, pte, entry);

The patch is fine. But we don't need to set every 4k dirty. We have single
dirty bit for whole THP. I'll change this part and sent the patch as part
of the series.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCHv3 11/11] mm: Use updated pmdp_invalidate() interface to track dirty/accessed bits
@ 2017-12-13 10:13       ` Kirill A. Shutemov
  0 siblings, 0 replies; 33+ messages in thread
From: Kirill A. Shutemov @ 2017-12-13 10:13 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Andrew Morton, Vlastimil Babka, Vineet Gupta, Russell King,
	Will Deacon, Catalin Marinas, Ralf Baechle, David S. Miller,
	Martin Schwidefsky, Heiko Carstens, Andrea Arcangeli, linux-arch,
	linux-mm, linux-kernel

On Wed, Sep 13, 2017 at 02:08:58AM +0000, Aneesh Kumar K.V wrote:
> @@ -2011,6 +2036,8 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
>  			if (soft_dirty)
>  				entry = pte_mksoft_dirty(entry);
>  		}
> +		if (dirty)
> +			SetPageDirty(page + i);
>  		pte = pte_offset_map(&_pmd, addr);
>  		BUG_ON(!pte_none(*pte));
>  		set_pte_at(mm, addr, pte, entry);

The patch is fine. But we don't need to set every 4k dirty. We have single
dirty bit for whole THP. I'll change this part and sent the patch as part
of the series.

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2017-12-13 10:14 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-12 15:39 [PATCHv3 00/11] Do not loose dirty bit on THP pages Kirill A. Shutemov
2017-09-12 15:39 ` Kirill A. Shutemov
2017-09-12 15:39 ` [PATCHv3 01/11] asm-generic: Provide generic_pmdp_establish() Kirill A. Shutemov
2017-09-12 15:39   ` Kirill A. Shutemov
2017-09-12 15:39 ` [PATCHv3 02/11] arc: Use generic_pmdp_establish as pmdp_establish Kirill A. Shutemov
2017-09-12 15:39   ` Kirill A. Shutemov
2017-09-12 15:39 ` [PATCHv3 03/11] arm/mm: Provide pmdp_establish() helper Kirill A. Shutemov
2017-09-12 15:39   ` Kirill A. Shutemov
2017-09-12 15:39 ` [PATCHv3 04/11] arm64: " Kirill A. Shutemov
2017-09-12 15:39   ` Kirill A. Shutemov
2017-09-12 15:39 ` [PATCHv3 05/11] mips: Use generic_pmdp_establish as pmdp_establish Kirill A. Shutemov
2017-09-12 15:39   ` Kirill A. Shutemov
2017-09-12 15:39 ` [PATCHv3 06/11] powerpc/mm: update pmdp_invalidate to return old pmd value Kirill A. Shutemov
2017-09-12 15:39   ` Kirill A. Shutemov
2017-09-12 15:39 ` [PATCHv3 07/11] s390/mm: Modify pmdp_invalidate to return old value Kirill A. Shutemov
2017-09-12 15:39   ` Kirill A. Shutemov
2017-09-12 15:39 ` [PATCHv3 08/11] sparc64: update pmdp_invalidate to return old pmd value Kirill A. Shutemov
2017-09-12 15:39   ` Kirill A. Shutemov
2017-09-13  5:06   ` Aneesh Kumar K.V
2017-09-13  5:06     ` Aneesh Kumar K.V
2017-09-13  5:06     ` Aneesh Kumar K.V
2017-09-13  5:06     ` Aneesh Kumar K.V
2017-09-12 15:39 ` [PATCHv3 09/11] x86/mm: Provide pmdp_establish() helper Kirill A. Shutemov
2017-09-12 15:39   ` Kirill A. Shutemov
2017-09-12 15:39 ` [PATCHv3 10/11] mm: Do not loose dirty and access bits in pmdp_invalidate() Kirill A. Shutemov
2017-09-12 15:39   ` Kirill A. Shutemov
2017-09-12 15:39 ` [PATCHv3 11/11] mm: Use updated pmdp_invalidate() interface to track dirty/accessed bits Kirill A. Shutemov
2017-09-12 15:39   ` Kirill A. Shutemov
2017-09-13  2:08   ` Aneesh Kumar K.V
2017-09-13  2:08     ` Aneesh Kumar K.V
2017-09-13  2:08     ` Aneesh Kumar K.V
2017-12-13 10:13     ` Kirill A. Shutemov
2017-12-13 10:13       ` Kirill A. Shutemov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.