linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] mm/thp: rework the pmd protect changing flow
@ 2020-01-23  7:55 Xuefeng Wang
  2020-01-23  7:55 ` [PATCH v2 1/2] mm: add helpers pmdp_modify_prot_start/commit Xuefeng Wang
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Xuefeng Wang @ 2020-01-23  7:55 UTC (permalink / raw)
  To: catalin.marinas, will, mark.rutland, arnd, akpm
  Cc: linux-arch, linux-mm, linux-kernel, linux-arm-kernel, chenzhou10,
	Xuefeng Wang

On KunPeng920 board. When changing permission of a large range region,
pmdp_invalidate() takes about 65% in profile (with hugepages) in JIT tool.
Kernel will flush tlb twice: first flush happens in pmdp_invalidate, second
flush happens at the end of change_protect_range(). The first pmdp_invalidate
is not necessary if the hardware support atomic pmdp changing. The atomic
changing pimd to zero can prevent the hardware from update asynchronous.
So reconstruct it and remove the first pmdp_invalidate. And the second tlb
flush can make sure the new tlb entry valid.

This patch series add a pmdp_modify_prot transaction abstraction firstly.
Then add pmdp_modify_prot_start() in arm64, which uses pmdp_huge_get_and_clear()
to atomically fetch the pmd and zero the entry.

After rework, the mprotect can get 3~13 times performace gain in range
64M to 512M on KunPeng920:

4K granule/THP on
memory size(M)	64	128	256	320	448	512
pre-patch	0.77	1.40	2.64	3.23	4.49	5.10
post-patch	0.20	0.23	0.28	0.31	0.37	0.39

Changes:
v2:
 - fix set_pmd_at compile problems

Xuefeng Wang (2):
  mm: add helpers pmdp_modify_prot_start/commit
  arm64: mm: rework the pmd protect changing flow

 arch/arm64/include/asm/pgtable.h | 14 +++++++++++++
 include/asm-generic/pgtable.h    | 35 ++++++++++++++++++++++++++++++++
 mm/huge_memory.c                 | 19 ++++++++---------
 3 files changed, 57 insertions(+), 11 deletions(-)

--
2.17.1


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH v2 1/2] mm: add helpers pmdp_modify_prot_start/commit
  2020-01-23  7:55 [PATCH 0/2] mm/thp: rework the pmd protect changing flow Xuefeng Wang
@ 2020-01-23  7:55 ` Xuefeng Wang
  2020-01-23  7:55 ` [PATCH v2 2/2] arm64: mm: rework the pmd protect changing flow Xuefeng Wang
  2020-01-23  8:24 ` [PATCH 0/2] mm/thp: " Anshuman Khandual
  2 siblings, 0 replies; 4+ messages in thread
From: Xuefeng Wang @ 2020-01-23  7:55 UTC (permalink / raw)
  To: catalin.marinas, will, mark.rutland, arnd, akpm
  Cc: linux-arch, linux-mm, linux-kernel, linux-arm-kernel, chenzhou10,
	Xuefeng Wang

Introduce helpers pmdp_modify_prot_start/commit to abstract
pmdp_modify_prot transaction. Helpers pmdp_modify_prot_start/commit are
functionally unchanged.

Signed-off-by: Xuefeng Wang <wxf.wang@hisilicon.com>
Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
---
 include/asm-generic/pgtable.h | 35 +++++++++++++++++++++++++++++++++++
 mm/huge_memory.c              | 19 ++++++++-----------
 2 files changed, 43 insertions(+), 11 deletions(-)

diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index 798ea36a0549..e81bd58a9170 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -673,6 +673,41 @@ static inline void ptep_modify_prot_commit(struct vm_area_struct *vma,
	__ptep_modify_prot_commit(vma, addr, ptep, pte);
 }
 #endif /* __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION */
+
+static inline pmd_t __pmdp_modify_prot_start(struct vm_area_struct *vma,
+						unsigned long addr,
+						pmd_t *pmdp)
+{
+	return pmdp_invalidate(vma, addr, pmdp);
+}
+
+static inline void __pmdp_modify_prot_commit(struct vm_area_struct *vma,
+						unsigned long addr,
+						pmd_t *pmdp, pmd_t pmd)
+{
+	set_pmd_at(vma->vm_mm, addr, pmdp, pmd);
+}
+
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+#ifndef __HAVE_ARCH_PMDP_MODIFY_PROT_TRANSACTION
+static inline pmd_t pmdp_modify_prot_start(struct vm_area_struct *vma,
+						unsigned long addr,
+						pmd_t *pmdp)
+{
+	__pmdp_modify_prot_start(vma, addr, pmdp);
+}
+#endif /* __HAVE_ARCH_PMDP_MODIFY_PROT_TRANSACTION */
+
+/*
+ * Commit an update to a pmd.
+ */
+static inline void pmdp_modify_prot_commit(struct vm_area_struct *vma,
+						unsigned long addr,
+						pmd_t *pmdp, pmd_t old_pmd, pmd_t pmd)
+{
+	__pmdp_modify_prot_commit(vma, addr, pmdp, pmd);
+}
+#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 #endif /* CONFIG_MMU */

 /*
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index a88093213674..53515a3c91dd 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1933,9 +1933,8 @@ bool move_huge_pmd(struct vm_area_struct *vma, unsigned long old_addr,
 int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
		unsigned long addr, pgprot_t newprot, int prot_numa)
 {
-	struct mm_struct *mm = vma->vm_mm;
	spinlock_t *ptl;
-	pmd_t entry;
+	pmd_t pmdnt, oldpmd;
	bool preserve_write;
	int ret;

@@ -1961,7 +1960,7 @@ int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
			newpmd = swp_entry_to_pmd(entry);
			if (pmd_swp_soft_dirty(*pmd))
				newpmd = pmd_swp_mksoft_dirty(newpmd);
-			set_pmd_at(mm, addr, pmd, newpmd);
+			set_pmd_at(vma->vm_mm, addr, pmd, newpmd);
		}
		goto unlock;
	}
@@ -1995,18 +1994,16 @@ int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
	 *
	 * The race makes MADV_DONTNEED miss the huge pmd and don't clear it
	 * which may break userspace.
-	 *
-	 * pmdp_invalidate() is required to make sure we don't miss
-	 * dirty/young flags set by hardware.
	 */
-	entry = pmdp_invalidate(vma, addr, pmd);

-	entry = pmd_modify(entry, newprot);
+	oldpmd = pmdp_modify_prot_start(vma, addr, pmd);
+	pmdnt = pmd_modify(oldpmd, newprot);
	if (preserve_write)
-		entry = pmd_mk_savedwrite(entry);
+		pmdnt = pmd_mk_savedwrite(pmdnt);
+	pmdp_modify_prot_commit(vma, addr, pmd, oldpmd, pmdnt);
+
	ret = HPAGE_PMD_NR;
-	set_pmd_at(mm, addr, pmd, entry);
-	BUG_ON(vma_is_anonymous(vma) && !preserve_write && pmd_write(entry));
+	BUG_ON(vma_is_anonymous(vma) && !preserve_write && pmd_write(pmdnt));
 unlock:
	spin_unlock(ptl);
	return ret;
--
2.17.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH v2 2/2] arm64: mm: rework the pmd protect changing flow
  2020-01-23  7:55 [PATCH 0/2] mm/thp: rework the pmd protect changing flow Xuefeng Wang
  2020-01-23  7:55 ` [PATCH v2 1/2] mm: add helpers pmdp_modify_prot_start/commit Xuefeng Wang
@ 2020-01-23  7:55 ` Xuefeng Wang
  2020-01-23  8:24 ` [PATCH 0/2] mm/thp: " Anshuman Khandual
  2 siblings, 0 replies; 4+ messages in thread
From: Xuefeng Wang @ 2020-01-23  7:55 UTC (permalink / raw)
  To: catalin.marinas, will, mark.rutland, arnd, akpm
  Cc: linux-arch, linux-mm, linux-kernel, linux-arm-kernel, chenzhou10,
	Xuefeng Wang

On KunPeng920 board. When changing permission of a large range region,
pmdp_invalidate() takes about 65% in profile (with hugepages) in JIT tool.
Kernel will flush tlb twice: first flush happens in pmdp_invalidate, second
flush happens at the end of change_protect_range(). The first pmdp_invalidate
is not necessary if the hardware support atomic pmdp changing. The atomic
changing pimd to zero can prevent the hardware from update asynchronous.
So reconstruct it and remove the first pmdp_invalidate. And the second tlb
flush can make sure the new tlb entry valid.

Add pmdp_modify_prot_start() in arm64, which uses pmdp_huge_get_and_clear()
to fetch the pmd and zero entry, preventing racing of any hardware updates.

After rework, the mprotect can get 3~13 times performace gain in range
64M to 512M.

4K granule/THP on
memory size(M)	64	128	256	320	448	512
pre-patch	0.77	1.40	2.64	3.23	4.49	5.10
post-patch	0.20	0.23	0.28	0.31	0.37	0.39

Signed-off-by: Xuefeng Wang <wxf.wang@hisilicon.com>
Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
---
 arch/arm64/include/asm/pgtable.h | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index cd5de0e40bfa..bccdaa5bd5f2 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -769,6 +769,20 @@ static inline pmd_t pmdp_huge_get_and_clear(struct mm_struct *mm,
 }
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */

+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+#define __HAVE_ARCH_PMDP_MODIFY_PROT_TRANSACTION
+static inline pmd_t pmdp_modify_prot_start(struct vm_area_struct *vma,
+						unsigned long addr,
+						pmd_t *pmdp)
+{
+	/*
+	 * Atomic change pmd to zero, prevent the hardware from update
+	 * aynchronously update it.
+	 */
+	return pmdp_huge_get_and_clear(vma->vm_mm, addr, pmdp);
+}
+#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
+
 /*
  * ptep_set_wrprotect - mark read-only while trasferring potential hardware
  * dirty status (PTE_DBM && !PTE_RDONLY) to the software PTE_DIRTY bit.
--
2.17.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH 0/2] mm/thp: rework the pmd protect changing flow
  2020-01-23  7:55 [PATCH 0/2] mm/thp: rework the pmd protect changing flow Xuefeng Wang
  2020-01-23  7:55 ` [PATCH v2 1/2] mm: add helpers pmdp_modify_prot_start/commit Xuefeng Wang
  2020-01-23  7:55 ` [PATCH v2 2/2] arm64: mm: rework the pmd protect changing flow Xuefeng Wang
@ 2020-01-23  8:24 ` Anshuman Khandual
  2 siblings, 0 replies; 4+ messages in thread
From: Anshuman Khandual @ 2020-01-23  8:24 UTC (permalink / raw)
  To: Xuefeng Wang, catalin.marinas, will, mark.rutland, arnd, akpm
  Cc: linux-arch, linux-mm, linux-kernel, linux-arm-kernel, chenzhou10



On 01/23/2020 01:25 PM, Xuefeng Wang wrote:
> On KunPeng920 board. When changing permission of a large range region,
> pmdp_invalidate() takes about 65% in profile (with hugepages) in JIT tool.
> Kernel will flush tlb twice: first flush happens in pmdp_invalidate, second
> flush happens at the end of change_protect_range(). The first pmdp_invalidate
> is not necessary if the hardware support atomic pmdp changing. The atomic
> changing pimd to zero can prevent the hardware from update asynchronous.
> So reconstruct it and remove the first pmdp_invalidate. And the second tlb
> flush can make sure the new tlb entry valid.
> 
> This patch series add a pmdp_modify_prot transaction abstraction firstly.
> Then add pmdp_modify_prot_start() in arm64, which uses pmdp_huge_get_and_clear()
> to atomically fetch the pmd and zero the entry.

There is a comment section in change_huge_pmd() which details how clearing
the PMD entry there (in prot_numa case) can potentially race with another
concurrent madvise(MADV_DONTNEED, ..) call. Here is the comment block for
reference.

        /*
         * In case prot_numa, we are under down_read(mmap_sem). It's critical
         * to not clear pmd intermittently to avoid race with MADV_DONTNEED
         * which is also under down_read(mmap_sem):
         *
         *      CPU0:                           CPU1:
         *                              change_huge_pmd(prot_numa=1)
         *                               pmdp_huge_get_and_clear_notify()
         * madvise_dontneed()
         *  zap_pmd_range()
         *   pmd_trans_huge(*pmd) == 0 (without ptl)
         *   // skip the pmd
         *                               set_pmd_at();
         *                               // pmd is re-established
         *
         * The race makes MADV_DONTNEED miss the huge pmd and don't clear it
         * which may break userspace.
         *
         * pmdp_invalidate() is required to make sure we don't miss
         * dirty/young flags set by hardware.
         */

By defining the new override with pmdp_huge_get_and_clear(), are not we
now exposed to above race condition ?

- Anshuman


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-01-23  8:23 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-23  7:55 [PATCH 0/2] mm/thp: rework the pmd protect changing flow Xuefeng Wang
2020-01-23  7:55 ` [PATCH v2 1/2] mm: add helpers pmdp_modify_prot_start/commit Xuefeng Wang
2020-01-23  7:55 ` [PATCH v2 2/2] arm64: mm: rework the pmd protect changing flow Xuefeng Wang
2020-01-23  8:24 ` [PATCH 0/2] mm/thp: " Anshuman Khandual

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).