LinuxPPC-Dev Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v4 0/9] Speedup mremap on ppc64
@ 2021-04-14  8:59 Aneesh Kumar K.V
  2021-04-14  8:59 ` [PATCH v4 1/9] selftest/mremap_test: Update the test to handle pagesize other than 4K Aneesh Kumar K.V
                   ` (8 more replies)
  0 siblings, 9 replies; 15+ messages in thread
From: Aneesh Kumar K.V @ 2021-04-14  8:59 UTC (permalink / raw)
  To: linux-mm, akpm; +Cc: kaleshsingh, npiggin, Aneesh Kumar K.V, joel, linuxppc-dev

Hi,

This patchset enables MOVE_PMD/MOVE_PUD support on power. This requires
the platform to support updating higher-level page tables without
updating page table entries. This also needs to invalidate the Page Walk
Cache on architecture supporting the same.

Changes from v3:
* Fix build error reported by kernel test robot
* Address review feedback.

Changes from v2:
* switch from using mmu_gather to flush_pte_tlb_pwc_range() 

Changes from v1:
* Rebase to recent upstream
* Fix build issues with tlb_gather_mmu changes


Aneesh Kumar K.V (9):
  selftest/mremap_test: Update the test to handle pagesize other than 4K
  selftest/mremap_test: Avoid crash with static build
  mm/mremap: Use pmd/pud_poplulate to update page table entries
  powerpc/mm/book3s64: Fix possible build error
  powerpc/mm/book3s64: Update tlb flush routines to take a page walk
    cache flush argument
  mm/mremap: Use range flush that does TLB and page walk cache flush
  mm/mremap: Move TLB flush outside page table lock
  mm/mremap: Allow arch runtime override
  powerpc/mm: Enable move pmd/pud

 .../include/asm/book3s/64/tlbflush-radix.h    |  19 +--
 arch/powerpc/include/asm/book3s/64/tlbflush.h |  30 ++++-
 arch/powerpc/include/asm/tlb.h                |   6 +
 arch/powerpc/mm/book3s64/radix_hugetlbpage.c  |   4 +-
 arch/powerpc/mm/book3s64/radix_tlb.c          |  55 ++++----
 arch/powerpc/platforms/Kconfig.cputype        |   2 +
 mm/mremap.c                                   |  41 ++++--
 tools/testing/selftests/vm/mremap_test.c      | 118 ++++++++++--------
 8 files changed, 172 insertions(+), 103 deletions(-)

-- 
2.30.2


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v4 1/9] selftest/mremap_test: Update the test to handle pagesize other than 4K
  2021-04-14  8:59 [PATCH v4 0/9] Speedup mremap on ppc64 Aneesh Kumar K.V
@ 2021-04-14  8:59 ` Aneesh Kumar K.V
  2021-04-14  8:59 ` [PATCH v4 2/9] selftest/mremap_test: Avoid crash with static build Aneesh Kumar K.V
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: Aneesh Kumar K.V @ 2021-04-14  8:59 UTC (permalink / raw)
  To: linux-mm, akpm; +Cc: kaleshsingh, npiggin, Aneesh Kumar K.V, joel, linuxppc-dev

Instead of hardcoding 4K page size fetch it using sysconf(). For the performance
measurements test still assume 2M and 1G are hugepage sizes.

Reviewed-by: Kalesh Singh <kaleshsingh@google.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 tools/testing/selftests/vm/mremap_test.c | 113 ++++++++++++-----------
 1 file changed, 61 insertions(+), 52 deletions(-)

diff --git a/tools/testing/selftests/vm/mremap_test.c b/tools/testing/selftests/vm/mremap_test.c
index 9c391d016922..c9a5461eb786 100644
--- a/tools/testing/selftests/vm/mremap_test.c
+++ b/tools/testing/selftests/vm/mremap_test.c
@@ -45,14 +45,15 @@ enum {
 	_4MB = 4ULL << 20,
 	_1GB = 1ULL << 30,
 	_2GB = 2ULL << 30,
-	PTE = _4KB,
 	PMD = _2MB,
 	PUD = _1GB,
 };
 
+#define PTE page_size
+
 #define MAKE_TEST(source_align, destination_align, size,	\
 		  overlaps, should_fail, test_name)		\
-{								\
+(struct test){							\
 	.name = test_name,					\
 	.config = {						\
 		.src_alignment = source_align,			\
@@ -252,12 +253,17 @@ static int parse_args(int argc, char **argv, unsigned int *threshold_mb,
 	return 0;
 }
 
+#define MAX_TEST 13
+#define MAX_PERF_TEST 3
 int main(int argc, char **argv)
 {
 	int failures = 0;
 	int i, run_perf_tests;
 	unsigned int threshold_mb = VALIDATION_DEFAULT_THRESHOLD;
 	unsigned int pattern_seed;
+	struct test test_cases[MAX_TEST];
+	struct test perf_test_cases[MAX_PERF_TEST];
+	int page_size;
 	time_t t;
 
 	pattern_seed = (unsigned int) time(&t);
@@ -268,56 +274,59 @@ int main(int argc, char **argv)
 	ksft_print_msg("Test configs:\n\tthreshold_mb=%u\n\tpattern_seed=%u\n\n",
 		       threshold_mb, pattern_seed);
 
-	struct test test_cases[] = {
-		/* Expected mremap failures */
-		MAKE_TEST(_4KB, _4KB, _4KB, OVERLAPPING, EXPECT_FAILURE,
-		  "mremap - Source and Destination Regions Overlapping"),
-		MAKE_TEST(_4KB, _1KB, _4KB, NON_OVERLAPPING, EXPECT_FAILURE,
-		  "mremap - Destination Address Misaligned (1KB-aligned)"),
-		MAKE_TEST(_1KB, _4KB, _4KB, NON_OVERLAPPING, EXPECT_FAILURE,
-		  "mremap - Source Address Misaligned (1KB-aligned)"),
-
-		/* Src addr PTE aligned */
-		MAKE_TEST(PTE, PTE, _8KB, NON_OVERLAPPING, EXPECT_SUCCESS,
-		  "8KB mremap - Source PTE-aligned, Destination PTE-aligned"),
-
-		/* Src addr 1MB aligned */
-		MAKE_TEST(_1MB, PTE, _2MB, NON_OVERLAPPING, EXPECT_SUCCESS,
-		  "2MB mremap - Source 1MB-aligned, Destination PTE-aligned"),
-		MAKE_TEST(_1MB, _1MB, _2MB, NON_OVERLAPPING, EXPECT_SUCCESS,
-		  "2MB mremap - Source 1MB-aligned, Destination 1MB-aligned"),
-
-		/* Src addr PMD aligned */
-		MAKE_TEST(PMD, PTE, _4MB, NON_OVERLAPPING, EXPECT_SUCCESS,
-		  "4MB mremap - Source PMD-aligned, Destination PTE-aligned"),
-		MAKE_TEST(PMD, _1MB, _4MB, NON_OVERLAPPING, EXPECT_SUCCESS,
-		  "4MB mremap - Source PMD-aligned, Destination 1MB-aligned"),
-		MAKE_TEST(PMD, PMD, _4MB, NON_OVERLAPPING, EXPECT_SUCCESS,
-		  "4MB mremap - Source PMD-aligned, Destination PMD-aligned"),
-
-		/* Src addr PUD aligned */
-		MAKE_TEST(PUD, PTE, _2GB, NON_OVERLAPPING, EXPECT_SUCCESS,
-		  "2GB mremap - Source PUD-aligned, Destination PTE-aligned"),
-		MAKE_TEST(PUD, _1MB, _2GB, NON_OVERLAPPING, EXPECT_SUCCESS,
-		  "2GB mremap - Source PUD-aligned, Destination 1MB-aligned"),
-		MAKE_TEST(PUD, PMD, _2GB, NON_OVERLAPPING, EXPECT_SUCCESS,
-		  "2GB mremap - Source PUD-aligned, Destination PMD-aligned"),
-		MAKE_TEST(PUD, PUD, _2GB, NON_OVERLAPPING, EXPECT_SUCCESS,
-		  "2GB mremap - Source PUD-aligned, Destination PUD-aligned"),
-	};
-
-	struct test perf_test_cases[] = {
-		/*
-		 * mremap 1GB region - Page table level aligned time
-		 * comparison.
-		 */
-		MAKE_TEST(PTE, PTE, _1GB, NON_OVERLAPPING, EXPECT_SUCCESS,
-		  "1GB mremap - Source PTE-aligned, Destination PTE-aligned"),
-		MAKE_TEST(PMD, PMD, _1GB, NON_OVERLAPPING, EXPECT_SUCCESS,
-		  "1GB mremap - Source PMD-aligned, Destination PMD-aligned"),
-		MAKE_TEST(PUD, PUD, _1GB, NON_OVERLAPPING, EXPECT_SUCCESS,
-		  "1GB mremap - Source PUD-aligned, Destination PUD-aligned"),
-	};
+	page_size = sysconf(_SC_PAGESIZE);
+
+	/* Expected mremap failures */
+	test_cases[0] =	MAKE_TEST(page_size, page_size, page_size,
+				  OVERLAPPING, EXPECT_FAILURE,
+				  "mremap - Source and Destination Regions Overlapping");
+
+	test_cases[1] = MAKE_TEST(page_size, page_size/4, page_size,
+				  NON_OVERLAPPING, EXPECT_FAILURE,
+				  "mremap - Destination Address Misaligned (1KB-aligned)");
+	test_cases[2] = MAKE_TEST(page_size/4, page_size, page_size,
+				  NON_OVERLAPPING, EXPECT_FAILURE,
+				  "mremap - Source Address Misaligned (1KB-aligned)");
+
+	/* Src addr PTE aligned */
+	test_cases[3] = MAKE_TEST(PTE, PTE, PTE * 2,
+				  NON_OVERLAPPING, EXPECT_SUCCESS,
+				  "8KB mremap - Source PTE-aligned, Destination PTE-aligned");
+
+	/* Src addr 1MB aligned */
+	test_cases[4] = MAKE_TEST(_1MB, PTE, _2MB, NON_OVERLAPPING, EXPECT_SUCCESS,
+				  "2MB mremap - Source 1MB-aligned, Destination PTE-aligned");
+	test_cases[5] = MAKE_TEST(_1MB, _1MB, _2MB, NON_OVERLAPPING, EXPECT_SUCCESS,
+				  "2MB mremap - Source 1MB-aligned, Destination 1MB-aligned");
+
+	/* Src addr PMD aligned */
+	test_cases[6] = MAKE_TEST(PMD, PTE, _4MB, NON_OVERLAPPING, EXPECT_SUCCESS,
+				  "4MB mremap - Source PMD-aligned, Destination PTE-aligned");
+	test_cases[7] =	MAKE_TEST(PMD, _1MB, _4MB, NON_OVERLAPPING, EXPECT_SUCCESS,
+				  "4MB mremap - Source PMD-aligned, Destination 1MB-aligned");
+	test_cases[8] = MAKE_TEST(PMD, PMD, _4MB, NON_OVERLAPPING, EXPECT_SUCCESS,
+				  "4MB mremap - Source PMD-aligned, Destination PMD-aligned");
+
+	/* Src addr PUD aligned */
+	test_cases[9] = MAKE_TEST(PUD, PTE, _2GB, NON_OVERLAPPING, EXPECT_SUCCESS,
+				  "2GB mremap - Source PUD-aligned, Destination PTE-aligned");
+	test_cases[10] = MAKE_TEST(PUD, _1MB, _2GB, NON_OVERLAPPING, EXPECT_SUCCESS,
+				   "2GB mremap - Source PUD-aligned, Destination 1MB-aligned");
+	test_cases[11] = MAKE_TEST(PUD, PMD, _2GB, NON_OVERLAPPING, EXPECT_SUCCESS,
+				   "2GB mremap - Source PUD-aligned, Destination PMD-aligned");
+	test_cases[12] = MAKE_TEST(PUD, PUD, _2GB, NON_OVERLAPPING, EXPECT_SUCCESS,
+				   "2GB mremap - Source PUD-aligned, Destination PUD-aligned");
+
+	perf_test_cases[0] =  MAKE_TEST(page_size, page_size, _1GB, NON_OVERLAPPING, EXPECT_SUCCESS,
+					"1GB mremap - Source PTE-aligned, Destination PTE-aligned");
+	/*
+	 * mremap 1GB region - Page table level aligned time
+	 * comparison.
+	 */
+	perf_test_cases[1] = MAKE_TEST(PMD, PMD, _1GB, NON_OVERLAPPING, EXPECT_SUCCESS,
+				       "1GB mremap - Source PMD-aligned, Destination PMD-aligned");
+	perf_test_cases[2] = MAKE_TEST(PUD, PUD, _1GB, NON_OVERLAPPING, EXPECT_SUCCESS,
+				       "1GB mremap - Source PUD-aligned, Destination PUD-aligned");
 
 	run_perf_tests =  (threshold_mb == VALIDATION_NO_THRESHOLD) ||
 				(threshold_mb * _1MB >= _1GB);
-- 
2.30.2


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v4 2/9] selftest/mremap_test: Avoid crash with static build
  2021-04-14  8:59 [PATCH v4 0/9] Speedup mremap on ppc64 Aneesh Kumar K.V
  2021-04-14  8:59 ` [PATCH v4 1/9] selftest/mremap_test: Update the test to handle pagesize other than 4K Aneesh Kumar K.V
@ 2021-04-14  8:59 ` Aneesh Kumar K.V
  2021-04-14  8:59 ` [PATCH v4 3/9] mm/mremap: Use pmd/pud_poplulate to update page table entries Aneesh Kumar K.V
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: Aneesh Kumar K.V @ 2021-04-14  8:59 UTC (permalink / raw)
  To: linux-mm, akpm; +Cc: kaleshsingh, npiggin, Aneesh Kumar K.V, joel, linuxppc-dev

With a large mmap map size, we can overlap with the text area and using
MAP_FIXED results in unmapping that area. Switch to MAP_FIXED_NOREPLACE
and handle the EEXIST error.

Reviewed-by: Kalesh Singh <kaleshsingh@google.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 tools/testing/selftests/vm/mremap_test.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/vm/mremap_test.c b/tools/testing/selftests/vm/mremap_test.c
index c9a5461eb786..0624d1bd71b5 100644
--- a/tools/testing/selftests/vm/mremap_test.c
+++ b/tools/testing/selftests/vm/mremap_test.c
@@ -75,9 +75,10 @@ static void *get_source_mapping(struct config c)
 retry:
 	addr += c.src_alignment;
 	src_addr = mmap((void *) addr, c.region_size, PROT_READ | PROT_WRITE,
-			MAP_FIXED | MAP_ANONYMOUS | MAP_SHARED, -1, 0);
+			MAP_FIXED_NOREPLACE | MAP_ANONYMOUS | MAP_SHARED,
+			-1, 0);
 	if (src_addr == MAP_FAILED) {
-		if (errno == EPERM)
+		if (errno == EPERM || errno == EEXIST)
 			goto retry;
 		goto error;
 	}
-- 
2.30.2


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v4 3/9] mm/mremap: Use pmd/pud_poplulate to update page table entries
  2021-04-14  8:59 [PATCH v4 0/9] Speedup mremap on ppc64 Aneesh Kumar K.V
  2021-04-14  8:59 ` [PATCH v4 1/9] selftest/mremap_test: Update the test to handle pagesize other than 4K Aneesh Kumar K.V
  2021-04-14  8:59 ` [PATCH v4 2/9] selftest/mremap_test: Avoid crash with static build Aneesh Kumar K.V
@ 2021-04-14  8:59 ` Aneesh Kumar K.V
  2021-04-14  8:59 ` [PATCH v4 4/9] powerpc/mm/book3s64: Fix possible build error Aneesh Kumar K.V
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: Aneesh Kumar K.V @ 2021-04-14  8:59 UTC (permalink / raw)
  To: linux-mm, akpm; +Cc: kaleshsingh, npiggin, Aneesh Kumar K.V, joel, linuxppc-dev

pmd/pud_populate is the right interface to be used to set the respective
page table entries. Some architectures like ppc64 do assume that set_pmd/pud_at
can only be used to set a hugepage PTE. Since we are not setting up a hugepage
PTE here, use the pmd/pud_populate interface.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 mm/mremap.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/mm/mremap.c b/mm/mremap.c
index ec8f840399ed..574287f9bb39 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -26,6 +26,7 @@
 
 #include <asm/cacheflush.h>
 #include <asm/tlbflush.h>
+#include <asm/pgalloc.h>
 
 #include "internal.h"
 
@@ -257,9 +258,8 @@ static bool move_normal_pmd(struct vm_area_struct *vma, unsigned long old_addr,
 	pmd_clear(old_pmd);
 
 	VM_BUG_ON(!pmd_none(*new_pmd));
+	pmd_populate(mm, new_pmd, (pgtable_t)pmd_page_vaddr(pmd));
 
-	/* Set the new pmd */
-	set_pmd_at(mm, new_addr, new_pmd, pmd);
 	flush_tlb_range(vma, old_addr, old_addr + PMD_SIZE);
 	if (new_ptl != old_ptl)
 		spin_unlock(new_ptl);
@@ -306,8 +306,7 @@ static bool move_normal_pud(struct vm_area_struct *vma, unsigned long old_addr,
 
 	VM_BUG_ON(!pud_none(*new_pud));
 
-	/* Set the new pud */
-	set_pud_at(mm, new_addr, new_pud, pud);
+	pud_populate(mm, new_pud, (pmd_t *)pud_page_vaddr(pud));
 	flush_tlb_range(vma, old_addr, old_addr + PUD_SIZE);
 	if (new_ptl != old_ptl)
 		spin_unlock(new_ptl);
-- 
2.30.2


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v4 4/9] powerpc/mm/book3s64: Fix possible build error
  2021-04-14  8:59 [PATCH v4 0/9] Speedup mremap on ppc64 Aneesh Kumar K.V
                   ` (2 preceding siblings ...)
  2021-04-14  8:59 ` [PATCH v4 3/9] mm/mremap: Use pmd/pud_poplulate to update page table entries Aneesh Kumar K.V
@ 2021-04-14  8:59 ` Aneesh Kumar K.V
  2021-04-20  3:43   ` Michael Ellerman
  2021-04-14  8:59 ` [PATCH v4 5/9] powerpc/mm/book3s64: Update tlb flush routines to take a page walk cache flush argument Aneesh Kumar K.V
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 15+ messages in thread
From: Aneesh Kumar K.V @ 2021-04-14  8:59 UTC (permalink / raw)
  To: linux-mm, akpm; +Cc: kaleshsingh, npiggin, Aneesh Kumar K.V, joel, linuxppc-dev

Update _tlbiel_pid() such that we can avoid build errors like below when
using this function in other places.

arch/powerpc/mm/book3s64/radix_tlb.c: In function ‘__radix__flush_tlb_range_psize’:
arch/powerpc/mm/book3s64/radix_tlb.c:114:2: warning: ‘asm’ operand 3 probably does not match constraints
  114 |  asm volatile(PPC_TLBIEL(%0, %4, %3, %2, %1)
      |  ^~~
arch/powerpc/mm/book3s64/radix_tlb.c:114:2: error: impossible constraint in ‘asm’
make[4]: *** [scripts/Makefile.build:271: arch/powerpc/mm/book3s64/radix_tlb.o] Error 1
m

With this fix, we can also drop the __always_inline in __radix_flush_tlb_range_psize
which was added by commit e12d6d7d46a6 ("powerpc/mm/radix: mark __radix__flush_tlb_range_psize() as __always_inline")

Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/powerpc/mm/book3s64/radix_tlb.c | 26 +++++++++++++++++---------
 1 file changed, 17 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/mm/book3s64/radix_tlb.c b/arch/powerpc/mm/book3s64/radix_tlb.c
index 409e61210789..817a02ef6032 100644
--- a/arch/powerpc/mm/book3s64/radix_tlb.c
+++ b/arch/powerpc/mm/book3s64/radix_tlb.c
@@ -291,22 +291,30 @@ static inline void fixup_tlbie_lpid(unsigned long lpid)
 /*
  * We use 128 set in radix mode and 256 set in hpt mode.
  */
-static __always_inline void _tlbiel_pid(unsigned long pid, unsigned long ric)
+static inline void _tlbiel_pid(unsigned long pid, unsigned long ric)
 {
 	int set;
 
 	asm volatile("ptesync": : :"memory");
 
-	/*
-	 * Flush the first set of the TLB, and if we're doing a RIC_FLUSH_ALL,
-	 * also flush the entire Page Walk Cache.
-	 */
-	__tlbiel_pid(pid, 0, ric);
+	switch (ric) {
+	case RIC_FLUSH_PWC:
 
-	/* For PWC, only one flush is needed */
-	if (ric == RIC_FLUSH_PWC) {
+		/* For PWC, only one flush is needed */
+		__tlbiel_pid(pid, 0, RIC_FLUSH_PWC);
 		ppc_after_tlbiel_barrier();
 		return;
+	case RIC_FLUSH_TLB:
+		__tlbiel_pid(pid, 0, RIC_FLUSH_TLB);
+		break;
+	case RIC_FLUSH_ALL:
+	default:
+		/*
+		 * Flush the first set of the TLB, and if
+		 * we're doing a RIC_FLUSH_ALL, also flush
+		 * the entire Page Walk Cache.
+		 */
+		__tlbiel_pid(pid, 0, RIC_FLUSH_ALL);
 	}
 
 	if (!cpu_has_feature(CPU_FTR_ARCH_31)) {
@@ -1176,7 +1184,7 @@ void radix__tlb_flush(struct mmu_gather *tlb)
 	}
 }
 
-static __always_inline void __radix__flush_tlb_range_psize(struct mm_struct *mm,
+static void __radix__flush_tlb_range_psize(struct mm_struct *mm,
 				unsigned long start, unsigned long end,
 				int psize, bool also_pwc)
 {
-- 
2.30.2


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v4 5/9] powerpc/mm/book3s64: Update tlb flush routines to take a page walk cache flush argument
  2021-04-14  8:59 [PATCH v4 0/9] Speedup mremap on ppc64 Aneesh Kumar K.V
                   ` (3 preceding siblings ...)
  2021-04-14  8:59 ` [PATCH v4 4/9] powerpc/mm/book3s64: Fix possible build error Aneesh Kumar K.V
@ 2021-04-14  8:59 ` Aneesh Kumar K.V
  2021-04-14  8:59 ` [PATCH v4 6/9] mm/mremap: Use range flush that does TLB and page walk cache flush Aneesh Kumar K.V
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: Aneesh Kumar K.V @ 2021-04-14  8:59 UTC (permalink / raw)
  To: linux-mm, akpm; +Cc: kaleshsingh, npiggin, Aneesh Kumar K.V, joel, linuxppc-dev

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 .../include/asm/book3s/64/tlbflush-radix.h    | 19 +++++++-----
 arch/powerpc/include/asm/book3s/64/tlbflush.h | 23 ++++++++++++---
 arch/powerpc/mm/book3s64/radix_hugetlbpage.c  |  4 +--
 arch/powerpc/mm/book3s64/radix_tlb.c          | 29 +++++++------------
 4 files changed, 42 insertions(+), 33 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h b/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
index 8b33601cdb9d..171441a43b35 100644
--- a/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
+++ b/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
@@ -56,15 +56,18 @@ static inline void radix__flush_all_lpid_guest(unsigned int lpid)
 }
 #endif
 
-extern void radix__flush_hugetlb_tlb_range(struct vm_area_struct *vma,
-					   unsigned long start, unsigned long end);
-extern void radix__flush_tlb_range_psize(struct mm_struct *mm, unsigned long start,
-					 unsigned long end, int psize);
-extern void radix__flush_pmd_tlb_range(struct vm_area_struct *vma,
-				       unsigned long start, unsigned long end);
-extern void radix__flush_tlb_range(struct vm_area_struct *vma, unsigned long start,
+void radix__flush_hugetlb_tlb_range(struct vm_area_struct *vma,
+				    unsigned long start, unsigned long end,
+				    bool flush_pwc);
+void radix__flush_pmd_tlb_range(struct vm_area_struct *vma,
+				unsigned long start, unsigned long end,
+				bool flush_pwc);
+void radix__flush_tlb_pwc_range_psize(struct mm_struct *mm, unsigned long start,
+				      unsigned long end, int psize, bool flush_pwc);
+void radix__flush_tlb_range(struct vm_area_struct *vma, unsigned long start,
 			    unsigned long end);
-extern void radix__flush_tlb_kernel_range(unsigned long start, unsigned long end);
+void radix__flush_tlb_kernel_range(unsigned long start, unsigned long end);
+
 
 extern void radix__local_flush_tlb_mm(struct mm_struct *mm);
 extern void radix__local_flush_all_mm(struct mm_struct *mm);
diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush.h b/arch/powerpc/include/asm/book3s/64/tlbflush.h
index 215973b4cb26..f9f8a3a264f7 100644
--- a/arch/powerpc/include/asm/book3s/64/tlbflush.h
+++ b/arch/powerpc/include/asm/book3s/64/tlbflush.h
@@ -45,13 +45,30 @@ static inline void tlbiel_all_lpid(bool radix)
 		hash__tlbiel_all(TLB_INVAL_SCOPE_LPID);
 }
 
+static inline void flush_pmd_tlb_pwc_range(struct vm_area_struct *vma,
+					   unsigned long start,
+					   unsigned long end,
+					   bool flush_pwc)
+{
+	if (radix_enabled())
+		return radix__flush_pmd_tlb_range(vma, start, end, flush_pwc);
+	return hash__flush_tlb_range(vma, start, end);
+}
 
 #define __HAVE_ARCH_FLUSH_PMD_TLB_RANGE
 static inline void flush_pmd_tlb_range(struct vm_area_struct *vma,
 				       unsigned long start, unsigned long end)
+{
+	return flush_pmd_tlb_pwc_range(vma, start, end, false);
+}
+
+static inline void flush_hugetlb_tlb_pwc_range(struct vm_area_struct *vma,
+					       unsigned long start,
+					       unsigned long end,
+					       bool flush_pwc)
 {
 	if (radix_enabled())
-		return radix__flush_pmd_tlb_range(vma, start, end);
+		return radix__flush_hugetlb_tlb_range(vma, start, end, flush_pwc);
 	return hash__flush_tlb_range(vma, start, end);
 }
 
@@ -60,9 +77,7 @@ static inline void flush_hugetlb_tlb_range(struct vm_area_struct *vma,
 					   unsigned long start,
 					   unsigned long end)
 {
-	if (radix_enabled())
-		return radix__flush_hugetlb_tlb_range(vma, start, end);
-	return hash__flush_tlb_range(vma, start, end);
+	return flush_hugetlb_tlb_pwc_range(vma, start, end, false);
 }
 
 static inline void flush_tlb_range(struct vm_area_struct *vma,
diff --git a/arch/powerpc/mm/book3s64/radix_hugetlbpage.c b/arch/powerpc/mm/book3s64/radix_hugetlbpage.c
index cb91071eef52..e62f5679b119 100644
--- a/arch/powerpc/mm/book3s64/radix_hugetlbpage.c
+++ b/arch/powerpc/mm/book3s64/radix_hugetlbpage.c
@@ -26,13 +26,13 @@ void radix__local_flush_hugetlb_page(struct vm_area_struct *vma, unsigned long v
 }
 
 void radix__flush_hugetlb_tlb_range(struct vm_area_struct *vma, unsigned long start,
-				   unsigned long end)
+				    unsigned long end, bool flush_pwc)
 {
 	int psize;
 	struct hstate *hstate = hstate_file(vma->vm_file);
 
 	psize = hstate_get_psize(hstate);
-	radix__flush_tlb_range_psize(vma->vm_mm, start, end, psize);
+	radix__flush_tlb_pwc_range_psize(vma->vm_mm, start, end, psize, flush_pwc);
 }
 
 /*
diff --git a/arch/powerpc/mm/book3s64/radix_tlb.c b/arch/powerpc/mm/book3s64/radix_tlb.c
index 817a02ef6032..5a59e19f9e53 100644
--- a/arch/powerpc/mm/book3s64/radix_tlb.c
+++ b/arch/powerpc/mm/book3s64/radix_tlb.c
@@ -1090,7 +1090,7 @@ void radix__flush_tlb_range(struct vm_area_struct *vma, unsigned long start,
 {
 #ifdef CONFIG_HUGETLB_PAGE
 	if (is_vm_hugetlb_page(vma))
-		return radix__flush_hugetlb_tlb_range(vma, start, end);
+		return radix__flush_hugetlb_tlb_range(vma, start, end, false);
 #endif
 
 	__radix__flush_tlb_range(vma->vm_mm, start, end);
@@ -1151,9 +1151,6 @@ void radix__flush_all_lpid_guest(unsigned int lpid)
 	_tlbie_lpid_guest(lpid, RIC_FLUSH_ALL);
 }
 
-static void radix__flush_tlb_pwc_range_psize(struct mm_struct *mm, unsigned long start,
-				  unsigned long end, int psize);
-
 void radix__tlb_flush(struct mmu_gather *tlb)
 {
 	int psize = 0;
@@ -1177,10 +1174,8 @@ void radix__tlb_flush(struct mmu_gather *tlb)
 		else
 			radix__flush_all_mm(mm);
 	} else {
-		if (!tlb->freed_tables)
-			radix__flush_tlb_range_psize(mm, start, end, psize);
-		else
-			radix__flush_tlb_pwc_range_psize(mm, start, end, psize);
+		radix__flush_tlb_pwc_range_psize(mm, start,
+						 end, psize, tlb->freed_tables);
 	}
 }
 
@@ -1254,16 +1249,10 @@ static void __radix__flush_tlb_range_psize(struct mm_struct *mm,
 	preempt_enable();
 }
 
-void radix__flush_tlb_range_psize(struct mm_struct *mm, unsigned long start,
-				  unsigned long end, int psize)
+void radix__flush_tlb_pwc_range_psize(struct mm_struct *mm, unsigned long start,
+				      unsigned long end, int psize, bool flush_pwc)
 {
-	return __radix__flush_tlb_range_psize(mm, start, end, psize, false);
-}
-
-static void radix__flush_tlb_pwc_range_psize(struct mm_struct *mm, unsigned long start,
-				  unsigned long end, int psize)
-{
-	__radix__flush_tlb_range_psize(mm, start, end, psize, true);
+	__radix__flush_tlb_range_psize(mm, start, end, psize, flush_pwc);
 }
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
@@ -1315,9 +1304,11 @@ void radix__flush_tlb_collapsed_pmd(struct mm_struct *mm, unsigned long addr)
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 
 void radix__flush_pmd_tlb_range(struct vm_area_struct *vma,
-				unsigned long start, unsigned long end)
+				unsigned long start, unsigned long end,
+				bool flush_pwc)
 {
-	radix__flush_tlb_range_psize(vma->vm_mm, start, end, MMU_PAGE_2M);
+	__radix__flush_tlb_range_psize(vma->vm_mm, start,
+				       end, MMU_PAGE_2M, flush_pwc);
 }
 EXPORT_SYMBOL(radix__flush_pmd_tlb_range);
 
-- 
2.30.2


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v4 6/9] mm/mremap: Use range flush that does TLB and page walk cache flush
  2021-04-14  8:59 [PATCH v4 0/9] Speedup mremap on ppc64 Aneesh Kumar K.V
                   ` (4 preceding siblings ...)
  2021-04-14  8:59 ` [PATCH v4 5/9] powerpc/mm/book3s64: Update tlb flush routines to take a page walk cache flush argument Aneesh Kumar K.V
@ 2021-04-14  8:59 ` Aneesh Kumar K.V
  2021-04-20  3:47   ` Michael Ellerman
  2021-04-14  8:59 ` [PATCH v4 7/9] mm/mremap: Move TLB flush outside page table lock Aneesh Kumar K.V
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 15+ messages in thread
From: Aneesh Kumar K.V @ 2021-04-14  8:59 UTC (permalink / raw)
  To: linux-mm, akpm; +Cc: kaleshsingh, npiggin, Aneesh Kumar K.V, joel, linuxppc-dev

Some architectures do have the concept of page walk cache which need
to be flush when updating higher levels of page tables. A fast mremap
that involves moving page table pages instead of copying pte entries
should flush page walk cache since the old translation cache is no more
valid.

Add new helper flush_pte_tlb_pwc_range() which invalidates both TLB and
page walk cache where TLB entries are mapped with page size PAGE_SIZE.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/powerpc/include/asm/book3s/64/tlbflush.h | 11 +++++++++++
 mm/mremap.c                                   | 15 +++++++++++++--
 2 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush.h b/arch/powerpc/include/asm/book3s/64/tlbflush.h
index f9f8a3a264f7..c236b66f490b 100644
--- a/arch/powerpc/include/asm/book3s/64/tlbflush.h
+++ b/arch/powerpc/include/asm/book3s/64/tlbflush.h
@@ -80,6 +80,17 @@ static inline void flush_hugetlb_tlb_range(struct vm_area_struct *vma,
 	return flush_hugetlb_tlb_pwc_range(vma, start, end, false);
 }
 
+#define flush_pte_tlb_pwc_range flush_tlb_pwc_range
+static inline void flush_pte_tlb_pwc_range(struct vm_area_struct *vma,
+					   unsigned long start, unsigned long end,
+					   bool also_pwc)
+{
+	if (radix_enabled())
+		return radix__flush_tlb_pwc_range_psize(vma->vm_mm, start,
+							end, mmu_virtual_psize, also_pwc);
+	return hash__flush_tlb_range(vma, start, end);
+}
+
 static inline void flush_tlb_range(struct vm_area_struct *vma,
 				   unsigned long start, unsigned long end)
 {
diff --git a/mm/mremap.c b/mm/mremap.c
index 574287f9bb39..0e7b11daafee 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -210,6 +210,17 @@ static void move_ptes(struct vm_area_struct *vma, pmd_t *old_pmd,
 		drop_rmap_locks(vma);
 }
 
+#ifndef flush_pte_tlb_pwc_range
+#define flush_pte_tlb_pwc_range flush_pte_tlb_pwc_range
+static inline void flush_pte_tlb_pwc_range(struct vm_area_struct *vma,
+					   unsigned long start,
+					   unsigned long end,
+					   bool also_pwc)
+{
+	return flush_tlb_range(vma, start, end);
+}
+#endif
+
 #ifdef CONFIG_HAVE_MOVE_PMD
 static bool move_normal_pmd(struct vm_area_struct *vma, unsigned long old_addr,
 		  unsigned long new_addr, pmd_t *old_pmd, pmd_t *new_pmd)
@@ -260,7 +271,7 @@ static bool move_normal_pmd(struct vm_area_struct *vma, unsigned long old_addr,
 	VM_BUG_ON(!pmd_none(*new_pmd));
 	pmd_populate(mm, new_pmd, (pgtable_t)pmd_page_vaddr(pmd));
 
-	flush_tlb_range(vma, old_addr, old_addr + PMD_SIZE);
+	flush_pte_tlb_pwc_range(vma, old_addr, old_addr + PMD_SIZE, true);
 	if (new_ptl != old_ptl)
 		spin_unlock(new_ptl);
 	spin_unlock(old_ptl);
@@ -307,7 +318,7 @@ static bool move_normal_pud(struct vm_area_struct *vma, unsigned long old_addr,
 	VM_BUG_ON(!pud_none(*new_pud));
 
 	pud_populate(mm, new_pud, (pmd_t *)pud_page_vaddr(pud));
-	flush_tlb_range(vma, old_addr, old_addr + PUD_SIZE);
+	flush_pte_tlb_pwc_range(vma, old_addr, old_addr + PUD_SIZE, true);
 	if (new_ptl != old_ptl)
 		spin_unlock(new_ptl);
 	spin_unlock(old_ptl);
-- 
2.30.2


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v4 7/9] mm/mremap: Move TLB flush outside page table lock
  2021-04-14  8:59 [PATCH v4 0/9] Speedup mremap on ppc64 Aneesh Kumar K.V
                   ` (5 preceding siblings ...)
  2021-04-14  8:59 ` [PATCH v4 6/9] mm/mremap: Use range flush that does TLB and page walk cache flush Aneesh Kumar K.V
@ 2021-04-14  8:59 ` Aneesh Kumar K.V
  2021-04-14  8:59 ` [PATCH v4 8/9] mm/mremap: Allow arch runtime override Aneesh Kumar K.V
  2021-04-14  8:59 ` [PATCH v4 9/9] powerpc/mm: Enable move pmd/pud Aneesh Kumar K.V
  8 siblings, 0 replies; 15+ messages in thread
From: Aneesh Kumar K.V @ 2021-04-14  8:59 UTC (permalink / raw)
  To: linux-mm, akpm; +Cc: kaleshsingh, npiggin, Aneesh Kumar K.V, joel, linuxppc-dev

Move TLB flush outside page table lock so that kernel does
less with page table lock held. Releasing the ptl with old
TLB contents still valid will behave such that such access
happened before the level3 or level2 entry update.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 mm/mremap.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/mm/mremap.c b/mm/mremap.c
index 0e7b11daafee..7ac1df8e6d51 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -259,7 +259,7 @@ static bool move_normal_pmd(struct vm_area_struct *vma, unsigned long old_addr,
 	 * We don't have to worry about the ordering of src and dst
 	 * ptlocks because exclusive mmap_lock prevents deadlock.
 	 */
-	old_ptl = pmd_lock(vma->vm_mm, old_pmd);
+	old_ptl = pmd_lock(mm, old_pmd);
 	new_ptl = pmd_lockptr(mm, new_pmd);
 	if (new_ptl != old_ptl)
 		spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING);
@@ -271,11 +271,11 @@ static bool move_normal_pmd(struct vm_area_struct *vma, unsigned long old_addr,
 	VM_BUG_ON(!pmd_none(*new_pmd));
 	pmd_populate(mm, new_pmd, (pgtable_t)pmd_page_vaddr(pmd));
 
-	flush_pte_tlb_pwc_range(vma, old_addr, old_addr + PMD_SIZE, true);
 	if (new_ptl != old_ptl)
 		spin_unlock(new_ptl);
 	spin_unlock(old_ptl);
 
+	flush_pte_tlb_pwc_range(vma, old_addr, old_addr + PMD_SIZE, true);
 	return true;
 }
 #else
@@ -306,7 +306,7 @@ static bool move_normal_pud(struct vm_area_struct *vma, unsigned long old_addr,
 	 * We don't have to worry about the ordering of src and dst
 	 * ptlocks because exclusive mmap_lock prevents deadlock.
 	 */
-	old_ptl = pud_lock(vma->vm_mm, old_pud);
+	old_ptl = pud_lock(mm, old_pud);
 	new_ptl = pud_lockptr(mm, new_pud);
 	if (new_ptl != old_ptl)
 		spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING);
@@ -318,11 +318,11 @@ static bool move_normal_pud(struct vm_area_struct *vma, unsigned long old_addr,
 	VM_BUG_ON(!pud_none(*new_pud));
 
 	pud_populate(mm, new_pud, (pmd_t *)pud_page_vaddr(pud));
-	flush_pte_tlb_pwc_range(vma, old_addr, old_addr + PUD_SIZE, true);
 	if (new_ptl != old_ptl)
 		spin_unlock(new_ptl);
 	spin_unlock(old_ptl);
 
+	flush_pte_tlb_pwc_range(vma, old_addr, old_addr + PUD_SIZE, true);
 	return true;
 }
 #else
-- 
2.30.2


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v4 8/9] mm/mremap: Allow arch runtime override
  2021-04-14  8:59 [PATCH v4 0/9] Speedup mremap on ppc64 Aneesh Kumar K.V
                   ` (6 preceding siblings ...)
  2021-04-14  8:59 ` [PATCH v4 7/9] mm/mremap: Move TLB flush outside page table lock Aneesh Kumar K.V
@ 2021-04-14  8:59 ` Aneesh Kumar K.V
  2021-04-20  3:52   ` Michael Ellerman
  2021-04-14  8:59 ` [PATCH v4 9/9] powerpc/mm: Enable move pmd/pud Aneesh Kumar K.V
  8 siblings, 1 reply; 15+ messages in thread
From: Aneesh Kumar K.V @ 2021-04-14  8:59 UTC (permalink / raw)
  To: linux-mm, akpm; +Cc: kaleshsingh, npiggin, Aneesh Kumar K.V, joel, linuxppc-dev

Architectures like ppc64 support faster mremap only with radix
translation. Hence allow a runtime check w.r.t support for fast mremap.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/powerpc/include/asm/tlb.h |  6 ++++++
 mm/mremap.c                    | 15 ++++++++++++++-
 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/tlb.h b/arch/powerpc/include/asm/tlb.h
index 160422a439aa..058918a7cd3c 100644
--- a/arch/powerpc/include/asm/tlb.h
+++ b/arch/powerpc/include/asm/tlb.h
@@ -83,5 +83,11 @@ static inline int mm_is_thread_local(struct mm_struct *mm)
 }
 #endif
 
+#define arch_supports_page_tables_move arch_supports_page_tables_move
+static inline bool arch_supports_page_tables_move(void)
+{
+	return radix_enabled();
+}
+
 #endif /* __KERNEL__ */
 #endif /* __ASM_POWERPC_TLB_H */
diff --git a/mm/mremap.c b/mm/mremap.c
index 7ac1df8e6d51..6922e161928c 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -25,7 +25,7 @@
 #include <linux/userfaultfd_k.h>
 
 #include <asm/cacheflush.h>
-#include <asm/tlbflush.h>
+#include <asm/tlb.h>
 #include <asm/pgalloc.h>
 
 #include "internal.h"
@@ -221,6 +221,15 @@ static inline void flush_pte_tlb_pwc_range(struct vm_area_struct *vma,
 }
 #endif
 
+#ifndef arch_supports_page_tables_move
+#define arch_supports_page_tables_move arch_supports_page_tables_move
+static inline bool arch_supports_page_tables_move(void)
+{
+	return IS_ENABLED(CONFIG_HAVE_MOVE_PMD) ||
+		IS_ENABLED(CONFIG_HAVE_MOVE_PUD);
+}
+#endif
+
 #ifdef CONFIG_HAVE_MOVE_PMD
 static bool move_normal_pmd(struct vm_area_struct *vma, unsigned long old_addr,
 		  unsigned long new_addr, pmd_t *old_pmd, pmd_t *new_pmd)
@@ -229,6 +238,8 @@ static bool move_normal_pmd(struct vm_area_struct *vma, unsigned long old_addr,
 	struct mm_struct *mm = vma->vm_mm;
 	pmd_t pmd;
 
+	if (!arch_supports_page_tables_move())
+		return false;
 	/*
 	 * The destination pmd shouldn't be established, free_pgtables()
 	 * should have released it.
@@ -295,6 +306,8 @@ static bool move_normal_pud(struct vm_area_struct *vma, unsigned long old_addr,
 	struct mm_struct *mm = vma->vm_mm;
 	pud_t pud;
 
+	if (!arch_supports_page_tables_move())
+		return false;
 	/*
 	 * The destination pud shouldn't be established, free_pgtables()
 	 * should have released it.
-- 
2.30.2


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v4 9/9] powerpc/mm: Enable move pmd/pud
  2021-04-14  8:59 [PATCH v4 0/9] Speedup mremap on ppc64 Aneesh Kumar K.V
                   ` (7 preceding siblings ...)
  2021-04-14  8:59 ` [PATCH v4 8/9] mm/mremap: Allow arch runtime override Aneesh Kumar K.V
@ 2021-04-14  8:59 ` Aneesh Kumar K.V
  8 siblings, 0 replies; 15+ messages in thread
From: Aneesh Kumar K.V @ 2021-04-14  8:59 UTC (permalink / raw)
  To: linux-mm, akpm; +Cc: kaleshsingh, npiggin, Aneesh Kumar K.V, joel, linuxppc-dev

mremap HAVE_MOVE_PMD/PUD optimization time comparison for 1GB region:
1GB mremap - Source PTE-aligned, Destination PTE-aligned
  mremap time:      1127034ns
1GB mremap - Source PMD-aligned, Destination PMD-aligned
  mremap time:       508817ns
1GB mremap - Source PUD-aligned, Destination PUD-aligned
  mremap time:        23046ns

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/powerpc/platforms/Kconfig.cputype | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype
index 3ce907523b1e..2e666e569fdf 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -97,6 +97,8 @@ config PPC_BOOK3S_64
 	select PPC_HAVE_PMU_SUPPORT
 	select SYS_SUPPORTS_HUGETLBFS
 	select HAVE_ARCH_TRANSPARENT_HUGEPAGE
+	select HAVE_MOVE_PMD
+	select HAVE_MOVE_PUD
 	select ARCH_ENABLE_THP_MIGRATION if TRANSPARENT_HUGEPAGE
 	select ARCH_SUPPORTS_NUMA_BALANCING
 	select IRQ_WORK
-- 
2.30.2


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 4/9] powerpc/mm/book3s64: Fix possible build error
  2021-04-14  8:59 ` [PATCH v4 4/9] powerpc/mm/book3s64: Fix possible build error Aneesh Kumar K.V
@ 2021-04-20  3:43   ` Michael Ellerman
  0 siblings, 0 replies; 15+ messages in thread
From: Michael Ellerman @ 2021-04-20  3:43 UTC (permalink / raw)
  To: Aneesh Kumar K.V, linux-mm, akpm
  Cc: Aneesh Kumar K.V, npiggin, kaleshsingh, joel, linuxppc-dev

"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:
> Update _tlbiel_pid() such that we can avoid build errors like below when
> using this function in other places.
>
> arch/powerpc/mm/book3s64/radix_tlb.c: In function ‘__radix__flush_tlb_range_psize’:
> arch/powerpc/mm/book3s64/radix_tlb.c:114:2: warning: ‘asm’ operand 3 probably does not match constraints
>   114 |  asm volatile(PPC_TLBIEL(%0, %4, %3, %2, %1)
>       |  ^~~
> arch/powerpc/mm/book3s64/radix_tlb.c:114:2: error: impossible constraint in ‘asm’
> make[4]: *** [scripts/Makefile.build:271: arch/powerpc/mm/book3s64/radix_tlb.o] Error 1
> m
>
> With this fix, we can also drop the __always_inline in __radix_flush_tlb_range_psize
> which was added by commit e12d6d7d46a6 ("powerpc/mm/radix: mark __radix__flush_tlb_range_psize() as __always_inline")
>
> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
>  arch/powerpc/mm/book3s64/radix_tlb.c | 26 +++++++++++++++++---------
>  1 file changed, 17 insertions(+), 9 deletions(-)

Acked-by: Michael Ellerman <mpe@ellerman.id.au>

cheers

> diff --git a/arch/powerpc/mm/book3s64/radix_tlb.c b/arch/powerpc/mm/book3s64/radix_tlb.c
> index 409e61210789..817a02ef6032 100644
> --- a/arch/powerpc/mm/book3s64/radix_tlb.c
> +++ b/arch/powerpc/mm/book3s64/radix_tlb.c
> @@ -291,22 +291,30 @@ static inline void fixup_tlbie_lpid(unsigned long lpid)
>  /*
>   * We use 128 set in radix mode and 256 set in hpt mode.
>   */
> -static __always_inline void _tlbiel_pid(unsigned long pid, unsigned long ric)
> +static inline void _tlbiel_pid(unsigned long pid, unsigned long ric)
>  {
>  	int set;
>  
>  	asm volatile("ptesync": : :"memory");
>  
> -	/*
> -	 * Flush the first set of the TLB, and if we're doing a RIC_FLUSH_ALL,
> -	 * also flush the entire Page Walk Cache.
> -	 */
> -	__tlbiel_pid(pid, 0, ric);
> +	switch (ric) {
> +	case RIC_FLUSH_PWC:
>  
> -	/* For PWC, only one flush is needed */
> -	if (ric == RIC_FLUSH_PWC) {
> +		/* For PWC, only one flush is needed */
> +		__tlbiel_pid(pid, 0, RIC_FLUSH_PWC);
>  		ppc_after_tlbiel_barrier();
>  		return;
> +	case RIC_FLUSH_TLB:
> +		__tlbiel_pid(pid, 0, RIC_FLUSH_TLB);
> +		break;
> +	case RIC_FLUSH_ALL:
> +	default:
> +		/*
> +		 * Flush the first set of the TLB, and if
> +		 * we're doing a RIC_FLUSH_ALL, also flush
> +		 * the entire Page Walk Cache.
> +		 */
> +		__tlbiel_pid(pid, 0, RIC_FLUSH_ALL);
>  	}
>  
>  	if (!cpu_has_feature(CPU_FTR_ARCH_31)) {
> @@ -1176,7 +1184,7 @@ void radix__tlb_flush(struct mmu_gather *tlb)
>  	}
>  }
>  
> -static __always_inline void __radix__flush_tlb_range_psize(struct mm_struct *mm,
> +static void __radix__flush_tlb_range_psize(struct mm_struct *mm,
>  				unsigned long start, unsigned long end,
>  				int psize, bool also_pwc)
>  {
> -- 
> 2.30.2

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 6/9] mm/mremap: Use range flush that does TLB and page walk cache flush
  2021-04-14  8:59 ` [PATCH v4 6/9] mm/mremap: Use range flush that does TLB and page walk cache flush Aneesh Kumar K.V
@ 2021-04-20  3:47   ` Michael Ellerman
  2021-04-20  4:17     ` Aneesh Kumar K.V
  0 siblings, 1 reply; 15+ messages in thread
From: Michael Ellerman @ 2021-04-20  3:47 UTC (permalink / raw)
  To: Aneesh Kumar K.V, linux-mm, akpm
  Cc: joel, Aneesh Kumar K.V, linuxppc-dev, npiggin, kaleshsingh

"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:
> Some architectures do have the concept of page walk cache which need
> to be flush when updating higher levels of page tables. A fast mremap
> that involves moving page table pages instead of copying pte entries
> should flush page walk cache since the old translation cache is no more
> valid.
>
> Add new helper flush_pte_tlb_pwc_range() which invalidates both TLB and
> page walk cache where TLB entries are mapped with page size PAGE_SIZE.
>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
>  arch/powerpc/include/asm/book3s/64/tlbflush.h | 11 +++++++++++
>  mm/mremap.c                                   | 15 +++++++++++++--
>  2 files changed, 24 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush.h b/arch/powerpc/include/asm/book3s/64/tlbflush.h
> index f9f8a3a264f7..c236b66f490b 100644
> --- a/arch/powerpc/include/asm/book3s/64/tlbflush.h
> +++ b/arch/powerpc/include/asm/book3s/64/tlbflush.h
> @@ -80,6 +80,17 @@ static inline void flush_hugetlb_tlb_range(struct vm_area_struct *vma,
>  	return flush_hugetlb_tlb_pwc_range(vma, start, end, false);
>  }
>  
> +#define flush_pte_tlb_pwc_range flush_tlb_pwc_range
> +static inline void flush_pte_tlb_pwc_range(struct vm_area_struct *vma,
> +					   unsigned long start, unsigned long end,
> +					   bool also_pwc)

This still uses the also_pwc name, which is a bit inconsistent with the
previous patch.

But, does it even need to be a parameter? AFAICS you always pass true,
and pwc=true is sort of implied by the name isn't it?

cheers

> +{
> +	if (radix_enabled())
> +		return radix__flush_tlb_pwc_range_psize(vma->vm_mm, start,
> +							end, mmu_virtual_psize, also_pwc);
> +	return hash__flush_tlb_range(vma, start, end);
> +}
> +
>  static inline void flush_tlb_range(struct vm_area_struct *vma,
>  				   unsigned long start, unsigned long end)
>  {
> diff --git a/mm/mremap.c b/mm/mremap.c
> index 574287f9bb39..0e7b11daafee 100644
> --- a/mm/mremap.c
> +++ b/mm/mremap.c
> @@ -210,6 +210,17 @@ static void move_ptes(struct vm_area_struct *vma, pmd_t *old_pmd,
>  		drop_rmap_locks(vma);
>  }
>  
> +#ifndef flush_pte_tlb_pwc_range
> +#define flush_pte_tlb_pwc_range flush_pte_tlb_pwc_range
> +static inline void flush_pte_tlb_pwc_range(struct vm_area_struct *vma,
> +					   unsigned long start,
> +					   unsigned long end,
> +					   bool also_pwc)
> +{
> +	return flush_tlb_range(vma, start, end);
> +}
> +#endif
> +
>  #ifdef CONFIG_HAVE_MOVE_PMD
>  static bool move_normal_pmd(struct vm_area_struct *vma, unsigned long old_addr,
>  		  unsigned long new_addr, pmd_t *old_pmd, pmd_t *new_pmd)
> @@ -260,7 +271,7 @@ static bool move_normal_pmd(struct vm_area_struct *vma, unsigned long old_addr,
>  	VM_BUG_ON(!pmd_none(*new_pmd));
>  	pmd_populate(mm, new_pmd, (pgtable_t)pmd_page_vaddr(pmd));
>  
> -	flush_tlb_range(vma, old_addr, old_addr + PMD_SIZE);
> +	flush_pte_tlb_pwc_range(vma, old_addr, old_addr + PMD_SIZE, true);
>  	if (new_ptl != old_ptl)
>  		spin_unlock(new_ptl);
>  	spin_unlock(old_ptl);
> @@ -307,7 +318,7 @@ static bool move_normal_pud(struct vm_area_struct *vma, unsigned long old_addr,
>  	VM_BUG_ON(!pud_none(*new_pud));
>  
>  	pud_populate(mm, new_pud, (pmd_t *)pud_page_vaddr(pud));
> -	flush_tlb_range(vma, old_addr, old_addr + PUD_SIZE);
> +	flush_pte_tlb_pwc_range(vma, old_addr, old_addr + PUD_SIZE, true);
>  	if (new_ptl != old_ptl)
>  		spin_unlock(new_ptl);
>  	spin_unlock(old_ptl);
> -- 
> 2.30.2

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 8/9] mm/mremap: Allow arch runtime override
  2021-04-14  8:59 ` [PATCH v4 8/9] mm/mremap: Allow arch runtime override Aneesh Kumar K.V
@ 2021-04-20  3:52   ` Michael Ellerman
  2021-04-20  4:30     ` Aneesh Kumar K.V
  0 siblings, 1 reply; 15+ messages in thread
From: Michael Ellerman @ 2021-04-20  3:52 UTC (permalink / raw)
  To: Aneesh Kumar K.V, linux-mm, akpm
  Cc: joel, Aneesh Kumar K.V, linuxppc-dev, npiggin, kaleshsingh

"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:
> Architectures like ppc64 support faster mremap only with radix
> translation. Hence allow a runtime check w.r.t support for fast mremap.
>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
>  arch/powerpc/include/asm/tlb.h |  6 ++++++
>  mm/mremap.c                    | 15 ++++++++++++++-
>  2 files changed, 20 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/include/asm/tlb.h b/arch/powerpc/include/asm/tlb.h
> index 160422a439aa..058918a7cd3c 100644
> --- a/arch/powerpc/include/asm/tlb.h
> +++ b/arch/powerpc/include/asm/tlb.h
> @@ -83,5 +83,11 @@ static inline int mm_is_thread_local(struct mm_struct *mm)
>  }
>  #endif
>  
> +#define arch_supports_page_tables_move arch_supports_page_tables_move
> +static inline bool arch_supports_page_tables_move(void)
> +{
> +	return radix_enabled();
> +}

Not sure it's worth a respin on its own, but page table*s* move is
slightly strange phrasing.

arch_supports_move_page_tables() or arch_supports_page_table_move()
would be more typical.

cheers

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 6/9] mm/mremap: Use range flush that does TLB and page walk cache flush
  2021-04-20  3:47   ` Michael Ellerman
@ 2021-04-20  4:17     ` Aneesh Kumar K.V
  0 siblings, 0 replies; 15+ messages in thread
From: Aneesh Kumar K.V @ 2021-04-20  4:17 UTC (permalink / raw)
  To: Michael Ellerman, linux-mm, akpm; +Cc: joel, linuxppc-dev, npiggin, kaleshsingh

On 4/20/21 9:17 AM, Michael Ellerman wrote:
> "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:
>> Some architectures do have the concept of page walk cache which need
>> to be flush when updating higher levels of page tables. A fast mremap
>> that involves moving page table pages instead of copying pte entries
>> should flush page walk cache since the old translation cache is no more
>> valid.
>>
>> Add new helper flush_pte_tlb_pwc_range() which invalidates both TLB and
>> page walk cache where TLB entries are mapped with page size PAGE_SIZE.
>>
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>> ---
>>   arch/powerpc/include/asm/book3s/64/tlbflush.h | 11 +++++++++++
>>   mm/mremap.c                                   | 15 +++++++++++++--
>>   2 files changed, 24 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush.h b/arch/powerpc/include/asm/book3s/64/tlbflush.h
>> index f9f8a3a264f7..c236b66f490b 100644
>> --- a/arch/powerpc/include/asm/book3s/64/tlbflush.h
>> +++ b/arch/powerpc/include/asm/book3s/64/tlbflush.h
>> @@ -80,6 +80,17 @@ static inline void flush_hugetlb_tlb_range(struct vm_area_struct *vma,
>>   	return flush_hugetlb_tlb_pwc_range(vma, start, end, false);
>>   }
>>   
>> +#define flush_pte_tlb_pwc_range flush_tlb_pwc_range
>> +static inline void flush_pte_tlb_pwc_range(struct vm_area_struct *vma,
>> +					   unsigned long start, unsigned long end,
>> +					   bool also_pwc)
> 
> This still uses the also_pwc name, which is a bit inconsistent with the
> previous patch.
> 

will fix that.

> But, does it even need to be a parameter? AFAICS you always pass true,
> and pwc=true is sort of implied by the name isn't it?
> 

I don't have strong opinion about that. I was wondering having flush_pwc 
explicitly called out is a better indication of we are flushing page 
walk cache. Will drop that in the next update.


-aneesh

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 8/9] mm/mremap: Allow arch runtime override
  2021-04-20  3:52   ` Michael Ellerman
@ 2021-04-20  4:30     ` Aneesh Kumar K.V
  0 siblings, 0 replies; 15+ messages in thread
From: Aneesh Kumar K.V @ 2021-04-20  4:30 UTC (permalink / raw)
  To: Michael Ellerman, linux-mm, akpm; +Cc: joel, linuxppc-dev, npiggin, kaleshsingh

On 4/20/21 9:22 AM, Michael Ellerman wrote:
> "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:
>> Architectures like ppc64 support faster mremap only with radix
>> translation. Hence allow a runtime check w.r.t support for fast mremap.
>>
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>> ---
>>   arch/powerpc/include/asm/tlb.h |  6 ++++++
>>   mm/mremap.c                    | 15 ++++++++++++++-
>>   2 files changed, 20 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/include/asm/tlb.h b/arch/powerpc/include/asm/tlb.h
>> index 160422a439aa..058918a7cd3c 100644
>> --- a/arch/powerpc/include/asm/tlb.h
>> +++ b/arch/powerpc/include/asm/tlb.h
>> @@ -83,5 +83,11 @@ static inline int mm_is_thread_local(struct mm_struct *mm)
>>   }
>>   #endif
>>   
>> +#define arch_supports_page_tables_move arch_supports_page_tables_move
>> +static inline bool arch_supports_page_tables_move(void)
>> +{
>> +	return radix_enabled();
>> +}
> 
> Not sure it's worth a respin on its own, but page table*s* move is
> slightly strange phrasing.
> 
> arch_supports_move_page_tables() or arch_supports_page_table_move()
> would be more typical.
> 

I will switch to arch_supports_page_table_move()

-aneesh


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, back to index

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-14  8:59 [PATCH v4 0/9] Speedup mremap on ppc64 Aneesh Kumar K.V
2021-04-14  8:59 ` [PATCH v4 1/9] selftest/mremap_test: Update the test to handle pagesize other than 4K Aneesh Kumar K.V
2021-04-14  8:59 ` [PATCH v4 2/9] selftest/mremap_test: Avoid crash with static build Aneesh Kumar K.V
2021-04-14  8:59 ` [PATCH v4 3/9] mm/mremap: Use pmd/pud_poplulate to update page table entries Aneesh Kumar K.V
2021-04-14  8:59 ` [PATCH v4 4/9] powerpc/mm/book3s64: Fix possible build error Aneesh Kumar K.V
2021-04-20  3:43   ` Michael Ellerman
2021-04-14  8:59 ` [PATCH v4 5/9] powerpc/mm/book3s64: Update tlb flush routines to take a page walk cache flush argument Aneesh Kumar K.V
2021-04-14  8:59 ` [PATCH v4 6/9] mm/mremap: Use range flush that does TLB and page walk cache flush Aneesh Kumar K.V
2021-04-20  3:47   ` Michael Ellerman
2021-04-20  4:17     ` Aneesh Kumar K.V
2021-04-14  8:59 ` [PATCH v4 7/9] mm/mremap: Move TLB flush outside page table lock Aneesh Kumar K.V
2021-04-14  8:59 ` [PATCH v4 8/9] mm/mremap: Allow arch runtime override Aneesh Kumar K.V
2021-04-20  3:52   ` Michael Ellerman
2021-04-20  4:30     ` Aneesh Kumar K.V
2021-04-14  8:59 ` [PATCH v4 9/9] powerpc/mm: Enable move pmd/pud Aneesh Kumar K.V

LinuxPPC-Dev Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linuxppc-dev/0 linuxppc-dev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linuxppc-dev linuxppc-dev/ https://lore.kernel.org/linuxppc-dev \
		linuxppc-dev@lists.ozlabs.org linuxppc-dev@ozlabs.org
	public-inbox-index linuxppc-dev

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.ozlabs.lists.linuxppc-dev


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git