All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/6] mrermap fixes
@ 2021-06-16  4:52 ` Aneesh Kumar K.V
  0 siblings, 0 replies; 20+ messages in thread
From: Aneesh Kumar K.V @ 2021-06-16  4:52 UTC (permalink / raw)
  To: linux-mm, akpm
  Cc: mpe, linuxppc-dev, kaleshsingh, npiggin, joel, Christophe Leroy,
	Linus Torvalds, Kirill A . Shutemov, Aneesh Kumar K.V

This patch series is split out series from [PATCH v7 00/11] Speedup mremap on ppc64
(https://lore.kernel.org/linux-mm/20210607055131.156184-1-aneesh.kumar@linux.ibm.com)
dropping ppc64 specific changes.

This patchset is dependent on
https://lore.kernel.org/linux-mm/20210615110859.320299-1-aneesh.kumar@linux.ibm.com

ppc64 specific changes will be sent as a separate series depending on this patchset.

Changes from v1:
* cc stable@kernel.org
* USe the correct config for TRANSPARENT_HUGEPAGE_PUD
* use pud_pgtable instead of pud_page_vaddr


Aneesh Kumar K.V (6):
  selftest/mremap_test: Update the test to handle pagesize other than 4K
  selftest/mremap_test: Avoid crash with static build
  mm/mremap: Convert huge PUD move to separate helper
  mm/mremap: Don't enable optimized PUD move if page table levels is 2
  mm/mremap: Use pmd/pud_poplulate to update page table entries
  mm/mremap: hold the rmap lock in write mode when moving page table
    entries.

 mm/mremap.c                              |  92 +++++++++++++++---
 tools/testing/selftests/vm/mremap_test.c | 118 ++++++++++++-----------
 2 files changed, 142 insertions(+), 68 deletions(-)

-- 
2.31.1



^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2 0/6] mrermap fixes
@ 2021-06-16  4:52 ` Aneesh Kumar K.V
  0 siblings, 0 replies; 20+ messages in thread
From: Aneesh Kumar K.V @ 2021-06-16  4:52 UTC (permalink / raw)
  To: linux-mm, akpm
  Cc: Aneesh Kumar K.V, Linus Torvalds, npiggin, kaleshsingh, joel,
	Kirill A . Shutemov, linuxppc-dev

This patch series is split out series from [PATCH v7 00/11] Speedup mremap on ppc64
(https://lore.kernel.org/linux-mm/20210607055131.156184-1-aneesh.kumar@linux.ibm.com)
dropping ppc64 specific changes.

This patchset is dependent on
https://lore.kernel.org/linux-mm/20210615110859.320299-1-aneesh.kumar@linux.ibm.com

ppc64 specific changes will be sent as a separate series depending on this patchset.

Changes from v1:
* cc stable@kernel.org
* USe the correct config for TRANSPARENT_HUGEPAGE_PUD
* use pud_pgtable instead of pud_page_vaddr


Aneesh Kumar K.V (6):
  selftest/mremap_test: Update the test to handle pagesize other than 4K
  selftest/mremap_test: Avoid crash with static build
  mm/mremap: Convert huge PUD move to separate helper
  mm/mremap: Don't enable optimized PUD move if page table levels is 2
  mm/mremap: Use pmd/pud_poplulate to update page table entries
  mm/mremap: hold the rmap lock in write mode when moving page table
    entries.

 mm/mremap.c                              |  92 +++++++++++++++---
 tools/testing/selftests/vm/mremap_test.c | 118 ++++++++++++-----------
 2 files changed, 142 insertions(+), 68 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2 1/6] selftest/mremap_test: Update the test to handle pagesize other than 4K
  2021-06-16  4:52 ` Aneesh Kumar K.V
@ 2021-06-16  4:52   ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 20+ messages in thread
From: Aneesh Kumar K.V @ 2021-06-16  4:52 UTC (permalink / raw)
  To: linux-mm, akpm
  Cc: mpe, linuxppc-dev, kaleshsingh, npiggin, joel, Christophe Leroy,
	Linus Torvalds, Kirill A . Shutemov, Aneesh Kumar K.V

Instead of hardcoding 4K page size fetch it using sysconf(). For the performance
measurements test still assume 2M and 1G are hugepage sizes.

Reviewed-by: Kalesh Singh <kaleshsingh@google.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 tools/testing/selftests/vm/mremap_test.c | 113 ++++++++++++-----------
 1 file changed, 61 insertions(+), 52 deletions(-)

diff --git a/tools/testing/selftests/vm/mremap_test.c b/tools/testing/selftests/vm/mremap_test.c
index 9c391d016922..c9a5461eb786 100644
--- a/tools/testing/selftests/vm/mremap_test.c
+++ b/tools/testing/selftests/vm/mremap_test.c
@@ -45,14 +45,15 @@ enum {
 	_4MB = 4ULL << 20,
 	_1GB = 1ULL << 30,
 	_2GB = 2ULL << 30,
-	PTE = _4KB,
 	PMD = _2MB,
 	PUD = _1GB,
 };
 
+#define PTE page_size
+
 #define MAKE_TEST(source_align, destination_align, size,	\
 		  overlaps, should_fail, test_name)		\
-{								\
+(struct test){							\
 	.name = test_name,					\
 	.config = {						\
 		.src_alignment = source_align,			\
@@ -252,12 +253,17 @@ static int parse_args(int argc, char **argv, unsigned int *threshold_mb,
 	return 0;
 }
 
+#define MAX_TEST 13
+#define MAX_PERF_TEST 3
 int main(int argc, char **argv)
 {
 	int failures = 0;
 	int i, run_perf_tests;
 	unsigned int threshold_mb = VALIDATION_DEFAULT_THRESHOLD;
 	unsigned int pattern_seed;
+	struct test test_cases[MAX_TEST];
+	struct test perf_test_cases[MAX_PERF_TEST];
+	int page_size;
 	time_t t;
 
 	pattern_seed = (unsigned int) time(&t);
@@ -268,56 +274,59 @@ int main(int argc, char **argv)
 	ksft_print_msg("Test configs:\n\tthreshold_mb=%u\n\tpattern_seed=%u\n\n",
 		       threshold_mb, pattern_seed);
 
-	struct test test_cases[] = {
-		/* Expected mremap failures */
-		MAKE_TEST(_4KB, _4KB, _4KB, OVERLAPPING, EXPECT_FAILURE,
-		  "mremap - Source and Destination Regions Overlapping"),
-		MAKE_TEST(_4KB, _1KB, _4KB, NON_OVERLAPPING, EXPECT_FAILURE,
-		  "mremap - Destination Address Misaligned (1KB-aligned)"),
-		MAKE_TEST(_1KB, _4KB, _4KB, NON_OVERLAPPING, EXPECT_FAILURE,
-		  "mremap - Source Address Misaligned (1KB-aligned)"),
-
-		/* Src addr PTE aligned */
-		MAKE_TEST(PTE, PTE, _8KB, NON_OVERLAPPING, EXPECT_SUCCESS,
-		  "8KB mremap - Source PTE-aligned, Destination PTE-aligned"),
-
-		/* Src addr 1MB aligned */
-		MAKE_TEST(_1MB, PTE, _2MB, NON_OVERLAPPING, EXPECT_SUCCESS,
-		  "2MB mremap - Source 1MB-aligned, Destination PTE-aligned"),
-		MAKE_TEST(_1MB, _1MB, _2MB, NON_OVERLAPPING, EXPECT_SUCCESS,
-		  "2MB mremap - Source 1MB-aligned, Destination 1MB-aligned"),
-
-		/* Src addr PMD aligned */
-		MAKE_TEST(PMD, PTE, _4MB, NON_OVERLAPPING, EXPECT_SUCCESS,
-		  "4MB mremap - Source PMD-aligned, Destination PTE-aligned"),
-		MAKE_TEST(PMD, _1MB, _4MB, NON_OVERLAPPING, EXPECT_SUCCESS,
-		  "4MB mremap - Source PMD-aligned, Destination 1MB-aligned"),
-		MAKE_TEST(PMD, PMD, _4MB, NON_OVERLAPPING, EXPECT_SUCCESS,
-		  "4MB mremap - Source PMD-aligned, Destination PMD-aligned"),
-
-		/* Src addr PUD aligned */
-		MAKE_TEST(PUD, PTE, _2GB, NON_OVERLAPPING, EXPECT_SUCCESS,
-		  "2GB mremap - Source PUD-aligned, Destination PTE-aligned"),
-		MAKE_TEST(PUD, _1MB, _2GB, NON_OVERLAPPING, EXPECT_SUCCESS,
-		  "2GB mremap - Source PUD-aligned, Destination 1MB-aligned"),
-		MAKE_TEST(PUD, PMD, _2GB, NON_OVERLAPPING, EXPECT_SUCCESS,
-		  "2GB mremap - Source PUD-aligned, Destination PMD-aligned"),
-		MAKE_TEST(PUD, PUD, _2GB, NON_OVERLAPPING, EXPECT_SUCCESS,
-		  "2GB mremap - Source PUD-aligned, Destination PUD-aligned"),
-	};
-
-	struct test perf_test_cases[] = {
-		/*
-		 * mremap 1GB region - Page table level aligned time
-		 * comparison.
-		 */
-		MAKE_TEST(PTE, PTE, _1GB, NON_OVERLAPPING, EXPECT_SUCCESS,
-		  "1GB mremap - Source PTE-aligned, Destination PTE-aligned"),
-		MAKE_TEST(PMD, PMD, _1GB, NON_OVERLAPPING, EXPECT_SUCCESS,
-		  "1GB mremap - Source PMD-aligned, Destination PMD-aligned"),
-		MAKE_TEST(PUD, PUD, _1GB, NON_OVERLAPPING, EXPECT_SUCCESS,
-		  "1GB mremap - Source PUD-aligned, Destination PUD-aligned"),
-	};
+	page_size = sysconf(_SC_PAGESIZE);
+
+	/* Expected mremap failures */
+	test_cases[0] =	MAKE_TEST(page_size, page_size, page_size,
+				  OVERLAPPING, EXPECT_FAILURE,
+				  "mremap - Source and Destination Regions Overlapping");
+
+	test_cases[1] = MAKE_TEST(page_size, page_size/4, page_size,
+				  NON_OVERLAPPING, EXPECT_FAILURE,
+				  "mremap - Destination Address Misaligned (1KB-aligned)");
+	test_cases[2] = MAKE_TEST(page_size/4, page_size, page_size,
+				  NON_OVERLAPPING, EXPECT_FAILURE,
+				  "mremap - Source Address Misaligned (1KB-aligned)");
+
+	/* Src addr PTE aligned */
+	test_cases[3] = MAKE_TEST(PTE, PTE, PTE * 2,
+				  NON_OVERLAPPING, EXPECT_SUCCESS,
+				  "8KB mremap - Source PTE-aligned, Destination PTE-aligned");
+
+	/* Src addr 1MB aligned */
+	test_cases[4] = MAKE_TEST(_1MB, PTE, _2MB, NON_OVERLAPPING, EXPECT_SUCCESS,
+				  "2MB mremap - Source 1MB-aligned, Destination PTE-aligned");
+	test_cases[5] = MAKE_TEST(_1MB, _1MB, _2MB, NON_OVERLAPPING, EXPECT_SUCCESS,
+				  "2MB mremap - Source 1MB-aligned, Destination 1MB-aligned");
+
+	/* Src addr PMD aligned */
+	test_cases[6] = MAKE_TEST(PMD, PTE, _4MB, NON_OVERLAPPING, EXPECT_SUCCESS,
+				  "4MB mremap - Source PMD-aligned, Destination PTE-aligned");
+	test_cases[7] =	MAKE_TEST(PMD, _1MB, _4MB, NON_OVERLAPPING, EXPECT_SUCCESS,
+				  "4MB mremap - Source PMD-aligned, Destination 1MB-aligned");
+	test_cases[8] = MAKE_TEST(PMD, PMD, _4MB, NON_OVERLAPPING, EXPECT_SUCCESS,
+				  "4MB mremap - Source PMD-aligned, Destination PMD-aligned");
+
+	/* Src addr PUD aligned */
+	test_cases[9] = MAKE_TEST(PUD, PTE, _2GB, NON_OVERLAPPING, EXPECT_SUCCESS,
+				  "2GB mremap - Source PUD-aligned, Destination PTE-aligned");
+	test_cases[10] = MAKE_TEST(PUD, _1MB, _2GB, NON_OVERLAPPING, EXPECT_SUCCESS,
+				   "2GB mremap - Source PUD-aligned, Destination 1MB-aligned");
+	test_cases[11] = MAKE_TEST(PUD, PMD, _2GB, NON_OVERLAPPING, EXPECT_SUCCESS,
+				   "2GB mremap - Source PUD-aligned, Destination PMD-aligned");
+	test_cases[12] = MAKE_TEST(PUD, PUD, _2GB, NON_OVERLAPPING, EXPECT_SUCCESS,
+				   "2GB mremap - Source PUD-aligned, Destination PUD-aligned");
+
+	perf_test_cases[0] =  MAKE_TEST(page_size, page_size, _1GB, NON_OVERLAPPING, EXPECT_SUCCESS,
+					"1GB mremap - Source PTE-aligned, Destination PTE-aligned");
+	/*
+	 * mremap 1GB region - Page table level aligned time
+	 * comparison.
+	 */
+	perf_test_cases[1] = MAKE_TEST(PMD, PMD, _1GB, NON_OVERLAPPING, EXPECT_SUCCESS,
+				       "1GB mremap - Source PMD-aligned, Destination PMD-aligned");
+	perf_test_cases[2] = MAKE_TEST(PUD, PUD, _1GB, NON_OVERLAPPING, EXPECT_SUCCESS,
+				       "1GB mremap - Source PUD-aligned, Destination PUD-aligned");
 
 	run_perf_tests =  (threshold_mb == VALIDATION_NO_THRESHOLD) ||
 				(threshold_mb * _1MB >= _1GB);
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 1/6] selftest/mremap_test: Update the test to handle pagesize other than 4K
@ 2021-06-16  4:52   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 20+ messages in thread
From: Aneesh Kumar K.V @ 2021-06-16  4:52 UTC (permalink / raw)
  To: linux-mm, akpm
  Cc: Aneesh Kumar K.V, Linus Torvalds, npiggin, kaleshsingh, joel,
	Kirill A . Shutemov, linuxppc-dev

Instead of hardcoding 4K page size fetch it using sysconf(). For the performance
measurements test still assume 2M and 1G are hugepage sizes.

Reviewed-by: Kalesh Singh <kaleshsingh@google.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 tools/testing/selftests/vm/mremap_test.c | 113 ++++++++++++-----------
 1 file changed, 61 insertions(+), 52 deletions(-)

diff --git a/tools/testing/selftests/vm/mremap_test.c b/tools/testing/selftests/vm/mremap_test.c
index 9c391d016922..c9a5461eb786 100644
--- a/tools/testing/selftests/vm/mremap_test.c
+++ b/tools/testing/selftests/vm/mremap_test.c
@@ -45,14 +45,15 @@ enum {
 	_4MB = 4ULL << 20,
 	_1GB = 1ULL << 30,
 	_2GB = 2ULL << 30,
-	PTE = _4KB,
 	PMD = _2MB,
 	PUD = _1GB,
 };
 
+#define PTE page_size
+
 #define MAKE_TEST(source_align, destination_align, size,	\
 		  overlaps, should_fail, test_name)		\
-{								\
+(struct test){							\
 	.name = test_name,					\
 	.config = {						\
 		.src_alignment = source_align,			\
@@ -252,12 +253,17 @@ static int parse_args(int argc, char **argv, unsigned int *threshold_mb,
 	return 0;
 }
 
+#define MAX_TEST 13
+#define MAX_PERF_TEST 3
 int main(int argc, char **argv)
 {
 	int failures = 0;
 	int i, run_perf_tests;
 	unsigned int threshold_mb = VALIDATION_DEFAULT_THRESHOLD;
 	unsigned int pattern_seed;
+	struct test test_cases[MAX_TEST];
+	struct test perf_test_cases[MAX_PERF_TEST];
+	int page_size;
 	time_t t;
 
 	pattern_seed = (unsigned int) time(&t);
@@ -268,56 +274,59 @@ int main(int argc, char **argv)
 	ksft_print_msg("Test configs:\n\tthreshold_mb=%u\n\tpattern_seed=%u\n\n",
 		       threshold_mb, pattern_seed);
 
-	struct test test_cases[] = {
-		/* Expected mremap failures */
-		MAKE_TEST(_4KB, _4KB, _4KB, OVERLAPPING, EXPECT_FAILURE,
-		  "mremap - Source and Destination Regions Overlapping"),
-		MAKE_TEST(_4KB, _1KB, _4KB, NON_OVERLAPPING, EXPECT_FAILURE,
-		  "mremap - Destination Address Misaligned (1KB-aligned)"),
-		MAKE_TEST(_1KB, _4KB, _4KB, NON_OVERLAPPING, EXPECT_FAILURE,
-		  "mremap - Source Address Misaligned (1KB-aligned)"),
-
-		/* Src addr PTE aligned */
-		MAKE_TEST(PTE, PTE, _8KB, NON_OVERLAPPING, EXPECT_SUCCESS,
-		  "8KB mremap - Source PTE-aligned, Destination PTE-aligned"),
-
-		/* Src addr 1MB aligned */
-		MAKE_TEST(_1MB, PTE, _2MB, NON_OVERLAPPING, EXPECT_SUCCESS,
-		  "2MB mremap - Source 1MB-aligned, Destination PTE-aligned"),
-		MAKE_TEST(_1MB, _1MB, _2MB, NON_OVERLAPPING, EXPECT_SUCCESS,
-		  "2MB mremap - Source 1MB-aligned, Destination 1MB-aligned"),
-
-		/* Src addr PMD aligned */
-		MAKE_TEST(PMD, PTE, _4MB, NON_OVERLAPPING, EXPECT_SUCCESS,
-		  "4MB mremap - Source PMD-aligned, Destination PTE-aligned"),
-		MAKE_TEST(PMD, _1MB, _4MB, NON_OVERLAPPING, EXPECT_SUCCESS,
-		  "4MB mremap - Source PMD-aligned, Destination 1MB-aligned"),
-		MAKE_TEST(PMD, PMD, _4MB, NON_OVERLAPPING, EXPECT_SUCCESS,
-		  "4MB mremap - Source PMD-aligned, Destination PMD-aligned"),
-
-		/* Src addr PUD aligned */
-		MAKE_TEST(PUD, PTE, _2GB, NON_OVERLAPPING, EXPECT_SUCCESS,
-		  "2GB mremap - Source PUD-aligned, Destination PTE-aligned"),
-		MAKE_TEST(PUD, _1MB, _2GB, NON_OVERLAPPING, EXPECT_SUCCESS,
-		  "2GB mremap - Source PUD-aligned, Destination 1MB-aligned"),
-		MAKE_TEST(PUD, PMD, _2GB, NON_OVERLAPPING, EXPECT_SUCCESS,
-		  "2GB mremap - Source PUD-aligned, Destination PMD-aligned"),
-		MAKE_TEST(PUD, PUD, _2GB, NON_OVERLAPPING, EXPECT_SUCCESS,
-		  "2GB mremap - Source PUD-aligned, Destination PUD-aligned"),
-	};
-
-	struct test perf_test_cases[] = {
-		/*
-		 * mremap 1GB region - Page table level aligned time
-		 * comparison.
-		 */
-		MAKE_TEST(PTE, PTE, _1GB, NON_OVERLAPPING, EXPECT_SUCCESS,
-		  "1GB mremap - Source PTE-aligned, Destination PTE-aligned"),
-		MAKE_TEST(PMD, PMD, _1GB, NON_OVERLAPPING, EXPECT_SUCCESS,
-		  "1GB mremap - Source PMD-aligned, Destination PMD-aligned"),
-		MAKE_TEST(PUD, PUD, _1GB, NON_OVERLAPPING, EXPECT_SUCCESS,
-		  "1GB mremap - Source PUD-aligned, Destination PUD-aligned"),
-	};
+	page_size = sysconf(_SC_PAGESIZE);
+
+	/* Expected mremap failures */
+	test_cases[0] =	MAKE_TEST(page_size, page_size, page_size,
+				  OVERLAPPING, EXPECT_FAILURE,
+				  "mremap - Source and Destination Regions Overlapping");
+
+	test_cases[1] = MAKE_TEST(page_size, page_size/4, page_size,
+				  NON_OVERLAPPING, EXPECT_FAILURE,
+				  "mremap - Destination Address Misaligned (1KB-aligned)");
+	test_cases[2] = MAKE_TEST(page_size/4, page_size, page_size,
+				  NON_OVERLAPPING, EXPECT_FAILURE,
+				  "mremap - Source Address Misaligned (1KB-aligned)");
+
+	/* Src addr PTE aligned */
+	test_cases[3] = MAKE_TEST(PTE, PTE, PTE * 2,
+				  NON_OVERLAPPING, EXPECT_SUCCESS,
+				  "8KB mremap - Source PTE-aligned, Destination PTE-aligned");
+
+	/* Src addr 1MB aligned */
+	test_cases[4] = MAKE_TEST(_1MB, PTE, _2MB, NON_OVERLAPPING, EXPECT_SUCCESS,
+				  "2MB mremap - Source 1MB-aligned, Destination PTE-aligned");
+	test_cases[5] = MAKE_TEST(_1MB, _1MB, _2MB, NON_OVERLAPPING, EXPECT_SUCCESS,
+				  "2MB mremap - Source 1MB-aligned, Destination 1MB-aligned");
+
+	/* Src addr PMD aligned */
+	test_cases[6] = MAKE_TEST(PMD, PTE, _4MB, NON_OVERLAPPING, EXPECT_SUCCESS,
+				  "4MB mremap - Source PMD-aligned, Destination PTE-aligned");
+	test_cases[7] =	MAKE_TEST(PMD, _1MB, _4MB, NON_OVERLAPPING, EXPECT_SUCCESS,
+				  "4MB mremap - Source PMD-aligned, Destination 1MB-aligned");
+	test_cases[8] = MAKE_TEST(PMD, PMD, _4MB, NON_OVERLAPPING, EXPECT_SUCCESS,
+				  "4MB mremap - Source PMD-aligned, Destination PMD-aligned");
+
+	/* Src addr PUD aligned */
+	test_cases[9] = MAKE_TEST(PUD, PTE, _2GB, NON_OVERLAPPING, EXPECT_SUCCESS,
+				  "2GB mremap - Source PUD-aligned, Destination PTE-aligned");
+	test_cases[10] = MAKE_TEST(PUD, _1MB, _2GB, NON_OVERLAPPING, EXPECT_SUCCESS,
+				   "2GB mremap - Source PUD-aligned, Destination 1MB-aligned");
+	test_cases[11] = MAKE_TEST(PUD, PMD, _2GB, NON_OVERLAPPING, EXPECT_SUCCESS,
+				   "2GB mremap - Source PUD-aligned, Destination PMD-aligned");
+	test_cases[12] = MAKE_TEST(PUD, PUD, _2GB, NON_OVERLAPPING, EXPECT_SUCCESS,
+				   "2GB mremap - Source PUD-aligned, Destination PUD-aligned");
+
+	perf_test_cases[0] =  MAKE_TEST(page_size, page_size, _1GB, NON_OVERLAPPING, EXPECT_SUCCESS,
+					"1GB mremap - Source PTE-aligned, Destination PTE-aligned");
+	/*
+	 * mremap 1GB region - Page table level aligned time
+	 * comparison.
+	 */
+	perf_test_cases[1] = MAKE_TEST(PMD, PMD, _1GB, NON_OVERLAPPING, EXPECT_SUCCESS,
+				       "1GB mremap - Source PMD-aligned, Destination PMD-aligned");
+	perf_test_cases[2] = MAKE_TEST(PUD, PUD, _1GB, NON_OVERLAPPING, EXPECT_SUCCESS,
+				       "1GB mremap - Source PUD-aligned, Destination PUD-aligned");
 
 	run_perf_tests =  (threshold_mb == VALIDATION_NO_THRESHOLD) ||
 				(threshold_mb * _1MB >= _1GB);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 2/6] selftest/mremap_test: Avoid crash with static build
  2021-06-16  4:52 ` Aneesh Kumar K.V
@ 2021-06-16  4:52   ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 20+ messages in thread
From: Aneesh Kumar K.V @ 2021-06-16  4:52 UTC (permalink / raw)
  To: linux-mm, akpm
  Cc: mpe, linuxppc-dev, kaleshsingh, npiggin, joel, Christophe Leroy,
	Linus Torvalds, Kirill A . Shutemov, Aneesh Kumar K.V

With a large mmap map size, we can overlap with the text area and using
MAP_FIXED results in unmapping that area. Switch to MAP_FIXED_NOREPLACE
and handle the EEXIST error.

Reviewed-by: Kalesh Singh <kaleshsingh@google.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 tools/testing/selftests/vm/mremap_test.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/vm/mremap_test.c b/tools/testing/selftests/vm/mremap_test.c
index c9a5461eb786..0624d1bd71b5 100644
--- a/tools/testing/selftests/vm/mremap_test.c
+++ b/tools/testing/selftests/vm/mremap_test.c
@@ -75,9 +75,10 @@ static void *get_source_mapping(struct config c)
 retry:
 	addr += c.src_alignment;
 	src_addr = mmap((void *) addr, c.region_size, PROT_READ | PROT_WRITE,
-			MAP_FIXED | MAP_ANONYMOUS | MAP_SHARED, -1, 0);
+			MAP_FIXED_NOREPLACE | MAP_ANONYMOUS | MAP_SHARED,
+			-1, 0);
 	if (src_addr == MAP_FAILED) {
-		if (errno == EPERM)
+		if (errno == EPERM || errno == EEXIST)
 			goto retry;
 		goto error;
 	}
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 2/6] selftest/mremap_test: Avoid crash with static build
@ 2021-06-16  4:52   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 20+ messages in thread
From: Aneesh Kumar K.V @ 2021-06-16  4:52 UTC (permalink / raw)
  To: linux-mm, akpm
  Cc: Aneesh Kumar K.V, Linus Torvalds, npiggin, kaleshsingh, joel,
	Kirill A . Shutemov, linuxppc-dev

With a large mmap map size, we can overlap with the text area and using
MAP_FIXED results in unmapping that area. Switch to MAP_FIXED_NOREPLACE
and handle the EEXIST error.

Reviewed-by: Kalesh Singh <kaleshsingh@google.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 tools/testing/selftests/vm/mremap_test.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/vm/mremap_test.c b/tools/testing/selftests/vm/mremap_test.c
index c9a5461eb786..0624d1bd71b5 100644
--- a/tools/testing/selftests/vm/mremap_test.c
+++ b/tools/testing/selftests/vm/mremap_test.c
@@ -75,9 +75,10 @@ static void *get_source_mapping(struct config c)
 retry:
 	addr += c.src_alignment;
 	src_addr = mmap((void *) addr, c.region_size, PROT_READ | PROT_WRITE,
-			MAP_FIXED | MAP_ANONYMOUS | MAP_SHARED, -1, 0);
+			MAP_FIXED_NOREPLACE | MAP_ANONYMOUS | MAP_SHARED,
+			-1, 0);
 	if (src_addr == MAP_FAILED) {
-		if (errno == EPERM)
+		if (errno == EPERM || errno == EEXIST)
 			goto retry;
 		goto error;
 	}
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 3/6] mm/mremap: Convert huge PUD move to separate helper
  2021-06-16  4:52 ` Aneesh Kumar K.V
@ 2021-06-16  4:52   ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 20+ messages in thread
From: Aneesh Kumar K.V @ 2021-06-16  4:52 UTC (permalink / raw)
  To: linux-mm, akpm
  Cc: mpe, linuxppc-dev, kaleshsingh, npiggin, joel, Christophe Leroy,
	Linus Torvalds, Kirill A . Shutemov, Aneesh Kumar K.V

With TRANSPARENT_HUGEPAGE_PUD enabled the kernel can find huge PUD entries.
Add a helper to move huge PUD entries on mremap().

This will be used by a later patch to optimize mremap of PUD_SIZE aligned
level 4 PTE mapped address

This also make sure we support mremap on huge PUD entries even with
CONFIG_HAVE_MOVE_PUD disabled.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 mm/mremap.c | 79 ++++++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 72 insertions(+), 7 deletions(-)

diff --git a/mm/mremap.c b/mm/mremap.c
index 47c255b60150..958ecdc6f29d 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -324,10 +324,61 @@ static inline bool move_normal_pud(struct vm_area_struct *vma,
 }
 #endif
 
+#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
+static bool move_huge_pud(struct vm_area_struct *vma, unsigned long old_addr,
+			  unsigned long new_addr, pud_t *old_pud, pud_t *new_pud)
+{
+	spinlock_t *old_ptl, *new_ptl;
+	struct mm_struct *mm = vma->vm_mm;
+	pud_t pud;
+
+	/*
+	 * The destination pud shouldn't be established, free_pgtables()
+	 * should have released it.
+	 */
+	if (WARN_ON_ONCE(!pud_none(*new_pud)))
+		return false;
+
+	/*
+	 * We don't have to worry about the ordering of src and dst
+	 * ptlocks because exclusive mmap_lock prevents deadlock.
+	 */
+	old_ptl = pud_lock(vma->vm_mm, old_pud);
+	new_ptl = pud_lockptr(mm, new_pud);
+	if (new_ptl != old_ptl)
+		spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING);
+
+	/* Clear the pud */
+	pud = *old_pud;
+	pud_clear(old_pud);
+
+	VM_BUG_ON(!pud_none(*new_pud));
+
+	/* Set the new pud */
+	/* mark soft_ditry when we add pud level soft dirty support */
+	set_pud_at(mm, new_addr, new_pud, pud);
+	flush_pud_tlb_range(vma, old_addr, old_addr + HPAGE_PUD_SIZE);
+	if (new_ptl != old_ptl)
+		spin_unlock(new_ptl);
+	spin_unlock(old_ptl);
+
+	return true;
+}
+#else
+static bool move_huge_pud(struct vm_area_struct *vma, unsigned long old_addr,
+			  unsigned long new_addr, pud_t *old_pud, pud_t *new_pud)
+{
+	WARN_ON_ONCE(1);
+	return false;
+
+}
+#endif
+
 enum pgt_entry {
 	NORMAL_PMD,
 	HPAGE_PMD,
 	NORMAL_PUD,
+	HPAGE_PUD,
 };
 
 /*
@@ -347,6 +398,7 @@ static __always_inline unsigned long get_extent(enum pgt_entry entry,
 		mask = PMD_MASK;
 		size = PMD_SIZE;
 		break;
+	case HPAGE_PUD:
 	case NORMAL_PUD:
 		mask = PUD_MASK;
 		size = PUD_SIZE;
@@ -395,6 +447,11 @@ static bool move_pgt_entry(enum pgt_entry entry, struct vm_area_struct *vma,
 			move_huge_pmd(vma, old_addr, new_addr, old_entry,
 				      new_entry);
 		break;
+	case HPAGE_PUD:
+		moved = move_huge_pud(vma, old_addr, new_addr, old_entry,
+				      new_entry);
+		break;
+
 	default:
 		WARN_ON_ONCE(1);
 		break;
@@ -414,6 +471,7 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
 	unsigned long extent, old_end;
 	struct mmu_notifier_range range;
 	pmd_t *old_pmd, *new_pmd;
+	pud_t *old_pud, *new_pud;
 
 	old_end = old_addr + len;
 	flush_cache_range(vma, old_addr, old_end);
@@ -429,15 +487,22 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
 		 * PUD level if possible.
 		 */
 		extent = get_extent(NORMAL_PUD, old_addr, old_end, new_addr);
-		if (IS_ENABLED(CONFIG_HAVE_MOVE_PUD) && extent == PUD_SIZE) {
-			pud_t *old_pud, *new_pud;
 
-			old_pud = get_old_pud(vma->vm_mm, old_addr);
-			if (!old_pud)
+		old_pud = get_old_pud(vma->vm_mm, old_addr);
+		if (!old_pud)
+			continue;
+		new_pud = alloc_new_pud(vma->vm_mm, vma, new_addr);
+		if (!new_pud)
+			break;
+		if (pud_trans_huge(*old_pud) || pud_devmap(*old_pud)) {
+			if (extent == HPAGE_PUD_SIZE) {
+				move_pgt_entry(HPAGE_PUD, vma, old_addr, new_addr,
+					       old_pud, new_pud, need_rmap_locks);
+				/* We ignore and continue on error? */
 				continue;
-			new_pud = alloc_new_pud(vma->vm_mm, vma, new_addr);
-			if (!new_pud)
-				break;
+			}
+		} else if (IS_ENABLED(CONFIG_HAVE_MOVE_PUD) && extent == PUD_SIZE) {
+
 			if (move_pgt_entry(NORMAL_PUD, vma, old_addr, new_addr,
 					   old_pud, new_pud, need_rmap_locks))
 				continue;
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 3/6] mm/mremap: Convert huge PUD move to separate helper
@ 2021-06-16  4:52   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 20+ messages in thread
From: Aneesh Kumar K.V @ 2021-06-16  4:52 UTC (permalink / raw)
  To: linux-mm, akpm
  Cc: Aneesh Kumar K.V, Linus Torvalds, npiggin, kaleshsingh, joel,
	Kirill A . Shutemov, linuxppc-dev

With TRANSPARENT_HUGEPAGE_PUD enabled the kernel can find huge PUD entries.
Add a helper to move huge PUD entries on mremap().

This will be used by a later patch to optimize mremap of PUD_SIZE aligned
level 4 PTE mapped address

This also make sure we support mremap on huge PUD entries even with
CONFIG_HAVE_MOVE_PUD disabled.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 mm/mremap.c | 79 ++++++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 72 insertions(+), 7 deletions(-)

diff --git a/mm/mremap.c b/mm/mremap.c
index 47c255b60150..958ecdc6f29d 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -324,10 +324,61 @@ static inline bool move_normal_pud(struct vm_area_struct *vma,
 }
 #endif
 
+#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
+static bool move_huge_pud(struct vm_area_struct *vma, unsigned long old_addr,
+			  unsigned long new_addr, pud_t *old_pud, pud_t *new_pud)
+{
+	spinlock_t *old_ptl, *new_ptl;
+	struct mm_struct *mm = vma->vm_mm;
+	pud_t pud;
+
+	/*
+	 * The destination pud shouldn't be established, free_pgtables()
+	 * should have released it.
+	 */
+	if (WARN_ON_ONCE(!pud_none(*new_pud)))
+		return false;
+
+	/*
+	 * We don't have to worry about the ordering of src and dst
+	 * ptlocks because exclusive mmap_lock prevents deadlock.
+	 */
+	old_ptl = pud_lock(vma->vm_mm, old_pud);
+	new_ptl = pud_lockptr(mm, new_pud);
+	if (new_ptl != old_ptl)
+		spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING);
+
+	/* Clear the pud */
+	pud = *old_pud;
+	pud_clear(old_pud);
+
+	VM_BUG_ON(!pud_none(*new_pud));
+
+	/* Set the new pud */
+	/* mark soft_ditry when we add pud level soft dirty support */
+	set_pud_at(mm, new_addr, new_pud, pud);
+	flush_pud_tlb_range(vma, old_addr, old_addr + HPAGE_PUD_SIZE);
+	if (new_ptl != old_ptl)
+		spin_unlock(new_ptl);
+	spin_unlock(old_ptl);
+
+	return true;
+}
+#else
+static bool move_huge_pud(struct vm_area_struct *vma, unsigned long old_addr,
+			  unsigned long new_addr, pud_t *old_pud, pud_t *new_pud)
+{
+	WARN_ON_ONCE(1);
+	return false;
+
+}
+#endif
+
 enum pgt_entry {
 	NORMAL_PMD,
 	HPAGE_PMD,
 	NORMAL_PUD,
+	HPAGE_PUD,
 };
 
 /*
@@ -347,6 +398,7 @@ static __always_inline unsigned long get_extent(enum pgt_entry entry,
 		mask = PMD_MASK;
 		size = PMD_SIZE;
 		break;
+	case HPAGE_PUD:
 	case NORMAL_PUD:
 		mask = PUD_MASK;
 		size = PUD_SIZE;
@@ -395,6 +447,11 @@ static bool move_pgt_entry(enum pgt_entry entry, struct vm_area_struct *vma,
 			move_huge_pmd(vma, old_addr, new_addr, old_entry,
 				      new_entry);
 		break;
+	case HPAGE_PUD:
+		moved = move_huge_pud(vma, old_addr, new_addr, old_entry,
+				      new_entry);
+		break;
+
 	default:
 		WARN_ON_ONCE(1);
 		break;
@@ -414,6 +471,7 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
 	unsigned long extent, old_end;
 	struct mmu_notifier_range range;
 	pmd_t *old_pmd, *new_pmd;
+	pud_t *old_pud, *new_pud;
 
 	old_end = old_addr + len;
 	flush_cache_range(vma, old_addr, old_end);
@@ -429,15 +487,22 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
 		 * PUD level if possible.
 		 */
 		extent = get_extent(NORMAL_PUD, old_addr, old_end, new_addr);
-		if (IS_ENABLED(CONFIG_HAVE_MOVE_PUD) && extent == PUD_SIZE) {
-			pud_t *old_pud, *new_pud;
 
-			old_pud = get_old_pud(vma->vm_mm, old_addr);
-			if (!old_pud)
+		old_pud = get_old_pud(vma->vm_mm, old_addr);
+		if (!old_pud)
+			continue;
+		new_pud = alloc_new_pud(vma->vm_mm, vma, new_addr);
+		if (!new_pud)
+			break;
+		if (pud_trans_huge(*old_pud) || pud_devmap(*old_pud)) {
+			if (extent == HPAGE_PUD_SIZE) {
+				move_pgt_entry(HPAGE_PUD, vma, old_addr, new_addr,
+					       old_pud, new_pud, need_rmap_locks);
+				/* We ignore and continue on error? */
 				continue;
-			new_pud = alloc_new_pud(vma->vm_mm, vma, new_addr);
-			if (!new_pud)
-				break;
+			}
+		} else if (IS_ENABLED(CONFIG_HAVE_MOVE_PUD) && extent == PUD_SIZE) {
+
 			if (move_pgt_entry(NORMAL_PUD, vma, old_addr, new_addr,
 					   old_pud, new_pud, need_rmap_locks))
 				continue;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 4/6] mm/mremap: Don't enable optimized PUD move if page table levels is 2
  2021-06-16  4:52 ` Aneesh Kumar K.V
@ 2021-06-16  4:52   ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 20+ messages in thread
From: Aneesh Kumar K.V @ 2021-06-16  4:52 UTC (permalink / raw)
  To: linux-mm, akpm
  Cc: mpe, linuxppc-dev, kaleshsingh, npiggin, joel, Christophe Leroy,
	Linus Torvalds, Kirill A . Shutemov, Aneesh Kumar K.V

With two level page table don't enable move_normal_pud.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 mm/mremap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/mremap.c b/mm/mremap.c
index 958ecdc6f29d..97313e316a4d 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -276,7 +276,7 @@ static inline bool move_normal_pmd(struct vm_area_struct *vma,
 }
 #endif
 
-#ifdef CONFIG_HAVE_MOVE_PUD
+#if CONFIG_PGTABLE_LEVELS > 2 && defined(CONFIG_HAVE_MOVE_PUD)
 static bool move_normal_pud(struct vm_area_struct *vma, unsigned long old_addr,
 		  unsigned long new_addr, pud_t *old_pud, pud_t *new_pud)
 {
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 4/6] mm/mremap: Don't enable optimized PUD move if page table levels is 2
@ 2021-06-16  4:52   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 20+ messages in thread
From: Aneesh Kumar K.V @ 2021-06-16  4:52 UTC (permalink / raw)
  To: linux-mm, akpm
  Cc: Aneesh Kumar K.V, Linus Torvalds, npiggin, kaleshsingh, joel,
	Kirill A . Shutemov, linuxppc-dev

With two level page table don't enable move_normal_pud.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 mm/mremap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/mremap.c b/mm/mremap.c
index 958ecdc6f29d..97313e316a4d 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -276,7 +276,7 @@ static inline bool move_normal_pmd(struct vm_area_struct *vma,
 }
 #endif
 
-#ifdef CONFIG_HAVE_MOVE_PUD
+#if CONFIG_PGTABLE_LEVELS > 2 && defined(CONFIG_HAVE_MOVE_PUD)
 static bool move_normal_pud(struct vm_area_struct *vma, unsigned long old_addr,
 		  unsigned long new_addr, pud_t *old_pud, pud_t *new_pud)
 {
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 5/6] mm/mremap: Use pmd/pud_poplulate to update page table entries
  2021-06-16  4:52 ` Aneesh Kumar K.V
@ 2021-06-16  4:52   ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 20+ messages in thread
From: Aneesh Kumar K.V @ 2021-06-16  4:52 UTC (permalink / raw)
  To: linux-mm, akpm
  Cc: mpe, linuxppc-dev, kaleshsingh, npiggin, joel, Christophe Leroy,
	Linus Torvalds, Kirill A . Shutemov, Aneesh Kumar K.V

pmd/pud_populate is the right interface to be used to set the respective
page table entries. Some architectures like ppc64 do assume that set_pmd/pud_at
can only be used to set a hugepage PTE. Since we are not setting up a hugepage
PTE here, use the pmd/pud_populate interface.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 mm/mremap.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/mm/mremap.c b/mm/mremap.c
index 97313e316a4d..72fa0491681e 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -26,6 +26,7 @@
 
 #include <asm/cacheflush.h>
 #include <asm/tlbflush.h>
+#include <asm/pgalloc.h>
 
 #include "internal.h"
 
@@ -258,8 +259,7 @@ static bool move_normal_pmd(struct vm_area_struct *vma, unsigned long old_addr,
 
 	VM_BUG_ON(!pmd_none(*new_pmd));
 
-	/* Set the new pmd */
-	set_pmd_at(mm, new_addr, new_pmd, pmd);
+	pmd_populate(mm, new_pmd, pmd_pgtable(pmd));
 	flush_tlb_range(vma, old_addr, old_addr + PMD_SIZE);
 	if (new_ptl != old_ptl)
 		spin_unlock(new_ptl);
@@ -306,8 +306,7 @@ static bool move_normal_pud(struct vm_area_struct *vma, unsigned long old_addr,
 
 	VM_BUG_ON(!pud_none(*new_pud));
 
-	/* Set the new pud */
-	set_pud_at(mm, new_addr, new_pud, pud);
+	pud_populate(mm, new_pud, pud_pgtable(pud));
 	flush_tlb_range(vma, old_addr, old_addr + PUD_SIZE);
 	if (new_ptl != old_ptl)
 		spin_unlock(new_ptl);
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 5/6] mm/mremap: Use pmd/pud_poplulate to update page table entries
@ 2021-06-16  4:52   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 20+ messages in thread
From: Aneesh Kumar K.V @ 2021-06-16  4:52 UTC (permalink / raw)
  To: linux-mm, akpm
  Cc: Aneesh Kumar K.V, Linus Torvalds, npiggin, kaleshsingh, joel,
	Kirill A . Shutemov, linuxppc-dev

pmd/pud_populate is the right interface to be used to set the respective
page table entries. Some architectures like ppc64 do assume that set_pmd/pud_at
can only be used to set a hugepage PTE. Since we are not setting up a hugepage
PTE here, use the pmd/pud_populate interface.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 mm/mremap.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/mm/mremap.c b/mm/mremap.c
index 97313e316a4d..72fa0491681e 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -26,6 +26,7 @@
 
 #include <asm/cacheflush.h>
 #include <asm/tlbflush.h>
+#include <asm/pgalloc.h>
 
 #include "internal.h"
 
@@ -258,8 +259,7 @@ static bool move_normal_pmd(struct vm_area_struct *vma, unsigned long old_addr,
 
 	VM_BUG_ON(!pmd_none(*new_pmd));
 
-	/* Set the new pmd */
-	set_pmd_at(mm, new_addr, new_pmd, pmd);
+	pmd_populate(mm, new_pmd, pmd_pgtable(pmd));
 	flush_tlb_range(vma, old_addr, old_addr + PMD_SIZE);
 	if (new_ptl != old_ptl)
 		spin_unlock(new_ptl);
@@ -306,8 +306,7 @@ static bool move_normal_pud(struct vm_area_struct *vma, unsigned long old_addr,
 
 	VM_BUG_ON(!pud_none(*new_pud));
 
-	/* Set the new pud */
-	set_pud_at(mm, new_addr, new_pud, pud);
+	pud_populate(mm, new_pud, pud_pgtable(pud));
 	flush_tlb_range(vma, old_addr, old_addr + PUD_SIZE);
 	if (new_ptl != old_ptl)
 		spin_unlock(new_ptl);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 6/6] mm/mremap: hold the rmap lock in write mode when moving page table entries.
  2021-06-16  4:52 ` Aneesh Kumar K.V
@ 2021-06-16  4:52   ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 20+ messages in thread
From: Aneesh Kumar K.V @ 2021-06-16  4:52 UTC (permalink / raw)
  To: linux-mm, akpm
  Cc: mpe, linuxppc-dev, kaleshsingh, npiggin, joel, Christophe Leroy,
	Linus Torvalds, Kirill A . Shutemov, Aneesh Kumar K.V, stable,
	Hugh Dickins, Kirill A . Shutemov

To avoid a race between rmap walk and mremap, mremap does take_rmap_locks().
The lock was taken to ensure that rmap walk don't miss a page table entry due to
PTE moves via move_pagetables(). The kernel does further optimization of
this lock such that if we are going to find the newly added vma after the
old vma, the rmap lock is not taken. This is because rmap walk would find the
vmas in the same order and if we don't find the page table attached to
older vma we would find it with the new vma which we would iterate later.

As explained in commit eb66ae030829 ("mremap: properly flush TLB before releasing the page")
mremap is special in that it doesn't take ownership of the page. The
optimized version for PUD/PMD aligned mremap also doesn't hold the ptl lock.
This can result in stale TLB entries as show below.

This patch updates the rmap locking requirement in mremap to handle the race condition
explained below with optimized mremap::

Optmized PMD move

    CPU 1                           CPU 2                                   CPU 3

    mremap(old_addr, new_addr)      page_shrinker/try_to_unmap_one

    mmap_write_lock_killable()

                                    addr = old_addr
                                    lock(pte_ptl)
    lock(pmd_ptl)
    pmd = *old_pmd
    pmd_clear(old_pmd)
    flush_tlb_range(old_addr)

    *new_pmd = pmd
                                                                            *new_addr = 10; and fills
                                                                            TLB with new addr
                                                                            and old pfn

    unlock(pmd_ptl)
                                    ptep_clear_flush()
                                    old pfn is free.
                                                                            Stale TLB entry

Optimized PUD move also suffers from a similar race.
Both the above race condition can be fixed if we force mremap path to take rmap lock.

Cc: stable@vger.kernel.org
Fixes: 2c91bd4a4e2e ("mm: speed up mremap by 20x on large regions")
Fixes: c49dd3401802 ("mm: speedup mremap on 1GB or larger regions")
Link: https://lore.kernel.org/linux-mm/CAHk-=wgXVR04eBNtxQfevontWnP6FDm+oj5vauQXP3S-huwbPw@mail.gmail.com
Acked-by: Hugh Dickins <hughd@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 mm/mremap.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/mremap.c b/mm/mremap.c
index 72fa0491681e..c3cad539a7aa 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -503,7 +503,7 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
 		} else if (IS_ENABLED(CONFIG_HAVE_MOVE_PUD) && extent == PUD_SIZE) {
 
 			if (move_pgt_entry(NORMAL_PUD, vma, old_addr, new_addr,
-					   old_pud, new_pud, need_rmap_locks))
+					   old_pud, new_pud, true))
 				continue;
 		}
 
@@ -530,7 +530,7 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
 			 * moving at the PMD level if possible.
 			 */
 			if (move_pgt_entry(NORMAL_PMD, vma, old_addr, new_addr,
-					   old_pmd, new_pmd, need_rmap_locks))
+					   old_pmd, new_pmd, true))
 				continue;
 		}
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 6/6] mm/mremap: hold the rmap lock in write mode when moving page table entries.
@ 2021-06-16  4:52   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 20+ messages in thread
From: Aneesh Kumar K.V @ 2021-06-16  4:52 UTC (permalink / raw)
  To: linux-mm, akpm
  Cc: Aneesh Kumar K.V, Hugh Dickins, Linus Torvalds, npiggin,
	kaleshsingh, joel, Kirill A . Shutemov, stable, linuxppc-dev,
	Kirill A . Shutemov

To avoid a race between rmap walk and mremap, mremap does take_rmap_locks().
The lock was taken to ensure that rmap walk don't miss a page table entry due to
PTE moves via move_pagetables(). The kernel does further optimization of
this lock such that if we are going to find the newly added vma after the
old vma, the rmap lock is not taken. This is because rmap walk would find the
vmas in the same order and if we don't find the page table attached to
older vma we would find it with the new vma which we would iterate later.

As explained in commit eb66ae030829 ("mremap: properly flush TLB before releasing the page")
mremap is special in that it doesn't take ownership of the page. The
optimized version for PUD/PMD aligned mremap also doesn't hold the ptl lock.
This can result in stale TLB entries as show below.

This patch updates the rmap locking requirement in mremap to handle the race condition
explained below with optimized mremap::

Optmized PMD move

    CPU 1                           CPU 2                                   CPU 3

    mremap(old_addr, new_addr)      page_shrinker/try_to_unmap_one

    mmap_write_lock_killable()

                                    addr = old_addr
                                    lock(pte_ptl)
    lock(pmd_ptl)
    pmd = *old_pmd
    pmd_clear(old_pmd)
    flush_tlb_range(old_addr)

    *new_pmd = pmd
                                                                            *new_addr = 10; and fills
                                                                            TLB with new addr
                                                                            and old pfn

    unlock(pmd_ptl)
                                    ptep_clear_flush()
                                    old pfn is free.
                                                                            Stale TLB entry

Optimized PUD move also suffers from a similar race.
Both the above race condition can be fixed if we force mremap path to take rmap lock.

Cc: stable@vger.kernel.org
Fixes: 2c91bd4a4e2e ("mm: speed up mremap by 20x on large regions")
Fixes: c49dd3401802 ("mm: speedup mremap on 1GB or larger regions")
Link: https://lore.kernel.org/linux-mm/CAHk-=wgXVR04eBNtxQfevontWnP6FDm+oj5vauQXP3S-huwbPw@mail.gmail.com
Acked-by: Hugh Dickins <hughd@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 mm/mremap.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/mremap.c b/mm/mremap.c
index 72fa0491681e..c3cad539a7aa 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -503,7 +503,7 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
 		} else if (IS_ENABLED(CONFIG_HAVE_MOVE_PUD) && extent == PUD_SIZE) {
 
 			if (move_pgt_entry(NORMAL_PUD, vma, old_addr, new_addr,
-					   old_pud, new_pud, need_rmap_locks))
+					   old_pud, new_pud, true))
 				continue;
 		}
 
@@ -530,7 +530,7 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
 			 * moving at the PMD level if possible.
 			 */
 			if (move_pgt_entry(NORMAL_PMD, vma, old_addr, new_addr,
-					   old_pmd, new_pmd, need_rmap_locks))
+					   old_pmd, new_pmd, true))
 				continue;
 		}
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/6] mrermap fixes
  2021-06-16  4:52 ` Aneesh Kumar K.V
@ 2021-06-16 14:39   ` Linus Torvalds
  -1 siblings, 0 replies; 20+ messages in thread
From: Linus Torvalds @ 2021-06-16 14:39 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Linux-MM, Andrew Morton, Michael Ellerman, linuxppc-dev,
	Kalesh Singh, Nick Piggin, Joel Fernandes, Christophe Leroy,
	Kirill A . Shutemov

On Tue, Jun 15, 2021 at 9:53 PM Aneesh Kumar K.V
<aneesh.kumar@linux.ibm.com> wrote:
>
> This patch series is split out series from [PATCH v7 00/11] Speedup mremap on ppc64
> (https://lore.kernel.org/linux-mm/20210607055131.156184-1-aneesh.kumar@linux.ibm.com)
> dropping ppc64 specific changes.

Both this and the followup powerpc enablement looks ok to me. Apart
from the obvious subject line bug ;)

Do we have robot confirmation that this version doesn't have any
performance regression?

              Linus


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/6] mrermap fixes
@ 2021-06-16 14:39   ` Linus Torvalds
  0 siblings, 0 replies; 20+ messages in thread
From: Linus Torvalds @ 2021-06-16 14:39 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Nick Piggin, Linux-MM, Kalesh Singh, Joel Fernandes,
	Kirill A . Shutemov, Andrew Morton, linuxppc-dev

On Tue, Jun 15, 2021 at 9:53 PM Aneesh Kumar K.V
<aneesh.kumar@linux.ibm.com> wrote:
>
> This patch series is split out series from [PATCH v7 00/11] Speedup mremap on ppc64
> (https://lore.kernel.org/linux-mm/20210607055131.156184-1-aneesh.kumar@linux.ibm.com)
> dropping ppc64 specific changes.

Both this and the followup powerpc enablement looks ok to me. Apart
from the obvious subject line bug ;)

Do we have robot confirmation that this version doesn't have any
performance regression?

              Linus

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/6] mrermap fixes
  2021-06-16  4:52 ` Aneesh Kumar K.V
@ 2021-06-17  1:00   ` Andrew Morton
  -1 siblings, 0 replies; 20+ messages in thread
From: Andrew Morton @ 2021-06-17  1:00 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, mpe, linuxppc-dev, kaleshsingh, npiggin, joel,
	Christophe Leroy, Linus Torvalds, Kirill A . Shutemov

On Wed, 16 Jun 2021 10:22:33 +0530 "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> wrote:

> This patch series is split out series from [PATCH v7 00/11] Speedup mremap on ppc64
> (https://lore.kernel.org/linux-mm/20210607055131.156184-1-aneesh.kumar@linux.ibm.com)
> dropping ppc64 specific changes.
> 
> This patchset is dependent on
> https://lore.kernel.org/linux-mm/20210615110859.320299-1-aneesh.kumar@linux.ibm.com

Which I just dropped because of all the build breakages :(


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/6] mrermap fixes
@ 2021-06-17  1:00   ` Andrew Morton
  0 siblings, 0 replies; 20+ messages in thread
From: Andrew Morton @ 2021-06-17  1:00 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Linus Torvalds, npiggin, linux-mm, kaleshsingh, joel,
	Kirill A . Shutemov, linuxppc-dev

On Wed, 16 Jun 2021 10:22:33 +0530 "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> wrote:

> This patch series is split out series from [PATCH v7 00/11] Speedup mremap on ppc64
> (https://lore.kernel.org/linux-mm/20210607055131.156184-1-aneesh.kumar@linux.ibm.com)
> dropping ppc64 specific changes.
> 
> This patchset is dependent on
> https://lore.kernel.org/linux-mm/20210615110859.320299-1-aneesh.kumar@linux.ibm.com

Which I just dropped because of all the build breakages :(

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 6/6] mm/mremap: hold the rmap lock in write mode when moving page table entries.
  2021-06-16  4:52   ` Aneesh Kumar K.V
@ 2021-06-17  1:43     ` Andrew Morton
  -1 siblings, 0 replies; 20+ messages in thread
From: Andrew Morton @ 2021-06-17  1:43 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, mpe, linuxppc-dev, kaleshsingh, npiggin, joel,
	Christophe Leroy, Linus Torvalds, Kirill A . Shutemov, stable,
	Hugh Dickins, Kirill A . Shutemov

On Wed, 16 Jun 2021 10:22:39 +0530 "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> wrote:

> To avoid a race between rmap walk and mremap, mremap does take_rmap_locks().
> The lock was taken to ensure that rmap walk don't miss a page table entry due to
> PTE moves via move_pagetables(). The kernel does further optimization of
> this lock such that if we are going to find the newly added vma after the
> old vma, the rmap lock is not taken. This is because rmap walk would find the
> vmas in the same order and if we don't find the page table attached to
> older vma we would find it with the new vma which we would iterate later.
> 
> As explained in commit eb66ae030829 ("mremap: properly flush TLB before releasing the page")
> mremap is special in that it doesn't take ownership of the page. The
> optimized version for PUD/PMD aligned mremap also doesn't hold the ptl lock.
> This can result in stale TLB entries as show below.
> 
> ...
>
> Cc: stable@vger.kernel.org

Sneaking a -stable patch into the middle of all of this was ... sneaky :(

It doesn't actually apply to current mainline either.

I think I'll pretend I didn't notice.  Please sort this out with Greg
when he reports this back to you.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 6/6] mm/mremap: hold the rmap lock in write mode when moving page table entries.
@ 2021-06-17  1:43     ` Andrew Morton
  0 siblings, 0 replies; 20+ messages in thread
From: Andrew Morton @ 2021-06-17  1:43 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Linus Torvalds, Hugh Dickins, npiggin, linux-mm, kaleshsingh,
	joel, Kirill A . Shutemov, stable, linuxppc-dev,
	Kirill A . Shutemov

On Wed, 16 Jun 2021 10:22:39 +0530 "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> wrote:

> To avoid a race between rmap walk and mremap, mremap does take_rmap_locks().
> The lock was taken to ensure that rmap walk don't miss a page table entry due to
> PTE moves via move_pagetables(). The kernel does further optimization of
> this lock such that if we are going to find the newly added vma after the
> old vma, the rmap lock is not taken. This is because rmap walk would find the
> vmas in the same order and if we don't find the page table attached to
> older vma we would find it with the new vma which we would iterate later.
> 
> As explained in commit eb66ae030829 ("mremap: properly flush TLB before releasing the page")
> mremap is special in that it doesn't take ownership of the page. The
> optimized version for PUD/PMD aligned mremap also doesn't hold the ptl lock.
> This can result in stale TLB entries as show below.
> 
> ...
>
> Cc: stable@vger.kernel.org

Sneaking a -stable patch into the middle of all of this was ... sneaky :(

It doesn't actually apply to current mainline either.

I think I'll pretend I didn't notice.  Please sort this out with Greg
when he reports this back to you.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2021-06-17  1:43 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-16  4:52 [PATCH v2 0/6] mrermap fixes Aneesh Kumar K.V
2021-06-16  4:52 ` Aneesh Kumar K.V
2021-06-16  4:52 ` [PATCH v2 1/6] selftest/mremap_test: Update the test to handle pagesize other than 4K Aneesh Kumar K.V
2021-06-16  4:52   ` Aneesh Kumar K.V
2021-06-16  4:52 ` [PATCH v2 2/6] selftest/mremap_test: Avoid crash with static build Aneesh Kumar K.V
2021-06-16  4:52   ` Aneesh Kumar K.V
2021-06-16  4:52 ` [PATCH v2 3/6] mm/mremap: Convert huge PUD move to separate helper Aneesh Kumar K.V
2021-06-16  4:52   ` Aneesh Kumar K.V
2021-06-16  4:52 ` [PATCH v2 4/6] mm/mremap: Don't enable optimized PUD move if page table levels is 2 Aneesh Kumar K.V
2021-06-16  4:52   ` Aneesh Kumar K.V
2021-06-16  4:52 ` [PATCH v2 5/6] mm/mremap: Use pmd/pud_poplulate to update page table entries Aneesh Kumar K.V
2021-06-16  4:52   ` Aneesh Kumar K.V
2021-06-16  4:52 ` [PATCH v2 6/6] mm/mremap: hold the rmap lock in write mode when moving " Aneesh Kumar K.V
2021-06-16  4:52   ` Aneesh Kumar K.V
2021-06-17  1:43   ` Andrew Morton
2021-06-17  1:43     ` Andrew Morton
2021-06-16 14:39 ` [PATCH v2 0/6] mrermap fixes Linus Torvalds
2021-06-16 14:39   ` Linus Torvalds
2021-06-17  1:00 ` Andrew Morton
2021-06-17  1:00   ` Andrew Morton

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.