linux-kselftest.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/5] Speed up mremap on large regions
@ 2020-10-14  0:53 Kalesh Singh
  2020-10-14  0:53 ` [PATCH v4 1/5] kselftests: vm: Add mremap tests Kalesh Singh
                   ` (5 more replies)
  0 siblings, 6 replies; 12+ messages in thread
From: Kalesh Singh @ 2020-10-14  0:53 UTC (permalink / raw)
  Cc: surenb, minchan, joelaf, lokeshgidra, kaleshsingh, kernel-team,
	Catalin Marinas, Will Deacon, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H. Peter Anvin, Andrew Morton, Shuah Khan,
	Peter Zijlstra, Aneesh Kumar K.V, Kees Cook, Arnd Bergmann,
	Josh Poimboeuf, Sami Tolvanen, Frederic Weisbecker,
	Krzysztof Kozlowski, Hassan Naveed, Christian Brauner,
	Stephen Boyd, Anshuman Khandual, Mark Brown, Gavin Shan,
	Mike Rapoport, Steven Price, Jia He, John Hubbard,
	Masahiro Yamada, Ralph Campbell, Kirill A. Shutemov,
	Mina Almasry, Sandipan Das, Dave Hansen, Masami Hiramatsu,
	Brian Geffon, Jason Gunthorpe, SeongJae Park, linux-kernel,
	linux-arm-kernel, linux-mm, linux-kselftest

This is a repost of the mremap speed up patches, adding Kirill's
Acked-by's (from a separate discussion). The previous versions are
posted at:
v1 - https://lore.kernel.org/r/20200930222130.4175584-1-kaleshsingh@google.com
v2 - https://lore.kernel.org/r/20201002162101.665549-1-kaleshsingh@google.com
v3 - http://lore.kernel.org/r/20201005154017.474722-1-kaleshsingh@google.com

mremap time can be optimized by moving entries at the PMD/PUD level if
the source and destination addresses are PMD/PUD-aligned and
PMD/PUD-sized. Enable moving at the PMD and PUD levels on arm64 and
x86. Other architectures where this type of move is supported and known to
be safe can also opt-in to these optimizations by enabling HAVE_MOVE_PMD
and HAVE_MOVE_PUD.

Observed Performance Improvements for remapping a PUD-aligned 1GB-sized
region on x86 and arm64:

    - HAVE_MOVE_PMD is already enabled on x86 : N/A
    - Enabling HAVE_MOVE_PUD on x86   : ~13x speed up

    - Enabling HAVE_MOVE_PMD on arm64 : ~ 8x speed up
    - Enabling HAVE_MOVE_PUD on arm64 : ~19x speed up

          Altogether, HAVE_MOVE_PMD and HAVE_MOVE_PUD
          give a total of ~150x speed up on arm64.

Changes in v2:
  - Reduce mremap_test time by only validating a configurable
    threshold of the remapped region, as per John.
  - Use a random pattern for mremap validation. Provide pattern
    seed in test output, as per John.
  - Moved set_pud_at() to separate patch, per Kirill.
  - Use switch() instead of ifs in move_pgt_entry(), per Kirill.
  - Update commit message with description of Android
    garbage collector use case for HAVE_MOVE_PUD, as per Joel.
  - Fix build test error reported by kernel test robot in [1].

Changes in v3:
  - Make lines 80 cols or less where they don’t need to be longer,
    per John.
  - Removed unused PATTERN_SIZE in mremap_test
  - Added Reviewed-by tag for patch 1/5 (mremap kselftest patch).
  - Use switch() instead of ifs in get_extent(), per Kirill
  - Add BUILD_BUG() is get_extent() default case.
  - Move get_old_pud() and alloc_new_pud() out of
    #ifdef CONFIG_HAVE_MOVE_PUD, per Kirill.
  - Have get_old_pmd() and alloc_new_pmd() use get_old_pud() and
    alloc_old_pud(), per Kirill.
  - Replace #ifdef CONFIG_HAVE_MOVE_PMD / PUD in move_page_tables()
    with IS_ENABLED(CONFIG_HAVE_MOVE_PMD / PUD), per Kirill.
  - Fold Add set_pud_at() patch into patch 4/5, per Kirill.

[1] https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org/thread/CKPGL4FH4NG7TGH2CVYX2UX76L25BTA3/

Kalesh Singh (5):
  kselftests: vm: Add mremap tests
  arm64: mremap speedup - Enable HAVE_MOVE_PMD
  mm: Speedup mremap on 1GB or larger regions
  arm64: mremap speedup - Enable HAVE_MOVE_PUD
  x86: mremap speedup - Enable HAVE_MOVE_PUD

 arch/Kconfig                             |   7 +
 arch/arm64/Kconfig                       |   2 +
 arch/arm64/include/asm/pgtable.h         |   1 +
 arch/x86/Kconfig                         |   1 +
 mm/mremap.c                              | 230 ++++++++++++---
 tools/testing/selftests/vm/.gitignore    |   1 +
 tools/testing/selftests/vm/Makefile      |   1 +
 tools/testing/selftests/vm/mremap_test.c | 344 +++++++++++++++++++++++
 tools/testing/selftests/vm/run_vmtests   |  11 +
 9 files changed, 558 insertions(+), 40 deletions(-)
 create mode 100644 tools/testing/selftests/vm/mremap_test.c

-- 
2.28.0.1011.ga647a8990f-goog


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v4 1/5] kselftests: vm: Add mremap tests
  2020-10-14  0:53 [PATCH v4 0/5] Speed up mremap on large regions Kalesh Singh
@ 2020-10-14  0:53 ` Kalesh Singh
  2020-10-14 19:02   ` Kalesh Singh
  2020-10-14  0:53 ` [PATCH v4 2/5] arm64: mremap speedup - Enable HAVE_MOVE_PMD Kalesh Singh
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: Kalesh Singh @ 2020-10-14  0:53 UTC (permalink / raw)
  Cc: surenb, minchan, joelaf, lokeshgidra, kaleshsingh, kernel-team,
	John Hubbard, Shuah Khan, Andrew Morton, Kirill A . Shutemov,
	Catalin Marinas, Will Deacon, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H. Peter Anvin, Peter Zijlstra, Kees Cook,
	Aneesh Kumar K.V, Masahiro Yamada, Josh Poimboeuf, Sami Tolvanen,
	Krzysztof Kozlowski, Frederic Weisbecker, Hassan Naveed,
	Arnd Bergmann, Christian Brauner, Anshuman Khandual,
	Mike Rapoport, Gavin Shan, Steven Price, Jia He, Ralph Campbell,
	Zi Yan, Mina Almasry, Ram Pai, Sandipan Das, Dave Hansen,
	Masami Hiramatsu, Brian Geffon, SeongJae Park, linux-kernel,
	linux-arm-kernel, linux-mm, linux-kselftest

Test mremap on regions of various sizes and alignments and validate
data after remapping. Also provide total time for remapping
the region which is useful for performance comparison of the mremap
optimizations that move pages at the PMD/PUD levels if HAVE_MOVE_PMD
and/or HAVE_MOVE_PUD are enabled.

Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
Changes in v2:
  - Reduce test time by only validating a certain threshold of the
    remapped region (4MB by default). The -t flag can be used to
    set a custom threshold in MB or no threshold by passing 0. (-t0).
    mremap time is not provided in stdout for only partially validated
    regions. This time is only applicable for comparison if the entire
    mapped region was faulted in.
  - Use a random pattern for validating the remapped region. The -p
    flag can be used to run the tests with a specified seed for the
    random pattern.
  - Print test configs (threshold_mb and pattern_seed) to stdout.
  - Remove MAKE_SIMPLE_TEST macro.
  - Define named flags instead of 0 / 1.
  - Add comments for destination address' align_mask and offset.

Changes in v3:
  - Remove unused PATTERN_SIZE definition.
  - Make lines 80 cols or less where they don’t need to be longer.
  - Add John Hubbard’s Reviewed-by tag.

 tools/testing/selftests/vm/.gitignore    |   1 +
 tools/testing/selftests/vm/Makefile      |   1 +
 tools/testing/selftests/vm/mremap_test.c | 344 +++++++++++++++++++++++
 tools/testing/selftests/vm/run_vmtests   |  11 +
 4 files changed, 357 insertions(+)
 create mode 100644 tools/testing/selftests/vm/mremap_test.c

diff --git a/tools/testing/selftests/vm/.gitignore b/tools/testing/selftests/vm/.gitignore
index 849e8226395a..b3a183c36cb5 100644
--- a/tools/testing/selftests/vm/.gitignore
+++ b/tools/testing/selftests/vm/.gitignore
@@ -8,6 +8,7 @@ thuge-gen
 compaction_test
 mlock2-tests
 mremap_dontunmap
+mremap_test
 on-fault-limit
 transhuge-stress
 protection_keys
diff --git a/tools/testing/selftests/vm/Makefile b/tools/testing/selftests/vm/Makefile
index a9026706d597..f044808b45fa 100644
--- a/tools/testing/selftests/vm/Makefile
+++ b/tools/testing/selftests/vm/Makefile
@@ -16,6 +16,7 @@ TEST_GEN_FILES += map_populate
 TEST_GEN_FILES += mlock-random-test
 TEST_GEN_FILES += mlock2-tests
 TEST_GEN_FILES += mremap_dontunmap
+TEST_GEN_FILES += mremap_test
 TEST_GEN_FILES += on-fault-limit
 TEST_GEN_FILES += thuge-gen
 TEST_GEN_FILES += transhuge-stress
diff --git a/tools/testing/selftests/vm/mremap_test.c b/tools/testing/selftests/vm/mremap_test.c
new file mode 100644
index 000000000000..9c391d016922
--- /dev/null
+++ b/tools/testing/selftests/vm/mremap_test.c
@@ -0,0 +1,344 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2020 Google LLC
+ */
+#define _GNU_SOURCE
+
+#include <errno.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/mman.h>
+#include <time.h>
+
+#include "../kselftest.h"
+
+#define EXPECT_SUCCESS 0
+#define EXPECT_FAILURE 1
+#define NON_OVERLAPPING 0
+#define OVERLAPPING 1
+#define NS_PER_SEC 1000000000ULL
+#define VALIDATION_DEFAULT_THRESHOLD 4	/* 4MB */
+#define VALIDATION_NO_THRESHOLD 0	/* Verify the entire region */
+
+#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
+#define MIN(X, Y) ((X) < (Y) ? (X) : (Y))
+
+struct config {
+	unsigned long long src_alignment;
+	unsigned long long dest_alignment;
+	unsigned long long region_size;
+	int overlapping;
+};
+
+struct test {
+	const char *name;
+	struct config config;
+	int expect_failure;
+};
+
+enum {
+	_1KB = 1ULL << 10,	/* 1KB -> not page aligned */
+	_4KB = 4ULL << 10,
+	_8KB = 8ULL << 10,
+	_1MB = 1ULL << 20,
+	_2MB = 2ULL << 20,
+	_4MB = 4ULL << 20,
+	_1GB = 1ULL << 30,
+	_2GB = 2ULL << 30,
+	PTE = _4KB,
+	PMD = _2MB,
+	PUD = _1GB,
+};
+
+#define MAKE_TEST(source_align, destination_align, size,	\
+		  overlaps, should_fail, test_name)		\
+{								\
+	.name = test_name,					\
+	.config = {						\
+		.src_alignment = source_align,			\
+		.dest_alignment = destination_align,		\
+		.region_size = size,				\
+		.overlapping = overlaps,			\
+	},							\
+	.expect_failure = should_fail				\
+}
+
+/*
+ * Returns the start address of the mapping on success, else returns
+ * NULL on failure.
+ */
+static void *get_source_mapping(struct config c)
+{
+	unsigned long long addr = 0ULL;
+	void *src_addr = NULL;
+retry:
+	addr += c.src_alignment;
+	src_addr = mmap((void *) addr, c.region_size, PROT_READ | PROT_WRITE,
+			MAP_FIXED | MAP_ANONYMOUS | MAP_SHARED, -1, 0);
+	if (src_addr == MAP_FAILED) {
+		if (errno == EPERM)
+			goto retry;
+		goto error;
+	}
+	/*
+	 * Check that the address is aligned to the specified alignment.
+	 * Addresses which have alignments that are multiples of that
+	 * specified are not considered valid. For instance, 1GB address is
+	 * 2MB-aligned, however it will not be considered valid for a
+	 * requested alignment of 2MB. This is done to reduce coincidental
+	 * alignment in the tests.
+	 */
+	if (((unsigned long long) src_addr & (c.src_alignment - 1)) ||
+			!((unsigned long long) src_addr & c.src_alignment))
+		goto retry;
+
+	if (!src_addr)
+		goto error;
+
+	return src_addr;
+error:
+	ksft_print_msg("Failed to map source region: %s\n",
+			strerror(errno));
+	return NULL;
+}
+
+/* Returns the time taken for the remap on success else returns -1. */
+static long long remap_region(struct config c, unsigned int threshold_mb,
+			      char pattern_seed)
+{
+	void *addr, *src_addr, *dest_addr;
+	unsigned long long i;
+	struct timespec t_start = {0, 0}, t_end = {0, 0};
+	long long  start_ns, end_ns, align_mask, ret, offset;
+	unsigned long long threshold;
+
+	if (threshold_mb == VALIDATION_NO_THRESHOLD)
+		threshold = c.region_size;
+	else
+		threshold = MIN(threshold_mb * _1MB, c.region_size);
+
+	src_addr = get_source_mapping(c);
+	if (!src_addr) {
+		ret = -1;
+		goto out;
+	}
+
+	/* Set byte pattern */
+	srand(pattern_seed);
+	for (i = 0; i < threshold; i++)
+		memset((char *) src_addr + i, (char) rand(), 1);
+
+	/* Mask to zero out lower bits of address for alignment */
+	align_mask = ~(c.dest_alignment - 1);
+	/* Offset of destination address from the end of the source region */
+	offset = (c.overlapping) ? -c.dest_alignment : c.dest_alignment;
+	addr = (void *) (((unsigned long long) src_addr + c.region_size
+			  + offset) & align_mask);
+
+	/* See comment in get_source_mapping() */
+	if (!((unsigned long long) addr & c.dest_alignment))
+		addr = (void *) ((unsigned long long) addr | c.dest_alignment);
+
+	clock_gettime(CLOCK_MONOTONIC, &t_start);
+	dest_addr = mremap(src_addr, c.region_size, c.region_size,
+			MREMAP_MAYMOVE|MREMAP_FIXED, (char *) addr);
+	clock_gettime(CLOCK_MONOTONIC, &t_end);
+
+	if (dest_addr == MAP_FAILED) {
+		ksft_print_msg("mremap failed: %s\n", strerror(errno));
+		ret = -1;
+		goto clean_up_src;
+	}
+
+	/* Verify byte pattern after remapping */
+	srand(pattern_seed);
+	for (i = 0; i < threshold; i++) {
+		char c = (char) rand();
+
+		if (((char *) dest_addr)[i] != c) {
+			ksft_print_msg("Data after remap doesn't match at offset %d\n",
+				       i);
+			ksft_print_msg("Expected: %#x\t Got: %#x\n", c & 0xff,
+					((char *) dest_addr)[i] & 0xff);
+			ret = -1;
+			goto clean_up_dest;
+		}
+	}
+
+	start_ns = t_start.tv_sec * NS_PER_SEC + t_start.tv_nsec;
+	end_ns = t_end.tv_sec * NS_PER_SEC + t_end.tv_nsec;
+	ret = end_ns - start_ns;
+
+/*
+ * Since the destination address is specified using MREMAP_FIXED, subsequent
+ * mremap will unmap any previous mapping at the address range specified by
+ * dest_addr and region_size. This significantly affects the remap time of
+ * subsequent tests. So we clean up mappings after each test.
+ */
+clean_up_dest:
+	munmap(dest_addr, c.region_size);
+clean_up_src:
+	munmap(src_addr, c.region_size);
+out:
+	return ret;
+}
+
+static void run_mremap_test_case(struct test test_case, int *failures,
+				 unsigned int threshold_mb,
+				 unsigned int pattern_seed)
+{
+	long long remap_time = remap_region(test_case.config, threshold_mb,
+					    pattern_seed);
+
+	if (remap_time < 0) {
+		if (test_case.expect_failure)
+			ksft_test_result_pass("%s\n\tExpected mremap failure\n",
+					      test_case.name);
+		else {
+			ksft_test_result_fail("%s\n", test_case.name);
+			*failures += 1;
+		}
+	} else {
+		/*
+		 * Comparing mremap time is only applicable if entire region
+		 * was faulted in.
+		 */
+		if (threshold_mb == VALIDATION_NO_THRESHOLD ||
+		    test_case.config.region_size <= threshold_mb * _1MB)
+			ksft_test_result_pass("%s\n\tmremap time: %12lldns\n",
+					      test_case.name, remap_time);
+		else
+			ksft_test_result_pass("%s\n", test_case.name);
+	}
+}
+
+static void usage(const char *cmd)
+{
+	fprintf(stderr,
+		"Usage: %s [[-t <threshold_mb>] [-p <pattern_seed>]]\n"
+		"-t\t only validate threshold_mb of the remapped region\n"
+		"  \t if 0 is supplied no threshold is used; all tests\n"
+		"  \t are run and remapped regions validated fully.\n"
+		"  \t The default threshold used is 4MB.\n"
+		"-p\t provide a seed to generate the random pattern for\n"
+		"  \t validating the remapped region.\n", cmd);
+}
+
+static int parse_args(int argc, char **argv, unsigned int *threshold_mb,
+		      unsigned int *pattern_seed)
+{
+	const char *optstr = "t:p:";
+	int opt;
+
+	while ((opt = getopt(argc, argv, optstr)) != -1) {
+		switch (opt) {
+		case 't':
+			*threshold_mb = atoi(optarg);
+			break;
+		case 'p':
+			*pattern_seed = atoi(optarg);
+			break;
+		default:
+			usage(argv[0]);
+			return -1;
+		}
+	}
+
+	if (optind < argc) {
+		usage(argv[0]);
+		return -1;
+	}
+
+	return 0;
+}
+
+int main(int argc, char **argv)
+{
+	int failures = 0;
+	int i, run_perf_tests;
+	unsigned int threshold_mb = VALIDATION_DEFAULT_THRESHOLD;
+	unsigned int pattern_seed;
+	time_t t;
+
+	pattern_seed = (unsigned int) time(&t);
+
+	if (parse_args(argc, argv, &threshold_mb, &pattern_seed) < 0)
+		exit(EXIT_FAILURE);
+
+	ksft_print_msg("Test configs:\n\tthreshold_mb=%u\n\tpattern_seed=%u\n\n",
+		       threshold_mb, pattern_seed);
+
+	struct test test_cases[] = {
+		/* Expected mremap failures */
+		MAKE_TEST(_4KB, _4KB, _4KB, OVERLAPPING, EXPECT_FAILURE,
+		  "mremap - Source and Destination Regions Overlapping"),
+		MAKE_TEST(_4KB, _1KB, _4KB, NON_OVERLAPPING, EXPECT_FAILURE,
+		  "mremap - Destination Address Misaligned (1KB-aligned)"),
+		MAKE_TEST(_1KB, _4KB, _4KB, NON_OVERLAPPING, EXPECT_FAILURE,
+		  "mremap - Source Address Misaligned (1KB-aligned)"),
+
+		/* Src addr PTE aligned */
+		MAKE_TEST(PTE, PTE, _8KB, NON_OVERLAPPING, EXPECT_SUCCESS,
+		  "8KB mremap - Source PTE-aligned, Destination PTE-aligned"),
+
+		/* Src addr 1MB aligned */
+		MAKE_TEST(_1MB, PTE, _2MB, NON_OVERLAPPING, EXPECT_SUCCESS,
+		  "2MB mremap - Source 1MB-aligned, Destination PTE-aligned"),
+		MAKE_TEST(_1MB, _1MB, _2MB, NON_OVERLAPPING, EXPECT_SUCCESS,
+		  "2MB mremap - Source 1MB-aligned, Destination 1MB-aligned"),
+
+		/* Src addr PMD aligned */
+		MAKE_TEST(PMD, PTE, _4MB, NON_OVERLAPPING, EXPECT_SUCCESS,
+		  "4MB mremap - Source PMD-aligned, Destination PTE-aligned"),
+		MAKE_TEST(PMD, _1MB, _4MB, NON_OVERLAPPING, EXPECT_SUCCESS,
+		  "4MB mremap - Source PMD-aligned, Destination 1MB-aligned"),
+		MAKE_TEST(PMD, PMD, _4MB, NON_OVERLAPPING, EXPECT_SUCCESS,
+		  "4MB mremap - Source PMD-aligned, Destination PMD-aligned"),
+
+		/* Src addr PUD aligned */
+		MAKE_TEST(PUD, PTE, _2GB, NON_OVERLAPPING, EXPECT_SUCCESS,
+		  "2GB mremap - Source PUD-aligned, Destination PTE-aligned"),
+		MAKE_TEST(PUD, _1MB, _2GB, NON_OVERLAPPING, EXPECT_SUCCESS,
+		  "2GB mremap - Source PUD-aligned, Destination 1MB-aligned"),
+		MAKE_TEST(PUD, PMD, _2GB, NON_OVERLAPPING, EXPECT_SUCCESS,
+		  "2GB mremap - Source PUD-aligned, Destination PMD-aligned"),
+		MAKE_TEST(PUD, PUD, _2GB, NON_OVERLAPPING, EXPECT_SUCCESS,
+		  "2GB mremap - Source PUD-aligned, Destination PUD-aligned"),
+	};
+
+	struct test perf_test_cases[] = {
+		/*
+		 * mremap 1GB region - Page table level aligned time
+		 * comparison.
+		 */
+		MAKE_TEST(PTE, PTE, _1GB, NON_OVERLAPPING, EXPECT_SUCCESS,
+		  "1GB mremap - Source PTE-aligned, Destination PTE-aligned"),
+		MAKE_TEST(PMD, PMD, _1GB, NON_OVERLAPPING, EXPECT_SUCCESS,
+		  "1GB mremap - Source PMD-aligned, Destination PMD-aligned"),
+		MAKE_TEST(PUD, PUD, _1GB, NON_OVERLAPPING, EXPECT_SUCCESS,
+		  "1GB mremap - Source PUD-aligned, Destination PUD-aligned"),
+	};
+
+	run_perf_tests =  (threshold_mb == VALIDATION_NO_THRESHOLD) ||
+				(threshold_mb * _1MB >= _1GB);
+
+	ksft_set_plan(ARRAY_SIZE(test_cases) + (run_perf_tests ?
+		      ARRAY_SIZE(perf_test_cases) : 0));
+
+	for (i = 0; i < ARRAY_SIZE(test_cases); i++)
+		run_mremap_test_case(test_cases[i], &failures, threshold_mb,
+				     pattern_seed);
+
+	if (run_perf_tests) {
+		ksft_print_msg("\n%s\n",
+		 "mremap HAVE_MOVE_PMD/PUD optimization time comparison for 1GB region:");
+		for (i = 0; i < ARRAY_SIZE(perf_test_cases); i++)
+			run_mremap_test_case(perf_test_cases[i], &failures,
+					     threshold_mb, pattern_seed);
+	}
+
+	if (failures > 0)
+		ksft_exit_fail();
+	else
+		ksft_exit_pass();
+}
diff --git a/tools/testing/selftests/vm/run_vmtests b/tools/testing/selftests/vm/run_vmtests
index a3f4f30f0a2e..d578ad831813 100755
--- a/tools/testing/selftests/vm/run_vmtests
+++ b/tools/testing/selftests/vm/run_vmtests
@@ -241,6 +241,17 @@ else
 	echo "[PASS]"
 fi
 
+echo "-------------------"
+echo "running mremap_test"
+echo "-------------------"
+./mremap_test
+if [ $? -ne 0 ]; then
+	echo "[FAIL]"
+	exitcode=1
+else
+	echo "[PASS]"
+fi
+
 echo "-----------------"
 echo "running thuge-gen"
 echo "-----------------"
-- 
2.28.0.1011.ga647a8990f-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 2/5] arm64: mremap speedup - Enable HAVE_MOVE_PMD
  2020-10-14  0:53 [PATCH v4 0/5] Speed up mremap on large regions Kalesh Singh
  2020-10-14  0:53 ` [PATCH v4 1/5] kselftests: vm: Add mremap tests Kalesh Singh
@ 2020-10-14  0:53 ` Kalesh Singh
  2020-10-15 10:55   ` Will Deacon
  2020-10-14  0:53 ` [PATCH v4 3/5] mm: Speedup mremap on 1GB or larger regions Kalesh Singh
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: Kalesh Singh @ 2020-10-14  0:53 UTC (permalink / raw)
  Cc: surenb, minchan, joelaf, lokeshgidra, kaleshsingh, kernel-team,
	Kirill A . Shutemov, Catalin Marinas, Will Deacon, Andrew Morton,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin, Shuah Khan, Peter Zijlstra, Kees Cook,
	Aneesh Kumar K.V, Sami Tolvanen, Masahiro Yamada, Josh Poimboeuf,
	Frederic Weisbecker, Krzysztof Kozlowski, Hassan Naveed,
	Arnd Bergmann, Christian Brauner, Anshuman Khandual, Mark Brown,
	Gavin Shan, Mike Rapoport, Steven Price, Jia He, John Hubbard,
	Mike Kravetz, Greg Kroah-Hartman, Ram Pai, Mina Almasry,
	Ralph Campbell, Sandipan Das, Dave Hansen, Masami Hiramatsu,
	Jason Gunthorpe, Brian Geffon, SeongJae Park, linux-kernel,
	linux-arm-kernel, linux-mm, linux-kselftest

HAVE_MOVE_PMD enables remapping pages at the PMD level if both the
source and destination addresses are PMD-aligned.

HAVE_MOVE_PMD is already enabled on x86. The original patch [1] that
introduced this config did not enable it on arm64 at the time because
of performance issues with flushing the TLB on every PMD move. These
issues have since been addressed in more recent releases with
improvements to the arm64 TLB invalidation and core mmu_gather code as
Will Deacon mentioned in [2].

From the data below, it can be inferred that there is approximately
8x improvement in performance when HAVE_MOVE_PMD is enabled on arm64.

--------- Test Results ----------

The following results were obtained on an arm64 device running a 5.4
kernel, by remapping a PMD-aligned, 1GB sized region to a PMD-aligned
destination. The results from 10 iterations of the test are given below.
All times are in nanoseconds.

Control    HAVE_MOVE_PMD

9220833    1247761
9002552    1219896
9254115    1094792
8725885    1227760
9308646    1043698
9001667    1101771
8793385    1159896
8774636    1143594
9553125    1025833
9374010    1078125

9100885.4  1134312.6    <-- Mean Time in nanoseconds

Total mremap time for a 1GB sized PMD-aligned region drops from
~9.1 milliseconds to ~1.1 milliseconds. (~8x speedup).

[1] https://lore.kernel.org/r/20181108181201.88826-3-joelaf@google.com
[2] https://www.mail-archive.com/linuxppc-dev@lists.ozlabs.org/msg140837.html

Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
---
Changes in v4:
  - Add Kirill's Acked-by.

 arch/arm64/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 4b136e923ccb..434d6791e869 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -123,6 +123,7 @@ config ARM64
 	select GENERIC_VDSO_TIME_NS
 	select HANDLE_DOMAIN_IRQ
 	select HARDIRQS_SW_RESEND
+	select HAVE_MOVE_PMD
 	select HAVE_PCI
 	select HAVE_ACPI_APEI if (ACPI && EFI)
 	select HAVE_ALIGNED_STRUCT_PAGE if SLUB
-- 
2.28.0.1011.ga647a8990f-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 3/5] mm: Speedup mremap on 1GB or larger regions
  2020-10-14  0:53 [PATCH v4 0/5] Speed up mremap on large regions Kalesh Singh
  2020-10-14  0:53 ` [PATCH v4 1/5] kselftests: vm: Add mremap tests Kalesh Singh
  2020-10-14  0:53 ` [PATCH v4 2/5] arm64: mremap speedup - Enable HAVE_MOVE_PMD Kalesh Singh
@ 2020-10-14  0:53 ` Kalesh Singh
  2020-12-17 17:28   ` Guenter Roeck
  2020-10-14  0:53 ` [PATCH v4 4/5] arm64: mremap speedup - Enable HAVE_MOVE_PUD Kalesh Singh
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: Kalesh Singh @ 2020-10-14  0:53 UTC (permalink / raw)
  Cc: surenb, minchan, joelaf, lokeshgidra, kaleshsingh, kernel-team,
	kernel test robot, Kirill A . Shutemov, Andrew Morton,
	Catalin Marinas, Will Deacon, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H. Peter Anvin, Shuah Khan, Peter Zijlstra,
	Aneesh Kumar K.V, Kees Cook, Masahiro Yamada, Arnd Bergmann,
	Sami Tolvanen, Josh Poimboeuf, Frederic Weisbecker,
	Krzysztof Kozlowski, Hassan Naveed, Christian Brauner,
	Stephen Boyd, Anshuman Khandual, Gavin Shan, Mike Rapoport,
	Steven Price, Jia He, John Hubbard, Jason Gunthorpe, Yang Shi,
	Mina Almasry, Ralph Campbell, Ram Pai, Sandipan Das, Dave Hansen,
	Brian Geffon, Masami Hiramatsu, Ira Weiny, SeongJae Park,
	linux-kernel, linux-arm-kernel, linux-mm, linux-kselftest

Android needs to move large memory regions for garbage collection.
The GC requires moving physical pages of multi-gigabyte heap
using mremap. During this move, the application threads have to
be paused for correctness. It is critical to keep this pause as
short as possible to avoid jitters during user interaction.

Optimize mremap for >= 1GB-sized regions by moving at the PUD/PGD
level if the source and destination addresses are PUD-aligned.
For CONFIG_PGTABLE_LEVELS == 3, moving at the PUD level in effect moves
PGD entries, since the PUD entry is “folded back” onto the PGD entry.
Add HAVE_MOVE_PUD so that architectures where moving at the PUD level
isn't supported/tested can turn this off by not selecting the config.

Fix build test error from v1 of this series reported by
kernel test robot in [1].

[1] https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org/thread/CKPGL4FH4NG7TGH2CVYX2UX76L25BTA3/

Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Reported-by: kernel test robot <lkp@intel.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
---
Changes in v2:
  - Update commit message with description of Android GC's use case.
  - Move set_pud_at() to a separate patch.
  - Use switch() instead of ifs in move_pgt_entry()
  - Fix build test error reported by kernel test robot on x86_64 in [1].
    Guard move_huge_pmd() with IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE),
    since this section doesn't get optimized out in the kernel test
    robot's build test when HAVE_MOVE_PUD is enabled.
  - Keep WARN_ON_ONCE(1) instead of BUILD_BUG() for the aforementioned
    reason.

Changes in v3:
  - Move get_old_pud() and alloc_new_pud() out of
    #ifdef CONFIG_HAVE_MOVE_PUD.
  - Have get_old_pmd() and alloc_new_pmd() use get_old_pud() and
    alloc_old_pud().
  - Use switch() in get_extent() instead of ifs.
  - Add BUILD_BUG() to default case of get_extent().
  - Replace #ifdef CONFIG_HAVE_MOVE_PMD/PUD in move_page_tables() with
    IS_ENABLED(CONFIG_HAVE_MOVE_PMD/PUD).
  - Make lines 80 cols or less, where they don’t need to be longer.
  - s/=  /= /g (Fixed double spaces after '=').

Changes in v4:
  - Add Kirill's Acked-by.

 arch/Kconfig |   7 ++
 mm/mremap.c  | 230 ++++++++++++++++++++++++++++++++++++++++++---------
 2 files changed, 197 insertions(+), 40 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 76ec3395b843..79da6d714264 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -608,6 +608,13 @@ config HAVE_IRQ_TIME_ACCOUNTING
 	  Archs need to ensure they use a high enough resolution clock to
 	  support irq time accounting and then call enable_sched_clock_irqtime().
 
+config HAVE_MOVE_PUD
+	bool
+	help
+	  Architectures that select this are able to move page tables at the
+	  PUD level. If there are only 3 page table levels, the move effectively
+	  happens at the PGD level.
+
 config HAVE_MOVE_PMD
 	bool
 	help
diff --git a/mm/mremap.c b/mm/mremap.c
index 138abbae4f75..078f731277b6 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -30,12 +30,11 @@
 
 #include "internal.h"
 
-static pmd_t *get_old_pmd(struct mm_struct *mm, unsigned long addr)
+static pud_t *get_old_pud(struct mm_struct *mm, unsigned long addr)
 {
 	pgd_t *pgd;
 	p4d_t *p4d;
 	pud_t *pud;
-	pmd_t *pmd;
 
 	pgd = pgd_offset(mm, addr);
 	if (pgd_none_or_clear_bad(pgd))
@@ -49,6 +48,18 @@ static pmd_t *get_old_pmd(struct mm_struct *mm, unsigned long addr)
 	if (pud_none_or_clear_bad(pud))
 		return NULL;
 
+	return pud;
+}
+
+static pmd_t *get_old_pmd(struct mm_struct *mm, unsigned long addr)
+{
+	pud_t *pud;
+	pmd_t *pmd;
+
+	pud = get_old_pud(mm, addr);
+	if (!pud)
+		return NULL;
+
 	pmd = pmd_offset(pud, addr);
 	if (pmd_none(*pmd))
 		return NULL;
@@ -56,19 +67,27 @@ static pmd_t *get_old_pmd(struct mm_struct *mm, unsigned long addr)
 	return pmd;
 }
 
-static pmd_t *alloc_new_pmd(struct mm_struct *mm, struct vm_area_struct *vma,
+static pud_t *alloc_new_pud(struct mm_struct *mm, struct vm_area_struct *vma,
 			    unsigned long addr)
 {
 	pgd_t *pgd;
 	p4d_t *p4d;
-	pud_t *pud;
-	pmd_t *pmd;
 
 	pgd = pgd_offset(mm, addr);
 	p4d = p4d_alloc(mm, pgd, addr);
 	if (!p4d)
 		return NULL;
-	pud = pud_alloc(mm, p4d, addr);
+
+	return pud_alloc(mm, p4d, addr);
+}
+
+static pmd_t *alloc_new_pmd(struct mm_struct *mm, struct vm_area_struct *vma,
+			    unsigned long addr)
+{
+	pud_t *pud;
+	pmd_t *pmd;
+
+	pud = alloc_new_pud(mm, vma, addr);
 	if (!pud)
 		return NULL;
 
@@ -249,14 +268,148 @@ static bool move_normal_pmd(struct vm_area_struct *vma, unsigned long old_addr,
 
 	return true;
 }
+#else
+static inline bool move_normal_pmd(struct vm_area_struct *vma,
+		unsigned long old_addr, unsigned long new_addr, pmd_t *old_pmd,
+		pmd_t *new_pmd)
+{
+	return false;
+}
 #endif
 
+#ifdef CONFIG_HAVE_MOVE_PUD
+static bool move_normal_pud(struct vm_area_struct *vma, unsigned long old_addr,
+		  unsigned long new_addr, pud_t *old_pud, pud_t *new_pud)
+{
+	spinlock_t *old_ptl, *new_ptl;
+	struct mm_struct *mm = vma->vm_mm;
+	pud_t pud;
+
+	/*
+	 * The destination pud shouldn't be established, free_pgtables()
+	 * should have released it.
+	 */
+	if (WARN_ON_ONCE(!pud_none(*new_pud)))
+		return false;
+
+	/*
+	 * We don't have to worry about the ordering of src and dst
+	 * ptlocks because exclusive mmap_lock prevents deadlock.
+	 */
+	old_ptl = pud_lock(vma->vm_mm, old_pud);
+	new_ptl = pud_lockptr(mm, new_pud);
+	if (new_ptl != old_ptl)
+		spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING);
+
+	/* Clear the pud */
+	pud = *old_pud;
+	pud_clear(old_pud);
+
+	VM_BUG_ON(!pud_none(*new_pud));
+
+	/* Set the new pud */
+	set_pud_at(mm, new_addr, new_pud, pud);
+	flush_tlb_range(vma, old_addr, old_addr + PUD_SIZE);
+	if (new_ptl != old_ptl)
+		spin_unlock(new_ptl);
+	spin_unlock(old_ptl);
+
+	return true;
+}
+#else
+static inline bool move_normal_pud(struct vm_area_struct *vma,
+		unsigned long old_addr, unsigned long new_addr, pud_t *old_pud,
+		pud_t *new_pud)
+{
+	return false;
+}
+#endif
+
+enum pgt_entry {
+	NORMAL_PMD,
+	HPAGE_PMD,
+	NORMAL_PUD,
+};
+
+/*
+ * Returns an extent of the corresponding size for the pgt_entry specified if
+ * valid. Else returns a smaller extent bounded by the end of the source and
+ * destination pgt_entry.
+ */
+static unsigned long get_extent(enum pgt_entry entry, unsigned long old_addr,
+			unsigned long old_end, unsigned long new_addr)
+{
+	unsigned long next, extent, mask, size;
+
+	switch (entry) {
+	case HPAGE_PMD:
+	case NORMAL_PMD:
+		mask = PMD_MASK;
+		size = PMD_SIZE;
+		break;
+	case NORMAL_PUD:
+		mask = PUD_MASK;
+		size = PUD_SIZE;
+		break;
+	default:
+		BUILD_BUG();
+		break;
+	}
+
+	next = (old_addr + size) & mask;
+	/* even if next overflowed, extent below will be ok */
+	extent = (next > old_end) ? old_end - old_addr : next - old_addr;
+	next = (new_addr + size) & mask;
+	if (extent > next - new_addr)
+		extent = next - new_addr;
+	return extent;
+}
+
+/*
+ * Attempts to speedup the move by moving entry at the level corresponding to
+ * pgt_entry. Returns true if the move was successful, else false.
+ */
+static bool move_pgt_entry(enum pgt_entry entry, struct vm_area_struct *vma,
+			unsigned long old_addr, unsigned long new_addr,
+			void *old_entry, void *new_entry, bool need_rmap_locks)
+{
+	bool moved = false;
+
+	/* See comment in move_ptes() */
+	if (need_rmap_locks)
+		take_rmap_locks(vma);
+
+	switch (entry) {
+	case NORMAL_PMD:
+		moved = move_normal_pmd(vma, old_addr, new_addr, old_entry,
+					new_entry);
+		break;
+	case NORMAL_PUD:
+		moved = move_normal_pud(vma, old_addr, new_addr, old_entry,
+					new_entry);
+		break;
+	case HPAGE_PMD:
+		moved = IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
+			move_huge_pmd(vma, old_addr, new_addr, old_entry,
+				      new_entry);
+		break;
+	default:
+		WARN_ON_ONCE(1);
+		break;
+	}
+
+	if (need_rmap_locks)
+		drop_rmap_locks(vma);
+
+	return moved;
+}
+
 unsigned long move_page_tables(struct vm_area_struct *vma,
 		unsigned long old_addr, struct vm_area_struct *new_vma,
 		unsigned long new_addr, unsigned long len,
 		bool need_rmap_locks)
 {
-	unsigned long extent, next, old_end;
+	unsigned long extent, old_end;
 	struct mmu_notifier_range range;
 	pmd_t *old_pmd, *new_pmd;
 
@@ -269,53 +422,50 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
 
 	for (; old_addr < old_end; old_addr += extent, new_addr += extent) {
 		cond_resched();
-		next = (old_addr + PMD_SIZE) & PMD_MASK;
-		/* even if next overflowed, extent below will be ok */
-		extent = next - old_addr;
-		if (extent > old_end - old_addr)
-			extent = old_end - old_addr;
-		next = (new_addr + PMD_SIZE) & PMD_MASK;
-		if (extent > next - new_addr)
-			extent = next - new_addr;
+		/*
+		 * If extent is PUD-sized try to speed up the move by moving at the
+		 * PUD level if possible.
+		 */
+		extent = get_extent(NORMAL_PUD, old_addr, old_end, new_addr);
+		if (IS_ENABLED(CONFIG_HAVE_MOVE_PUD) && extent == PUD_SIZE) {
+			pud_t *old_pud, *new_pud;
+
+			old_pud = get_old_pud(vma->vm_mm, old_addr);
+			if (!old_pud)
+				continue;
+			new_pud = alloc_new_pud(vma->vm_mm, vma, new_addr);
+			if (!new_pud)
+				break;
+			if (move_pgt_entry(NORMAL_PUD, vma, old_addr, new_addr,
+					   old_pud, new_pud, need_rmap_locks))
+				continue;
+		}
+
+		extent = get_extent(NORMAL_PMD, old_addr, old_end, new_addr);
 		old_pmd = get_old_pmd(vma->vm_mm, old_addr);
 		if (!old_pmd)
 			continue;
 		new_pmd = alloc_new_pmd(vma->vm_mm, vma, new_addr);
 		if (!new_pmd)
 			break;
-		if (is_swap_pmd(*old_pmd) || pmd_trans_huge(*old_pmd) || pmd_devmap(*old_pmd)) {
-			if (extent == HPAGE_PMD_SIZE) {
-				bool moved;
-				/* See comment in move_ptes() */
-				if (need_rmap_locks)
-					take_rmap_locks(vma);
-				moved = move_huge_pmd(vma, old_addr, new_addr,
-						      old_pmd, new_pmd);
-				if (need_rmap_locks)
-					drop_rmap_locks(vma);
-				if (moved)
-					continue;
-			}
+		if (is_swap_pmd(*old_pmd) || pmd_trans_huge(*old_pmd) ||
+		    pmd_devmap(*old_pmd)) {
+			if (extent == HPAGE_PMD_SIZE &&
+			    move_pgt_entry(HPAGE_PMD, vma, old_addr, new_addr,
+					   old_pmd, new_pmd, need_rmap_locks))
+				continue;
 			split_huge_pmd(vma, old_pmd, old_addr);
 			if (pmd_trans_unstable(old_pmd))
 				continue;
-		} else if (extent == PMD_SIZE) {
-#ifdef CONFIG_HAVE_MOVE_PMD
+		} else if (IS_ENABLED(CONFIG_HAVE_MOVE_PMD) &&
+			   extent == PMD_SIZE) {
 			/*
 			 * If the extent is PMD-sized, try to speed the move by
 			 * moving at the PMD level if possible.
 			 */
-			bool moved;
-
-			if (need_rmap_locks)
-				take_rmap_locks(vma);
-			moved = move_normal_pmd(vma, old_addr, new_addr,
-						old_pmd, new_pmd);
-			if (need_rmap_locks)
-				drop_rmap_locks(vma);
-			if (moved)
+			if (move_pgt_entry(NORMAL_PMD, vma, old_addr, new_addr,
+					   old_pmd, new_pmd, need_rmap_locks))
 				continue;
-#endif
 		}
 
 		if (pte_alloc(new_vma->vm_mm, new_pmd))
-- 
2.28.0.1011.ga647a8990f-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 4/5] arm64: mremap speedup - Enable HAVE_MOVE_PUD
  2020-10-14  0:53 [PATCH v4 0/5] Speed up mremap on large regions Kalesh Singh
                   ` (2 preceding siblings ...)
  2020-10-14  0:53 ` [PATCH v4 3/5] mm: Speedup mremap on 1GB or larger regions Kalesh Singh
@ 2020-10-14  0:53 ` Kalesh Singh
  2020-10-14  0:53 ` [PATCH v4 5/5] x86: " Kalesh Singh
  2020-10-15 20:40 ` [PATCH v4 0/5] Speed up mremap on large regions Will Deacon
  5 siblings, 0 replies; 12+ messages in thread
From: Kalesh Singh @ 2020-10-14  0:53 UTC (permalink / raw)
  Cc: surenb, minchan, joelaf, lokeshgidra, kaleshsingh, kernel-team,
	Kirill A . Shutemov, Catalin Marinas, Will Deacon, Andrew Morton,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin, Shuah Khan, Peter Zijlstra, Aneesh Kumar K.V,
	Kees Cook, Josh Poimboeuf, Sami Tolvanen, Masahiro Yamada,
	Arnd Bergmann, Frederic Weisbecker, Krzysztof Kozlowski,
	Hassan Naveed, Christian Brauner, Stephen Boyd,
	Anshuman Khandual, Gavin Shan, Mike Rapoport, Steven Price,
	Jia He, John Hubbard, Ram Pai, Ralph Campbell, Mina Almasry,
	Sandipan Das, Dave Hansen, Brian Geffon, Masami Hiramatsu,
	Kamalesh Babulal, SeongJae Park, linux-kernel, linux-arm-kernel,
	linux-mm, linux-kselftest

HAVE_MOVE_PUD enables remapping pages at the PUD level if both the
source and destination addresses are PUD-aligned.

With HAVE_MOVE_PUD enabled it can be inferred that there is approximately
a 19x improvement in performance on arm64. (See data below).

------- Test Results ---------

The following results were obtained using a 5.4 kernel, by remapping
a PUD-aligned, 1GB sized region to a PUD-aligned destination.
The results from 10 iterations of the test are given below:

Total mremap times for 1GB data on arm64. All times are in nanoseconds.

Control          HAVE_MOVE_PUD

1247761          74271
1219896          46771
1094792          59687
1227760          48385
1043698          76666
1101771          50365
1159896          52500
1143594          75261
1025833          61354
1078125          48697

1134312.6        59395.7    <-- Mean time in nanoseconds

A 1GB mremap completion time drops from ~1.1 milliseconds
to ~59 microseconds on arm64. (~19x speed up).

Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
---
Changes in v3:
  - Add set_pud_at() macro - Used by move_normal_pud().

Changes in v4:
  - Add Kirill's Acked-by.

 arch/arm64/Kconfig               | 1 +
 arch/arm64/include/asm/pgtable.h | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 434d6791e869..7191a79fb44d 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -124,6 +124,7 @@ config ARM64
 	select HANDLE_DOMAIN_IRQ
 	select HARDIRQS_SW_RESEND
 	select HAVE_MOVE_PMD
+	select HAVE_MOVE_PUD
 	select HAVE_PCI
 	select HAVE_ACPI_APEI if (ACPI && EFI)
 	select HAVE_ALIGNED_STRUCT_PAGE if SLUB
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index a11bf52e0c38..0b0b36974757 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -454,6 +454,7 @@ static inline pmd_t pmd_mkdevmap(pmd_t pmd)
 #define pfn_pud(pfn,prot)	__pud(__phys_to_pud_val((phys_addr_t)(pfn) << PAGE_SHIFT) | pgprot_val(prot))
 
 #define set_pmd_at(mm, addr, pmdp, pmd)	set_pte_at(mm, addr, (pte_t *)pmdp, pmd_pte(pmd))
+#define set_pud_at(mm, addr, pudp, pud)	set_pte_at(mm, addr, (pte_t *)pudp, pud_pte(pud))
 
 #define __p4d_to_phys(p4d)	__pte_to_phys(p4d_pte(p4d))
 #define __phys_to_p4d_val(phys)	__phys_to_pte_val(phys)
-- 
2.28.0.1011.ga647a8990f-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 5/5] x86: mremap speedup - Enable HAVE_MOVE_PUD
  2020-10-14  0:53 [PATCH v4 0/5] Speed up mremap on large regions Kalesh Singh
                   ` (3 preceding siblings ...)
  2020-10-14  0:53 ` [PATCH v4 4/5] arm64: mremap speedup - Enable HAVE_MOVE_PUD Kalesh Singh
@ 2020-10-14  0:53 ` Kalesh Singh
  2020-10-14 15:53   ` Ingo Molnar
  2020-10-15 20:40 ` [PATCH v4 0/5] Speed up mremap on large regions Will Deacon
  5 siblings, 1 reply; 12+ messages in thread
From: Kalesh Singh @ 2020-10-14  0:53 UTC (permalink / raw)
  Cc: surenb, minchan, joelaf, lokeshgidra, kaleshsingh, kernel-team,
	Kirill A . Shutemov, Andrew Morton, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H . Peter Anvin, Catalin Marinas, Will Deacon,
	x86, Shuah Khan, Peter Zijlstra, Kees Cook, Aneesh Kumar K.V,
	Arnd Bergmann, Sami Tolvanen, Masahiro Yamada,
	Krzysztof Kozlowski, Frederic Weisbecker, Hassan Naveed,
	Christian Brauner, Anshuman Khandual, Mark Rutland, Gavin Shan,
	Mike Rapoport, Steven Price, Jia He, John Hubbard, Ram Pai,
	Sandipan Das, Zi Yan, Mina Almasry, Ralph Campbell, Dave Hansen,
	Brian Geffon, Masami Hiramatsu, SeongJae Park, linux-kernel,
	linux-arm-kernel, linux-mm, linux-kselftest

HAVE_MOVE_PUD enables remapping pages at the PUD level if both the
source and destination addresses are PUD-aligned.

With HAVE_MOVE_PUD enabled it can be inferred that there is approximately
a 13x improvement in performance on x86. (See data below).

------- Test Results ---------

The following results were obtained using a 5.4 kernel, by remapping
a PUD-aligned, 1GB sized region to a PUD-aligned destination.
The results from 10 iterations of the test are given below:

Total mremap times for 1GB data on x86. All times are in nanoseconds.

Control        HAVE_MOVE_PUD

180394         15089
235728         14056
238931         25741
187330         13838
241742         14187
177925         14778
182758         14728
160872         14418
205813         15107
245722         13998

205721.5       15594    <-- Mean time in nanoseconds

A 1GB mremap completion time drops from ~205 microseconds
to ~15 microseconds on x86. (~13x speed up).

Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
---
Changes in v4:
  - Add Kirill's Acked-by.

 arch/x86/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 835d93006bd6..e199760d54fc 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -198,6 +198,7 @@ config X86
 	select HAVE_MIXED_BREAKPOINTS_REGS
 	select HAVE_MOD_ARCH_SPECIFIC
 	select HAVE_MOVE_PMD
+	select HAVE_MOVE_PUD
 	select HAVE_NMI
 	select HAVE_OPROFILE
 	select HAVE_OPTPROBES
-- 
2.28.0.1011.ga647a8990f-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 5/5] x86: mremap speedup - Enable HAVE_MOVE_PUD
  2020-10-14  0:53 ` [PATCH v4 5/5] x86: " Kalesh Singh
@ 2020-10-14 15:53   ` Ingo Molnar
  0 siblings, 0 replies; 12+ messages in thread
From: Ingo Molnar @ 2020-10-14 15:53 UTC (permalink / raw)
  To: Kalesh Singh
  Cc: surenb, minchan, joelaf, lokeshgidra, kernel-team,
	Kirill A . Shutemov, Andrew Morton, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H . Peter Anvin, Catalin Marinas, Will Deacon,
	x86, Shuah Khan, Peter Zijlstra, Kees Cook, Aneesh Kumar K.V,
	Arnd Bergmann, Sami Tolvanen, Masahiro Yamada,
	Krzysztof Kozlowski, Frederic Weisbecker, Hassan Naveed,
	Christian Brauner, Anshuman Khandual, Mark Rutland, Gavin Shan,
	Mike Rapoport, Steven Price, Jia He, John Hubbard, Ram Pai,
	Sandipan Das, Zi Yan, Mina Almasry, Ralph Campbell, Dave Hansen,
	Brian Geffon, Masami Hiramatsu, SeongJae Park, linux-kernel,
	linux-arm-kernel, linux-mm, linux-kselftest


* Kalesh Singh <kaleshsingh@google.com> wrote:

> HAVE_MOVE_PUD enables remapping pages at the PUD level if both the
> source and destination addresses are PUD-aligned.
> 
> With HAVE_MOVE_PUD enabled it can be inferred that there is approximately
> a 13x improvement in performance on x86. (See data below).
> 
> ------- Test Results ---------
> 
> The following results were obtained using a 5.4 kernel, by remapping
> a PUD-aligned, 1GB sized region to a PUD-aligned destination.
> The results from 10 iterations of the test are given below:
> 
> Total mremap times for 1GB data on x86. All times are in nanoseconds.
> 
> Control        HAVE_MOVE_PUD
> 
> 180394         15089
> 235728         14056
> 238931         25741
> 187330         13838
> 241742         14187
> 177925         14778
> 182758         14728
> 160872         14418
> 205813         15107
> 245722         13998
> 
> 205721.5       15594    <-- Mean time in nanoseconds
> 
> A 1GB mremap completion time drops from ~205 microseconds
> to ~15 microseconds on x86. (~13x speed up).
> 
> Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: H. Peter Anvin <hpa@zytor.com>

Nice!

Assuming it's all correct code:

Acked-by: Ingo Molnar <mingo@kernel.org>

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 1/5] kselftests: vm: Add mremap tests
  2020-10-14  0:53 ` [PATCH v4 1/5] kselftests: vm: Add mremap tests Kalesh Singh
@ 2020-10-14 19:02   ` Kalesh Singh
  0 siblings, 0 replies; 12+ messages in thread
From: Kalesh Singh @ 2020-10-14 19:02 UTC (permalink / raw)
  Cc: Suren Baghdasaryan, Minchan Kim, Joel Fernandes, Lokesh Gidra,
	Cc: Android Kernel, John Hubbard, Shuah Khan, Andrew Morton,
	Kirill A . Shutemov, Catalin Marinas, Will Deacon,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	the arch/x86 maintainers, H. Peter Anvin, Peter Zijlstra,
	Kees Cook, Aneesh Kumar K.V, Masahiro Yamada, Josh Poimboeuf,
	Sami Tolvanen, Krzysztof Kozlowski, Frederic Weisbecker,
	Arnd Bergmann, Christian Brauner, Anshuman Khandual,
	Mike Rapoport, Gavin Shan, Steven Price, Jia He, Ralph Campbell,
	Zi Yan, Mina Almasry, Ram Pai, Sandipan Das, Dave Hansen,
	Masami Hiramatsu, Brian Geffon, SeongJae Park, LKML,
	moderated list:ARM64 PORT (AARCH64 ARCHITECTURE),
	open list:MEMORY MANAGEMENT, open list:KERNEL SELFTEST FRAMEWORK

On Tue, Oct 13, 2020 at 8:54 PM Kalesh Singh <kaleshsingh@google.com> wrote:
>

Hi kselftest maintainers,

Could someone ACK this mremap test if there isn't any other concern?

Thanks,
Kalesh

> Test mremap on regions of various sizes and alignments and validate
> data after remapping. Also provide total time for remapping
> the region which is useful for performance comparison of the mremap
> optimizations that move pages at the PMD/PUD levels if HAVE_MOVE_PMD
> and/or HAVE_MOVE_PUD are enabled.
>
> Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
> Reviewed-by: John Hubbard <jhubbard@nvidia.com>
> Cc: Shuah Khan <shuah@kernel.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
> Changes in v2:
>   - Reduce test time by only validating a certain threshold of the
>     remapped region (4MB by default). The -t flag can be used to
>     set a custom threshold in MB or no threshold by passing 0. (-t0).
>     mremap time is not provided in stdout for only partially validated
>     regions. This time is only applicable for comparison if the entire
>     mapped region was faulted in.
>   - Use a random pattern for validating the remapped region. The -p
>     flag can be used to run the tests with a specified seed for the
>     random pattern.
>   - Print test configs (threshold_mb and pattern_seed) to stdout.
>   - Remove MAKE_SIMPLE_TEST macro.
>   - Define named flags instead of 0 / 1.
>   - Add comments for destination address' align_mask and offset.
>
> Changes in v3:
>   - Remove unused PATTERN_SIZE definition.
>   - Make lines 80 cols or less where they don’t need to be longer.
>   - Add John Hubbard’s Reviewed-by tag.
>
>  tools/testing/selftests/vm/.gitignore    |   1 +
>  tools/testing/selftests/vm/Makefile      |   1 +
>  tools/testing/selftests/vm/mremap_test.c | 344 +++++++++++++++++++++++
>  tools/testing/selftests/vm/run_vmtests   |  11 +
>  4 files changed, 357 insertions(+)
>  create mode 100644 tools/testing/selftests/vm/mremap_test.c
>
> diff --git a/tools/testing/selftests/vm/.gitignore b/tools/testing/selftests/vm/.gitignore
> index 849e8226395a..b3a183c36cb5 100644
> --- a/tools/testing/selftests/vm/.gitignore
> +++ b/tools/testing/selftests/vm/.gitignore
> @@ -8,6 +8,7 @@ thuge-gen
>  compaction_test
>  mlock2-tests
>  mremap_dontunmap
> +mremap_test
>  on-fault-limit
>  transhuge-stress
>  protection_keys
> diff --git a/tools/testing/selftests/vm/Makefile b/tools/testing/selftests/vm/Makefile
> index a9026706d597..f044808b45fa 100644
> --- a/tools/testing/selftests/vm/Makefile
> +++ b/tools/testing/selftests/vm/Makefile
> @@ -16,6 +16,7 @@ TEST_GEN_FILES += map_populate
>  TEST_GEN_FILES += mlock-random-test
>  TEST_GEN_FILES += mlock2-tests
>  TEST_GEN_FILES += mremap_dontunmap
> +TEST_GEN_FILES += mremap_test
>  TEST_GEN_FILES += on-fault-limit
>  TEST_GEN_FILES += thuge-gen
>  TEST_GEN_FILES += transhuge-stress
> diff --git a/tools/testing/selftests/vm/mremap_test.c b/tools/testing/selftests/vm/mremap_test.c
> new file mode 100644
> index 000000000000..9c391d016922
> --- /dev/null
> +++ b/tools/testing/selftests/vm/mremap_test.c
> @@ -0,0 +1,344 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright 2020 Google LLC
> + */
> +#define _GNU_SOURCE
> +
> +#include <errno.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <sys/mman.h>
> +#include <time.h>
> +
> +#include "../kselftest.h"
> +
> +#define EXPECT_SUCCESS 0
> +#define EXPECT_FAILURE 1
> +#define NON_OVERLAPPING 0
> +#define OVERLAPPING 1
> +#define NS_PER_SEC 1000000000ULL
> +#define VALIDATION_DEFAULT_THRESHOLD 4 /* 4MB */
> +#define VALIDATION_NO_THRESHOLD 0      /* Verify the entire region */
> +
> +#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
> +#define MIN(X, Y) ((X) < (Y) ? (X) : (Y))
> +
> +struct config {
> +       unsigned long long src_alignment;
> +       unsigned long long dest_alignment;
> +       unsigned long long region_size;
> +       int overlapping;
> +};
> +
> +struct test {
> +       const char *name;
> +       struct config config;
> +       int expect_failure;
> +};
> +
> +enum {
> +       _1KB = 1ULL << 10,      /* 1KB -> not page aligned */
> +       _4KB = 4ULL << 10,
> +       _8KB = 8ULL << 10,
> +       _1MB = 1ULL << 20,
> +       _2MB = 2ULL << 20,
> +       _4MB = 4ULL << 20,
> +       _1GB = 1ULL << 30,
> +       _2GB = 2ULL << 30,
> +       PTE = _4KB,
> +       PMD = _2MB,
> +       PUD = _1GB,
> +};
> +
> +#define MAKE_TEST(source_align, destination_align, size,       \
> +                 overlaps, should_fail, test_name)             \
> +{                                                              \
> +       .name = test_name,                                      \
> +       .config = {                                             \
> +               .src_alignment = source_align,                  \
> +               .dest_alignment = destination_align,            \
> +               .region_size = size,                            \
> +               .overlapping = overlaps,                        \
> +       },                                                      \
> +       .expect_failure = should_fail                           \
> +}
> +
> +/*
> + * Returns the start address of the mapping on success, else returns
> + * NULL on failure.
> + */
> +static void *get_source_mapping(struct config c)
> +{
> +       unsigned long long addr = 0ULL;
> +       void *src_addr = NULL;
> +retry:
> +       addr += c.src_alignment;
> +       src_addr = mmap((void *) addr, c.region_size, PROT_READ | PROT_WRITE,
> +                       MAP_FIXED | MAP_ANONYMOUS | MAP_SHARED, -1, 0);
> +       if (src_addr == MAP_FAILED) {
> +               if (errno == EPERM)
> +                       goto retry;
> +               goto error;
> +       }
> +       /*
> +        * Check that the address is aligned to the specified alignment.
> +        * Addresses which have alignments that are multiples of that
> +        * specified are not considered valid. For instance, 1GB address is
> +        * 2MB-aligned, however it will not be considered valid for a
> +        * requested alignment of 2MB. This is done to reduce coincidental
> +        * alignment in the tests.
> +        */
> +       if (((unsigned long long) src_addr & (c.src_alignment - 1)) ||
> +                       !((unsigned long long) src_addr & c.src_alignment))
> +               goto retry;
> +
> +       if (!src_addr)
> +               goto error;
> +
> +       return src_addr;
> +error:
> +       ksft_print_msg("Failed to map source region: %s\n",
> +                       strerror(errno));
> +       return NULL;
> +}
> +
> +/* Returns the time taken for the remap on success else returns -1. */
> +static long long remap_region(struct config c, unsigned int threshold_mb,
> +                             char pattern_seed)
> +{
> +       void *addr, *src_addr, *dest_addr;
> +       unsigned long long i;
> +       struct timespec t_start = {0, 0}, t_end = {0, 0};
> +       long long  start_ns, end_ns, align_mask, ret, offset;
> +       unsigned long long threshold;
> +
> +       if (threshold_mb == VALIDATION_NO_THRESHOLD)
> +               threshold = c.region_size;
> +       else
> +               threshold = MIN(threshold_mb * _1MB, c.region_size);
> +
> +       src_addr = get_source_mapping(c);
> +       if (!src_addr) {
> +               ret = -1;
> +               goto out;
> +       }
> +
> +       /* Set byte pattern */
> +       srand(pattern_seed);
> +       for (i = 0; i < threshold; i++)
> +               memset((char *) src_addr + i, (char) rand(), 1);
> +
> +       /* Mask to zero out lower bits of address for alignment */
> +       align_mask = ~(c.dest_alignment - 1);
> +       /* Offset of destination address from the end of the source region */
> +       offset = (c.overlapping) ? -c.dest_alignment : c.dest_alignment;
> +       addr = (void *) (((unsigned long long) src_addr + c.region_size
> +                         + offset) & align_mask);
> +
> +       /* See comment in get_source_mapping() */
> +       if (!((unsigned long long) addr & c.dest_alignment))
> +               addr = (void *) ((unsigned long long) addr | c.dest_alignment);
> +
> +       clock_gettime(CLOCK_MONOTONIC, &t_start);
> +       dest_addr = mremap(src_addr, c.region_size, c.region_size,
> +                       MREMAP_MAYMOVE|MREMAP_FIXED, (char *) addr);
> +       clock_gettime(CLOCK_MONOTONIC, &t_end);
> +
> +       if (dest_addr == MAP_FAILED) {
> +               ksft_print_msg("mremap failed: %s\n", strerror(errno));
> +               ret = -1;
> +               goto clean_up_src;
> +       }
> +
> +       /* Verify byte pattern after remapping */
> +       srand(pattern_seed);
> +       for (i = 0; i < threshold; i++) {
> +               char c = (char) rand();
> +
> +               if (((char *) dest_addr)[i] != c) {
> +                       ksft_print_msg("Data after remap doesn't match at offset %d\n",
> +                                      i);
> +                       ksft_print_msg("Expected: %#x\t Got: %#x\n", c & 0xff,
> +                                       ((char *) dest_addr)[i] & 0xff);
> +                       ret = -1;
> +                       goto clean_up_dest;
> +               }
> +       }
> +
> +       start_ns = t_start.tv_sec * NS_PER_SEC + t_start.tv_nsec;
> +       end_ns = t_end.tv_sec * NS_PER_SEC + t_end.tv_nsec;
> +       ret = end_ns - start_ns;
> +
> +/*
> + * Since the destination address is specified using MREMAP_FIXED, subsequent
> + * mremap will unmap any previous mapping at the address range specified by
> + * dest_addr and region_size. This significantly affects the remap time of
> + * subsequent tests. So we clean up mappings after each test.
> + */
> +clean_up_dest:
> +       munmap(dest_addr, c.region_size);
> +clean_up_src:
> +       munmap(src_addr, c.region_size);
> +out:
> +       return ret;
> +}
> +
> +static void run_mremap_test_case(struct test test_case, int *failures,
> +                                unsigned int threshold_mb,
> +                                unsigned int pattern_seed)
> +{
> +       long long remap_time = remap_region(test_case.config, threshold_mb,
> +                                           pattern_seed);
> +
> +       if (remap_time < 0) {
> +               if (test_case.expect_failure)
> +                       ksft_test_result_pass("%s\n\tExpected mremap failure\n",
> +                                             test_case.name);
> +               else {
> +                       ksft_test_result_fail("%s\n", test_case.name);
> +                       *failures += 1;
> +               }
> +       } else {
> +               /*
> +                * Comparing mremap time is only applicable if entire region
> +                * was faulted in.
> +                */
> +               if (threshold_mb == VALIDATION_NO_THRESHOLD ||
> +                   test_case.config.region_size <= threshold_mb * _1MB)
> +                       ksft_test_result_pass("%s\n\tmremap time: %12lldns\n",
> +                                             test_case.name, remap_time);
> +               else
> +                       ksft_test_result_pass("%s\n", test_case.name);
> +       }
> +}
> +
> +static void usage(const char *cmd)
> +{
> +       fprintf(stderr,
> +               "Usage: %s [[-t <threshold_mb>] [-p <pattern_seed>]]\n"
> +               "-t\t only validate threshold_mb of the remapped region\n"
> +               "  \t if 0 is supplied no threshold is used; all tests\n"
> +               "  \t are run and remapped regions validated fully.\n"
> +               "  \t The default threshold used is 4MB.\n"
> +               "-p\t provide a seed to generate the random pattern for\n"
> +               "  \t validating the remapped region.\n", cmd);
> +}
> +
> +static int parse_args(int argc, char **argv, unsigned int *threshold_mb,
> +                     unsigned int *pattern_seed)
> +{
> +       const char *optstr = "t:p:";
> +       int opt;
> +
> +       while ((opt = getopt(argc, argv, optstr)) != -1) {
> +               switch (opt) {
> +               case 't':
> +                       *threshold_mb = atoi(optarg);
> +                       break;
> +               case 'p':
> +                       *pattern_seed = atoi(optarg);
> +                       break;
> +               default:
> +                       usage(argv[0]);
> +                       return -1;
> +               }
> +       }
> +
> +       if (optind < argc) {
> +               usage(argv[0]);
> +               return -1;
> +       }
> +
> +       return 0;
> +}
> +
> +int main(int argc, char **argv)
> +{
> +       int failures = 0;
> +       int i, run_perf_tests;
> +       unsigned int threshold_mb = VALIDATION_DEFAULT_THRESHOLD;
> +       unsigned int pattern_seed;
> +       time_t t;
> +
> +       pattern_seed = (unsigned int) time(&t);
> +
> +       if (parse_args(argc, argv, &threshold_mb, &pattern_seed) < 0)
> +               exit(EXIT_FAILURE);
> +
> +       ksft_print_msg("Test configs:\n\tthreshold_mb=%u\n\tpattern_seed=%u\n\n",
> +                      threshold_mb, pattern_seed);
> +
> +       struct test test_cases[] = {
> +               /* Expected mremap failures */
> +               MAKE_TEST(_4KB, _4KB, _4KB, OVERLAPPING, EXPECT_FAILURE,
> +                 "mremap - Source and Destination Regions Overlapping"),
> +               MAKE_TEST(_4KB, _1KB, _4KB, NON_OVERLAPPING, EXPECT_FAILURE,
> +                 "mremap - Destination Address Misaligned (1KB-aligned)"),
> +               MAKE_TEST(_1KB, _4KB, _4KB, NON_OVERLAPPING, EXPECT_FAILURE,
> +                 "mremap - Source Address Misaligned (1KB-aligned)"),
> +
> +               /* Src addr PTE aligned */
> +               MAKE_TEST(PTE, PTE, _8KB, NON_OVERLAPPING, EXPECT_SUCCESS,
> +                 "8KB mremap - Source PTE-aligned, Destination PTE-aligned"),
> +
> +               /* Src addr 1MB aligned */
> +               MAKE_TEST(_1MB, PTE, _2MB, NON_OVERLAPPING, EXPECT_SUCCESS,
> +                 "2MB mremap - Source 1MB-aligned, Destination PTE-aligned"),
> +               MAKE_TEST(_1MB, _1MB, _2MB, NON_OVERLAPPING, EXPECT_SUCCESS,
> +                 "2MB mremap - Source 1MB-aligned, Destination 1MB-aligned"),
> +
> +               /* Src addr PMD aligned */
> +               MAKE_TEST(PMD, PTE, _4MB, NON_OVERLAPPING, EXPECT_SUCCESS,
> +                 "4MB mremap - Source PMD-aligned, Destination PTE-aligned"),
> +               MAKE_TEST(PMD, _1MB, _4MB, NON_OVERLAPPING, EXPECT_SUCCESS,
> +                 "4MB mremap - Source PMD-aligned, Destination 1MB-aligned"),
> +               MAKE_TEST(PMD, PMD, _4MB, NON_OVERLAPPING, EXPECT_SUCCESS,
> +                 "4MB mremap - Source PMD-aligned, Destination PMD-aligned"),
> +
> +               /* Src addr PUD aligned */
> +               MAKE_TEST(PUD, PTE, _2GB, NON_OVERLAPPING, EXPECT_SUCCESS,
> +                 "2GB mremap - Source PUD-aligned, Destination PTE-aligned"),
> +               MAKE_TEST(PUD, _1MB, _2GB, NON_OVERLAPPING, EXPECT_SUCCESS,
> +                 "2GB mremap - Source PUD-aligned, Destination 1MB-aligned"),
> +               MAKE_TEST(PUD, PMD, _2GB, NON_OVERLAPPING, EXPECT_SUCCESS,
> +                 "2GB mremap - Source PUD-aligned, Destination PMD-aligned"),
> +               MAKE_TEST(PUD, PUD, _2GB, NON_OVERLAPPING, EXPECT_SUCCESS,
> +                 "2GB mremap - Source PUD-aligned, Destination PUD-aligned"),
> +       };
> +
> +       struct test perf_test_cases[] = {
> +               /*
> +                * mremap 1GB region - Page table level aligned time
> +                * comparison.
> +                */
> +               MAKE_TEST(PTE, PTE, _1GB, NON_OVERLAPPING, EXPECT_SUCCESS,
> +                 "1GB mremap - Source PTE-aligned, Destination PTE-aligned"),
> +               MAKE_TEST(PMD, PMD, _1GB, NON_OVERLAPPING, EXPECT_SUCCESS,
> +                 "1GB mremap - Source PMD-aligned, Destination PMD-aligned"),
> +               MAKE_TEST(PUD, PUD, _1GB, NON_OVERLAPPING, EXPECT_SUCCESS,
> +                 "1GB mremap - Source PUD-aligned, Destination PUD-aligned"),
> +       };
> +
> +       run_perf_tests =  (threshold_mb == VALIDATION_NO_THRESHOLD) ||
> +                               (threshold_mb * _1MB >= _1GB);
> +
> +       ksft_set_plan(ARRAY_SIZE(test_cases) + (run_perf_tests ?
> +                     ARRAY_SIZE(perf_test_cases) : 0));
> +
> +       for (i = 0; i < ARRAY_SIZE(test_cases); i++)
> +               run_mremap_test_case(test_cases[i], &failures, threshold_mb,
> +                                    pattern_seed);
> +
> +       if (run_perf_tests) {
> +               ksft_print_msg("\n%s\n",
> +                "mremap HAVE_MOVE_PMD/PUD optimization time comparison for 1GB region:");
> +               for (i = 0; i < ARRAY_SIZE(perf_test_cases); i++)
> +                       run_mremap_test_case(perf_test_cases[i], &failures,
> +                                            threshold_mb, pattern_seed);
> +       }
> +
> +       if (failures > 0)
> +               ksft_exit_fail();
> +       else
> +               ksft_exit_pass();
> +}
> diff --git a/tools/testing/selftests/vm/run_vmtests b/tools/testing/selftests/vm/run_vmtests
> index a3f4f30f0a2e..d578ad831813 100755
> --- a/tools/testing/selftests/vm/run_vmtests
> +++ b/tools/testing/selftests/vm/run_vmtests
> @@ -241,6 +241,17 @@ else
>         echo "[PASS]"
>  fi
>
> +echo "-------------------"
> +echo "running mremap_test"
> +echo "-------------------"
> +./mremap_test
> +if [ $? -ne 0 ]; then
> +       echo "[FAIL]"
> +       exitcode=1
> +else
> +       echo "[PASS]"
> +fi
> +
>  echo "-----------------"
>  echo "running thuge-gen"
>  echo "-----------------"
> --
> 2.28.0.1011.ga647a8990f-goog
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 2/5] arm64: mremap speedup - Enable HAVE_MOVE_PMD
  2020-10-14  0:53 ` [PATCH v4 2/5] arm64: mremap speedup - Enable HAVE_MOVE_PMD Kalesh Singh
@ 2020-10-15 10:55   ` Will Deacon
  0 siblings, 0 replies; 12+ messages in thread
From: Will Deacon @ 2020-10-15 10:55 UTC (permalink / raw)
  To: Kalesh Singh
  Cc: surenb, minchan, joelaf, lokeshgidra, kernel-team,
	Kirill A . Shutemov, Catalin Marinas, Andrew Morton,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin, Shuah Khan, Peter Zijlstra, Kees Cook,
	Aneesh Kumar K.V, Sami Tolvanen, Masahiro Yamada, Josh Poimboeuf,
	Frederic Weisbecker, Krzysztof Kozlowski, Hassan Naveed,
	Arnd Bergmann, Christian Brauner, Anshuman Khandual, Mark Brown,
	Gavin Shan, Mike Rapoport, Steven Price, Jia He, John Hubbard,
	Mike Kravetz, Greg Kroah-Hartman, Ram Pai, Mina Almasry,
	Ralph Campbell, Sandipan Das, Dave Hansen, Masami Hiramatsu,
	Jason Gunthorpe, Brian Geffon, SeongJae Park, linux-kernel,
	linux-arm-kernel, linux-mm, linux-kselftest

On Wed, Oct 14, 2020 at 12:53:07AM +0000, Kalesh Singh wrote:
> HAVE_MOVE_PMD enables remapping pages at the PMD level if both the
> source and destination addresses are PMD-aligned.
> 
> HAVE_MOVE_PMD is already enabled on x86. The original patch [1] that
> introduced this config did not enable it on arm64 at the time because
> of performance issues with flushing the TLB on every PMD move. These
> issues have since been addressed in more recent releases with
> improvements to the arm64 TLB invalidation and core mmu_gather code as
> Will Deacon mentioned in [2].
> 
> From the data below, it can be inferred that there is approximately
> 8x improvement in performance when HAVE_MOVE_PMD is enabled on arm64.
> 
> --------- Test Results ----------
> 
> The following results were obtained on an arm64 device running a 5.4
> kernel, by remapping a PMD-aligned, 1GB sized region to a PMD-aligned
> destination. The results from 10 iterations of the test are given below.
> All times are in nanoseconds.
> 
> Control    HAVE_MOVE_PMD
> 
> 9220833    1247761
> 9002552    1219896
> 9254115    1094792
> 8725885    1227760
> 9308646    1043698
> 9001667    1101771
> 8793385    1159896
> 8774636    1143594
> 9553125    1025833
> 9374010    1078125
> 
> 9100885.4  1134312.6    <-- Mean Time in nanoseconds
> 
> Total mremap time for a 1GB sized PMD-aligned region drops from
> ~9.1 milliseconds to ~1.1 milliseconds. (~8x speedup).
> 
> [1] https://lore.kernel.org/r/20181108181201.88826-3-joelaf@google.com
> [2] https://www.mail-archive.com/linuxppc-dev@lists.ozlabs.org/msg140837.html
> 
> Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> ---
> Changes in v4:
>   - Add Kirill's Acked-by.

Argh, I thought we already enabled this for PMDs back in 2018! Looks like
that we forgot to actually do that after I improved the performance of
the TLB invalidation.

I'll pick this one patch up for 5.10.

Will

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 0/5] Speed up mremap on large regions
  2020-10-14  0:53 [PATCH v4 0/5] Speed up mremap on large regions Kalesh Singh
                   ` (4 preceding siblings ...)
  2020-10-14  0:53 ` [PATCH v4 5/5] x86: " Kalesh Singh
@ 2020-10-15 20:40 ` Will Deacon
  5 siblings, 0 replies; 12+ messages in thread
From: Will Deacon @ 2020-10-15 20:40 UTC (permalink / raw)
  To: Kalesh Singh
  Cc: catalin.marinas, kernel-team, Will Deacon, Mina Almasry,
	lokeshgidra, Frederic Weisbecker, linux-kernel, linux-arm-kernel,
	H. Peter Anvin, surenb, Dave Hansen, Brian Geffon,
	Borislav Petkov, Hassan Naveed, Stephen Boyd, John Hubbard,
	Kirill A. Shutemov, joelaf, Masahiro Yamada, Christian Brauner,
	Jason Gunthorpe, linux-mm, Ingo Molnar, Masami Hiramatsu,
	Steven Price, x86, Sandipan Das, Peter Zijlstra, Thomas Gleixner,
	Sami Tolvanen, Andrew Morton, minchan, Mark Brown, Arnd Bergmann,
	Gavin Shan, Josh Poimboeuf, Aneesh Kumar K.V, Jia He,
	Mike Rapoport, Shuah Khan, Anshuman Khandual, Ralph Campbell,
	SeongJae Park, Krzysztof Kozlowski, linux-kselftest, Kees Cook

On Wed, 14 Oct 2020 00:53:05 +0000, Kalesh Singh wrote:
> This is a repost of the mremap speed up patches, adding Kirill's
> Acked-by's (from a separate discussion). The previous versions are
> posted at:
> v1 - https://lore.kernel.org/r/20200930222130.4175584-1-kaleshsingh@google.com
> v2 - https://lore.kernel.org/r/20201002162101.665549-1-kaleshsingh@google.com
> v3 - http://lore.kernel.org/r/20201005154017.474722-1-kaleshsingh@google.com
> 
> [...]

Applied just the arm64 PMD patch to arm64 (for-next/core), thanks!

[1/1] arm64: mremap speedup - Enable HAVE_MOVE_PMD
      https://git.kernel.org/arm64/c/45544eee9606

Cheers,
-- 
Will

https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 3/5] mm: Speedup mremap on 1GB or larger regions
  2020-10-14  0:53 ` [PATCH v4 3/5] mm: Speedup mremap on 1GB or larger regions Kalesh Singh
@ 2020-12-17 17:28   ` Guenter Roeck
  2020-12-17 18:15     ` Kalesh Singh
  0 siblings, 1 reply; 12+ messages in thread
From: Guenter Roeck @ 2020-12-17 17:28 UTC (permalink / raw)
  To: Kalesh Singh
  Cc: surenb, minchan, joelaf, lokeshgidra, kernel-team,
	kernel test robot, Kirill A . Shutemov, Andrew Morton,
	Catalin Marinas, Will Deacon, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H. Peter Anvin, Shuah Khan, Peter Zijlstra,
	Aneesh Kumar K.V, Kees Cook, Masahiro Yamada, Arnd Bergmann,
	Sami Tolvanen, Josh Poimboeuf, Frederic Weisbecker,
	Krzysztof Kozlowski, Hassan Naveed, Christian Brauner,
	Stephen Boyd, Anshuman Khandual, Gavin Shan, Mike Rapoport,
	Steven Price, Jia He, John Hubbard, Jason Gunthorpe, Yang Shi,
	Mina Almasry, Ralph Campbell, Ram Pai, Sandipan Das, Dave Hansen,
	Brian Geffon, Masami Hiramatsu, Ira Weiny, SeongJae Park,
	linux-kernel, linux-arm-kernel, linux-mm, linux-kselftest

On Wed, Oct 14, 2020 at 12:53:08AM +0000, Kalesh Singh wrote:
> Android needs to move large memory regions for garbage collection.
> The GC requires moving physical pages of multi-gigabyte heap
> using mremap. During this move, the application threads have to
> be paused for correctness. It is critical to keep this pause as
> short as possible to avoid jitters during user interaction.
> 
> Optimize mremap for >= 1GB-sized regions by moving at the PUD/PGD
> level if the source and destination addresses are PUD-aligned.
> For CONFIG_PGTABLE_LEVELS == 3, moving at the PUD level in effect moves
> PGD entries, since the PUD entry is “folded back” onto the PGD entry.
> Add HAVE_MOVE_PUD so that architectures where moving at the PUD level
> isn't supported/tested can turn this off by not selecting the config.
> 
> Fix build test error from v1 of this series reported by
> kernel test robot in [1].
> 
> [1] https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org/thread/CKPGL4FH4NG7TGH2CVYX2UX76L25BTA3/
> 
> Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
> Reported-by: kernel test robot <lkp@intel.com>
> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>

I thought I reported it, but maybe I didn't. This patch causes all
'parisc' qemu emulations to fail. Typical log:

Freeing unused kernel memory: 604K
Write protected read-only-after-init data: 2k
rodata_test: end of .rodata is not page size aligned
Run /sbin/init as init process
process '/bin/busybox' started with executable stack
Failed to execute /sbin/init (error -12)

Reverting this patch fixes the problem.

Bisect log from linux-next below. The patch (and thus the problem)
are now in mainline Linux.

Guenter

---
# bad: [7bba37a1591369e2e506d599b8f5d7d0516b2dbc] Add linux-next specific files for 20201214
# good: [0477e92881850d44910a7e94fc2c46f96faa131f] Linux 5.10-rc7
git bisect start 'HEAD' 'v5.10-rc7'
# good: [fe5c40ab90a1f82ba97294637eaf875cfdd7a05f] Merge remote-tracking branch 'nand/nand/next'
git bisect good fe5c40ab90a1f82ba97294637eaf875cfdd7a05f
# good: [674a0d6de8bd290671f7dff405205871a70300b3] Merge remote-tracking branch 'spi/for-next'
git bisect good 674a0d6de8bd290671f7dff405205871a70300b3
# good: [8623dae312f73a2ea3230b1c648d3004cfc224ce] Merge remote-tracking branch 'vfio/next'
git bisect good 8623dae312f73a2ea3230b1c648d3004cfc224ce
# good: [dd26635f54bcd8e5d4e875a209f82a0423ba9c08] Merge remote-tracking branch 'gpio/for-next'
git bisect good dd26635f54bcd8e5d4e875a209f82a0423ba9c08
# good: [86e9c9a734889fe437442e0a35eb4c61d319cb47] Merge remote-tracking branch 'memblock/for-next'
git bisect good 86e9c9a734889fe437442e0a35eb4c61d319cb47
# bad: [3452331fda80b1cb5e121e6718ca6c07264382b2] userfaultfd: add user-mode only option to unprivileged_userfaultfd sysctl knob
git bisect bad 3452331fda80b1cb5e121e6718ca6c07264382b2
# bad: [19f468d54fcffc3f98b71e3e12ff23726767d953] mremap: don't allow MREMAP_DONTUNMAP on special_mappings and aio
git bisect bad 19f468d54fcffc3f98b71e3e12ff23726767d953
# good: [d89f3ababcac54493a4cb0582c61eb5f426b44e3] mm: remove pagevec_lookup_range_nr_tag()
git bisect good d89f3ababcac54493a4cb0582c61eb5f426b44e3
# good: [eba8373dcb40d30952f31d5fc0cff56b78f46273] mm/mlock: remove __munlock_isolate_lru_page()
git bisect good eba8373dcb40d30952f31d5fc0cff56b78f46273
# good: [8831d3f3564beba0f3f1b5291c88b35725bc45c9] xen/unpopulated-alloc: consolidate pgmap manipulation
git bisect good 8831d3f3564beba0f3f1b5291c88b35725bc45c9
# bad: [b8d53d70851821d8a2040ddca3aa6ee88fc8aaec] mm/page_vma_mapped.c: add colon to fix kernel-doc markups error for check_pte
git bisect bad b8d53d70851821d8a2040ddca3aa6ee88fc8aaec
# bad: [e77846c3da1862faa25c08e186a62b03e98c862f] x86: mremap speedup - Enable HAVE_MOVE_PUD
git bisect bad e77846c3da1862faa25c08e186a62b03e98c862f
# bad: [72ad8951bac1c559ea1b691a0b035fb339e4d71d] mm: speedup mremap on 1GB or larger regions
git bisect bad 72ad8951bac1c559ea1b691a0b035fb339e4d71d
# good: [fa94bfe31609787501a1ff8d7659ade5734ec4e5] kselftests: vm: add mremap tests
git bisect good fa94bfe31609787501a1ff8d7659ade5734ec4e5
# first bad commit: [72ad8951bac1c559ea1b691a0b035fb339e4d71d] mm: speedup mremap on 1GB or larger regions

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 3/5] mm: Speedup mremap on 1GB or larger regions
  2020-12-17 17:28   ` Guenter Roeck
@ 2020-12-17 18:15     ` Kalesh Singh
  0 siblings, 0 replies; 12+ messages in thread
From: Kalesh Singh @ 2020-12-17 18:15 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Suren Baghdasaryan, Minchan Kim, Joel Fernandes, Lokesh Gidra,
	Cc: Android Kernel, kernel test robot, Kirill A . Shutemov,
	Andrew Morton, Catalin Marinas, Will Deacon, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, the arch/x86 maintainers,
	H. Peter Anvin, Shuah Khan, Peter Zijlstra, Aneesh Kumar K.V,
	Kees Cook, Masahiro Yamada, Arnd Bergmann, Sami Tolvanen,
	Josh Poimboeuf, Frederic Weisbecker, Krzysztof Kozlowski,
	Hassan Naveed, Christian Brauner, Stephen Boyd,
	Anshuman Khandual, Gavin Shan, Mike Rapoport, Steven Price,
	Jia He, John Hubbard, Jason Gunthorpe, Yang Shi, Mina Almasry,
	Ralph Campbell, Ram Pai, Sandipan Das, Dave Hansen, Brian Geffon,
	Masami Hiramatsu, Ira Weiny, SeongJae Park, LKML,
	moderated list:ARM64 PORT (AARCH64 ARCHITECTURE),
	open list:MEMORY MANAGEMENT, open list:KERNEL SELFTEST FRAMEWORK

On Thu, Dec 17, 2020 at 12:28 PM Guenter Roeck <linux@roeck-us.net> wrote:
>
> On Wed, Oct 14, 2020 at 12:53:08AM +0000, Kalesh Singh wrote:
> > Android needs to move large memory regions for garbage collection.
> > The GC requires moving physical pages of multi-gigabyte heap
> > using mremap. During this move, the application threads have to
> > be paused for correctness. It is critical to keep this pause as
> > short as possible to avoid jitters during user interaction.
> >
> > Optimize mremap for >= 1GB-sized regions by moving at the PUD/PGD
> > level if the source and destination addresses are PUD-aligned.
> > For CONFIG_PGTABLE_LEVELS == 3, moving at the PUD level in effect moves
> > PGD entries, since the PUD entry is “folded back” onto the PGD entry.
> > Add HAVE_MOVE_PUD so that architectures where moving at the PUD level
> > isn't supported/tested can turn this off by not selecting the config.
> >
> > Fix build test error from v1 of this series reported by
> > kernel test robot in [1].
> >
> > [1] https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org/thread/CKPGL4FH4NG7TGH2CVYX2UX76L25BTA3/
> >
> > Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
> > Reported-by: kernel test robot <lkp@intel.com>
> > Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
>
> I thought I reported it, but maybe I didn't. This patch causes all
> 'parisc' qemu emulations to fail. Typical log:
>
> Freeing unused kernel memory: 604K
> Write protected read-only-after-init data: 2k
> rodata_test: end of .rodata is not page size aligned
> Run /sbin/init as init process
> process '/bin/busybox' started with executable stack
> Failed to execute /sbin/init (error -12)
>
> Reverting this patch fixes the problem.
>
> Bisect log from linux-next below. The patch (and thus the problem)
> are now in mainline Linux.

Hi Guenter,

Thanks for reporting. We enabled this on x86 and arm64. Investigating
what the root cause is now.

Kalesh
>
> Guenter
>
> ---
> # bad: [7bba37a1591369e2e506d599b8f5d7d0516b2dbc] Add linux-next specific files for 20201214
> # good: [0477e92881850d44910a7e94fc2c46f96faa131f] Linux 5.10-rc7
> git bisect start 'HEAD' 'v5.10-rc7'
> # good: [fe5c40ab90a1f82ba97294637eaf875cfdd7a05f] Merge remote-tracking branch 'nand/nand/next'
> git bisect good fe5c40ab90a1f82ba97294637eaf875cfdd7a05f
> # good: [674a0d6de8bd290671f7dff405205871a70300b3] Merge remote-tracking branch 'spi/for-next'
> git bisect good 674a0d6de8bd290671f7dff405205871a70300b3
> # good: [8623dae312f73a2ea3230b1c648d3004cfc224ce] Merge remote-tracking branch 'vfio/next'
> git bisect good 8623dae312f73a2ea3230b1c648d3004cfc224ce
> # good: [dd26635f54bcd8e5d4e875a209f82a0423ba9c08] Merge remote-tracking branch 'gpio/for-next'
> git bisect good dd26635f54bcd8e5d4e875a209f82a0423ba9c08
> # good: [86e9c9a734889fe437442e0a35eb4c61d319cb47] Merge remote-tracking branch 'memblock/for-next'
> git bisect good 86e9c9a734889fe437442e0a35eb4c61d319cb47
> # bad: [3452331fda80b1cb5e121e6718ca6c07264382b2] userfaultfd: add user-mode only option to unprivileged_userfaultfd sysctl knob
> git bisect bad 3452331fda80b1cb5e121e6718ca6c07264382b2
> # bad: [19f468d54fcffc3f98b71e3e12ff23726767d953] mremap: don't allow MREMAP_DONTUNMAP on special_mappings and aio
> git bisect bad 19f468d54fcffc3f98b71e3e12ff23726767d953
> # good: [d89f3ababcac54493a4cb0582c61eb5f426b44e3] mm: remove pagevec_lookup_range_nr_tag()
> git bisect good d89f3ababcac54493a4cb0582c61eb5f426b44e3
> # good: [eba8373dcb40d30952f31d5fc0cff56b78f46273] mm/mlock: remove __munlock_isolate_lru_page()
> git bisect good eba8373dcb40d30952f31d5fc0cff56b78f46273
> # good: [8831d3f3564beba0f3f1b5291c88b35725bc45c9] xen/unpopulated-alloc: consolidate pgmap manipulation
> git bisect good 8831d3f3564beba0f3f1b5291c88b35725bc45c9
> # bad: [b8d53d70851821d8a2040ddca3aa6ee88fc8aaec] mm/page_vma_mapped.c: add colon to fix kernel-doc markups error for check_pte
> git bisect bad b8d53d70851821d8a2040ddca3aa6ee88fc8aaec
> # bad: [e77846c3da1862faa25c08e186a62b03e98c862f] x86: mremap speedup - Enable HAVE_MOVE_PUD
> git bisect bad e77846c3da1862faa25c08e186a62b03e98c862f
> # bad: [72ad8951bac1c559ea1b691a0b035fb339e4d71d] mm: speedup mremap on 1GB or larger regions
> git bisect bad 72ad8951bac1c559ea1b691a0b035fb339e4d71d
> # good: [fa94bfe31609787501a1ff8d7659ade5734ec4e5] kselftests: vm: add mremap tests
> git bisect good fa94bfe31609787501a1ff8d7659ade5734ec4e5
> # first bad commit: [72ad8951bac1c559ea1b691a0b035fb339e4d71d] mm: speedup mremap on 1GB or larger regions
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2020-12-17 18:17 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-14  0:53 [PATCH v4 0/5] Speed up mremap on large regions Kalesh Singh
2020-10-14  0:53 ` [PATCH v4 1/5] kselftests: vm: Add mremap tests Kalesh Singh
2020-10-14 19:02   ` Kalesh Singh
2020-10-14  0:53 ` [PATCH v4 2/5] arm64: mremap speedup - Enable HAVE_MOVE_PMD Kalesh Singh
2020-10-15 10:55   ` Will Deacon
2020-10-14  0:53 ` [PATCH v4 3/5] mm: Speedup mremap on 1GB or larger regions Kalesh Singh
2020-12-17 17:28   ` Guenter Roeck
2020-12-17 18:15     ` Kalesh Singh
2020-10-14  0:53 ` [PATCH v4 4/5] arm64: mremap speedup - Enable HAVE_MOVE_PUD Kalesh Singh
2020-10-14  0:53 ` [PATCH v4 5/5] x86: " Kalesh Singh
2020-10-14 15:53   ` Ingo Molnar
2020-10-15 20:40 ` [PATCH v4 0/5] Speed up mremap on large regions Will Deacon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).