linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/6] Optimize mremap during mutual alignment within PMD
@ 2023-05-24 15:32 Joel Fernandes (Google)
  2023-05-24 15:32 ` [PATCH v3 1/6] mm/mremap: Optimize the start addresses in move_page_tables() Joel Fernandes (Google)
                   ` (5 more replies)
  0 siblings, 6 replies; 9+ messages in thread
From: Joel Fernandes (Google) @ 2023-05-24 15:32 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google),
	linux-kselftest, linux-mm, Shuah Khan, Vlastimil Babka,
	Michal Hocko, Linus Torvalds, Lorenzo Stoakes, Kirill A Shutemov,
	Liam R. Howlett, Paul E. McKenney, Suren Baghdasaryan,
	Kalesh Singh, Lokesh Gidra

Hello!

Here is v3 of the mremap start address optimization / fix for exec warning.

The main changes are:
1. Care to be taken to move purely within a VMA, in other words this check
   in call_align_down():
    if (vma->vm_start <= addr_masked)
            return false;

    As an example of why this is needed:
    Consider the following range which is 2MB aligned and is
    a part of a larger 10MB range which is not shown. Each
    character is 256KB below making the source and destination
    2MB each. The lower case letters are moved (s to d) and the
    upper case letters are not moved.

    |DDDDddddSSSSssss|

    If we align down 'ssss' to start from the 'SSSS', we will end up destroying
    SSSS. The above if statement prevents that and I verified it.

    I also added a test for this in the last patch.

2. Handle the stack case separately. We do not care about #1 for stack movement
   because the 'SSSS' does not matter during this move. Further we need to do this
   to prevent the stack move warning.

    if (!for_stack && vma->vm_start <= addr_masked)
            return false;

History of patches
==================
v2->v3:
1. Masked address was stored in int, fixed it to unsigned long to avoid truncation.
2. We now handle moves happening purely within a VMA, a new test is added to handle this.
3. More code comments.

v1->v2:
1. Trigger the optimization for mremaps smaller than a PMD. I tested by tracing
that it works correctly.

2. Fix issue with bogus return value found by Linus if we broke out of the
above loop for the first PMD itself.

v1: Initial RFC.

Description of patches
======================
These patches optimizes the start addresses in move_page_tables() and tests the
changes. It addresses a warning [1] that occurs due to a downward, overlapping
move on a mutually-aligned offset within a PMD during exec. By initiating the
copy process at the PMD level when such alignment is present, we can prevent
this warning and speed up the copying process at the same time. Linus Torvalds
suggested this idea.

Please check the individual patches for more details.

thanks,

 - Joel

[1] https://lore.kernel.org/all/ZB2GTBD%2FLWTrkOiO@dhcp22.suse.cz/

Joel Fernandes (Google) (6):
mm/mremap: Optimize the start addresses in move_page_tables()
mm/mremap: Allow moves within the same VMA
selftests: mm: Fix failure case when new remap region was not found
selftests: mm: Add a test for mutually aligned moves > PMD size
selftests: mm: Add a test for remapping to area immediately after
existing mapping
selftests: mm: Add a test for remapping within a range

fs/exec.c                                |   2 +-
include/linux/mm.h                       |   2 +-
mm/mremap.c                              |  69 ++++++++++-
tools/testing/selftests/mm/mremap_test.c | 148 +++++++++++++++++++++--
4 files changed, 209 insertions(+), 12 deletions(-)

--
2.40.1.698.g37aff9b760-goog



^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v3 1/6] mm/mremap: Optimize the start addresses in move_page_tables()
  2023-05-24 15:32 [PATCH v3 0/6] Optimize mremap during mutual alignment within PMD Joel Fernandes (Google)
@ 2023-05-24 15:32 ` Joel Fernandes (Google)
  2023-05-24 23:23   ` Linus Torvalds
  2023-05-24 15:32 ` [PATCH v3 2/6] mm/mremap: Allow moves within the same VMA Joel Fernandes (Google)
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 9+ messages in thread
From: Joel Fernandes (Google) @ 2023-05-24 15:32 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google),
	Linus Torvalds, linux-kselftest, linux-mm, Shuah Khan,
	Vlastimil Babka, Michal Hocko, Lorenzo Stoakes,
	Kirill A Shutemov, Liam R. Howlett, Paul E. McKenney,
	Suren Baghdasaryan, Kalesh Singh, Lokesh Gidra

Recently, we see reports [1] of a warning that triggers due to
move_page_tables() doing a downward and overlapping move on a
mutually-aligned offset within a PMD. By mutual alignment, I
mean the source and destination addresses of the mremap are at
the same offset within a PMD.

This mutual alignment along with the fact that the move is downward is
sufficient to cause a warning related to having an allocated PMD that
does not have PTEs in it.

This warning will only trigger when there is mutual alignment in the
move operation. A solution, as suggested by Linus Torvalds [2], is to
initiate the copy process at the PMD level whenever such alignment is
present. Implementing this approach will not only prevent the warning
from being triggered, but it will also optimize the operation as this
method should enhance the speed of the copy process whenever there's a
possibility to start copying at the PMD level.

Some more points:
a. The optimization can be done only when both the source and
destination of the mremap do not have anything mapped below it up to a
PMD boundary. I add support to detect that.

b. #1 is not a problem for the call to move_page_tables() from exec.c as
nothing is expected to be mapped below the source. However, for
non-overlapping mutually aligned moves as triggered by mremap(2), I
added support for checking such cases.

c. I currently only optimize for PMD moves, in the future I/we can build
on this work and do PUD moves as well if there is a need for this. But I
want to take it one step at a time.

d. We need to be careful about mremap of ranges within the VMA itself.
For this purpose, I added checks to determine if the address after
alignment falls within its VMA itself.

[1] https://lore.kernel.org/all/ZB2GTBD%2FLWTrkOiO@dhcp22.suse.cz/
[2] https://lore.kernel.org/all/CAHk-=whd7msp8reJPfeGNyt0LiySMT0egExx3TVZSX3Ok6X=9g@mail.gmail.com/

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 mm/mremap.c | 63 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 63 insertions(+)

diff --git a/mm/mremap.c b/mm/mremap.c
index 411a85682b58..184d52f83b19 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -478,6 +478,53 @@ static bool move_pgt_entry(enum pgt_entry entry, struct vm_area_struct *vma,
 	return moved;
 }
 
+/*
+ * A helper to check if a previous mapping exists. Required for
+ * move_page_tables() and realign_addr() to determine if a previous mapping
+ * exists before we can do realignment optimizations.
+ */
+static bool can_align_down(struct vm_area_struct *vma, unsigned long addr_to_align,
+			       unsigned long mask)
+{
+	unsigned long addr_masked = addr_to_align & mask;
+	struct vm_area_struct *prev = NULL, *cur = NULL;
+
+	/* If the masked address is within vma, we cannot align the address down. */
+	if (vma->vm_start <= addr_masked)
+		return false;
+
+	/*
+	 * Attempt to find VMA before prev that contains the address.
+	 * On any issue finding prev, assume there is a mapping and return false
+	 * which will turn off any optimizations. Yes, we're conservative!
+	 * The mmap write lock is held here, so the lookup is safe.
+	 */
+	cur = find_vma_prev(vma->vm_mm, vma->vm_start, &prev);
+	if (!cur || cur != vma || !prev)
+		return false;
+
+	/* The masked address fell within some previous mapping. */
+	if (prev->vm_end > addr_masked)
+		return false;
+
+	return true;
+}
+
+/* Opportunistically realign to specified boundary for faster copy. */
+static void realign_addr(unsigned long *old_addr, struct vm_area_struct *old_vma,
+			 unsigned long *new_addr, struct vm_area_struct *new_vma,
+			 unsigned long mask)
+{
+	bool mutually_aligned = (*old_addr & ~mask) == (*new_addr & ~mask);
+
+	if ((*old_addr & ~mask) && mutually_aligned
+	    && can_align_down(old_vma, *old_addr, mask)
+	    && can_align_down(new_vma, *new_addr, mask)) {
+		*old_addr = *old_addr & mask;
+		*new_addr = *new_addr & mask;
+	}
+}
+
 unsigned long move_page_tables(struct vm_area_struct *vma,
 		unsigned long old_addr, struct vm_area_struct *new_vma,
 		unsigned long new_addr, unsigned long len,
@@ -493,6 +540,15 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
 
 	old_end = old_addr + len;
 
+	/*
+	 * If possible, realign addresses to PMD boundary for faster copy.
+	 * Don't align for intra-VMA moves as we may destroy existing mappings.
+	 */
+	if ((vma != new_vma)
+		&& (len >= PMD_SIZE - (old_addr & ~PMD_MASK))) {
+		realign_addr(&old_addr, vma, &new_addr, new_vma, PMD_MASK);
+	}
+
 	if (is_vm_hugetlb_page(vma))
 		return move_hugetlb_page_tables(vma, new_vma, old_addr,
 						new_addr, len);
@@ -565,6 +621,13 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
 
 	mmu_notifier_invalidate_range_end(&range);
 
+	/*
+	 * Prevent negative return values when {old,new}_addr was realigned
+	 * but we broke out of the above loop for the first PMD itself.
+	 */
+	if (len + old_addr < old_end)
+		return 0;
+
 	return len + old_addr - old_end;	/* how much done */
 }
 
-- 
2.40.1.698.g37aff9b760-goog



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v3 2/6] mm/mremap: Allow moves within the same VMA
  2023-05-24 15:32 [PATCH v3 0/6] Optimize mremap during mutual alignment within PMD Joel Fernandes (Google)
  2023-05-24 15:32 ` [PATCH v3 1/6] mm/mremap: Optimize the start addresses in move_page_tables() Joel Fernandes (Google)
@ 2023-05-24 15:32 ` Joel Fernandes (Google)
  2023-05-24 15:32 ` [PATCH v3 3/6] selftests: mm: Fix failure case when new remap region was not found Joel Fernandes (Google)
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Joel Fernandes (Google) @ 2023-05-24 15:32 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google),
	linux-kselftest, linux-mm, Shuah Khan, Vlastimil Babka,
	Michal Hocko, Linus Torvalds, Lorenzo Stoakes, Kirill A Shutemov,
	Liam R. Howlett, Paul E. McKenney, Suren Baghdasaryan,
	Kalesh Singh, Lokesh Gidra

For the stack move happening in shift_arg_pages(), the move is happening
within the same VMA which spans the old and new ranges.

In case the aligned address happens to fall within that VMA, allow such
moves and don't abort the optimization.

In the mremap case, we cannot allow any such moves as will end up
destroying some part of the mapping (either the source of the move, or
part of the existing mapping). So just avoid it for mremap.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 fs/exec.c          |  2 +-
 include/linux/mm.h |  2 +-
 mm/mremap.c        | 40 ++++++++++++++++++++--------------------
 3 files changed, 22 insertions(+), 22 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 7c44d0c65b1b..7a7217353115 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -707,7 +707,7 @@ static int shift_arg_pages(struct vm_area_struct *vma, unsigned long shift)
 	 * process cleanup to remove whatever mess we made.
 	 */
 	if (length != move_page_tables(vma, old_start,
-				       vma, new_start, length, false))
+				       vma, new_start, length, false, true))
 		return -ENOMEM;
 
 	lru_add_drain();
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 1f79667824eb..dd415cd2493d 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2265,7 +2265,7 @@ int get_cmdline(struct task_struct *task, char *buffer, int buflen);
 extern unsigned long move_page_tables(struct vm_area_struct *vma,
 		unsigned long old_addr, struct vm_area_struct *new_vma,
 		unsigned long new_addr, unsigned long len,
-		bool need_rmap_locks);
+		bool need_rmap_locks, bool for_stack);
 
 /*
  * Flags used by change_protection().  For now we make it a bitmap so
diff --git a/mm/mremap.c b/mm/mremap.c
index 184d52f83b19..323c3b94216f 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -479,18 +479,23 @@ static bool move_pgt_entry(enum pgt_entry entry, struct vm_area_struct *vma,
 }
 
 /*
- * A helper to check if a previous mapping exists. Required for
- * move_page_tables() and realign_addr() to determine if a previous mapping
- * exists before we can do realignment optimizations.
+ * A helper to check if aligning down is OK. The aligned address should fall
+ * on *no mapping*. For the stack moving down, that's a special move within
+ * the VMA that is created to span the source and destination of the move,
+ * so we make an exception for it.
  */
 static bool can_align_down(struct vm_area_struct *vma, unsigned long addr_to_align,
-			       unsigned long mask)
+			    unsigned long mask, bool for_stack)
 {
 	unsigned long addr_masked = addr_to_align & mask;
 	struct vm_area_struct *prev = NULL, *cur = NULL;
 
-	/* If the masked address is within vma, we cannot align the address down. */
-	if (vma->vm_start <= addr_masked)
+	/*
+	 * Other than for stack moves, if the alignment causes the address to be within
+	 * its own @vma, we can't align down or we will destroy the current mapping.
+	 * In other words for non-stack moves, the masked addr has to fall on no mapping.
+	 */
+	if (!for_stack && vma->vm_start <= addr_masked)
 		return false;
 
 	/*
@@ -513,13 +518,13 @@ static bool can_align_down(struct vm_area_struct *vma, unsigned long addr_to_ali
 /* Opportunistically realign to specified boundary for faster copy. */
 static void realign_addr(unsigned long *old_addr, struct vm_area_struct *old_vma,
 			 unsigned long *new_addr, struct vm_area_struct *new_vma,
-			 unsigned long mask)
+			 unsigned long mask, bool for_stack)
 {
 	bool mutually_aligned = (*old_addr & ~mask) == (*new_addr & ~mask);
 
 	if ((*old_addr & ~mask) && mutually_aligned
-	    && can_align_down(old_vma, *old_addr, mask)
-	    && can_align_down(new_vma, *new_addr, mask)) {
+	    && can_align_down(old_vma, *old_addr, mask, for_stack)
+	    && can_align_down(new_vma, *new_addr, mask, for_stack)) {
 		*old_addr = *old_addr & mask;
 		*new_addr = *new_addr & mask;
 	}
@@ -528,7 +533,7 @@ static void realign_addr(unsigned long *old_addr, struct vm_area_struct *old_vma
 unsigned long move_page_tables(struct vm_area_struct *vma,
 		unsigned long old_addr, struct vm_area_struct *new_vma,
 		unsigned long new_addr, unsigned long len,
-		bool need_rmap_locks)
+		bool need_rmap_locks, bool for_stack)
 {
 	unsigned long extent, old_end;
 	struct mmu_notifier_range range;
@@ -540,14 +545,9 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
 
 	old_end = old_addr + len;
 
-	/*
-	 * If possible, realign addresses to PMD boundary for faster copy.
-	 * Don't align for intra-VMA moves as we may destroy existing mappings.
-	 */
-	if ((vma != new_vma)
-		&& (len >= PMD_SIZE - (old_addr & ~PMD_MASK))) {
-		realign_addr(&old_addr, vma, &new_addr, new_vma, PMD_MASK);
-	}
+	/* If possible, realign addresses to PMD boundary for faster copy. */
+	if (len >= PMD_SIZE - (old_addr & ~PMD_MASK))
+		realign_addr(&old_addr, vma, &new_addr, new_vma, PMD_MASK, for_stack);
 
 	if (is_vm_hugetlb_page(vma))
 		return move_hugetlb_page_tables(vma, new_vma, old_addr,
@@ -696,7 +696,7 @@ static unsigned long move_vma(struct vm_area_struct *vma,
 	}
 
 	moved_len = move_page_tables(vma, old_addr, new_vma, new_addr, old_len,
-				     need_rmap_locks);
+				     need_rmap_locks, false);
 	if (moved_len < old_len) {
 		err = -ENOMEM;
 	} else if (vma->vm_ops && vma->vm_ops->mremap) {
@@ -710,7 +710,7 @@ static unsigned long move_vma(struct vm_area_struct *vma,
 		 * and then proceed to unmap new area instead of old.
 		 */
 		move_page_tables(new_vma, new_addr, vma, old_addr, moved_len,
-				 true);
+				 true, false);
 		vma = new_vma;
 		old_len = new_len;
 		old_addr = new_addr;
-- 
2.40.1.698.g37aff9b760-goog



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v3 3/6] selftests: mm: Fix failure case when new remap region was not found
  2023-05-24 15:32 [PATCH v3 0/6] Optimize mremap during mutual alignment within PMD Joel Fernandes (Google)
  2023-05-24 15:32 ` [PATCH v3 1/6] mm/mremap: Optimize the start addresses in move_page_tables() Joel Fernandes (Google)
  2023-05-24 15:32 ` [PATCH v3 2/6] mm/mremap: Allow moves within the same VMA Joel Fernandes (Google)
@ 2023-05-24 15:32 ` Joel Fernandes (Google)
  2023-05-24 15:32 ` [PATCH v3 4/6] selftests: mm: Add a test for mutually aligned moves > PMD size Joel Fernandes (Google)
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Joel Fernandes (Google) @ 2023-05-24 15:32 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google),
	linux-kselftest, linux-mm, Shuah Khan, Vlastimil Babka,
	Michal Hocko, Linus Torvalds, Lorenzo Stoakes, Kirill A Shutemov,
	Liam R. Howlett, Paul E. McKenney, Suren Baghdasaryan,
	Kalesh Singh, Lokesh Gidra

When a valid remap region could not be found, the source mapping is not
cleaned up. Fix the goto statement such that the clean up happens.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 tools/testing/selftests/mm/mremap_test.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/mm/mremap_test.c b/tools/testing/selftests/mm/mremap_test.c
index 5c3773de9f0f..6822d657f589 100644
--- a/tools/testing/selftests/mm/mremap_test.c
+++ b/tools/testing/selftests/mm/mremap_test.c
@@ -316,7 +316,7 @@ static long long remap_region(struct config c, unsigned int threshold_mb,
 		if (addr + c.dest_alignment < addr) {
 			ksft_print_msg("Couldn't find a valid region to remap to\n");
 			ret = -1;
-			goto out;
+			goto clean_up_src;
 		}
 		addr += c.dest_alignment;
 	}
-- 
2.40.1.698.g37aff9b760-goog



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v3 4/6] selftests: mm: Add a test for mutually aligned moves > PMD size
  2023-05-24 15:32 [PATCH v3 0/6] Optimize mremap during mutual alignment within PMD Joel Fernandes (Google)
                   ` (2 preceding siblings ...)
  2023-05-24 15:32 ` [PATCH v3 3/6] selftests: mm: Fix failure case when new remap region was not found Joel Fernandes (Google)
@ 2023-05-24 15:32 ` Joel Fernandes (Google)
  2023-05-24 15:32 ` [PATCH v3 5/6] selftests: mm: Add a test for remapping to area immediately after existing mapping Joel Fernandes (Google)
  2023-05-24 15:32 ` [PATCH v3 6/6] selftests: mm: Add a test for remapping within a range Joel Fernandes (Google)
  5 siblings, 0 replies; 9+ messages in thread
From: Joel Fernandes (Google) @ 2023-05-24 15:32 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google),
	linux-kselftest, linux-mm, Shuah Khan, Vlastimil Babka,
	Michal Hocko, Linus Torvalds, Lorenzo Stoakes, Kirill A Shutemov,
	Liam R. Howlett, Paul E. McKenney, Suren Baghdasaryan,
	Kalesh Singh, Lokesh Gidra

This patch adds a test case to check if a PMD-alignment optimization
successfully happens.

I add support to make sure there is some room before the source mapping,
otherwise the optimization to trigger PMD-aligned move will be disabled
as the kernel will detect that a mapping before the source exists and
such optimization becomes impossible.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 tools/testing/selftests/mm/mremap_test.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/mm/mremap_test.c b/tools/testing/selftests/mm/mremap_test.c
index 6822d657f589..6304eb0947a3 100644
--- a/tools/testing/selftests/mm/mremap_test.c
+++ b/tools/testing/selftests/mm/mremap_test.c
@@ -44,6 +44,7 @@ enum {
 	_1MB = 1ULL << 20,
 	_2MB = 2ULL << 20,
 	_4MB = 4ULL << 20,
+	_5MB = 5ULL << 20,
 	_1GB = 1ULL << 30,
 	_2GB = 2ULL << 30,
 	PMD = _2MB,
@@ -235,6 +236,11 @@ static void *get_source_mapping(struct config c)
 	unsigned long long mmap_min_addr;
 
 	mmap_min_addr = get_mmap_min_addr();
+	/*
+	 * For some tests, we need to not have any mappings below the
+	 * source mapping. Add some headroom to mmap_min_addr for this.
+	 */
+	mmap_min_addr += 10 * _4MB;
 
 retry:
 	addr += c.src_alignment;
@@ -434,7 +440,7 @@ static int parse_args(int argc, char **argv, unsigned int *threshold_mb,
 	return 0;
 }
 
-#define MAX_TEST 13
+#define MAX_TEST 14
 #define MAX_PERF_TEST 3
 int main(int argc, char **argv)
 {
@@ -500,6 +506,10 @@ int main(int argc, char **argv)
 	test_cases[12] = MAKE_TEST(PUD, PUD, _2GB, NON_OVERLAPPING, EXPECT_SUCCESS,
 				   "2GB mremap - Source PUD-aligned, Destination PUD-aligned");
 
+	/* Src and Dest addr 1MB aligned. 5MB mremap. */
+	test_cases[13] = MAKE_TEST(_1MB, _1MB, _5MB, NON_OVERLAPPING, EXPECT_SUCCESS,
+				  "5MB mremap - Source 1MB-aligned, Destination 1MB-aligned");
+
 	perf_test_cases[0] =  MAKE_TEST(page_size, page_size, _1GB, NON_OVERLAPPING, EXPECT_SUCCESS,
 					"1GB mremap - Source PTE-aligned, Destination PTE-aligned");
 	/*
-- 
2.40.1.698.g37aff9b760-goog



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v3 5/6] selftests: mm: Add a test for remapping to area immediately after existing mapping
  2023-05-24 15:32 [PATCH v3 0/6] Optimize mremap during mutual alignment within PMD Joel Fernandes (Google)
                   ` (3 preceding siblings ...)
  2023-05-24 15:32 ` [PATCH v3 4/6] selftests: mm: Add a test for mutually aligned moves > PMD size Joel Fernandes (Google)
@ 2023-05-24 15:32 ` Joel Fernandes (Google)
  2023-05-24 15:32 ` [PATCH v3 6/6] selftests: mm: Add a test for remapping within a range Joel Fernandes (Google)
  5 siblings, 0 replies; 9+ messages in thread
From: Joel Fernandes (Google) @ 2023-05-24 15:32 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google),
	linux-kselftest, linux-mm, Shuah Khan, Vlastimil Babka,
	Michal Hocko, Linus Torvalds, Lorenzo Stoakes, Kirill A Shutemov,
	Liam R. Howlett, Paul E. McKenney, Suren Baghdasaryan,
	Kalesh Singh, Lokesh Gidra

This patch adds support for verifying that we correctly handle the
situation where something is already mapped before the destination of the remap.

Any realignment of destination address and PMD-copy will destroy that
existing mapping. In such cases, we need to avoid doing the optimization.

To test this, we map an area called the preamble before the remap
region. Then we verify after the mremap operation that this region did not get
corrupted.

Putting some prints in the kernel, I verified that we optimize
correctly in different situations:

Optimize when there is alignment and no previous mapping (this is tested
by previous patch).
<prints>
can_align_down(old_vma->vm_start=2900000, old_addr=2900000, mask=-2097152): 0
can_align_down(new_vma->vm_start=2f00000, new_addr=2f00000, mask=-2097152): 0
=== Starting move_page_tables ===
Doing PUD move for 2800000 -> 2e00000 of extent=200000 <-- Optimization
Doing PUD move for 2a00000 -> 3000000 of extent=200000
Doing PUD move for 2c00000 -> 3200000 of extent=200000
</prints>

Don't optimize when there is alignment but there is previous mapping
(this is tested by this patch).
Notice that can_align_down() returns 1 for the destination mapping
as we detected there is something there.
<prints>
can_align_down(old_vma->vm_start=2900000, old_addr=2900000, mask=-2097152): 0
can_align_down(new_vma->vm_start=5700000, new_addr=5700000, mask=-2097152): 1
=== Starting move_page_tables ===
Doing move_ptes for 2900000 -> 5700000 of extent=100000 <-- Unoptimized
Doing PUD move for 2a00000 -> 5800000 of extent=200000
Doing PUD move for 2c00000 -> 5a00000 of extent=200000
</prints>

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 tools/testing/selftests/mm/mremap_test.c | 57 +++++++++++++++++++++---
 1 file changed, 52 insertions(+), 5 deletions(-)

diff --git a/tools/testing/selftests/mm/mremap_test.c b/tools/testing/selftests/mm/mremap_test.c
index 6304eb0947a3..d7366074e2a8 100644
--- a/tools/testing/selftests/mm/mremap_test.c
+++ b/tools/testing/selftests/mm/mremap_test.c
@@ -29,6 +29,7 @@ struct config {
 	unsigned long long dest_alignment;
 	unsigned long long region_size;
 	int overlapping;
+	int dest_preamble_size;
 };
 
 struct test {
@@ -283,7 +284,7 @@ static void *get_source_mapping(struct config c)
 static long long remap_region(struct config c, unsigned int threshold_mb,
 			      char pattern_seed)
 {
-	void *addr, *src_addr, *dest_addr;
+	void *addr, *src_addr, *dest_addr, *dest_preamble_addr;
 	unsigned long long i;
 	struct timespec t_start = {0, 0}, t_end = {0, 0};
 	long long  start_ns, end_ns, align_mask, ret, offset;
@@ -300,7 +301,7 @@ static long long remap_region(struct config c, unsigned int threshold_mb,
 		goto out;
 	}
 
-	/* Set byte pattern */
+	/* Set byte pattern for source block. */
 	srand(pattern_seed);
 	for (i = 0; i < threshold; i++)
 		memset((char *) src_addr + i, (char) rand(), 1);
@@ -312,6 +313,9 @@ static long long remap_region(struct config c, unsigned int threshold_mb,
 	addr = (void *) (((unsigned long long) src_addr + c.region_size
 			  + offset) & align_mask);
 
+	/* Remap after the destination block preamble. */
+	addr += c.dest_preamble_size;
+
 	/* See comment in get_source_mapping() */
 	if (!((unsigned long long) addr & c.dest_alignment))
 		addr = (void *) ((unsigned long long) addr | c.dest_alignment);
@@ -327,6 +331,24 @@ static long long remap_region(struct config c, unsigned int threshold_mb,
 		addr += c.dest_alignment;
 	}
 
+	if (c.dest_preamble_size) {
+		dest_preamble_addr = mmap((void *) addr - c.dest_preamble_size, c.dest_preamble_size,
+					  PROT_READ | PROT_WRITE,
+					  MAP_FIXED_NOREPLACE | MAP_ANONYMOUS | MAP_SHARED,
+							-1, 0);
+		if (dest_preamble_addr == MAP_FAILED) {
+			ksft_print_msg("Failed to map dest preamble region: %s\n",
+					strerror(errno));
+			ret = -1;
+			goto clean_up_src;
+		}
+
+		/* Set byte pattern for the dest preamble block. */
+		srand(pattern_seed);
+		for (i = 0; i < c.dest_preamble_size; i++)
+			memset((char *) dest_preamble_addr + i, (char) rand(), 1);
+	}
+
 	clock_gettime(CLOCK_MONOTONIC, &t_start);
 	dest_addr = mremap(src_addr, c.region_size, c.region_size,
 					  MREMAP_MAYMOVE|MREMAP_FIXED, (char *) addr);
@@ -335,7 +357,7 @@ static long long remap_region(struct config c, unsigned int threshold_mb,
 	if (dest_addr == MAP_FAILED) {
 		ksft_print_msg("mremap failed: %s\n", strerror(errno));
 		ret = -1;
-		goto clean_up_src;
+		goto clean_up_dest_preamble;
 	}
 
 	/* Verify byte pattern after remapping */
@@ -353,6 +375,23 @@ static long long remap_region(struct config c, unsigned int threshold_mb,
 		}
 	}
 
+	/* Verify the dest preamble byte pattern after remapping */
+	if (c.dest_preamble_size) {
+		srand(pattern_seed);
+		for (i = 0; i < c.dest_preamble_size; i++) {
+			char c = (char) rand();
+
+			if (((char *) dest_preamble_addr)[i] != c) {
+				ksft_print_msg("Preamble data after remap doesn't match at offset %d\n",
+					       i);
+				ksft_print_msg("Expected: %#x\t Got: %#x\n", c & 0xff,
+					       ((char *) dest_preamble_addr)[i] & 0xff);
+				ret = -1;
+				goto clean_up_dest;
+			}
+		}
+	}
+
 	start_ns = t_start.tv_sec * NS_PER_SEC + t_start.tv_nsec;
 	end_ns = t_end.tv_sec * NS_PER_SEC + t_end.tv_nsec;
 	ret = end_ns - start_ns;
@@ -365,6 +404,9 @@ static long long remap_region(struct config c, unsigned int threshold_mb,
  */
 clean_up_dest:
 	munmap(dest_addr, c.region_size);
+clean_up_dest_preamble:
+	if (c.dest_preamble_size && dest_preamble_addr)
+		munmap(dest_preamble_addr, c.dest_preamble_size);
 clean_up_src:
 	munmap(src_addr, c.region_size);
 out:
@@ -440,7 +482,7 @@ static int parse_args(int argc, char **argv, unsigned int *threshold_mb,
 	return 0;
 }
 
-#define MAX_TEST 14
+#define MAX_TEST 15
 #define MAX_PERF_TEST 3
 int main(int argc, char **argv)
 {
@@ -449,7 +491,7 @@ int main(int argc, char **argv)
 	unsigned int threshold_mb = VALIDATION_DEFAULT_THRESHOLD;
 	unsigned int pattern_seed;
 	int num_expand_tests = 2;
-	struct test test_cases[MAX_TEST];
+	struct test test_cases[MAX_TEST] = {};
 	struct test perf_test_cases[MAX_PERF_TEST];
 	int page_size;
 	time_t t;
@@ -510,6 +552,11 @@ int main(int argc, char **argv)
 	test_cases[13] = MAKE_TEST(_1MB, _1MB, _5MB, NON_OVERLAPPING, EXPECT_SUCCESS,
 				  "5MB mremap - Source 1MB-aligned, Destination 1MB-aligned");
 
+	/* Src and Dest addr 1MB aligned. 5MB mremap. */
+	test_cases[14] = MAKE_TEST(_1MB, _1MB, _5MB, NON_OVERLAPPING, EXPECT_SUCCESS,
+				  "5MB mremap - Source 1MB-aligned, Dest 1MB-aligned with 40MB Preamble");
+	test_cases[14].config.dest_preamble_size = 10 * _4MB;
+
 	perf_test_cases[0] =  MAKE_TEST(page_size, page_size, _1GB, NON_OVERLAPPING, EXPECT_SUCCESS,
 					"1GB mremap - Source PTE-aligned, Destination PTE-aligned");
 	/*
-- 
2.40.1.698.g37aff9b760-goog



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v3 6/6] selftests: mm: Add a test for remapping within a range
  2023-05-24 15:32 [PATCH v3 0/6] Optimize mremap during mutual alignment within PMD Joel Fernandes (Google)
                   ` (4 preceding siblings ...)
  2023-05-24 15:32 ` [PATCH v3 5/6] selftests: mm: Add a test for remapping to area immediately after existing mapping Joel Fernandes (Google)
@ 2023-05-24 15:32 ` Joel Fernandes (Google)
  5 siblings, 0 replies; 9+ messages in thread
From: Joel Fernandes (Google) @ 2023-05-24 15:32 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google),
	linux-kselftest, linux-mm, Shuah Khan, Vlastimil Babka,
	Michal Hocko, Linus Torvalds, Lorenzo Stoakes, Kirill A Shutemov,
	Liam R. Howlett, Paul E. McKenney, Suren Baghdasaryan,
	Kalesh Singh, Lokesh Gidra

Move a block of memory within a memory range. Any alignment optimization
on the source address may cause corruption. Verify using kselftest that
it works. I have also verified with tracing that such optimization does
not happen due to this check in can_align_down():

if (!for_stack && vma->vm_start <= addr_masked)
	return false;

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 tools/testing/selftests/mm/mremap_test.c | 79 +++++++++++++++++++++++-
 1 file changed, 78 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/mm/mremap_test.c b/tools/testing/selftests/mm/mremap_test.c
index d7366074e2a8..f45d1abedc9c 100644
--- a/tools/testing/selftests/mm/mremap_test.c
+++ b/tools/testing/selftests/mm/mremap_test.c
@@ -23,6 +23,7 @@
 #define VALIDATION_NO_THRESHOLD 0	/* Verify the entire region */
 
 #define MIN(X, Y) ((X) < (Y) ? (X) : (Y))
+#define SIZE_MB(m) ((size_t)m * (1024 * 1024))
 
 struct config {
 	unsigned long long src_alignment;
@@ -226,6 +227,79 @@ static void mremap_expand_merge_offset(FILE *maps_fp, unsigned long page_size)
 		ksft_test_result_fail("%s\n", test_name);
 }
 
+/*
+ * Verify that an mremap within a range does not cause corruption
+ * of unrelated part of range.
+ *
+ * Consider the following range which is 2MB aligned and is
+ * a part of a larger 10MB range which is not shown. Each
+ * character is 256KB below making the source and destination
+ * 2MB each. The lower case letters are moved (s to d) and the
+ * upper case letters are not moved. The below test verifies
+ * that the upper case S letters are not corrupted by the
+ * adjacent mremap.
+ *
+ * |DDDDddddSSSSssss|
+ */
+static void mremap_move_within_range(char pattern_seed)
+{
+	char *test_name = "mremap mremap move within range";
+	void *src, *dest;
+	int i, success = 1;
+
+	size_t size = SIZE_MB(20);
+	void *ptr = mmap(NULL, size, PROT_READ | PROT_WRITE,
+			 MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+	if (ptr == MAP_FAILED) {
+		perror("mmap");
+		success = 0;
+		goto out;
+	}
+	memset(ptr, 0, size);
+
+	src = ptr + SIZE_MB(6);
+	src = (void *)((unsigned long)src & ~(SIZE_MB(2) - 1));
+
+	/* Set byte pattern for source block. */
+	srand(pattern_seed);
+	for (i = 0; i < SIZE_MB(2); i++) {
+		((char *)src)[i] = (char) rand();
+	}
+
+	dest = src - SIZE_MB(2);
+
+	void *new_ptr = mremap(src + SIZE_MB(1), SIZE_MB(1), SIZE_MB(1),
+						   MREMAP_MAYMOVE | MREMAP_FIXED, dest + SIZE_MB(1));
+	if (new_ptr == MAP_FAILED) {
+		perror("mremap");
+		success = 0;
+		goto out;
+	}
+
+	/* Verify byte pattern after remapping */
+	srand(pattern_seed);
+	for (i = 0; i < SIZE_MB(1); i++) {
+		char c = (char) rand();
+
+		if (((char *)src)[i] != c) {
+			ksft_print_msg("Data at src at %d got corrupted due to unrelated mremap\n",
+				       i);
+			ksft_print_msg("Expected: %#x\t Got: %#x\n", c & 0xff,
+					((char *) src)[i] & 0xff);
+			success = 0;
+		}
+	}
+
+out:
+	if (munmap(ptr, size) == -1)
+		perror("munmap");
+
+	if (success)
+		ksft_test_result_pass("%s\n", test_name);
+	else
+		ksft_test_result_fail("%s\n", test_name);
+}
+
 /*
  * Returns the start address of the mapping on success, else returns
  * NULL on failure.
@@ -491,6 +565,7 @@ int main(int argc, char **argv)
 	unsigned int threshold_mb = VALIDATION_DEFAULT_THRESHOLD;
 	unsigned int pattern_seed;
 	int num_expand_tests = 2;
+	int num_misc_tests = 1;
 	struct test test_cases[MAX_TEST] = {};
 	struct test perf_test_cases[MAX_PERF_TEST];
 	int page_size;
@@ -572,7 +647,7 @@ int main(int argc, char **argv)
 				(threshold_mb * _1MB >= _1GB);
 
 	ksft_set_plan(ARRAY_SIZE(test_cases) + (run_perf_tests ?
-		      ARRAY_SIZE(perf_test_cases) : 0) + num_expand_tests);
+		      ARRAY_SIZE(perf_test_cases) : 0) + num_expand_tests + num_misc_tests);
 
 	for (i = 0; i < ARRAY_SIZE(test_cases); i++)
 		run_mremap_test_case(test_cases[i], &failures, threshold_mb,
@@ -590,6 +665,8 @@ int main(int argc, char **argv)
 
 	fclose(maps_fp);
 
+	mremap_move_within_range(pattern_seed);
+
 	if (run_perf_tests) {
 		ksft_print_msg("\n%s\n",
 		 "mremap HAVE_MOVE_PMD/PUD optimization time comparison for 1GB region:");
-- 
2.40.1.698.g37aff9b760-goog



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 1/6] mm/mremap: Optimize the start addresses in move_page_tables()
  2023-05-24 15:32 ` [PATCH v3 1/6] mm/mremap: Optimize the start addresses in move_page_tables() Joel Fernandes (Google)
@ 2023-05-24 23:23   ` Linus Torvalds
  2023-05-25 19:51     ` Joel Fernandes
  0 siblings, 1 reply; 9+ messages in thread
From: Linus Torvalds @ 2023-05-24 23:23 UTC (permalink / raw)
  To: Joel Fernandes (Google)
  Cc: linux-kernel, linux-kselftest, linux-mm, Shuah Khan,
	Vlastimil Babka, Michal Hocko, Lorenzo Stoakes,
	Kirill A Shutemov, Liam R. Howlett, Paul E. McKenney,
	Suren Baghdasaryan, Kalesh Singh, Lokesh Gidra

Hmm. I'm still quite unhappy about your can_align_down().

On Wed, May 24, 2023 at 8:32 AM Joel Fernandes (Google)
<joel@joelfernandes.org> wrote:
>
> +       /* If the masked address is within vma, we cannot align the address down. */
> +       if (vma->vm_start <= addr_masked)
> +               return false;

I don't think this test is right.

The test should not be "is the mapping still there at the point we
aligned down to".

No, the test should be whether there is any part of the mapping below
the point we're starting with:

        if (vma->vm_start < addr_to_align)
                return false;

because we can do the "expand the move down" *only* if it's the
beginning of the vma (because otherwise we'd be moving part of the vma
that precedes the address!)

(Alternatively, just make that "<" be "!=" - we're basically saying
that we can expand moving ptes to a pmd boundary *only* if this vma
starts at that point. No?).

> +       cur = find_vma_prev(vma->vm_mm, vma->vm_start, &prev);
> +       if (!cur || cur != vma || !prev)
> +               return false;

I've mentioned this test before, and I still find it actively misleading.

First off, the "!cur || cur != vma" test is clearly redundant. We know
'vma' isn't NULL (we just dereferenced it!). So "cur != vma" already
includes the "!cur" test.

So that "!cur" part of the test simply *cannot* be sensible.

And the "!prev" test still makes no sense to me. You tried to explain
it to me earlier, and I clearly didn't get it. It seems actively
wrong. I still think "!prev" should return true.

You seemed to think that "!prev" couldn';t actually happen and would
be a sign of some VM problem, but that doesn't make any sense to me.
Of course !prev can happen - if "vma" is the first vma in the VM and
there is no previous.

It may be *rare*, but I still don't understand why you'd make that
"there is no vma below us" mean "we cannot expand the move below us
because there's something there".

So I continue to think that this test should just be

        if (WARN_ON_ONCE(cur != vma))
                return false;

because if it ever returns something that *isn't* the same as vma,
then we do indeed have serious problems. But that WARN_ON_ONCE() shows
that that's a "cannot happen" thing, not some kind of "if this happens
than don't do it" test.

and then the *real* test  for "can we align down" should just be

        return !prev || prev->vm_end <= addr_masked;

Because while I think your code _works_, it really doesn't seem to
make much sense as it stands in your patch. The tests are actively
misleading. No?

                 Linus


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 1/6] mm/mremap: Optimize the start addresses in move_page_tables()
  2023-05-24 23:23   ` Linus Torvalds
@ 2023-05-25 19:51     ` Joel Fernandes
  0 siblings, 0 replies; 9+ messages in thread
From: Joel Fernandes @ 2023-05-25 19:51 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, linux-kselftest, linux-mm, Shuah Khan,
	Vlastimil Babka, Michal Hocko, Lorenzo Stoakes,
	Kirill A Shutemov, Liam R. Howlett, Paul E. McKenney,
	Suren Baghdasaryan, Kalesh Singh, Lokesh Gidra

Hi Linus,

On Wed, May 24, 2023 at 7:23 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Hmm. I'm still quite unhappy about your can_align_down().
>
> On Wed, May 24, 2023 at 8:32 AM Joel Fernandes (Google)
> <joel@joelfernandes.org> wrote:
> >
> > +       /* If the masked address is within vma, we cannot align the address down. */
> > +       if (vma->vm_start <= addr_masked)
> > +               return false;
>
> I don't think this test is right.
>
> The test should not be "is the mapping still there at the point we
> aligned down to".
>
> No, the test should be whether there is any part of the mapping below
> the point we're starting with:
>
>         if (vma->vm_start < addr_to_align)
>                 return false;
>
> because we can do the "expand the move down" *only* if it's the
> beginning of the vma (because otherwise we'd be moving part of the vma
> that precedes the address!)

You are right, I missed that. Funny I did think about this case you
mentioned. I will fix it in the next revision, thanks.

> (Alternatively, just make that "<" be "!=" - we're basically saying
> that we can expand moving ptes to a pmd boundary *only* if this vma
> starts at that point. No?).

Yes, I prefer the "!=" check. I will use that.

>
> > +       cur = find_vma_prev(vma->vm_mm, vma->vm_start, &prev);
> > +       if (!cur || cur != vma || !prev)
> > +               return false;
>
> I've mentioned this test before, and I still find it actively misleading.
>
> First off, the "!cur || cur != vma" test is clearly redundant. We know
> 'vma' isn't NULL (we just dereferenced it!). So "cur != vma" already
> includes the "!cur" test.
>
> So that "!cur" part of the test simply *cannot* be sensible.

Ok, I agree with you now.

> And the "!prev" test still makes no sense to me. You tried to explain
> it to me earlier, and I clearly didn't get it. It seems actively
> wrong. I still think "!prev" should return true.

Yes, ok. Sounds good.

> You seemed to think that "!prev" couldn';t actually happen and would
> be a sign of some VM problem, but that doesn't make any sense to me.
> Of course !prev can happen - if "vma" is the first vma in the VM and
> there is no previous.
>
> It may be *rare*, but I still don't understand why you'd make that
> "there is no vma below us" mean "we cannot expand the move below us
> because there's something there".
>
> So I continue to think that this test should just be
>
>         if (WARN_ON_ONCE(cur != vma))
>                 return false;

I agree with this now.

>
> because if it ever returns something that *isn't* the same as vma,
> then we do indeed have serious problems. But that WARN_ON_ONCE() shows
> that that's a "cannot happen" thing, not some kind of "if this happens
> than don't do it" test.
>
> and then the *real* test  for "can we align down" should just be
>
>         return !prev || prev->vm_end <= addr_masked;

Agreed, that's cleaner.

> Because while I think your code _works_, it really doesn't seem to
> make much sense as it stands in your patch. The tests are actively
> misleading. No?

True, your approach makes me want to improve on writing cleaner code
than being excessively paranoid. So thank you for that.

These patches have been tricky to get right so thank you for your
continued input and quick feedback.

I will add a test for the case you mentioned above where the address
to realign wasn't in the VMA's beginning.

thanks,

- Joel


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-05-25 19:51 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-24 15:32 [PATCH v3 0/6] Optimize mremap during mutual alignment within PMD Joel Fernandes (Google)
2023-05-24 15:32 ` [PATCH v3 1/6] mm/mremap: Optimize the start addresses in move_page_tables() Joel Fernandes (Google)
2023-05-24 23:23   ` Linus Torvalds
2023-05-25 19:51     ` Joel Fernandes
2023-05-24 15:32 ` [PATCH v3 2/6] mm/mremap: Allow moves within the same VMA Joel Fernandes (Google)
2023-05-24 15:32 ` [PATCH v3 3/6] selftests: mm: Fix failure case when new remap region was not found Joel Fernandes (Google)
2023-05-24 15:32 ` [PATCH v3 4/6] selftests: mm: Add a test for mutually aligned moves > PMD size Joel Fernandes (Google)
2023-05-24 15:32 ` [PATCH v3 5/6] selftests: mm: Add a test for remapping to area immediately after existing mapping Joel Fernandes (Google)
2023-05-24 15:32 ` [PATCH v3 6/6] selftests: mm: Add a test for remapping within a range Joel Fernandes (Google)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).