linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 0/7] Optimize mremap during mutual alignment within PMD
@ 2023-08-22  1:54 Joel Fernandes (Google)
  2023-08-22  1:54 ` [PATCH v5 1/7] mm/mremap: Optimize the start addresses in move_page_tables() Joel Fernandes (Google)
                   ` (6 more replies)
  0 siblings, 7 replies; 25+ messages in thread
From: Joel Fernandes (Google) @ 2023-08-22  1:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google),
	linux-kselftest, linux-mm, Shuah Khan, Vlastimil Babka,
	Michal Hocko, Linus Torvalds, Lorenzo Stoakes, Kirill A Shutemov,
	Liam R. Howlett, Paul E. McKenney, Suren Baghdasaryan,
	Kalesh Singh, Lokesh Gidra

Hello!

Here is v5 of the mremap start address optimization / fix for exec warning.

Description of patches
======================
These patches optimizes the start addresses in move_page_tables() and tests the
changes. It addresses a warning [1] that occurs due to a downward, overlapping
move on a mutually-aligned offset within a PMD during exec. By initiating the
copy process at the PMD level when such alignment is present, we can prevent
this warning and speed up the copying process at the same time. Linus Torvalds
suggested this idea.

Please check the individual patches for more details.

[1] https://lore.kernel.org/all/ZB2GTBD%2FLWTrkOiO@dhcp22.suse.cz/

Link to v4:
https://lore.kernel.org/all/20230531220807.2048037-1-joel@joelfernandes.org/

History of patches:
v4->v5:
1. Rebased on mainline.
2. Several improvement suggestions from Lorenzo.

v3->v4:
1. Care to be taken to move purely within a VMA, in other words this check
   in call_align_down():
    if (vma->vm_start != addr_masked)
            return false;

    As an example of why this is needed:
    Consider the following range which is 2MB aligned and is
    a part of a larger 10MB range which is not shown. Each
    character is 256KB below making the source and destination
    2MB each. The lower case letters are moved (s to d) and the
    upper case letters are not moved.

    |DDDDddddSSSSssss|

    If we align down 'ssss' to start from the 'SSSS', we will end up destroying
    SSSS. The above if statement prevents that and I verified it.

    I also added a test for this in the last patch.

2. Handle the stack case separately. We do not care about #1 for stack movement
   because the 'SSSS' does not matter during this move. Further we need to do this
   to prevent the stack move warning.

    if (!for_stack && vma->vm_start <= addr_masked)
            return false;

v2->v3:
1. Masked address was stored in int, fixed it to unsigned long to avoid truncation.
2. We now handle moves happening purely within a VMA, a new test is added to handle this.
3. More code comments.

v1->v2:
1. Trigger the optimization for mremaps smaller than a PMD. I tested by tracing
that it works correctly.

2. Fix issue with bogus return value found by Linus if we broke out of the
above loop for the first PMD itself.

v1: Initial RFC.

Joel Fernandes (1):
selftests: mm: Add a test for moving from an offset from start of
mapping

Joel Fernandes (Google) (6):
mm/mremap: Optimize the start addresses in move_page_tables()
mm/mremap: Allow moves within the same VMA
selftests: mm: Fix failure case when new remap region was not found
selftests: mm: Add a test for mutually aligned moves > PMD size
selftests: mm: Add a test for remapping to area immediately after
existing mapping
selftests: mm: Add a test for remapping within a range

fs/exec.c                                |   2 +-
include/linux/mm.h                       |   2 +-
mm/mremap.c                              |  69 +++++-
tools/testing/selftests/mm/mremap_test.c | 301 +++++++++++++++++++----
4 files changed, 325 insertions(+), 49 deletions(-)

--
2.42.0.rc1.204.g551eb34607-goog


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v5 1/7] mm/mremap: Optimize the start addresses in move_page_tables()
  2023-08-22  1:54 [PATCH v5 0/7] Optimize mremap during mutual alignment within PMD Joel Fernandes (Google)
@ 2023-08-22  1:54 ` Joel Fernandes (Google)
  2023-08-27  9:09   ` Lorenzo Stoakes
  2023-08-22  1:54 ` [PATCH v5 2/7] mm/mremap: Allow moves within the same VMA Joel Fernandes (Google)
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 25+ messages in thread
From: Joel Fernandes (Google) @ 2023-08-22  1:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google),
	Linus Torvalds, linux-kselftest, linux-mm, Shuah Khan,
	Vlastimil Babka, Michal Hocko, Lorenzo Stoakes,
	Kirill A Shutemov, Liam R. Howlett, Paul E. McKenney,
	Suren Baghdasaryan, Kalesh Singh, Lokesh Gidra

Recently, we see reports [1] of a warning that triggers due to
move_page_tables() doing a downward and overlapping move on a
mutually-aligned offset within a PMD. By mutual alignment, I
mean the source and destination addresses of the mremap are at
the same offset within a PMD.

This mutual alignment along with the fact that the move is downward is
sufficient to cause a warning related to having an allocated PMD that
does not have PTEs in it.

This warning will only trigger when there is mutual alignment in the
move operation. A solution, as suggested by Linus Torvalds [2], is to
initiate the copy process at the PMD level whenever such alignment is
present. Implementing this approach will not only prevent the warning
from being triggered, but it will also optimize the operation as this
method should enhance the speed of the copy process whenever there's a
possibility to start copying at the PMD level.

Some more points:
a. The optimization can be done only when both the source and
destination of the mremap do not have anything mapped below it up to a
PMD boundary. I add support to detect that.

b. #1 is not a problem for the call to move_page_tables() from exec.c as
nothing is expected to be mapped below the source. However, for
non-overlapping mutually aligned moves as triggered by mremap(2), I
added support for checking such cases.

c. I currently only optimize for PMD moves, in the future I/we can build
on this work and do PUD moves as well if there is a need for this. But I
want to take it one step at a time.

d. We need to be careful about mremap of ranges within the VMA itself.
For this purpose, I added checks to determine if the address after
alignment falls within its VMA itself.

[1] https://lore.kernel.org/all/ZB2GTBD%2FLWTrkOiO@dhcp22.suse.cz/
[2] https://lore.kernel.org/all/CAHk-=whd7msp8reJPfeGNyt0LiySMT0egExx3TVZSX3Ok6X=9g@mail.gmail.com/

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 mm/mremap.c | 62 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 62 insertions(+)

diff --git a/mm/mremap.c b/mm/mremap.c
index 11e06e4ab33b..035fbf542a8f 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -489,6 +489,53 @@ static bool move_pgt_entry(enum pgt_entry entry, struct vm_area_struct *vma,
 	return moved;
 }
 
+/*
+ * A helper to check if a previous mapping exists. Required for
+ * move_page_tables() and realign_addr() to determine if a previous mapping
+ * exists before we can do realignment optimizations.
+ */
+static bool can_align_down(struct vm_area_struct *vma, unsigned long addr_to_align,
+			       unsigned long mask)
+{
+	unsigned long addr_masked = addr_to_align & mask;
+
+	/*
+	 * If @addr_to_align of either source or destination is not the beginning
+	 * of the corresponding VMA, we can't align down or we will destroy part
+	 * of the current mapping.
+	 */
+	if (vma->vm_start != addr_to_align)
+		return false;
+
+	/*
+	 * Make sure the realignment doesn't cause the address to fall on an
+	 * existing mapping.
+	 */
+	return find_vma_intersection(vma->vm_mm, addr_masked, addr_to_align) == NULL;
+}
+
+/* Opportunistically realign to specified boundary for faster copy. */
+static void try_realign_addr(unsigned long *old_addr, struct vm_area_struct *old_vma,
+			     unsigned long *new_addr, struct vm_area_struct *new_vma,
+			     unsigned long mask)
+{
+	/* Skip if the addresses are already aligned. */
+	if ((*old_addr & ~mask) == 0)
+		return;
+
+	/* Only realign if the new and old addresses are mutually aligned. */
+	if ((*old_addr & ~mask) != (*new_addr & ~mask))
+		return;
+
+	/* Ensure realignment doesn't cause overlap with existing mappings. */
+	if (!can_align_down(old_vma, *old_addr, mask) ||
+	    !can_align_down(new_vma, *new_addr, mask))
+		return;
+
+	*old_addr = *old_addr & mask;
+	*new_addr = *new_addr & mask;
+}
+
 unsigned long move_page_tables(struct vm_area_struct *vma,
 		unsigned long old_addr, struct vm_area_struct *new_vma,
 		unsigned long new_addr, unsigned long len,
@@ -508,6 +555,14 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
 		return move_hugetlb_page_tables(vma, new_vma, old_addr,
 						new_addr, len);
 
+	/*
+	 * If possible, realign addresses to PMD boundary for faster copy.
+	 * Only realign if the mremap copying hits a PMD boundary.
+	 */
+	if ((vma != new_vma)
+		&& (len >= PMD_SIZE - (old_addr & ~PMD_MASK)))
+		try_realign_addr(&old_addr, vma, &new_addr, new_vma, PMD_MASK);
+
 	flush_cache_range(vma, old_addr, old_end);
 	mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, vma->vm_mm,
 				old_addr, old_end);
@@ -577,6 +632,13 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
 
 	mmu_notifier_invalidate_range_end(&range);
 
+	/*
+	 * Prevent negative return values when {old,new}_addr was realigned
+	 * but we broke out of the above loop for the first PMD itself.
+	 */
+	if (len + old_addr < old_end)
+		return 0;
+
 	return len + old_addr - old_end;	/* how much done */
 }
 
-- 
2.42.0.rc1.204.g551eb34607-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v5 2/7] mm/mremap: Allow moves within the same VMA
  2023-08-22  1:54 [PATCH v5 0/7] Optimize mremap during mutual alignment within PMD Joel Fernandes (Google)
  2023-08-22  1:54 ` [PATCH v5 1/7] mm/mremap: Optimize the start addresses in move_page_tables() Joel Fernandes (Google)
@ 2023-08-22  1:54 ` Joel Fernandes (Google)
  2023-08-27  9:21   ` Lorenzo Stoakes
  2023-08-22  1:54 ` [PATCH v5 3/7] selftests: mm: Fix failure case when new remap region was not found Joel Fernandes (Google)
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 25+ messages in thread
From: Joel Fernandes (Google) @ 2023-08-22  1:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google),
	linux-kselftest, linux-mm, Shuah Khan, Vlastimil Babka,
	Michal Hocko, Linus Torvalds, Lorenzo Stoakes, Kirill A Shutemov,
	Liam R. Howlett, Paul E. McKenney, Suren Baghdasaryan,
	Kalesh Singh, Lokesh Gidra

For the stack move happening in shift_arg_pages(), the move is happening
within the same VMA which spans the old and new ranges.

In case the aligned address happens to fall within that VMA, allow such
moves and don't abort the optimization.

In the mremap case, we cannot allow any such moves as will end up
destroying some part of the mapping (either the source of the move, or
part of the existing mapping). So just avoid it for mremap.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 fs/exec.c          |  2 +-
 include/linux/mm.h |  2 +-
 mm/mremap.c        | 29 +++++++++++++++--------------
 3 files changed, 17 insertions(+), 16 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 1a827d55ba94..244925307958 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -712,7 +712,7 @@ static int shift_arg_pages(struct vm_area_struct *vma, unsigned long shift)
 	 * process cleanup to remove whatever mess we made.
 	 */
 	if (length != move_page_tables(vma, old_start,
-				       vma, new_start, length, false))
+				       vma, new_start, length, false, true))
 		return -ENOMEM;
 
 	lru_add_drain();
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 406ab9ea818f..e635d1fc73b6 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2458,7 +2458,7 @@ int get_cmdline(struct task_struct *task, char *buffer, int buflen);
 extern unsigned long move_page_tables(struct vm_area_struct *vma,
 		unsigned long old_addr, struct vm_area_struct *new_vma,
 		unsigned long new_addr, unsigned long len,
-		bool need_rmap_locks);
+		bool need_rmap_locks, bool for_stack);
 
 /*
  * Flags used by change_protection().  For now we make it a bitmap so
diff --git a/mm/mremap.c b/mm/mremap.c
index 035fbf542a8f..06baa13bd2c8 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -490,12 +490,13 @@ static bool move_pgt_entry(enum pgt_entry entry, struct vm_area_struct *vma,
 }
 
 /*
- * A helper to check if a previous mapping exists. Required for
- * move_page_tables() and realign_addr() to determine if a previous mapping
- * exists before we can do realignment optimizations.
+ * A helper to check if aligning down is OK. The aligned address should fall
+ * on *no mapping*. For the stack moving down, that's a special move within
+ * the VMA that is created to span the source and destination of the move,
+ * so we make an exception for it.
  */
 static bool can_align_down(struct vm_area_struct *vma, unsigned long addr_to_align,
-			       unsigned long mask)
+			    unsigned long mask, bool for_stack)
 {
 	unsigned long addr_masked = addr_to_align & mask;
 
@@ -504,7 +505,7 @@ static bool can_align_down(struct vm_area_struct *vma, unsigned long addr_to_ali
 	 * of the corresponding VMA, we can't align down or we will destroy part
 	 * of the current mapping.
 	 */
-	if (vma->vm_start != addr_to_align)
+	if (!for_stack && vma->vm_start != addr_to_align)
 		return false;
 
 	/*
@@ -517,7 +518,7 @@ static bool can_align_down(struct vm_area_struct *vma, unsigned long addr_to_ali
 /* Opportunistically realign to specified boundary for faster copy. */
 static void try_realign_addr(unsigned long *old_addr, struct vm_area_struct *old_vma,
 			     unsigned long *new_addr, struct vm_area_struct *new_vma,
-			     unsigned long mask)
+			     unsigned long mask, bool for_stack)
 {
 	/* Skip if the addresses are already aligned. */
 	if ((*old_addr & ~mask) == 0)
@@ -528,8 +529,8 @@ static void try_realign_addr(unsigned long *old_addr, struct vm_area_struct *old
 		return;
 
 	/* Ensure realignment doesn't cause overlap with existing mappings. */
-	if (!can_align_down(old_vma, *old_addr, mask) ||
-	    !can_align_down(new_vma, *new_addr, mask))
+	if (!can_align_down(old_vma, *old_addr, mask, for_stack) ||
+	    !can_align_down(new_vma, *new_addr, mask, for_stack))
 		return;
 
 	*old_addr = *old_addr & mask;
@@ -539,7 +540,7 @@ static void try_realign_addr(unsigned long *old_addr, struct vm_area_struct *old
 unsigned long move_page_tables(struct vm_area_struct *vma,
 		unsigned long old_addr, struct vm_area_struct *new_vma,
 		unsigned long new_addr, unsigned long len,
-		bool need_rmap_locks)
+		bool need_rmap_locks, bool for_stack)
 {
 	unsigned long extent, old_end;
 	struct mmu_notifier_range range;
@@ -559,9 +560,9 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
 	 * If possible, realign addresses to PMD boundary for faster copy.
 	 * Only realign if the mremap copying hits a PMD boundary.
 	 */
-	if ((vma != new_vma)
-		&& (len >= PMD_SIZE - (old_addr & ~PMD_MASK)))
-		try_realign_addr(&old_addr, vma, &new_addr, new_vma, PMD_MASK);
+	if (len >= PMD_SIZE - (old_addr & ~PMD_MASK))
+		try_realign_addr(&old_addr, vma, &new_addr, new_vma, PMD_MASK,
+				 for_stack);
 
 	flush_cache_range(vma, old_addr, old_end);
 	mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, vma->vm_mm,
@@ -708,7 +709,7 @@ static unsigned long move_vma(struct vm_area_struct *vma,
 	}
 
 	moved_len = move_page_tables(vma, old_addr, new_vma, new_addr, old_len,
-				     need_rmap_locks);
+				     need_rmap_locks, false);
 	if (moved_len < old_len) {
 		err = -ENOMEM;
 	} else if (vma->vm_ops && vma->vm_ops->mremap) {
@@ -722,7 +723,7 @@ static unsigned long move_vma(struct vm_area_struct *vma,
 		 * and then proceed to unmap new area instead of old.
 		 */
 		move_page_tables(new_vma, new_addr, vma, old_addr, moved_len,
-				 true);
+				 true, false);
 		vma = new_vma;
 		old_len = new_len;
 		old_addr = new_addr;
-- 
2.42.0.rc1.204.g551eb34607-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v5 3/7] selftests: mm: Fix failure case when new remap region was not found
  2023-08-22  1:54 [PATCH v5 0/7] Optimize mremap during mutual alignment within PMD Joel Fernandes (Google)
  2023-08-22  1:54 ` [PATCH v5 1/7] mm/mremap: Optimize the start addresses in move_page_tables() Joel Fernandes (Google)
  2023-08-22  1:54 ` [PATCH v5 2/7] mm/mremap: Allow moves within the same VMA Joel Fernandes (Google)
@ 2023-08-22  1:54 ` Joel Fernandes (Google)
  2023-08-27  9:22   ` Lorenzo Stoakes
  2023-08-22  1:54 ` [PATCH v5 4/7] selftests: mm: Add a test for mutually aligned moves > PMD size Joel Fernandes (Google)
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 25+ messages in thread
From: Joel Fernandes (Google) @ 2023-08-22  1:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google),
	linux-kselftest, linux-mm, Shuah Khan, Vlastimil Babka,
	Michal Hocko, Linus Torvalds, Lorenzo Stoakes, Kirill A Shutemov,
	Liam R. Howlett, Paul E. McKenney, Suren Baghdasaryan,
	Kalesh Singh, Lokesh Gidra

When a valid remap region could not be found, the source mapping is not
cleaned up. Fix the goto statement such that the clean up happens.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 tools/testing/selftests/mm/mremap_test.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/mm/mremap_test.c b/tools/testing/selftests/mm/mremap_test.c
index 5c3773de9f0f..6822d657f589 100644
--- a/tools/testing/selftests/mm/mremap_test.c
+++ b/tools/testing/selftests/mm/mremap_test.c
@@ -316,7 +316,7 @@ static long long remap_region(struct config c, unsigned int threshold_mb,
 		if (addr + c.dest_alignment < addr) {
 			ksft_print_msg("Couldn't find a valid region to remap to\n");
 			ret = -1;
-			goto out;
+			goto clean_up_src;
 		}
 		addr += c.dest_alignment;
 	}
-- 
2.42.0.rc1.204.g551eb34607-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v5 4/7] selftests: mm: Add a test for mutually aligned moves > PMD size
  2023-08-22  1:54 [PATCH v5 0/7] Optimize mremap during mutual alignment within PMD Joel Fernandes (Google)
                   ` (2 preceding siblings ...)
  2023-08-22  1:54 ` [PATCH v5 3/7] selftests: mm: Fix failure case when new remap region was not found Joel Fernandes (Google)
@ 2023-08-22  1:54 ` Joel Fernandes (Google)
  2023-08-27  9:36   ` Lorenzo Stoakes
  2023-08-22  1:54 ` [PATCH v5 5/7] selftests: mm: Add a test for remapping to area immediately after existing mapping Joel Fernandes (Google)
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 25+ messages in thread
From: Joel Fernandes (Google) @ 2023-08-22  1:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google),
	linux-kselftest, linux-mm, Shuah Khan, Vlastimil Babka,
	Michal Hocko, Linus Torvalds, Lorenzo Stoakes, Kirill A Shutemov,
	Liam R. Howlett, Paul E. McKenney, Suren Baghdasaryan,
	Kalesh Singh, Lokesh Gidra

This patch adds a test case to check if a PMD-alignment optimization
successfully happens.

I add support to make sure there is some room before the source mapping,
otherwise the optimization to trigger PMD-aligned move will be disabled
as the kernel will detect that a mapping before the source exists and
such optimization becomes impossible.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 tools/testing/selftests/mm/mremap_test.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/mm/mremap_test.c b/tools/testing/selftests/mm/mremap_test.c
index 6822d657f589..6304eb0947a3 100644
--- a/tools/testing/selftests/mm/mremap_test.c
+++ b/tools/testing/selftests/mm/mremap_test.c
@@ -44,6 +44,7 @@ enum {
 	_1MB = 1ULL << 20,
 	_2MB = 2ULL << 20,
 	_4MB = 4ULL << 20,
+	_5MB = 5ULL << 20,
 	_1GB = 1ULL << 30,
 	_2GB = 2ULL << 30,
 	PMD = _2MB,
@@ -235,6 +236,11 @@ static void *get_source_mapping(struct config c)
 	unsigned long long mmap_min_addr;
 
 	mmap_min_addr = get_mmap_min_addr();
+	/*
+	 * For some tests, we need to not have any mappings below the
+	 * source mapping. Add some headroom to mmap_min_addr for this.
+	 */
+	mmap_min_addr += 10 * _4MB;
 
 retry:
 	addr += c.src_alignment;
@@ -434,7 +440,7 @@ static int parse_args(int argc, char **argv, unsigned int *threshold_mb,
 	return 0;
 }
 
-#define MAX_TEST 13
+#define MAX_TEST 14
 #define MAX_PERF_TEST 3
 int main(int argc, char **argv)
 {
@@ -500,6 +506,10 @@ int main(int argc, char **argv)
 	test_cases[12] = MAKE_TEST(PUD, PUD, _2GB, NON_OVERLAPPING, EXPECT_SUCCESS,
 				   "2GB mremap - Source PUD-aligned, Destination PUD-aligned");
 
+	/* Src and Dest addr 1MB aligned. 5MB mremap. */
+	test_cases[13] = MAKE_TEST(_1MB, _1MB, _5MB, NON_OVERLAPPING, EXPECT_SUCCESS,
+				  "5MB mremap - Source 1MB-aligned, Destination 1MB-aligned");
+
 	perf_test_cases[0] =  MAKE_TEST(page_size, page_size, _1GB, NON_OVERLAPPING, EXPECT_SUCCESS,
 					"1GB mremap - Source PTE-aligned, Destination PTE-aligned");
 	/*
-- 
2.42.0.rc1.204.g551eb34607-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v5 5/7] selftests: mm: Add a test for remapping to area immediately after existing mapping
  2023-08-22  1:54 [PATCH v5 0/7] Optimize mremap during mutual alignment within PMD Joel Fernandes (Google)
                   ` (3 preceding siblings ...)
  2023-08-22  1:54 ` [PATCH v5 4/7] selftests: mm: Add a test for mutually aligned moves > PMD size Joel Fernandes (Google)
@ 2023-08-22  1:54 ` Joel Fernandes (Google)
  2023-08-27  9:42   ` Lorenzo Stoakes
  2023-08-22  1:54 ` [PATCH v5 6/7] selftests: mm: Add a test for remapping within a range Joel Fernandes (Google)
  2023-08-22  1:55 ` [PATCH v5 7/7] selftests: mm: Add a test for moving from an offset from start of mapping Joel Fernandes (Google)
  6 siblings, 1 reply; 25+ messages in thread
From: Joel Fernandes (Google) @ 2023-08-22  1:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google),
	linux-kselftest, linux-mm, Shuah Khan, Vlastimil Babka,
	Michal Hocko, Linus Torvalds, Lorenzo Stoakes, Kirill A Shutemov,
	Liam R. Howlett, Paul E. McKenney, Suren Baghdasaryan,
	Kalesh Singh, Lokesh Gidra

This patch adds support for verifying that we correctly handle the
situation where something is already mapped before the destination of the remap.

Any realignment of destination address and PMD-copy will destroy that
existing mapping. In such cases, we need to avoid doing the optimization.

To test this, we map an area called the preamble before the remap
region. Then we verify after the mremap operation that this region did not get
corrupted.

Putting some prints in the kernel, I verified that we optimize
correctly in different situations:

Optimize when there is alignment and no previous mapping (this is tested
by previous patch).
<prints>
can_align_down(old_vma->vm_start=2900000, old_addr=2900000, mask=-2097152): 0
can_align_down(new_vma->vm_start=2f00000, new_addr=2f00000, mask=-2097152): 0
=== Starting move_page_tables ===
Doing PUD move for 2800000 -> 2e00000 of extent=200000 <-- Optimization
Doing PUD move for 2a00000 -> 3000000 of extent=200000
Doing PUD move for 2c00000 -> 3200000 of extent=200000
</prints>

Don't optimize when there is alignment but there is previous mapping
(this is tested by this patch).
Notice that can_align_down() returns 1 for the destination mapping
as we detected there is something there.
<prints>
can_align_down(old_vma->vm_start=2900000, old_addr=2900000, mask=-2097152): 0
can_align_down(new_vma->vm_start=5700000, new_addr=5700000, mask=-2097152): 1
=== Starting move_page_tables ===
Doing move_ptes for 2900000 -> 5700000 of extent=100000 <-- Unoptimized
Doing PUD move for 2a00000 -> 5800000 of extent=200000
Doing PUD move for 2c00000 -> 5a00000 of extent=200000
</prints>

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 tools/testing/selftests/mm/mremap_test.c | 57 +++++++++++++++++++++---
 1 file changed, 52 insertions(+), 5 deletions(-)

diff --git a/tools/testing/selftests/mm/mremap_test.c b/tools/testing/selftests/mm/mremap_test.c
index 6304eb0947a3..d7366074e2a8 100644
--- a/tools/testing/selftests/mm/mremap_test.c
+++ b/tools/testing/selftests/mm/mremap_test.c
@@ -29,6 +29,7 @@ struct config {
 	unsigned long long dest_alignment;
 	unsigned long long region_size;
 	int overlapping;
+	int dest_preamble_size;
 };
 
 struct test {
@@ -283,7 +284,7 @@ static void *get_source_mapping(struct config c)
 static long long remap_region(struct config c, unsigned int threshold_mb,
 			      char pattern_seed)
 {
-	void *addr, *src_addr, *dest_addr;
+	void *addr, *src_addr, *dest_addr, *dest_preamble_addr;
 	unsigned long long i;
 	struct timespec t_start = {0, 0}, t_end = {0, 0};
 	long long  start_ns, end_ns, align_mask, ret, offset;
@@ -300,7 +301,7 @@ static long long remap_region(struct config c, unsigned int threshold_mb,
 		goto out;
 	}
 
-	/* Set byte pattern */
+	/* Set byte pattern for source block. */
 	srand(pattern_seed);
 	for (i = 0; i < threshold; i++)
 		memset((char *) src_addr + i, (char) rand(), 1);
@@ -312,6 +313,9 @@ static long long remap_region(struct config c, unsigned int threshold_mb,
 	addr = (void *) (((unsigned long long) src_addr + c.region_size
 			  + offset) & align_mask);
 
+	/* Remap after the destination block preamble. */
+	addr += c.dest_preamble_size;
+
 	/* See comment in get_source_mapping() */
 	if (!((unsigned long long) addr & c.dest_alignment))
 		addr = (void *) ((unsigned long long) addr | c.dest_alignment);
@@ -327,6 +331,24 @@ static long long remap_region(struct config c, unsigned int threshold_mb,
 		addr += c.dest_alignment;
 	}
 
+	if (c.dest_preamble_size) {
+		dest_preamble_addr = mmap((void *) addr - c.dest_preamble_size, c.dest_preamble_size,
+					  PROT_READ | PROT_WRITE,
+					  MAP_FIXED_NOREPLACE | MAP_ANONYMOUS | MAP_SHARED,
+							-1, 0);
+		if (dest_preamble_addr == MAP_FAILED) {
+			ksft_print_msg("Failed to map dest preamble region: %s\n",
+					strerror(errno));
+			ret = -1;
+			goto clean_up_src;
+		}
+
+		/* Set byte pattern for the dest preamble block. */
+		srand(pattern_seed);
+		for (i = 0; i < c.dest_preamble_size; i++)
+			memset((char *) dest_preamble_addr + i, (char) rand(), 1);
+	}
+
 	clock_gettime(CLOCK_MONOTONIC, &t_start);
 	dest_addr = mremap(src_addr, c.region_size, c.region_size,
 					  MREMAP_MAYMOVE|MREMAP_FIXED, (char *) addr);
@@ -335,7 +357,7 @@ static long long remap_region(struct config c, unsigned int threshold_mb,
 	if (dest_addr == MAP_FAILED) {
 		ksft_print_msg("mremap failed: %s\n", strerror(errno));
 		ret = -1;
-		goto clean_up_src;
+		goto clean_up_dest_preamble;
 	}
 
 	/* Verify byte pattern after remapping */
@@ -353,6 +375,23 @@ static long long remap_region(struct config c, unsigned int threshold_mb,
 		}
 	}
 
+	/* Verify the dest preamble byte pattern after remapping */
+	if (c.dest_preamble_size) {
+		srand(pattern_seed);
+		for (i = 0; i < c.dest_preamble_size; i++) {
+			char c = (char) rand();
+
+			if (((char *) dest_preamble_addr)[i] != c) {
+				ksft_print_msg("Preamble data after remap doesn't match at offset %d\n",
+					       i);
+				ksft_print_msg("Expected: %#x\t Got: %#x\n", c & 0xff,
+					       ((char *) dest_preamble_addr)[i] & 0xff);
+				ret = -1;
+				goto clean_up_dest;
+			}
+		}
+	}
+
 	start_ns = t_start.tv_sec * NS_PER_SEC + t_start.tv_nsec;
 	end_ns = t_end.tv_sec * NS_PER_SEC + t_end.tv_nsec;
 	ret = end_ns - start_ns;
@@ -365,6 +404,9 @@ static long long remap_region(struct config c, unsigned int threshold_mb,
  */
 clean_up_dest:
 	munmap(dest_addr, c.region_size);
+clean_up_dest_preamble:
+	if (c.dest_preamble_size && dest_preamble_addr)
+		munmap(dest_preamble_addr, c.dest_preamble_size);
 clean_up_src:
 	munmap(src_addr, c.region_size);
 out:
@@ -440,7 +482,7 @@ static int parse_args(int argc, char **argv, unsigned int *threshold_mb,
 	return 0;
 }
 
-#define MAX_TEST 14
+#define MAX_TEST 15
 #define MAX_PERF_TEST 3
 int main(int argc, char **argv)
 {
@@ -449,7 +491,7 @@ int main(int argc, char **argv)
 	unsigned int threshold_mb = VALIDATION_DEFAULT_THRESHOLD;
 	unsigned int pattern_seed;
 	int num_expand_tests = 2;
-	struct test test_cases[MAX_TEST];
+	struct test test_cases[MAX_TEST] = {};
 	struct test perf_test_cases[MAX_PERF_TEST];
 	int page_size;
 	time_t t;
@@ -510,6 +552,11 @@ int main(int argc, char **argv)
 	test_cases[13] = MAKE_TEST(_1MB, _1MB, _5MB, NON_OVERLAPPING, EXPECT_SUCCESS,
 				  "5MB mremap - Source 1MB-aligned, Destination 1MB-aligned");
 
+	/* Src and Dest addr 1MB aligned. 5MB mremap. */
+	test_cases[14] = MAKE_TEST(_1MB, _1MB, _5MB, NON_OVERLAPPING, EXPECT_SUCCESS,
+				  "5MB mremap - Source 1MB-aligned, Dest 1MB-aligned with 40MB Preamble");
+	test_cases[14].config.dest_preamble_size = 10 * _4MB;
+
 	perf_test_cases[0] =  MAKE_TEST(page_size, page_size, _1GB, NON_OVERLAPPING, EXPECT_SUCCESS,
 					"1GB mremap - Source PTE-aligned, Destination PTE-aligned");
 	/*
-- 
2.42.0.rc1.204.g551eb34607-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v5 6/7] selftests: mm: Add a test for remapping within a range
  2023-08-22  1:54 [PATCH v5 0/7] Optimize mremap during mutual alignment within PMD Joel Fernandes (Google)
                   ` (4 preceding siblings ...)
  2023-08-22  1:54 ` [PATCH v5 5/7] selftests: mm: Add a test for remapping to area immediately after existing mapping Joel Fernandes (Google)
@ 2023-08-22  1:54 ` Joel Fernandes (Google)
  2023-08-27  9:57   ` Lorenzo Stoakes
  2023-08-22  1:55 ` [PATCH v5 7/7] selftests: mm: Add a test for moving from an offset from start of mapping Joel Fernandes (Google)
  6 siblings, 1 reply; 25+ messages in thread
From: Joel Fernandes (Google) @ 2023-08-22  1:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google),
	linux-kselftest, linux-mm, Shuah Khan, Vlastimil Babka,
	Michal Hocko, Linus Torvalds, Lorenzo Stoakes, Kirill A Shutemov,
	Liam R. Howlett, Paul E. McKenney, Suren Baghdasaryan,
	Kalesh Singh, Lokesh Gidra

Move a block of memory within a memory range. Any alignment optimization
on the source address may cause corruption. Verify using kselftest that
it works. I have also verified with tracing that such optimization does
not happen due to this check in can_align_down():

if (!for_stack && vma->vm_start <= addr_masked)
	return false;

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 tools/testing/selftests/mm/mremap_test.c | 79 +++++++++++++++++++++++-
 1 file changed, 78 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/mm/mremap_test.c b/tools/testing/selftests/mm/mremap_test.c
index d7366074e2a8..f45d1abedc9c 100644
--- a/tools/testing/selftests/mm/mremap_test.c
+++ b/tools/testing/selftests/mm/mremap_test.c
@@ -23,6 +23,7 @@
 #define VALIDATION_NO_THRESHOLD 0	/* Verify the entire region */
 
 #define MIN(X, Y) ((X) < (Y) ? (X) : (Y))
+#define SIZE_MB(m) ((size_t)m * (1024 * 1024))
 
 struct config {
 	unsigned long long src_alignment;
@@ -226,6 +227,79 @@ static void mremap_expand_merge_offset(FILE *maps_fp, unsigned long page_size)
 		ksft_test_result_fail("%s\n", test_name);
 }
 
+/*
+ * Verify that an mremap within a range does not cause corruption
+ * of unrelated part of range.
+ *
+ * Consider the following range which is 2MB aligned and is
+ * a part of a larger 10MB range which is not shown. Each
+ * character is 256KB below making the source and destination
+ * 2MB each. The lower case letters are moved (s to d) and the
+ * upper case letters are not moved. The below test verifies
+ * that the upper case S letters are not corrupted by the
+ * adjacent mremap.
+ *
+ * |DDDDddddSSSSssss|
+ */
+static void mremap_move_within_range(char pattern_seed)
+{
+	char *test_name = "mremap mremap move within range";
+	void *src, *dest;
+	int i, success = 1;
+
+	size_t size = SIZE_MB(20);
+	void *ptr = mmap(NULL, size, PROT_READ | PROT_WRITE,
+			 MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+	if (ptr == MAP_FAILED) {
+		perror("mmap");
+		success = 0;
+		goto out;
+	}
+	memset(ptr, 0, size);
+
+	src = ptr + SIZE_MB(6);
+	src = (void *)((unsigned long)src & ~(SIZE_MB(2) - 1));
+
+	/* Set byte pattern for source block. */
+	srand(pattern_seed);
+	for (i = 0; i < SIZE_MB(2); i++) {
+		((char *)src)[i] = (char) rand();
+	}
+
+	dest = src - SIZE_MB(2);
+
+	void *new_ptr = mremap(src + SIZE_MB(1), SIZE_MB(1), SIZE_MB(1),
+						   MREMAP_MAYMOVE | MREMAP_FIXED, dest + SIZE_MB(1));
+	if (new_ptr == MAP_FAILED) {
+		perror("mremap");
+		success = 0;
+		goto out;
+	}
+
+	/* Verify byte pattern after remapping */
+	srand(pattern_seed);
+	for (i = 0; i < SIZE_MB(1); i++) {
+		char c = (char) rand();
+
+		if (((char *)src)[i] != c) {
+			ksft_print_msg("Data at src at %d got corrupted due to unrelated mremap\n",
+				       i);
+			ksft_print_msg("Expected: %#x\t Got: %#x\n", c & 0xff,
+					((char *) src)[i] & 0xff);
+			success = 0;
+		}
+	}
+
+out:
+	if (munmap(ptr, size) == -1)
+		perror("munmap");
+
+	if (success)
+		ksft_test_result_pass("%s\n", test_name);
+	else
+		ksft_test_result_fail("%s\n", test_name);
+}
+
 /*
  * Returns the start address of the mapping on success, else returns
  * NULL on failure.
@@ -491,6 +565,7 @@ int main(int argc, char **argv)
 	unsigned int threshold_mb = VALIDATION_DEFAULT_THRESHOLD;
 	unsigned int pattern_seed;
 	int num_expand_tests = 2;
+	int num_misc_tests = 1;
 	struct test test_cases[MAX_TEST] = {};
 	struct test perf_test_cases[MAX_PERF_TEST];
 	int page_size;
@@ -572,7 +647,7 @@ int main(int argc, char **argv)
 				(threshold_mb * _1MB >= _1GB);
 
 	ksft_set_plan(ARRAY_SIZE(test_cases) + (run_perf_tests ?
-		      ARRAY_SIZE(perf_test_cases) : 0) + num_expand_tests);
+		      ARRAY_SIZE(perf_test_cases) : 0) + num_expand_tests + num_misc_tests);
 
 	for (i = 0; i < ARRAY_SIZE(test_cases); i++)
 		run_mremap_test_case(test_cases[i], &failures, threshold_mb,
@@ -590,6 +665,8 @@ int main(int argc, char **argv)
 
 	fclose(maps_fp);
 
+	mremap_move_within_range(pattern_seed);
+
 	if (run_perf_tests) {
 		ksft_print_msg("\n%s\n",
 		 "mremap HAVE_MOVE_PMD/PUD optimization time comparison for 1GB region:");
-- 
2.42.0.rc1.204.g551eb34607-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v5 7/7] selftests: mm: Add a test for moving from an offset from start of mapping
  2023-08-22  1:54 [PATCH v5 0/7] Optimize mremap during mutual alignment within PMD Joel Fernandes (Google)
                   ` (5 preceding siblings ...)
  2023-08-22  1:54 ` [PATCH v5 6/7] selftests: mm: Add a test for remapping within a range Joel Fernandes (Google)
@ 2023-08-22  1:55 ` Joel Fernandes (Google)
  2023-08-27 10:12   ` Lorenzo Stoakes
  6 siblings, 1 reply; 25+ messages in thread
From: Joel Fernandes (Google) @ 2023-08-22  1:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes, linux-kselftest, linux-mm, Shuah Khan,
	Vlastimil Babka, Michal Hocko, Linus Torvalds, Lorenzo Stoakes,
	Kirill A Shutemov, Liam R. Howlett, Paul E. McKenney,
	Suren Baghdasaryan, Kalesh Singh, Lokesh Gidra

From: Joel Fernandes <joel@joelfernandes.org>

It is possible that the aligned address falls on no existing mapping,
however that does not mean that we can just align it down to that.
This test verifies that the "vma->vm_start != addr_to_align" check in
can_align_down() prevents disastrous results if aligning down when
source and dest are mutually aligned within a PMD but the source/dest
addresses requested are not at the beginning of the respective mapping
containing these addresses.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 tools/testing/selftests/mm/mremap_test.c | 189 ++++++++++++++++-------
 1 file changed, 134 insertions(+), 55 deletions(-)

diff --git a/tools/testing/selftests/mm/mremap_test.c b/tools/testing/selftests/mm/mremap_test.c
index f45d1abedc9c..c71ac8306c89 100644
--- a/tools/testing/selftests/mm/mremap_test.c
+++ b/tools/testing/selftests/mm/mremap_test.c
@@ -24,6 +24,7 @@
 
 #define MIN(X, Y) ((X) < (Y) ? (X) : (Y))
 #define SIZE_MB(m) ((size_t)m * (1024 * 1024))
+#define SIZE_KB(k) ((size_t)k * 1024)
 
 struct config {
 	unsigned long long src_alignment;
@@ -148,6 +149,60 @@ static bool is_range_mapped(FILE *maps_fp, void *start, void *end)
 	return success;
 }
 
+/*
+ * Returns the start address of the mapping on success, else returns
+ * NULL on failure.
+ */
+static void *get_source_mapping(struct config c)
+{
+	unsigned long long addr = 0ULL;
+	void *src_addr = NULL;
+	unsigned long long mmap_min_addr;
+
+	mmap_min_addr = get_mmap_min_addr();
+	/*
+	 * For some tests, we need to not have any mappings below the
+	 * source mapping. Add some headroom to mmap_min_addr for this.
+	 */
+	mmap_min_addr += 10 * _4MB;
+
+retry:
+	addr += c.src_alignment;
+	if (addr < mmap_min_addr)
+		goto retry;
+
+	src_addr = mmap((void *) addr, c.region_size, PROT_READ | PROT_WRITE,
+					MAP_FIXED_NOREPLACE | MAP_ANONYMOUS | MAP_SHARED,
+					-1, 0);
+	if (src_addr == MAP_FAILED) {
+		if (errno == EPERM || errno == EEXIST)
+			goto retry;
+		goto error;
+	}
+	/*
+	 * Check that the address is aligned to the specified alignment.
+	 * Addresses which have alignments that are multiples of that
+	 * specified are not considered valid. For instance, 1GB address is
+	 * 2MB-aligned, however it will not be considered valid for a
+	 * requested alignment of 2MB. This is done to reduce coincidental
+	 * alignment in the tests.
+	 */
+	if (((unsigned long long) src_addr & (c.src_alignment - 1)) ||
+			!((unsigned long long) src_addr & c.src_alignment)) {
+		munmap(src_addr, c.region_size);
+		goto retry;
+	}
+
+	if (!src_addr)
+		goto error;
+
+	return src_addr;
+error:
+	ksft_print_msg("Failed to map source region: %s\n",
+			strerror(errno));
+	return NULL;
+}
+
 /*
  * This test validates that merge is called when expanding a mapping.
  * Mapping containing three pages is created, middle page is unmapped
@@ -300,60 +355,6 @@ static void mremap_move_within_range(char pattern_seed)
 		ksft_test_result_fail("%s\n", test_name);
 }
 
-/*
- * Returns the start address of the mapping on success, else returns
- * NULL on failure.
- */
-static void *get_source_mapping(struct config c)
-{
-	unsigned long long addr = 0ULL;
-	void *src_addr = NULL;
-	unsigned long long mmap_min_addr;
-
-	mmap_min_addr = get_mmap_min_addr();
-	/*
-	 * For some tests, we need to not have any mappings below the
-	 * source mapping. Add some headroom to mmap_min_addr for this.
-	 */
-	mmap_min_addr += 10 * _4MB;
-
-retry:
-	addr += c.src_alignment;
-	if (addr < mmap_min_addr)
-		goto retry;
-
-	src_addr = mmap((void *) addr, c.region_size, PROT_READ | PROT_WRITE,
-					MAP_FIXED_NOREPLACE | MAP_ANONYMOUS | MAP_SHARED,
-					-1, 0);
-	if (src_addr == MAP_FAILED) {
-		if (errno == EPERM || errno == EEXIST)
-			goto retry;
-		goto error;
-	}
-	/*
-	 * Check that the address is aligned to the specified alignment.
-	 * Addresses which have alignments that are multiples of that
-	 * specified are not considered valid. For instance, 1GB address is
-	 * 2MB-aligned, however it will not be considered valid for a
-	 * requested alignment of 2MB. This is done to reduce coincidental
-	 * alignment in the tests.
-	 */
-	if (((unsigned long long) src_addr & (c.src_alignment - 1)) ||
-			!((unsigned long long) src_addr & c.src_alignment)) {
-		munmap(src_addr, c.region_size);
-		goto retry;
-	}
-
-	if (!src_addr)
-		goto error;
-
-	return src_addr;
-error:
-	ksft_print_msg("Failed to map source region: %s\n",
-			strerror(errno));
-	return NULL;
-}
-
 /* Returns the time taken for the remap on success else returns -1. */
 static long long remap_region(struct config c, unsigned int threshold_mb,
 			      char pattern_seed)
@@ -487,6 +488,83 @@ static long long remap_region(struct config c, unsigned int threshold_mb,
 	return ret;
 }
 
+/*
+ * Verify that an mremap aligning down does not destroy
+ * the beginning of the mapping just because the aligned
+ * down address landed on a mapping that maybe does not exist.
+ */
+static void mremap_move_1mb_from_start(char pattern_seed)
+{
+	char *test_name = "mremap move 1mb from start at 2MB+256KB aligned src";
+	void *src = NULL, *dest = NULL;
+	int i, success = 1;
+
+	/* Config to reuse get_source_mapping() to do an aligned mmap. */
+	struct config c = {
+		.src_alignment = SIZE_MB(1) + SIZE_KB(256),
+		.region_size = SIZE_MB(6)
+	};
+
+	src = get_source_mapping(c);
+	if (!src) {
+		success = 0;
+		goto out;
+	}
+
+	c.src_alignment = SIZE_MB(1) + SIZE_KB(256);
+	dest = get_source_mapping(c);
+	if (!dest) {
+		success = 0;
+		goto out;
+	}
+
+	/* Set byte pattern for source block. */
+	srand(pattern_seed);
+	for (i = 0; i < SIZE_MB(2); i++) {
+		((char *)src)[i] = (char) rand();
+	}
+
+	/*
+	 * Unmap the beginning of dest so that the aligned address
+	 * falls on no mapping.
+	 */
+	munmap(dest, SIZE_MB(1));
+
+	void *new_ptr = mremap(src + SIZE_MB(1), SIZE_MB(1), SIZE_MB(1),
+						   MREMAP_MAYMOVE | MREMAP_FIXED, dest + SIZE_MB(1));
+	if (new_ptr == MAP_FAILED) {
+		perror("mremap");
+		success = 0;
+		goto out;
+	}
+
+	/* Verify byte pattern after remapping */
+	srand(pattern_seed);
+	for (i = 0; i < SIZE_MB(1); i++) {
+		char c = (char) rand();
+
+		if (((char *)src)[i] != c) {
+			ksft_print_msg("Data at src at %d got corrupted due to unrelated mremap\n",
+				       i);
+			ksft_print_msg("Expected: %#x\t Got: %#x\n", c & 0xff,
+					((char *) src)[i] & 0xff);
+			success = 0;
+		}
+	}
+
+out:
+	if (src && munmap(src, c.region_size) == -1)
+		perror("munmap src");
+
+	if (dest && munmap(dest, c.region_size) == -1)
+		perror("munmap dest");
+
+	if (success)
+		ksft_test_result_pass("%s\n", test_name);
+	else
+		ksft_test_result_fail("%s\n", test_name);
+}
+
 static void run_mremap_test_case(struct test test_case, int *failures,
 				 unsigned int threshold_mb,
 				 unsigned int pattern_seed)
@@ -565,7 +643,7 @@ int main(int argc, char **argv)
 	unsigned int threshold_mb = VALIDATION_DEFAULT_THRESHOLD;
 	unsigned int pattern_seed;
 	int num_expand_tests = 2;
-	int num_misc_tests = 1;
+	int num_misc_tests = 2;
 	struct test test_cases[MAX_TEST] = {};
 	struct test perf_test_cases[MAX_PERF_TEST];
 	int page_size;
@@ -666,6 +744,7 @@ int main(int argc, char **argv)
 	fclose(maps_fp);
 
 	mremap_move_within_range(pattern_seed);
+	mremap_move_1mb_from_start(pattern_seed);
 
 	if (run_perf_tests) {
 		ksft_print_msg("\n%s\n",
-- 
2.42.0.rc1.204.g551eb34607-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v5 1/7] mm/mremap: Optimize the start addresses in move_page_tables()
  2023-08-22  1:54 ` [PATCH v5 1/7] mm/mremap: Optimize the start addresses in move_page_tables() Joel Fernandes (Google)
@ 2023-08-27  9:09   ` Lorenzo Stoakes
  0 siblings, 0 replies; 25+ messages in thread
From: Lorenzo Stoakes @ 2023-08-27  9:09 UTC (permalink / raw)
  To: Joel Fernandes (Google)
  Cc: linux-kernel, Linus Torvalds, linux-kselftest, linux-mm,
	Shuah Khan, Vlastimil Babka, Michal Hocko, Kirill A Shutemov,
	Liam R. Howlett, Paul E. McKenney, Suren Baghdasaryan,
	Kalesh Singh, Lokesh Gidra

On Tue, Aug 22, 2023 at 01:54:54AM +0000, Joel Fernandes (Google) wrote:
> Recently, we see reports [1] of a warning that triggers due to
> move_page_tables() doing a downward and overlapping move on a
> mutually-aligned offset within a PMD. By mutual alignment, I
> mean the source and destination addresses of the mremap are at
> the same offset within a PMD.
>
> This mutual alignment along with the fact that the move is downward is
> sufficient to cause a warning related to having an allocated PMD that
> does not have PTEs in it.
>
> This warning will only trigger when there is mutual alignment in the
> move operation. A solution, as suggested by Linus Torvalds [2], is to
> initiate the copy process at the PMD level whenever such alignment is
> present. Implementing this approach will not only prevent the warning
> from being triggered, but it will also optimize the operation as this
> method should enhance the speed of the copy process whenever there's a
> possibility to start copying at the PMD level.
>
> Some more points:
> a. The optimization can be done only when both the source and
> destination of the mremap do not have anything mapped below it up to a
> PMD boundary. I add support to detect that.
>
> b. #1 is not a problem for the call to move_page_tables() from exec.c as
> nothing is expected to be mapped below the source. However, for
> non-overlapping mutually aligned moves as triggered by mremap(2), I
> added support for checking such cases.
>
> c. I currently only optimize for PMD moves, in the future I/we can build
> on this work and do PUD moves as well if there is a need for this. But I
> want to take it one step at a time.
>
> d. We need to be careful about mremap of ranges within the VMA itself.
> For this purpose, I added checks to determine if the address after
> alignment falls within its VMA itself.
>
> [1] https://lore.kernel.org/all/ZB2GTBD%2FLWTrkOiO@dhcp22.suse.cz/
> [2] https://lore.kernel.org/all/CAHk-=whd7msp8reJPfeGNyt0LiySMT0egExx3TVZSX3Ok6X=9g@mail.gmail.com/
>
> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> ---
>  mm/mremap.c | 62 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 62 insertions(+)
>
> diff --git a/mm/mremap.c b/mm/mremap.c
> index 11e06e4ab33b..035fbf542a8f 100644
> --- a/mm/mremap.c
> +++ b/mm/mremap.c
> @@ -489,6 +489,53 @@ static bool move_pgt_entry(enum pgt_entry entry, struct vm_area_struct *vma,
>  	return moved;
>  }
>
> +/*
> + * A helper to check if a previous mapping exists. Required for
> + * move_page_tables() and realign_addr() to determine if a previous mapping
> + * exists before we can do realignment optimizations.
> + */
> +static bool can_align_down(struct vm_area_struct *vma, unsigned long addr_to_align,
> +			       unsigned long mask)
> +{
> +	unsigned long addr_masked = addr_to_align & mask;
> +
> +	/*
> +	 * If @addr_to_align of either source or destination is not the beginning
> +	 * of the corresponding VMA, we can't align down or we will destroy part
> +	 * of the current mapping.
> +	 */
> +	if (vma->vm_start != addr_to_align)
> +		return false;
> +
> +	/*
> +	 * Make sure the realignment doesn't cause the address to fall on an
> +	 * existing mapping.
> +	 */
> +	return find_vma_intersection(vma->vm_mm, addr_masked, addr_to_align) == NULL;
> +}
> +
> +/* Opportunistically realign to specified boundary for faster copy. */
> +static void try_realign_addr(unsigned long *old_addr, struct vm_area_struct *old_vma,
> +			     unsigned long *new_addr, struct vm_area_struct *new_vma,
> +			     unsigned long mask)
> +{
> +	/* Skip if the addresses are already aligned. */
> +	if ((*old_addr & ~mask) == 0)
> +		return;
> +
> +	/* Only realign if the new and old addresses are mutually aligned. */
> +	if ((*old_addr & ~mask) != (*new_addr & ~mask))
> +		return;
> +
> +	/* Ensure realignment doesn't cause overlap with existing mappings. */
> +	if (!can_align_down(old_vma, *old_addr, mask) ||
> +	    !can_align_down(new_vma, *new_addr, mask))
> +		return;
> +
> +	*old_addr = *old_addr & mask;
> +	*new_addr = *new_addr & mask;
> +}
> +
>  unsigned long move_page_tables(struct vm_area_struct *vma,
>  		unsigned long old_addr, struct vm_area_struct *new_vma,
>  		unsigned long new_addr, unsigned long len,
> @@ -508,6 +555,14 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
>  		return move_hugetlb_page_tables(vma, new_vma, old_addr,
>  						new_addr, len);
>
> +	/*
> +	 * If possible, realign addresses to PMD boundary for faster copy.
> +	 * Only realign if the mremap copying hits a PMD boundary.
> +	 */
> +	if ((vma != new_vma)
> +		&& (len >= PMD_SIZE - (old_addr & ~PMD_MASK)))
> +		try_realign_addr(&old_addr, vma, &new_addr, new_vma, PMD_MASK);
> +
>  	flush_cache_range(vma, old_addr, old_end);
>  	mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, vma->vm_mm,
>  				old_addr, old_end);
> @@ -577,6 +632,13 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
>
>  	mmu_notifier_invalidate_range_end(&range);
>
> +	/*
> +	 * Prevent negative return values when {old,new}_addr was realigned
> +	 * but we broke out of the above loop for the first PMD itself.
> +	 */
> +	if (len + old_addr < old_end)
> +		return 0;
> +
>  	return len + old_addr - old_end;	/* how much done */
>  }
>
> --
> 2.42.0.rc1.204.g551eb34607-goog
>

Looks good to me! Thanks for the changes :)

Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v5 2/7] mm/mremap: Allow moves within the same VMA
  2023-08-22  1:54 ` [PATCH v5 2/7] mm/mremap: Allow moves within the same VMA Joel Fernandes (Google)
@ 2023-08-27  9:21   ` Lorenzo Stoakes
  2023-08-28 18:32     ` Joel Fernandes
  0 siblings, 1 reply; 25+ messages in thread
From: Lorenzo Stoakes @ 2023-08-27  9:21 UTC (permalink / raw)
  To: Joel Fernandes (Google)
  Cc: linux-kernel, linux-kselftest, linux-mm, Shuah Khan,
	Vlastimil Babka, Michal Hocko, Linus Torvalds, Kirill A Shutemov,
	Liam R. Howlett, Paul E. McKenney, Suren Baghdasaryan,
	Kalesh Singh, Lokesh Gidra

On Tue, Aug 22, 2023 at 01:54:55AM +0000, Joel Fernandes (Google) wrote:
> For the stack move happening in shift_arg_pages(), the move is happening
> within the same VMA which spans the old and new ranges.
>
> In case the aligned address happens to fall within that VMA, allow such
> moves and don't abort the optimization.
>
> In the mremap case, we cannot allow any such moves as will end up
> destroying some part of the mapping (either the source of the move, or
> part of the existing mapping). So just avoid it for mremap.
>
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> ---
>  fs/exec.c          |  2 +-
>  include/linux/mm.h |  2 +-
>  mm/mremap.c        | 29 +++++++++++++++--------------
>  3 files changed, 17 insertions(+), 16 deletions(-)
>
> diff --git a/fs/exec.c b/fs/exec.c
> index 1a827d55ba94..244925307958 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -712,7 +712,7 @@ static int shift_arg_pages(struct vm_area_struct *vma, unsigned long shift)
>  	 * process cleanup to remove whatever mess we made.
>  	 */
>  	if (length != move_page_tables(vma, old_start,
> -				       vma, new_start, length, false))
> +				       vma, new_start, length, false, true))
>  		return -ENOMEM;
>
>  	lru_add_drain();
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 406ab9ea818f..e635d1fc73b6 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2458,7 +2458,7 @@ int get_cmdline(struct task_struct *task, char *buffer, int buflen);
>  extern unsigned long move_page_tables(struct vm_area_struct *vma,
>  		unsigned long old_addr, struct vm_area_struct *new_vma,
>  		unsigned long new_addr, unsigned long len,
> -		bool need_rmap_locks);
> +		bool need_rmap_locks, bool for_stack);

It's a nit, but it'd be nice to not have 'mystery meat' booleans foo(bar, baz,
true, false, true); always ends up being a pain to track down.

However I think probably something better than that (flags or wrapper
functions) might be too much noise here so perhaps we can live with this!

>
>  /*
>   * Flags used by change_protection().  For now we make it a bitmap so
> diff --git a/mm/mremap.c b/mm/mremap.c
> index 035fbf542a8f..06baa13bd2c8 100644
> --- a/mm/mremap.c
> +++ b/mm/mremap.c
> @@ -490,12 +490,13 @@ static bool move_pgt_entry(enum pgt_entry entry, struct vm_area_struct *vma,
>  }
>
>  /*
> - * A helper to check if a previous mapping exists. Required for
> - * move_page_tables() and realign_addr() to determine if a previous mapping
> - * exists before we can do realignment optimizations.
> + * A helper to check if aligning down is OK. The aligned address should fall
> + * on *no mapping*. For the stack moving down, that's a special move within
> + * the VMA that is created to span the source and destination of the move,
> + * so we make an exception for it.
>   */
>  static bool can_align_down(struct vm_area_struct *vma, unsigned long addr_to_align,
> -			       unsigned long mask)
> +			    unsigned long mask, bool for_stack)
>  {
>  	unsigned long addr_masked = addr_to_align & mask;
>
> @@ -504,7 +505,7 @@ static bool can_align_down(struct vm_area_struct *vma, unsigned long addr_to_ali
>  	 * of the corresponding VMA, we can't align down or we will destroy part
>  	 * of the current mapping.
>  	 */
> -	if (vma->vm_start != addr_to_align)
> +	if (!for_stack && vma->vm_start != addr_to_align)
>  		return false;

I'm a little confused by this exception, is it very specifically for the
shift_arg_pages() case where can assume we are safe to just discard the
lower portion of the stack?

Wouldn't the find_vma_intersection() line below fail in this case? I may be
missing something here :)

>
>  	/*
> @@ -517,7 +518,7 @@ static bool can_align_down(struct vm_area_struct *vma, unsigned long addr_to_ali
>  /* Opportunistically realign to specified boundary for faster copy. */
>  static void try_realign_addr(unsigned long *old_addr, struct vm_area_struct *old_vma,
>  			     unsigned long *new_addr, struct vm_area_struct *new_vma,
> -			     unsigned long mask)
> +			     unsigned long mask, bool for_stack)
>  {
>  	/* Skip if the addresses are already aligned. */
>  	if ((*old_addr & ~mask) == 0)
> @@ -528,8 +529,8 @@ static void try_realign_addr(unsigned long *old_addr, struct vm_area_struct *old
>  		return;
>
>  	/* Ensure realignment doesn't cause overlap with existing mappings. */
> -	if (!can_align_down(old_vma, *old_addr, mask) ||
> -	    !can_align_down(new_vma, *new_addr, mask))
> +	if (!can_align_down(old_vma, *old_addr, mask, for_stack) ||
> +	    !can_align_down(new_vma, *new_addr, mask, for_stack))
>  		return;
>
>  	*old_addr = *old_addr & mask;
> @@ -539,7 +540,7 @@ static void try_realign_addr(unsigned long *old_addr, struct vm_area_struct *old
>  unsigned long move_page_tables(struct vm_area_struct *vma,
>  		unsigned long old_addr, struct vm_area_struct *new_vma,
>  		unsigned long new_addr, unsigned long len,
> -		bool need_rmap_locks)
> +		bool need_rmap_locks, bool for_stack)
>  {
>  	unsigned long extent, old_end;
>  	struct mmu_notifier_range range;
> @@ -559,9 +560,9 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
>  	 * If possible, realign addresses to PMD boundary for faster copy.
>  	 * Only realign if the mremap copying hits a PMD boundary.
>  	 */
> -	if ((vma != new_vma)
> -		&& (len >= PMD_SIZE - (old_addr & ~PMD_MASK)))
> -		try_realign_addr(&old_addr, vma, &new_addr, new_vma, PMD_MASK);
> +	if (len >= PMD_SIZE - (old_addr & ~PMD_MASK))
> +		try_realign_addr(&old_addr, vma, &new_addr, new_vma, PMD_MASK,
> +				 for_stack);
>
>  	flush_cache_range(vma, old_addr, old_end);
>  	mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, vma->vm_mm,
> @@ -708,7 +709,7 @@ static unsigned long move_vma(struct vm_area_struct *vma,
>  	}
>
>  	moved_len = move_page_tables(vma, old_addr, new_vma, new_addr, old_len,
> -				     need_rmap_locks);
> +				     need_rmap_locks, false);
>  	if (moved_len < old_len) {
>  		err = -ENOMEM;
>  	} else if (vma->vm_ops && vma->vm_ops->mremap) {
> @@ -722,7 +723,7 @@ static unsigned long move_vma(struct vm_area_struct *vma,
>  		 * and then proceed to unmap new area instead of old.
>  		 */
>  		move_page_tables(new_vma, new_addr, vma, old_addr, moved_len,
> -				 true);
> +				 true, false);
>  		vma = new_vma;
>  		old_len = new_len;
>  		old_addr = new_addr;
> --
> 2.42.0.rc1.204.g551eb34607-goog
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v5 3/7] selftests: mm: Fix failure case when new remap region was not found
  2023-08-22  1:54 ` [PATCH v5 3/7] selftests: mm: Fix failure case when new remap region was not found Joel Fernandes (Google)
@ 2023-08-27  9:22   ` Lorenzo Stoakes
  0 siblings, 0 replies; 25+ messages in thread
From: Lorenzo Stoakes @ 2023-08-27  9:22 UTC (permalink / raw)
  To: Joel Fernandes (Google)
  Cc: linux-kernel, linux-kselftest, linux-mm, Shuah Khan,
	Vlastimil Babka, Michal Hocko, Linus Torvalds, Kirill A Shutemov,
	Liam R. Howlett, Paul E. McKenney, Suren Baghdasaryan,
	Kalesh Singh, Lokesh Gidra

On Tue, Aug 22, 2023 at 01:54:56AM +0000, Joel Fernandes (Google) wrote:
> When a valid remap region could not be found, the source mapping is not
> cleaned up. Fix the goto statement such that the clean up happens.
>
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> ---
>  tools/testing/selftests/mm/mremap_test.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/tools/testing/selftests/mm/mremap_test.c b/tools/testing/selftests/mm/mremap_test.c
> index 5c3773de9f0f..6822d657f589 100644
> --- a/tools/testing/selftests/mm/mremap_test.c
> +++ b/tools/testing/selftests/mm/mremap_test.c
> @@ -316,7 +316,7 @@ static long long remap_region(struct config c, unsigned int threshold_mb,
>  		if (addr + c.dest_alignment < addr) {
>  			ksft_print_msg("Couldn't find a valid region to remap to\n");
>  			ret = -1;
> -			goto out;
> +			goto clean_up_src;
>  		}
>  		addr += c.dest_alignment;
>  	}
> --
> 2.42.0.rc1.204.g551eb34607-goog
>

Nice spot!

Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v5 4/7] selftests: mm: Add a test for mutually aligned moves > PMD size
  2023-08-22  1:54 ` [PATCH v5 4/7] selftests: mm: Add a test for mutually aligned moves > PMD size Joel Fernandes (Google)
@ 2023-08-27  9:36   ` Lorenzo Stoakes
  0 siblings, 0 replies; 25+ messages in thread
From: Lorenzo Stoakes @ 2023-08-27  9:36 UTC (permalink / raw)
  To: Joel Fernandes (Google)
  Cc: linux-kernel, linux-kselftest, linux-mm, Shuah Khan,
	Vlastimil Babka, Michal Hocko, Linus Torvalds, Kirill A Shutemov,
	Liam R. Howlett, Paul E. McKenney, Suren Baghdasaryan,
	Kalesh Singh, Lokesh Gidra

On Tue, Aug 22, 2023 at 01:54:57AM +0000, Joel Fernandes (Google) wrote:
> This patch adds a test case to check if a PMD-alignment optimization
> successfully happens.
>
> I add support to make sure there is some room before the source mapping,
> otherwise the optimization to trigger PMD-aligned move will be disabled
> as the kernel will detect that a mapping before the source exists and
> such optimization becomes impossible.
>
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> ---
>  tools/testing/selftests/mm/mremap_test.c | 12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/tools/testing/selftests/mm/mremap_test.c b/tools/testing/selftests/mm/mremap_test.c
> index 6822d657f589..6304eb0947a3 100644
> --- a/tools/testing/selftests/mm/mremap_test.c
> +++ b/tools/testing/selftests/mm/mremap_test.c
> @@ -44,6 +44,7 @@ enum {
>  	_1MB = 1ULL << 20,
>  	_2MB = 2ULL << 20,
>  	_4MB = 4ULL << 20,
> +	_5MB = 5ULL << 20,
>  	_1GB = 1ULL << 30,
>  	_2GB = 2ULL << 30,
>  	PMD = _2MB,
> @@ -235,6 +236,11 @@ static void *get_source_mapping(struct config c)
>  	unsigned long long mmap_min_addr;
>
>  	mmap_min_addr = get_mmap_min_addr();
> +	/*
> +	 * For some tests, we need to not have any mappings below the
> +	 * source mapping. Add some headroom to mmap_min_addr for this.
> +	 */
> +	mmap_min_addr += 10 * _4MB;

To be super nitty, you _in theory_ can't necessarily rely on arbitrary VAs
being available even ones at a location that is the very opposite of where
mappings will go by default.

I guess the _ideal_ solution would be to PROT_NONE map a range to ensure
it's there then munmap() bits you don't want to exist, but that'd involve
reworking this whole test and yeah, not worth it.

>
>  retry:
>  	addr += c.src_alignment;
> @@ -434,7 +440,7 @@ static int parse_args(int argc, char **argv, unsigned int *threshold_mb,
>  	return 0;
>  }
>
> -#define MAX_TEST 13
> +#define MAX_TEST 14
>  #define MAX_PERF_TEST 3
>  int main(int argc, char **argv)
>  {
> @@ -500,6 +506,10 @@ int main(int argc, char **argv)
>  	test_cases[12] = MAKE_TEST(PUD, PUD, _2GB, NON_OVERLAPPING, EXPECT_SUCCESS,
>  				   "2GB mremap - Source PUD-aligned, Destination PUD-aligned");
>
> +	/* Src and Dest addr 1MB aligned. 5MB mremap. */
> +	test_cases[13] = MAKE_TEST(_1MB, _1MB, _5MB, NON_OVERLAPPING, EXPECT_SUCCESS,
> +				  "5MB mremap - Source 1MB-aligned, Destination 1MB-aligned");
> +
>  	perf_test_cases[0] =  MAKE_TEST(page_size, page_size, _1GB, NON_OVERLAPPING, EXPECT_SUCCESS,
>  					"1GB mremap - Source PTE-aligned, Destination PTE-aligned");
>  	/*
> --
> 2.42.0.rc1.204.g551eb34607-goog
>

Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v5 5/7] selftests: mm: Add a test for remapping to area immediately after existing mapping
  2023-08-22  1:54 ` [PATCH v5 5/7] selftests: mm: Add a test for remapping to area immediately after existing mapping Joel Fernandes (Google)
@ 2023-08-27  9:42   ` Lorenzo Stoakes
  2023-08-28 18:36     ` Joel Fernandes
  0 siblings, 1 reply; 25+ messages in thread
From: Lorenzo Stoakes @ 2023-08-27  9:42 UTC (permalink / raw)
  To: Joel Fernandes (Google)
  Cc: linux-kernel, linux-kselftest, linux-mm, Shuah Khan,
	Vlastimil Babka, Michal Hocko, Linus Torvalds, Kirill A Shutemov,
	Liam R. Howlett, Paul E. McKenney, Suren Baghdasaryan,
	Kalesh Singh, Lokesh Gidra

On Tue, Aug 22, 2023 at 01:54:58AM +0000, Joel Fernandes (Google) wrote:
> This patch adds support for verifying that we correctly handle the
> situation where something is already mapped before the destination of the remap.
>
> Any realignment of destination address and PMD-copy will destroy that
> existing mapping. In such cases, we need to avoid doing the optimization.
>
> To test this, we map an area called the preamble before the remap
> region. Then we verify after the mremap operation that this region did not get
> corrupted.
>
> Putting some prints in the kernel, I verified that we optimize
> correctly in different situations:
>
> Optimize when there is alignment and no previous mapping (this is tested
> by previous patch).
> <prints>
> can_align_down(old_vma->vm_start=2900000, old_addr=2900000, mask=-2097152): 0
> can_align_down(new_vma->vm_start=2f00000, new_addr=2f00000, mask=-2097152): 0
> === Starting move_page_tables ===
> Doing PUD move for 2800000 -> 2e00000 of extent=200000 <-- Optimization
> Doing PUD move for 2a00000 -> 3000000 of extent=200000
> Doing PUD move for 2c00000 -> 3200000 of extent=200000
> </prints>
>
> Don't optimize when there is alignment but there is previous mapping
> (this is tested by this patch).
> Notice that can_align_down() returns 1 for the destination mapping
> as we detected there is something there.
> <prints>
> can_align_down(old_vma->vm_start=2900000, old_addr=2900000, mask=-2097152): 0
> can_align_down(new_vma->vm_start=5700000, new_addr=5700000, mask=-2097152): 1
> === Starting move_page_tables ===
> Doing move_ptes for 2900000 -> 5700000 of extent=100000 <-- Unoptimized
> Doing PUD move for 2a00000 -> 5800000 of extent=200000
> Doing PUD move for 2c00000 -> 5a00000 of extent=200000
> </prints>
>

Have you additionally tested this by changing the code to be intentionally
broken then running the test and observing it fail?

> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> ---
>  tools/testing/selftests/mm/mremap_test.c | 57 +++++++++++++++++++++---
>  1 file changed, 52 insertions(+), 5 deletions(-)
>
> diff --git a/tools/testing/selftests/mm/mremap_test.c b/tools/testing/selftests/mm/mremap_test.c
> index 6304eb0947a3..d7366074e2a8 100644
> --- a/tools/testing/selftests/mm/mremap_test.c
> +++ b/tools/testing/selftests/mm/mremap_test.c
> @@ -29,6 +29,7 @@ struct config {
>  	unsigned long long dest_alignment;
>  	unsigned long long region_size;
>  	int overlapping;
> +	int dest_preamble_size;
>  };
>
>  struct test {
> @@ -283,7 +284,7 @@ static void *get_source_mapping(struct config c)
>  static long long remap_region(struct config c, unsigned int threshold_mb,
>  			      char pattern_seed)
>  {
> -	void *addr, *src_addr, *dest_addr;
> +	void *addr, *src_addr, *dest_addr, *dest_preamble_addr;
>  	unsigned long long i;
>  	struct timespec t_start = {0, 0}, t_end = {0, 0};
>  	long long  start_ns, end_ns, align_mask, ret, offset;
> @@ -300,7 +301,7 @@ static long long remap_region(struct config c, unsigned int threshold_mb,
>  		goto out;
>  	}
>
> -	/* Set byte pattern */
> +	/* Set byte pattern for source block. */
>  	srand(pattern_seed);
>  	for (i = 0; i < threshold; i++)
>  		memset((char *) src_addr + i, (char) rand(), 1);
> @@ -312,6 +313,9 @@ static long long remap_region(struct config c, unsigned int threshold_mb,
>  	addr = (void *) (((unsigned long long) src_addr + c.region_size
>  			  + offset) & align_mask);
>
> +	/* Remap after the destination block preamble. */
> +	addr += c.dest_preamble_size;
> +
>  	/* See comment in get_source_mapping() */
>  	if (!((unsigned long long) addr & c.dest_alignment))
>  		addr = (void *) ((unsigned long long) addr | c.dest_alignment);
> @@ -327,6 +331,24 @@ static long long remap_region(struct config c, unsigned int threshold_mb,
>  		addr += c.dest_alignment;
>  	}
>
> +	if (c.dest_preamble_size) {
> +		dest_preamble_addr = mmap((void *) addr - c.dest_preamble_size, c.dest_preamble_size,
> +					  PROT_READ | PROT_WRITE,
> +					  MAP_FIXED_NOREPLACE | MAP_ANONYMOUS | MAP_SHARED,
> +							-1, 0);
> +		if (dest_preamble_addr == MAP_FAILED) {
> +			ksft_print_msg("Failed to map dest preamble region: %s\n",
> +					strerror(errno));
> +			ret = -1;
> +			goto clean_up_src;
> +		}
> +
> +		/* Set byte pattern for the dest preamble block. */
> +		srand(pattern_seed);
> +		for (i = 0; i < c.dest_preamble_size; i++)
> +			memset((char *) dest_preamble_addr + i, (char) rand(), 1);
> +	}
> +
>  	clock_gettime(CLOCK_MONOTONIC, &t_start);
>  	dest_addr = mremap(src_addr, c.region_size, c.region_size,
>  					  MREMAP_MAYMOVE|MREMAP_FIXED, (char *) addr);
> @@ -335,7 +357,7 @@ static long long remap_region(struct config c, unsigned int threshold_mb,
>  	if (dest_addr == MAP_FAILED) {
>  		ksft_print_msg("mremap failed: %s\n", strerror(errno));
>  		ret = -1;
> -		goto clean_up_src;
> +		goto clean_up_dest_preamble;
>  	}
>
>  	/* Verify byte pattern after remapping */
> @@ -353,6 +375,23 @@ static long long remap_region(struct config c, unsigned int threshold_mb,
>  		}
>  	}
>
> +	/* Verify the dest preamble byte pattern after remapping */
> +	if (c.dest_preamble_size) {
> +		srand(pattern_seed);
> +		for (i = 0; i < c.dest_preamble_size; i++) {
> +			char c = (char) rand();
> +
> +			if (((char *) dest_preamble_addr)[i] != c) {
> +				ksft_print_msg("Preamble data after remap doesn't match at offset %d\n",
> +					       i);
> +				ksft_print_msg("Expected: %#x\t Got: %#x\n", c & 0xff,
> +					       ((char *) dest_preamble_addr)[i] & 0xff);
> +				ret = -1;
> +				goto clean_up_dest;
> +			}
> +		}
> +	}
> +
>  	start_ns = t_start.tv_sec * NS_PER_SEC + t_start.tv_nsec;
>  	end_ns = t_end.tv_sec * NS_PER_SEC + t_end.tv_nsec;
>  	ret = end_ns - start_ns;
> @@ -365,6 +404,9 @@ static long long remap_region(struct config c, unsigned int threshold_mb,
>   */
>  clean_up_dest:
>  	munmap(dest_addr, c.region_size);
> +clean_up_dest_preamble:
> +	if (c.dest_preamble_size && dest_preamble_addr)
> +		munmap(dest_preamble_addr, c.dest_preamble_size);
>  clean_up_src:
>  	munmap(src_addr, c.region_size);
>  out:
> @@ -440,7 +482,7 @@ static int parse_args(int argc, char **argv, unsigned int *threshold_mb,
>  	return 0;
>  }
>
> -#define MAX_TEST 14
> +#define MAX_TEST 15
>  #define MAX_PERF_TEST 3
>  int main(int argc, char **argv)
>  {
> @@ -449,7 +491,7 @@ int main(int argc, char **argv)
>  	unsigned int threshold_mb = VALIDATION_DEFAULT_THRESHOLD;
>  	unsigned int pattern_seed;
>  	int num_expand_tests = 2;
> -	struct test test_cases[MAX_TEST];
> +	struct test test_cases[MAX_TEST] = {};
>  	struct test perf_test_cases[MAX_PERF_TEST];
>  	int page_size;
>  	time_t t;
> @@ -510,6 +552,11 @@ int main(int argc, char **argv)
>  	test_cases[13] = MAKE_TEST(_1MB, _1MB, _5MB, NON_OVERLAPPING, EXPECT_SUCCESS,
>  				  "5MB mremap - Source 1MB-aligned, Destination 1MB-aligned");
>
> +	/* Src and Dest addr 1MB aligned. 5MB mremap. */
> +	test_cases[14] = MAKE_TEST(_1MB, _1MB, _5MB, NON_OVERLAPPING, EXPECT_SUCCESS,
> +				  "5MB mremap - Source 1MB-aligned, Dest 1MB-aligned with 40MB Preamble");
> +	test_cases[14].config.dest_preamble_size = 10 * _4MB;
> +
>  	perf_test_cases[0] =  MAKE_TEST(page_size, page_size, _1GB, NON_OVERLAPPING, EXPECT_SUCCESS,
>  					"1GB mremap - Source PTE-aligned, Destination PTE-aligned");
>  	/*
> --
> 2.42.0.rc1.204.g551eb34607-goog
>

Looks good to me,
Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v5 6/7] selftests: mm: Add a test for remapping within a range
  2023-08-22  1:54 ` [PATCH v5 6/7] selftests: mm: Add a test for remapping within a range Joel Fernandes (Google)
@ 2023-08-27  9:57   ` Lorenzo Stoakes
  2023-08-27 10:15     ` Lorenzo Stoakes
  2023-08-28 18:37     ` Joel Fernandes
  0 siblings, 2 replies; 25+ messages in thread
From: Lorenzo Stoakes @ 2023-08-27  9:57 UTC (permalink / raw)
  To: Joel Fernandes (Google)
  Cc: linux-kernel, linux-kselftest, linux-mm, Shuah Khan,
	Vlastimil Babka, Michal Hocko, Linus Torvalds, Kirill A Shutemov,
	Liam R. Howlett, Paul E. McKenney, Suren Baghdasaryan,
	Kalesh Singh, Lokesh Gidra

On Tue, Aug 22, 2023 at 01:54:59AM +0000, Joel Fernandes (Google) wrote:
> Move a block of memory within a memory range. Any alignment optimization
> on the source address may cause corruption. Verify using kselftest that
> it works. I have also verified with tracing that such optimization does
> not happen due to this check in can_align_down():
>
> if (!for_stack && vma->vm_start <= addr_masked)
> 	return false;
>
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> ---
>  tools/testing/selftests/mm/mremap_test.c | 79 +++++++++++++++++++++++-
>  1 file changed, 78 insertions(+), 1 deletion(-)
>
> diff --git a/tools/testing/selftests/mm/mremap_test.c b/tools/testing/selftests/mm/mremap_test.c
> index d7366074e2a8..f45d1abedc9c 100644
> --- a/tools/testing/selftests/mm/mremap_test.c
> +++ b/tools/testing/selftests/mm/mremap_test.c
> @@ -23,6 +23,7 @@
>  #define VALIDATION_NO_THRESHOLD 0	/* Verify the entire region */
>
>  #define MIN(X, Y) ((X) < (Y) ? (X) : (Y))
> +#define SIZE_MB(m) ((size_t)m * (1024 * 1024))

Nit in this instance since you always just use an integer, but generally
for good practice's sake - shouldn't we place m in parens e.g. (size_t)(m)
to avoid broken macro expansion?

>
>  struct config {
>  	unsigned long long src_alignment;
> @@ -226,6 +227,79 @@ static void mremap_expand_merge_offset(FILE *maps_fp, unsigned long page_size)
>  		ksft_test_result_fail("%s\n", test_name);
>  }
>
> +/*
> + * Verify that an mremap within a range does not cause corruption
> + * of unrelated part of range.
> + *
> + * Consider the following range which is 2MB aligned and is
> + * a part of a larger 10MB range which is not shown. Each
> + * character is 256KB below making the source and destination
> + * 2MB each. The lower case letters are moved (s to d) and the
> + * upper case letters are not moved. The below test verifies
> + * that the upper case S letters are not corrupted by the
> + * adjacent mremap.
> + *
> + * |DDDDddddSSSSssss|
> + */
> +static void mremap_move_within_range(char pattern_seed)
> +{
> +	char *test_name = "mremap mremap move within range";
> +	void *src, *dest;
> +	int i, success = 1;
> +
> +	size_t size = SIZE_MB(20);
> +	void *ptr = mmap(NULL, size, PROT_READ | PROT_WRITE,
> +			 MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> +	if (ptr == MAP_FAILED) {
> +		perror("mmap");
> +		success = 0;
> +		goto out;
> +	}
> +	memset(ptr, 0, size);
> +
> +	src = ptr + SIZE_MB(6);
> +	src = (void *)((unsigned long)src & ~(SIZE_MB(2) - 1));
> +

It's nitty I know for a test (sorry!) but it'd be nice to place a bitwise
alignment trick like this into a helper function or macro if it isn't
otherwise avialable.

> +	/* Set byte pattern for source block. */
> +	srand(pattern_seed);
> +	for (i = 0; i < SIZE_MB(2); i++) {
> +		((char *)src)[i] = (char) rand();
> +	}
> +
> +	dest = src - SIZE_MB(2);
> +
> +	void *new_ptr = mremap(src + SIZE_MB(1), SIZE_MB(1), SIZE_MB(1),
> +						   MREMAP_MAYMOVE | MREMAP_FIXED, dest + SIZE_MB(1));
> +	if (new_ptr == MAP_FAILED) {
> +		perror("mremap");
> +		success = 0;
> +		goto out;
> +	}
> +
> +	/* Verify byte pattern after remapping */
> +	srand(pattern_seed);
> +	for (i = 0; i < SIZE_MB(1); i++) {
> +		char c = (char) rand();
> +
> +		if (((char *)src)[i] != c) {
> +			ksft_print_msg("Data at src at %d got corrupted due to unrelated mremap\n",
> +				       i);
> +			ksft_print_msg("Expected: %#x\t Got: %#x\n", c & 0xff,
> +					((char *) src)[i] & 0xff);
> +			success = 0;
> +		}
> +	}
> +
> +out:
> +	if (munmap(ptr, size) == -1)
> +		perror("munmap");
> +
> +	if (success)
> +		ksft_test_result_pass("%s\n", test_name);
> +	else
> +		ksft_test_result_fail("%s\n", test_name);
> +}
> +
>  /*
>   * Returns the start address of the mapping on success, else returns
>   * NULL on failure.
> @@ -491,6 +565,7 @@ int main(int argc, char **argv)
>  	unsigned int threshold_mb = VALIDATION_DEFAULT_THRESHOLD;
>  	unsigned int pattern_seed;
>  	int num_expand_tests = 2;
> +	int num_misc_tests = 1;
>  	struct test test_cases[MAX_TEST] = {};
>  	struct test perf_test_cases[MAX_PERF_TEST];
>  	int page_size;
> @@ -572,7 +647,7 @@ int main(int argc, char **argv)
>  				(threshold_mb * _1MB >= _1GB);
>
>  	ksft_set_plan(ARRAY_SIZE(test_cases) + (run_perf_tests ?
> -		      ARRAY_SIZE(perf_test_cases) : 0) + num_expand_tests);
> +		      ARRAY_SIZE(perf_test_cases) : 0) + num_expand_tests + num_misc_tests);
>
>  	for (i = 0; i < ARRAY_SIZE(test_cases); i++)
>  		run_mremap_test_case(test_cases[i], &failures, threshold_mb,
> @@ -590,6 +665,8 @@ int main(int argc, char **argv)
>
>  	fclose(maps_fp);
>
> +	mremap_move_within_range(pattern_seed);
> +
>  	if (run_perf_tests) {
>  		ksft_print_msg("\n%s\n",
>  		 "mremap HAVE_MOVE_PMD/PUD optimization time comparison for 1GB region:");
> --
> 2.42.0.rc1.204.g551eb34607-goog
>

I drew a little diagram for myself and was thereby convinced this was a good test, therefore,

Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v5 7/7] selftests: mm: Add a test for moving from an offset from start of mapping
  2023-08-22  1:55 ` [PATCH v5 7/7] selftests: mm: Add a test for moving from an offset from start of mapping Joel Fernandes (Google)
@ 2023-08-27 10:12   ` Lorenzo Stoakes
  2023-08-28 20:17     ` Joel Fernandes
  0 siblings, 1 reply; 25+ messages in thread
From: Lorenzo Stoakes @ 2023-08-27 10:12 UTC (permalink / raw)
  To: Joel Fernandes (Google)
  Cc: linux-kernel, linux-kselftest, linux-mm, Shuah Khan,
	Vlastimil Babka, Michal Hocko, Linus Torvalds, Kirill A Shutemov,
	Liam R. Howlett, Paul E. McKenney, Suren Baghdasaryan,
	Kalesh Singh, Lokesh Gidra

On Tue, Aug 22, 2023 at 01:55:00AM +0000, Joel Fernandes (Google) wrote:
> From: Joel Fernandes <joel@joelfernandes.org>
>
> It is possible that the aligned address falls on no existing mapping,
> however that does not mean that we can just align it down to that.
> This test verifies that the "vma->vm_start != addr_to_align" check in
> can_align_down() prevents disastrous results if aligning down when
> source and dest are mutually aligned within a PMD but the source/dest
> addresses requested are not at the beginning of the respective mapping
> containing these addresses.
>
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> ---
>  tools/testing/selftests/mm/mremap_test.c | 189 ++++++++++++++++-------
>  1 file changed, 134 insertions(+), 55 deletions(-)
>
> diff --git a/tools/testing/selftests/mm/mremap_test.c b/tools/testing/selftests/mm/mremap_test.c
> index f45d1abedc9c..c71ac8306c89 100644
> --- a/tools/testing/selftests/mm/mremap_test.c
> +++ b/tools/testing/selftests/mm/mremap_test.c
> @@ -24,6 +24,7 @@
>
>  #define MIN(X, Y) ((X) < (Y) ? (X) : (Y))
>  #define SIZE_MB(m) ((size_t)m * (1024 * 1024))
> +#define SIZE_KB(k) ((size_t)k * 1024)

Same comment as previous re: wrapping k in parens, it doesn't really matter
here that much but for good practice (and consistency with MIN()) would be
nice to do.

>
>  struct config {
>  	unsigned long long src_alignment;
> @@ -148,6 +149,60 @@ static bool is_range_mapped(FILE *maps_fp, void *start, void *end)
>  	return success;
>  }
>
> +/*
> + * Returns the start address of the mapping on success, else returns
> + * NULL on failure.
> + */
> +static void *get_source_mapping(struct config c)
> +{
> +	unsigned long long addr = 0ULL;
> +	void *src_addr = NULL;
> +	unsigned long long mmap_min_addr;
> +
> +	mmap_min_addr = get_mmap_min_addr();
> +	/*
> +	 * For some tests, we need to not have any mappings below the
> +	 * source mapping. Add some headroom to mmap_min_addr for this.
> +	 */
> +	mmap_min_addr += 10 * _4MB;
> +
> +retry:
> +	addr += c.src_alignment;
> +	if (addr < mmap_min_addr)
> +		goto retry;
> +
> +	src_addr = mmap((void *) addr, c.region_size, PROT_READ | PROT_WRITE,
> +					MAP_FIXED_NOREPLACE | MAP_ANONYMOUS | MAP_SHARED,
> +					-1, 0);
> +	if (src_addr == MAP_FAILED) {
> +		if (errno == EPERM || errno == EEXIST)
> +			goto retry;
> +		goto error;
> +	}
> +	/*
> +	 * Check that the address is aligned to the specified alignment.
> +	 * Addresses which have alignments that are multiples of that
> +	 * specified are not considered valid. For instance, 1GB address is
> +	 * 2MB-aligned, however it will not be considered valid for a
> +	 * requested alignment of 2MB. This is done to reduce coincidental
> +	 * alignment in the tests.
> +	 */
> +	if (((unsigned long long) src_addr & (c.src_alignment - 1)) ||
> +			!((unsigned long long) src_addr & c.src_alignment)) {
> +		munmap(src_addr, c.region_size);
> +		goto retry;
> +	}
> +
> +	if (!src_addr)
> +		goto error;
> +
> +	return src_addr;
> +error:
> +	ksft_print_msg("Failed to map source region: %s\n",
> +			strerror(errno));
> +	return NULL;
> +}
> +

A meta thing, but it'd be nice to separate out _moving_ functions into
their own patch to make reviewing easier as here it's not obvious whether
you changed anything or not (I eyeballed and seems like you didn't though!)

>  /*
>   * This test validates that merge is called when expanding a mapping.
>   * Mapping containing three pages is created, middle page is unmapped
> @@ -300,60 +355,6 @@ static void mremap_move_within_range(char pattern_seed)
>  		ksft_test_result_fail("%s\n", test_name);
>  }
>

> -/*
> - * Returns the start address of the mapping on success, else returns
> - * NULL on failure.
> - */
> -static void *get_source_mapping(struct config c)
> -{
> -	unsigned long long addr = 0ULL;
> -	void *src_addr = NULL;
> -	unsigned long long mmap_min_addr;
> -
> -	mmap_min_addr = get_mmap_min_addr();
> -	/*
> -	 * For some tests, we need to not have any mappings below the
> -	 * source mapping. Add some headroom to mmap_min_addr for this.
> -	 */
> -	mmap_min_addr += 10 * _4MB;
> -
> -retry:
> -	addr += c.src_alignment;
> -	if (addr < mmap_min_addr)
> -		goto retry;
> -
> -	src_addr = mmap((void *) addr, c.region_size, PROT_READ | PROT_WRITE,
> -					MAP_FIXED_NOREPLACE | MAP_ANONYMOUS | MAP_SHARED,
> -					-1, 0);
> -	if (src_addr == MAP_FAILED) {
> -		if (errno == EPERM || errno == EEXIST)
> -			goto retry;
> -		goto error;
> -	}
> -	/*
> -	 * Check that the address is aligned to the specified alignment.
> -	 * Addresses which have alignments that are multiples of that
> -	 * specified are not considered valid. For instance, 1GB address is
> -	 * 2MB-aligned, however it will not be considered valid for a
> -	 * requested alignment of 2MB. This is done to reduce coincidental
> -	 * alignment in the tests.
> -	 */
> -	if (((unsigned long long) src_addr & (c.src_alignment - 1)) ||
> -			!((unsigned long long) src_addr & c.src_alignment)) {
> -		munmap(src_addr, c.region_size);
> -		goto retry;
> -	}
> -
> -	if (!src_addr)
> -		goto error;
> -
> -	return src_addr;
> -error:
> -	ksft_print_msg("Failed to map source region: %s\n",
> -			strerror(errno));
> -	return NULL;
> -}
> -
>  /* Returns the time taken for the remap on success else returns -1. */
>  static long long remap_region(struct config c, unsigned int threshold_mb,
>  			      char pattern_seed)
> @@ -487,6 +488,83 @@ static long long remap_region(struct config c, unsigned int threshold_mb,
>  	return ret;
>  }
>
> +/*
> + * Verify that an mremap aligning down does not destroy
> + * the beginning of the mapping just because the aligned
> + * down address landed on a mapping that maybe does not exist.
> + */
> +static void mremap_move_1mb_from_start(char pattern_seed)
> +{
> +	char *test_name = "mremap move 1mb from start at 2MB+256KB aligned src";
> +	void *src = NULL, *dest = NULL;
> +	int i, success = 1;
> +
> +	/* Config to reuse get_source_mapping() to do an aligned mmap. */
> +	struct config c = {
> +		.src_alignment = SIZE_MB(1) + SIZE_KB(256),
> +		.region_size = SIZE_MB(6)
> +	};
> +
> +	src = get_source_mapping(c);
> +	if (!src) {
> +		success = 0;
> +		goto out;
> +	}
> +
> +	c.src_alignment = SIZE_MB(1) + SIZE_KB(256);

Why are you setting this again?

> +	dest = get_source_mapping(c);
> +	if (!dest) {
> +		success = 0;
> +		goto out;
> +	}
> +
> +	/* Set byte pattern for source block. */
> +	srand(pattern_seed);
> +	for (i = 0; i < SIZE_MB(2); i++) {
> +		((char *)src)[i] = (char) rand();
> +	}
> +
> +	/*
> +	 * Unmap the beginning of dest so that the aligned address
> +	 * falls on no mapping.
> +	 */
> +	munmap(dest, SIZE_MB(1));

This actually aligns (no pun intended) with my comments on the min mmap
address + 4 MiB comments previously. But I guess for the generalised mremap
test you will still need to have that headroom...

> +
> +	void *new_ptr = mremap(src + SIZE_MB(1), SIZE_MB(1), SIZE_MB(1),
> +						   MREMAP_MAYMOVE | MREMAP_FIXED, dest + SIZE_MB(1));
> +	if (new_ptr == MAP_FAILED) {
> +		perror("mremap");
> +		success = 0;
> +		goto out;
> +	}
> +
> +	/* Verify byte pattern after remapping */
> +	srand(pattern_seed);
> +	for (i = 0; i < SIZE_MB(1); i++) {
> +		char c = (char) rand();
> +
> +		if (((char *)src)[i] != c) {
> +			ksft_print_msg("Data at src at %d got corrupted due to unrelated mremap\n",
> +				       i);
> +			ksft_print_msg("Expected: %#x\t Got: %#x\n", c & 0xff,
> +					((char *) src)[i] & 0xff);
> +			success = 0;
> +		}
> +	}
> +
> +out:
> +	if (src && munmap(src, c.region_size) == -1)
> +		perror("munmap src");
> +
> +	if (dest && munmap(dest, c.region_size) == -1)
> +		perror("munmap dest");
> +
> +	if (success)
> +		ksft_test_result_pass("%s\n", test_name);
> +	else
> +		ksft_test_result_fail("%s\n", test_name);
> +}
> +
>  static void run_mremap_test_case(struct test test_case, int *failures,
>  				 unsigned int threshold_mb,
>  				 unsigned int pattern_seed)
> @@ -565,7 +643,7 @@ int main(int argc, char **argv)
>  	unsigned int threshold_mb = VALIDATION_DEFAULT_THRESHOLD;
>  	unsigned int pattern_seed;
>  	int num_expand_tests = 2;
> -	int num_misc_tests = 1;
> +	int num_misc_tests = 2;
>  	struct test test_cases[MAX_TEST] = {};
>  	struct test perf_test_cases[MAX_PERF_TEST];
>  	int page_size;
> @@ -666,6 +744,7 @@ int main(int argc, char **argv)
>  	fclose(maps_fp);
>
>  	mremap_move_within_range(pattern_seed);
> +	mremap_move_1mb_from_start(pattern_seed);
>
>  	if (run_perf_tests) {
>  		ksft_print_msg("\n%s\n",
> --
> 2.42.0.rc1.204.g551eb34607-goog
>

Have you verified this test fails if you eliminate the vma->vm_start !=
addr_to_align check?

Unrelated to this patch, but I guess it might be tricky to come up with a
test for the shift_arg_pages() stack case?

Irrespective of the above, this looks correct to me so,

Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com>

Thanks for putting in so much effort on the testing side too! :)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v5 6/7] selftests: mm: Add a test for remapping within a range
  2023-08-27  9:57   ` Lorenzo Stoakes
@ 2023-08-27 10:15     ` Lorenzo Stoakes
  2023-08-28 18:59       ` Joel Fernandes
  2023-08-28 18:37     ` Joel Fernandes
  1 sibling, 1 reply; 25+ messages in thread
From: Lorenzo Stoakes @ 2023-08-27 10:15 UTC (permalink / raw)
  To: Joel Fernandes (Google)
  Cc: linux-kernel, linux-kselftest, linux-mm, Shuah Khan,
	Vlastimil Babka, Michal Hocko, Linus Torvalds, Kirill A Shutemov,
	Liam R. Howlett, Paul E. McKenney, Suren Baghdasaryan,
	Kalesh Singh, Lokesh Gidra

On Sun, Aug 27, 2023 at 10:57:59AM +0100, Lorenzo Stoakes wrote:
[snip]

> > +/*
> > + * Verify that an mremap within a range does not cause corruption
> > + * of unrelated part of range.
> > + *
> > + * Consider the following range which is 2MB aligned and is
> > + * a part of a larger 10MB range which is not shown. Each
> > + * character is 256KB below making the source and destination

Just noticed, I think you misspeak here, as this test doens't seem to
offset by 256 KiB? That is the strategy for mremap_move_1mb_from_start()
rather than this test so perhaps comment needs to be moved around?

 * 2MB each. The lower case letters are moved (s to d) and the
 * upper case letters are not moved. The below test verifies
 * that the upper case S letters are not corrupted by the
 * adjacent mremap.
 *
 * |DDDDddddSSSSssss|
 */
 static void mremap_move_within_range(char pattern_seed)

[snip]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v5 2/7] mm/mremap: Allow moves within the same VMA
  2023-08-27  9:21   ` Lorenzo Stoakes
@ 2023-08-28 18:32     ` Joel Fernandes
  2023-08-28 19:00       ` Lorenzo Stoakes
  0 siblings, 1 reply; 25+ messages in thread
From: Joel Fernandes @ 2023-08-28 18:32 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: linux-kernel, linux-kselftest, linux-mm, Shuah Khan,
	Vlastimil Babka, Michal Hocko, Linus Torvalds, Kirill A Shutemov,
	Liam R. Howlett, Paul E. McKenney, Suren Baghdasaryan,
	Kalesh Singh, Lokesh Gidra

On Sun, Aug 27, 2023 at 10:21:14AM +0100, Lorenzo Stoakes wrote:
[..] 
> >
> >  /*
> >   * Flags used by change_protection().  For now we make it a bitmap so
> > diff --git a/mm/mremap.c b/mm/mremap.c
> > index 035fbf542a8f..06baa13bd2c8 100644
> > --- a/mm/mremap.c
> > +++ b/mm/mremap.c
> > @@ -490,12 +490,13 @@ static bool move_pgt_entry(enum pgt_entry entry, struct vm_area_struct *vma,
> >  }
> >
> >  /*
> > - * A helper to check if a previous mapping exists. Required for
> > - * move_page_tables() and realign_addr() to determine if a previous mapping
> > - * exists before we can do realignment optimizations.
> > + * A helper to check if aligning down is OK. The aligned address should fall
> > + * on *no mapping*. For the stack moving down, that's a special move within
> > + * the VMA that is created to span the source and destination of the move,
> > + * so we make an exception for it.
> >   */
> >  static bool can_align_down(struct vm_area_struct *vma, unsigned long addr_to_align,
> > -			       unsigned long mask)
> > +			    unsigned long mask, bool for_stack)
> >  {
> >  	unsigned long addr_masked = addr_to_align & mask;
> >
> > @@ -504,7 +505,7 @@ static bool can_align_down(struct vm_area_struct *vma, unsigned long addr_to_ali
> >  	 * of the corresponding VMA, we can't align down or we will destroy part
> >  	 * of the current mapping.
> >  	 */
> > -	if (vma->vm_start != addr_to_align)
> > +	if (!for_stack && vma->vm_start != addr_to_align)
> >  		return false;
> 
> I'm a little confused by this exception, is it very specifically for the
> shift_arg_pages() case where can assume we are safe to just discard the
> lower portion of the stack?
> 
> Wouldn't the find_vma_intersection() line below fail in this case? I may be
> missing something here :)

I think you are right. In v4, this was not an issue as we did this:


+	if (!for_stack && vma->vm_start != addr_to_align)
+		return false;
+
+	cur = find_vma_prev(vma->vm_mm, vma->vm_start, &prev);
+	if (WARN_ON_ONCE(cur != vma))
+		return false;

Which essentially means this patch is a NOOP in v5 for the stack case.

So what we really want is the VMA previous to @vma and whether than subsumes
the masked address.

Should I just change it back to the v4 version then as above for both patch 1
and 2 and carry your review tags?

This is also hard to test as it requires triggering the execve stack move
case. Though it is not a bug (as it is essentially a NOOP), it still would be
nice to test it. This is complicated by also the fact that mremap(2) itself
does not allow overlapping moves. I could try to hardcode the unfavorable
situation as I have done in the past to force that mremap warning.

thanks,

 - Joel


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v5 5/7] selftests: mm: Add a test for remapping to area immediately after existing mapping
  2023-08-27  9:42   ` Lorenzo Stoakes
@ 2023-08-28 18:36     ` Joel Fernandes
  0 siblings, 0 replies; 25+ messages in thread
From: Joel Fernandes @ 2023-08-28 18:36 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: linux-kernel, linux-kselftest, linux-mm, Shuah Khan,
	Vlastimil Babka, Michal Hocko, Linus Torvalds, Kirill A Shutemov,
	Liam R. Howlett, Paul E. McKenney, Suren Baghdasaryan,
	Kalesh Singh, Lokesh Gidra

On Sun, Aug 27, 2023 at 10:42:44AM +0100, Lorenzo Stoakes wrote:
> On Tue, Aug 22, 2023 at 01:54:58AM +0000, Joel Fernandes (Google) wrote:
> > This patch adds support for verifying that we correctly handle the
> > situation where something is already mapped before the destination of the remap.
> >
> > Any realignment of destination address and PMD-copy will destroy that
> > existing mapping. In such cases, we need to avoid doing the optimization.
> >
> > To test this, we map an area called the preamble before the remap
> > region. Then we verify after the mremap operation that this region did not get
> > corrupted.
> >
> > Putting some prints in the kernel, I verified that we optimize
> > correctly in different situations:
> >
> > Optimize when there is alignment and no previous mapping (this is tested
> > by previous patch).
> > <prints>
> > can_align_down(old_vma->vm_start=2900000, old_addr=2900000, mask=-2097152): 0
> > can_align_down(new_vma->vm_start=2f00000, new_addr=2f00000, mask=-2097152): 0
> > === Starting move_page_tables ===
> > Doing PUD move for 2800000 -> 2e00000 of extent=200000 <-- Optimization
> > Doing PUD move for 2a00000 -> 3000000 of extent=200000
> > Doing PUD move for 2c00000 -> 3200000 of extent=200000
> > </prints>
> >
> > Don't optimize when there is alignment but there is previous mapping
> > (this is tested by this patch).
> > Notice that can_align_down() returns 1 for the destination mapping
> > as we detected there is something there.
> > <prints>
> > can_align_down(old_vma->vm_start=2900000, old_addr=2900000, mask=-2097152): 0
> > can_align_down(new_vma->vm_start=5700000, new_addr=5700000, mask=-2097152): 1
> > === Starting move_page_tables ===
> > Doing move_ptes for 2900000 -> 5700000 of extent=100000 <-- Unoptimized
> > Doing PUD move for 2a00000 -> 5800000 of extent=200000
> > Doing PUD move for 2c00000 -> 5a00000 of extent=200000
> > </prints>
> >
> 
> Have you additionally tested this by changing the code to be intentionally
> broken then running the test and observing it fail?

Yes I did! Because while developing the patch, it was broken many times and
the test failed during those times. ;-)

> Looks good to me,
> Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com>

Thanks!

 - Joel


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v5 6/7] selftests: mm: Add a test for remapping within a range
  2023-08-27  9:57   ` Lorenzo Stoakes
  2023-08-27 10:15     ` Lorenzo Stoakes
@ 2023-08-28 18:37     ` Joel Fernandes
  1 sibling, 0 replies; 25+ messages in thread
From: Joel Fernandes @ 2023-08-28 18:37 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: linux-kernel, linux-kselftest, linux-mm, Shuah Khan,
	Vlastimil Babka, Michal Hocko, Linus Torvalds, Kirill A Shutemov,
	Liam R. Howlett, Paul E. McKenney, Suren Baghdasaryan,
	Kalesh Singh, Lokesh Gidra

On Sun, Aug 27, 2023 at 10:57:59AM +0100, Lorenzo Stoakes wrote:
> On Tue, Aug 22, 2023 at 01:54:59AM +0000, Joel Fernandes (Google) wrote:
> > Move a block of memory within a memory range. Any alignment optimization
> > on the source address may cause corruption. Verify using kselftest that
> > it works. I have also verified with tracing that such optimization does
> > not happen due to this check in can_align_down():
> >
> > if (!for_stack && vma->vm_start <= addr_masked)
> > 	return false;
> >
> > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > ---
> >  tools/testing/selftests/mm/mremap_test.c | 79 +++++++++++++++++++++++-
> >  1 file changed, 78 insertions(+), 1 deletion(-)
> >
> > diff --git a/tools/testing/selftests/mm/mremap_test.c b/tools/testing/selftests/mm/mremap_test.c
> > index d7366074e2a8..f45d1abedc9c 100644
> > --- a/tools/testing/selftests/mm/mremap_test.c
> > +++ b/tools/testing/selftests/mm/mremap_test.c
> > @@ -23,6 +23,7 @@
> >  #define VALIDATION_NO_THRESHOLD 0	/* Verify the entire region */
> >
> >  #define MIN(X, Y) ((X) < (Y) ? (X) : (Y))
> > +#define SIZE_MB(m) ((size_t)m * (1024 * 1024))
> 
> Nit in this instance since you always just use an integer, but generally
> for good practice's sake - shouldn't we place m in parens e.g. (size_t)(m)
> to avoid broken macro expansion?

Sure, I'll do that. Thanks.

> I drew a little diagram for myself and was thereby convinced this was a good test, therefore,
> 
> Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com>

Thanks a lot!

 - Joel


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v5 6/7] selftests: mm: Add a test for remapping within a range
  2023-08-27 10:15     ` Lorenzo Stoakes
@ 2023-08-28 18:59       ` Joel Fernandes
  2023-08-28 19:38         ` Lorenzo Stoakes
  0 siblings, 1 reply; 25+ messages in thread
From: Joel Fernandes @ 2023-08-28 18:59 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: linux-kernel, linux-kselftest, linux-mm, Shuah Khan,
	Vlastimil Babka, Michal Hocko, Linus Torvalds, Kirill A Shutemov,
	Liam R. Howlett, Paul E. McKenney, Suren Baghdasaryan,
	Kalesh Singh, Lokesh Gidra

On Sun, Aug 27, 2023 at 11:15:20AM +0100, Lorenzo Stoakes wrote:
> On Sun, Aug 27, 2023 at 10:57:59AM +0100, Lorenzo Stoakes wrote:
> [snip]
> 
> > > +/*
> > > + * Verify that an mremap within a range does not cause corruption
> > > + * of unrelated part of range.
> > > + *
> > > + * Consider the following range which is 2MB aligned and is
> > > + * a part of a larger 10MB range which is not shown. Each
> > > + * character is 256KB below making the source and destination
> 
> Just noticed, I think you misspeak here, as this test doens't seem to
> offset by 256 KiB? That is the strategy for mremap_move_1mb_from_start()
> rather than this test so perhaps comment needs to be moved around?
> 
>  * 2MB each. The lower case letters are moved (s to d) and the
>  * upper case letters are not moved. The below test verifies
>  * that the upper case S letters are not corrupted by the
>  * adjacent mremap.
>  *
>  * |DDDDddddSSSSssss|
>  */
>  static void mremap_move_within_range(char pattern_seed)

Here we are moving 1MB within a 4MB zone of a large mapping. Each character
's' or 'd' is 256KB. The 256KB there is just for illustration and not really
significant as such. The 'ssss' is moved to 'dddd' 1MB each. Here we make
sure that this move did not accidentally corrupt 'SSSS' and 'DDDD' due to
alignment optimization. Basically to protect from this, we check in the code
that the source address is beginning of the VMA:
+	if (vma->vm_start != addr_to_align)
+		return false;

But you did point an issue which is I need to change the comment from 'larger
10MB' to 'larger 20MB'.

In the mremap_move_1mb_from_start() test, I request for an alignment of
1.25MB so that when I align down, I fall no mapping. This is to catch a bug
that Linus found which is that just because an aligned down address did not
fall on a mapping, that doesn't mean we can just move it at PMD-level
otherwise we destroy the mapping.

I do need to update the test name in mremap_move_1mb_from_start() to: "mremap
move 1mb from start at 1MB+256KB aligned src". So thanks for point this!

Would that sort it out or is there still something in the comment I am
missing?

Thanks!

 - Joel






> 
> [snip]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v5 2/7] mm/mremap: Allow moves within the same VMA
  2023-08-28 18:32     ` Joel Fernandes
@ 2023-08-28 19:00       ` Lorenzo Stoakes
  2023-08-28 20:26         ` Joel Fernandes
  0 siblings, 1 reply; 25+ messages in thread
From: Lorenzo Stoakes @ 2023-08-28 19:00 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, linux-kselftest, linux-mm, Shuah Khan,
	Vlastimil Babka, Michal Hocko, Linus Torvalds, Kirill A Shutemov,
	Liam R. Howlett, Paul E. McKenney, Suren Baghdasaryan,
	Kalesh Singh, Lokesh Gidra

On Mon, Aug 28, 2023 at 06:32:40PM +0000, Joel Fernandes wrote:
> On Sun, Aug 27, 2023 at 10:21:14AM +0100, Lorenzo Stoakes wrote:
> [..]
> > >
> > >  /*
> > >   * Flags used by change_protection().  For now we make it a bitmap so
> > > diff --git a/mm/mremap.c b/mm/mremap.c
> > > index 035fbf542a8f..06baa13bd2c8 100644
> > > --- a/mm/mremap.c
> > > +++ b/mm/mremap.c
> > > @@ -490,12 +490,13 @@ static bool move_pgt_entry(enum pgt_entry entry, struct vm_area_struct *vma,
> > >  }
> > >
> > >  /*
> > > - * A helper to check if a previous mapping exists. Required for
> > > - * move_page_tables() and realign_addr() to determine if a previous mapping
> > > - * exists before we can do realignment optimizations.
> > > + * A helper to check if aligning down is OK. The aligned address should fall
> > > + * on *no mapping*. For the stack moving down, that's a special move within
> > > + * the VMA that is created to span the source and destination of the move,
> > > + * so we make an exception for it.
> > >   */
> > >  static bool can_align_down(struct vm_area_struct *vma, unsigned long addr_to_align,
> > > -			       unsigned long mask)
> > > +			    unsigned long mask, bool for_stack)
> > >  {
> > >  	unsigned long addr_masked = addr_to_align & mask;
> > >
> > > @@ -504,7 +505,7 @@ static bool can_align_down(struct vm_area_struct *vma, unsigned long addr_to_ali
> > >  	 * of the corresponding VMA, we can't align down or we will destroy part
> > >  	 * of the current mapping.
> > >  	 */
> > > -	if (vma->vm_start != addr_to_align)
> > > +	if (!for_stack && vma->vm_start != addr_to_align)
> > >  		return false;
> >
> > I'm a little confused by this exception, is it very specifically for the
> > shift_arg_pages() case where can assume we are safe to just discard the
> > lower portion of the stack?
> >
> > Wouldn't the find_vma_intersection() line below fail in this case? I may be
> > missing something here :)
>
> I think you are right. In v4, this was not an issue as we did this:
>
>
> +	if (!for_stack && vma->vm_start != addr_to_align)
> +		return false;
> +
> +	cur = find_vma_prev(vma->vm_mm, vma->vm_start, &prev);
> +	if (WARN_ON_ONCE(cur != vma))
> +		return false;
>
> Which essentially means this patch is a NOOP in v5 for the stack case.

>
> So what we really want is the VMA previous to @vma and whether than subsumes
> the masked address.
>
> Should I just change it back to the v4 version then as above for both patch 1
> and 2 and carry your review tags?

You will not be surprised to hear that I'd rather not :) I think if we did
revert to that approach it'd need rework anyway, so I'd ask for a respin w/o
tag if we were to go down that road.

HOWEVER let's first clarify what we want to check.

My understand (please correct me if mistaken) is that there are two
acceptable cases:-

1. !for_stack

 addr_masked         addr_to_align
 |                   |
 v                   v
 .                   |-----|
 . <-must be empty-> | vma |
 .                   |-----|

2. for_stack

      addr_masked         addr_to_align
      |                   |
      v                   v
 |----.-------------------.-----|
 |    .        vma        .     |
 |----.-------------------.-----|

Meaning that there are only two cases that we should care about:-

1. !for_stack: addr_to_align == vma->vm_start and no other VMA exists
   between this and addr_masked

2. for_stack: addr_masked is in the same VMA as addr_to_align.

In this case, the check can surely be:-

return find_vma_intersection(vma->vm_mm, addr_masked, addr_to_align) ==
	(for_stack ? vma : NULL);

(maybe would be less ugly to actually assign the intersection value to a
local var and check that)

>
> This is also hard to test as it requires triggering the execve stack move
> case. Though it is not a bug (as it is essentially a NOOP), it still would be
> nice to test it. This is complicated by also the fact that mremap(2) itself
> does not allow overlapping moves. I could try to hardcode the unfavorable
> situation as I have done in the past to force that mremap warning.

I find this exception a bit confusing, why are we so adamant on performing
the optimisation in this case when it makes the code uglier and is rather
hard to understand? Does it really matter that much?

I wonder whether it wouldn't be better to just drop that (unless you really
felt strongly about it) for the patch set and then perhaps address it in a
follow up?

This may entirely be a product of my simply not entirely understanding this
case so do forgive the probing, I just want to make sure we handle it
correctly!

>
> thanks,
>
>  - Joel
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v5 6/7] selftests: mm: Add a test for remapping within a range
  2023-08-28 18:59       ` Joel Fernandes
@ 2023-08-28 19:38         ` Lorenzo Stoakes
  2023-08-28 20:10           ` Joel Fernandes
  0 siblings, 1 reply; 25+ messages in thread
From: Lorenzo Stoakes @ 2023-08-28 19:38 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, linux-kselftest, linux-mm, Shuah Khan,
	Vlastimil Babka, Michal Hocko, Linus Torvalds, Kirill A Shutemov,
	Liam R. Howlett, Paul E. McKenney, Suren Baghdasaryan,
	Kalesh Singh, Lokesh Gidra

On Mon, Aug 28, 2023 at 06:59:12PM +0000, Joel Fernandes wrote:
> On Sun, Aug 27, 2023 at 11:15:20AM +0100, Lorenzo Stoakes wrote:
> > On Sun, Aug 27, 2023 at 10:57:59AM +0100, Lorenzo Stoakes wrote:
> > [snip]
> >
> > > > +/*
> > > > + * Verify that an mremap within a range does not cause corruption
> > > > + * of unrelated part of range.
> > > > + *
> > > > + * Consider the following range which is 2MB aligned and is
> > > > + * a part of a larger 10MB range which is not shown. Each
> > > > + * character is 256KB below making the source and destination
> >
> > Just noticed, I think you misspeak here, as this test doens't seem to
> > offset by 256 KiB? That is the strategy for mremap_move_1mb_from_start()
> > rather than this test so perhaps comment needs to be moved around?
> >
> >  * 2MB each. The lower case letters are moved (s to d) and the
> >  * upper case letters are not moved. The below test verifies
> >  * that the upper case S letters are not corrupted by the
> >  * adjacent mremap.
> >  *
> >  * |DDDDddddSSSSssss|
> >  */
> >  static void mremap_move_within_range(char pattern_seed)
>
> Here we are moving 1MB within a 4MB zone of a large mapping. Each character
> 's' or 'd' is 256KB. The 256KB there is just for illustration and not really
> significant as such. The 'ssss' is moved to 'dddd' 1MB each. Here we make

Ahhh I see. I find that a little confusing here, perhaps clearer to say 'each
block of letters is 1 MiB in size' or something?

> sure that this move did not accidentally corrupt 'SSSS' and 'DDDD' due to
> alignment optimization. Basically to protect from this, we check in the code
> that the source address is beginning of the VMA:
> +	if (vma->vm_start != addr_to_align)
> +		return false;
>
> But you did point an issue which is I need to change the comment from 'larger
> 10MB' to 'larger 20MB'.

Did I? Well happy to take credit ;)

>
> In the mremap_move_1mb_from_start() test, I request for an alignment of
> 1.25MB so that when I align down, I fall no mapping. This is to catch a bug
> that Linus found which is that just because an aligned down address did not
> fall on a mapping, that doesn't mean we can just move it at PMD-level
> otherwise we destroy the mapping.

Yeah that case it all makes sense in, just wondered if the comment belonged
there but just me not reading your comment carefully!

>
> I do need to update the test name in mremap_move_1mb_from_start() to: "mremap
> move 1mb from start at 1MB+256KB aligned src". So thanks for point this!
>
> Would that sort it out or is there still something in the comment I am
> missing?

Just the suggestion above re: clarification, however it's not a show
stopper more of a small nit.

>
> Thanks!
>
>  - Joel

Cheers!

>
>
>
>
>
>
> >
> > [snip]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v5 6/7] selftests: mm: Add a test for remapping within a range
  2023-08-28 19:38         ` Lorenzo Stoakes
@ 2023-08-28 20:10           ` Joel Fernandes
  0 siblings, 0 replies; 25+ messages in thread
From: Joel Fernandes @ 2023-08-28 20:10 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: linux-kernel, linux-kselftest, linux-mm, Shuah Khan,
	Vlastimil Babka, Michal Hocko, Linus Torvalds, Kirill A Shutemov,
	Liam R. Howlett, Paul E. McKenney, Suren Baghdasaryan,
	Kalesh Singh, Lokesh Gidra

On Mon, Aug 28, 2023 at 08:38:17PM +0100, Lorenzo Stoakes wrote:
> On Mon, Aug 28, 2023 at 06:59:12PM +0000, Joel Fernandes wrote:
> > On Sun, Aug 27, 2023 at 11:15:20AM +0100, Lorenzo Stoakes wrote:
> > > On Sun, Aug 27, 2023 at 10:57:59AM +0100, Lorenzo Stoakes wrote:
> > > [snip]
> > >
> > > > > +/*
> > > > > + * Verify that an mremap within a range does not cause corruption
> > > > > + * of unrelated part of range.
> > > > > + *
> > > > > + * Consider the following range which is 2MB aligned and is
> > > > > + * a part of a larger 10MB range which is not shown. Each
> > > > > + * character is 256KB below making the source and destination
> > >
> > > Just noticed, I think you misspeak here, as this test doens't seem to
> > > offset by 256 KiB? That is the strategy for mremap_move_1mb_from_start()
> > > rather than this test so perhaps comment needs to be moved around?
> > >
> > >  * 2MB each. The lower case letters are moved (s to d) and the
> > >  * upper case letters are not moved. The below test verifies
> > >  * that the upper case S letters are not corrupted by the
> > >  * adjacent mremap.
> > >  *
> > >  * |DDDDddddSSSSssss|
> > >  */
> > >  static void mremap_move_within_range(char pattern_seed)
> >
> > Here we are moving 1MB within a 4MB zone of a large mapping. Each character
> > 's' or 'd' is 256KB. The 256KB there is just for illustration and not really
> > significant as such. The 'ssss' is moved to 'dddd' 1MB each. Here we make
> 
> Ahhh I see. I find that a little confusing here, perhaps clearer to say 'each
> block of letters is 1 MiB in size' or something?

Sure, I'll do that.
 
> Cheers!

Cheers. And thanks for the reviews!

 - Joel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v5 7/7] selftests: mm: Add a test for moving from an offset from start of mapping
  2023-08-27 10:12   ` Lorenzo Stoakes
@ 2023-08-28 20:17     ` Joel Fernandes
  0 siblings, 0 replies; 25+ messages in thread
From: Joel Fernandes @ 2023-08-28 20:17 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: linux-kernel, linux-kselftest, linux-mm, Shuah Khan,
	Vlastimil Babka, Michal Hocko, Linus Torvalds, Kirill A Shutemov,
	Liam R. Howlett, Paul E. McKenney, Suren Baghdasaryan,
	Kalesh Singh, Lokesh Gidra

On Sun, Aug 27, 2023 at 11:12:14AM +0100, Lorenzo Stoakes wrote:
> On Tue, Aug 22, 2023 at 01:55:00AM +0000, Joel Fernandes (Google) wrote:
> > From: Joel Fernandes <joel@joelfernandes.org>
> >
> > It is possible that the aligned address falls on no existing mapping,
> > however that does not mean that we can just align it down to that.
> > This test verifies that the "vma->vm_start != addr_to_align" check in
> > can_align_down() prevents disastrous results if aligning down when
> > source and dest are mutually aligned within a PMD but the source/dest
> > addresses requested are not at the beginning of the respective mapping
> > containing these addresses.
> >
> > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > ---
> >  tools/testing/selftests/mm/mremap_test.c | 189 ++++++++++++++++-------
> >  1 file changed, 134 insertions(+), 55 deletions(-)
> >
> > diff --git a/tools/testing/selftests/mm/mremap_test.c b/tools/testing/selftests/mm/mremap_test.c
> > index f45d1abedc9c..c71ac8306c89 100644
> > --- a/tools/testing/selftests/mm/mremap_test.c
> > +++ b/tools/testing/selftests/mm/mremap_test.c
> > @@ -24,6 +24,7 @@
> >
> >  #define MIN(X, Y) ((X) < (Y) ? (X) : (Y))
> >  #define SIZE_MB(m) ((size_t)m * (1024 * 1024))
> > +#define SIZE_KB(k) ((size_t)k * 1024)
> 
> Same comment as previous re: wrapping k in parens, it doesn't really matter
> here that much but for good practice (and consistency with MIN()) would be
> nice to do.

Ack.

> >
> >  struct config {
> >  	unsigned long long src_alignment;
> > @@ -148,6 +149,60 @@ static bool is_range_mapped(FILE *maps_fp, void *start, void *end)
> >  	return success;
> >  }
> >
> > +/*
> > + * Returns the start address of the mapping on success, else returns
> > + * NULL on failure.
> > + */
> > +static void *get_source_mapping(struct config c)
> > +{
> > +	unsigned long long addr = 0ULL;
> > +	void *src_addr = NULL;
> > +	unsigned long long mmap_min_addr;
> > +
> > +	mmap_min_addr = get_mmap_min_addr();
> > +	/*
> > +	 * For some tests, we need to not have any mappings below the
> > +	 * source mapping. Add some headroom to mmap_min_addr for this.
> > +	 */
> > +	mmap_min_addr += 10 * _4MB;
> > +
> > +retry:
> > +	addr += c.src_alignment;
> > +	if (addr < mmap_min_addr)
> > +		goto retry;
> > +
> > +	src_addr = mmap((void *) addr, c.region_size, PROT_READ | PROT_WRITE,
> > +					MAP_FIXED_NOREPLACE | MAP_ANONYMOUS | MAP_SHARED,
> > +					-1, 0);
> > +	if (src_addr == MAP_FAILED) {
> > +		if (errno == EPERM || errno == EEXIST)
> > +			goto retry;
> > +		goto error;
> > +	}
> > +	/*
> > +	 * Check that the address is aligned to the specified alignment.
> > +	 * Addresses which have alignments that are multiples of that
> > +	 * specified are not considered valid. For instance, 1GB address is
> > +	 * 2MB-aligned, however it will not be considered valid for a
> > +	 * requested alignment of 2MB. This is done to reduce coincidental
> > +	 * alignment in the tests.
> > +	 */
> > +	if (((unsigned long long) src_addr & (c.src_alignment - 1)) ||
> > +			!((unsigned long long) src_addr & c.src_alignment)) {
> > +		munmap(src_addr, c.region_size);
> > +		goto retry;
> > +	}
> > +
> > +	if (!src_addr)
> > +		goto error;
> > +
> > +	return src_addr;
> > +error:
> > +	ksft_print_msg("Failed to map source region: %s\n",
> > +			strerror(errno));
> > +	return NULL;
> > +}
> > +
> 
> A meta thing, but it'd be nice to separate out _moving_ functions into
> their own patch to make reviewing easier as here it's not obvious whether

Sure, good point and I'll give that a shot.

[..]
> > +/*
> > + * Verify that an mremap aligning down does not destroy
> > + * the beginning of the mapping just because the aligned
> > + * down address landed on a mapping that maybe does not exist.
> > + */
> > +static void mremap_move_1mb_from_start(char pattern_seed)
> > +{
> > +	char *test_name = "mremap move 1mb from start at 2MB+256KB aligned src";
> > +	void *src = NULL, *dest = NULL;
> > +	int i, success = 1;
> > +
> > +	/* Config to reuse get_source_mapping() to do an aligned mmap. */
> > +	struct config c = {
> > +		.src_alignment = SIZE_MB(1) + SIZE_KB(256),
> > +		.region_size = SIZE_MB(6)
> > +	};
> > +
> > +	src = get_source_mapping(c);
> > +	if (!src) {
> > +		success = 0;
> > +		goto out;
> > +	}
> > +
> > +	c.src_alignment = SIZE_MB(1) + SIZE_KB(256);
> 
> Why are you setting this again?

Yes, I may not need to but since 'c' is passed to a function, I just do it
for robustness. If you'd like, I can drop it.

> > +	dest = get_source_mapping(c);
> > +	if (!dest) {
> > +		success = 0;
> > +		goto out;
> > +	}
> > +
> > +	/* Set byte pattern for source block. */
> > +	srand(pattern_seed);
> > +	for (i = 0; i < SIZE_MB(2); i++) {
> > +		((char *)src)[i] = (char) rand();
> > +	}
> > +
> > +	/*
> > +	 * Unmap the beginning of dest so that the aligned address
> > +	 * falls on no mapping.
> > +	 */
> > +	munmap(dest, SIZE_MB(1));
> 
> This actually aligns (no pun intended) with my comments on the min mmap
> address + 4 MiB comments previously. But I guess for the generalised mremap
> test you will still need to have that headroom...

Yes, I guess the situation you are validly concerned about is if someone
mapped something just when I unmapped. However, as this is test code it
probably should be OK?

> > +
> > +	void *new_ptr = mremap(src + SIZE_MB(1), SIZE_MB(1), SIZE_MB(1),
> > +						   MREMAP_MAYMOVE | MREMAP_FIXED, dest + SIZE_MB(1));
> > +	if (new_ptr == MAP_FAILED) {
> > +		perror("mremap");
> > +		success = 0;
> > +		goto out;
> > +	}
> > +
> > +	/* Verify byte pattern after remapping */
> > +	srand(pattern_seed);
> > +	for (i = 0; i < SIZE_MB(1); i++) {
> > +		char c = (char) rand();
> > +
> > +		if (((char *)src)[i] != c) {
> > +			ksft_print_msg("Data at src at %d got corrupted due to unrelated mremap\n",
> > +				       i);
> > +			ksft_print_msg("Expected: %#x\t Got: %#x\n", c & 0xff,
> > +					((char *) src)[i] & 0xff);
> > +			success = 0;
> > +		}
> > +	}
> > +
> > +out:
> > +	if (src && munmap(src, c.region_size) == -1)
> > +		perror("munmap src");
> > +
> > +	if (dest && munmap(dest, c.region_size) == -1)
> > +		perror("munmap dest");
> > +
> > +	if (success)
> > +		ksft_test_result_pass("%s\n", test_name);
> > +	else
> > +		ksft_test_result_fail("%s\n", test_name);
> > +}
> > +
> >  static void run_mremap_test_case(struct test test_case, int *failures,
> >  				 unsigned int threshold_mb,
> >  				 unsigned int pattern_seed)
> > @@ -565,7 +643,7 @@ int main(int argc, char **argv)
> >  	unsigned int threshold_mb = VALIDATION_DEFAULT_THRESHOLD;
> >  	unsigned int pattern_seed;
> >  	int num_expand_tests = 2;
> > -	int num_misc_tests = 1;
> > +	int num_misc_tests = 2;
> >  	struct test test_cases[MAX_TEST] = {};
> >  	struct test perf_test_cases[MAX_PERF_TEST];
> >  	int page_size;
> > @@ -666,6 +744,7 @@ int main(int argc, char **argv)
> >  	fclose(maps_fp);
> >
> >  	mremap_move_within_range(pattern_seed);
> > +	mremap_move_1mb_from_start(pattern_seed);
> >
> >  	if (run_perf_tests) {
> >  		ksft_print_msg("\n%s\n",
> > --
> > 2.42.0.rc1.204.g551eb34607-goog
> >
> 
> Have you verified this test fails if you eliminate the vma->vm_start !=
> addr_to_align check?

Yes. When Linus found the issue, I wrote this very test to ensure that
we caught it and that the fix made the test pass.

> Unrelated to this patch, but I guess it might be tricky to come up with a
> test for the shift_arg_pages() stack case?

Yes I think so, at the moment I am resorting to manually hacking the kernel to
test that which is not ideal but..

> Irrespective of the above, this looks correct to me so,
> 
> Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com>
> Thanks for putting in so much effort on the testing side too! :)

My pleasure, and thanks for the awesome review! I'll get to work on it and
post a new version soon.

thanks,

 - Joel



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v5 2/7] mm/mremap: Allow moves within the same VMA
  2023-08-28 19:00       ` Lorenzo Stoakes
@ 2023-08-28 20:26         ` Joel Fernandes
  0 siblings, 0 replies; 25+ messages in thread
From: Joel Fernandes @ 2023-08-28 20:26 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: linux-kernel, linux-kselftest, linux-mm, Shuah Khan,
	Vlastimil Babka, Michal Hocko, Linus Torvalds, Kirill A Shutemov,
	Liam R. Howlett, Paul E. McKenney, Suren Baghdasaryan,
	Kalesh Singh, Lokesh Gidra

On Mon, Aug 28, 2023 at 08:00:18PM +0100, Lorenzo Stoakes wrote:
> On Mon, Aug 28, 2023 at 06:32:40PM +0000, Joel Fernandes wrote:
> > On Sun, Aug 27, 2023 at 10:21:14AM +0100, Lorenzo Stoakes wrote:
> > [..]
> > > >
> > > >  /*
> > > >   * Flags used by change_protection().  For now we make it a bitmap so
> > > > diff --git a/mm/mremap.c b/mm/mremap.c
> > > > index 035fbf542a8f..06baa13bd2c8 100644
> > > > --- a/mm/mremap.c
> > > > +++ b/mm/mremap.c
> > > > @@ -490,12 +490,13 @@ static bool move_pgt_entry(enum pgt_entry entry, struct vm_area_struct *vma,
> > > >  }
> > > >
> > > >  /*
> > > > - * A helper to check if a previous mapping exists. Required for
> > > > - * move_page_tables() and realign_addr() to determine if a previous mapping
> > > > - * exists before we can do realignment optimizations.
> > > > + * A helper to check if aligning down is OK. The aligned address should fall
> > > > + * on *no mapping*. For the stack moving down, that's a special move within
> > > > + * the VMA that is created to span the source and destination of the move,
> > > > + * so we make an exception for it.
> > > >   */
> > > >  static bool can_align_down(struct vm_area_struct *vma, unsigned long addr_to_align,
> > > > -			       unsigned long mask)
> > > > +			    unsigned long mask, bool for_stack)
> > > >  {
> > > >  	unsigned long addr_masked = addr_to_align & mask;
> > > >
> > > > @@ -504,7 +505,7 @@ static bool can_align_down(struct vm_area_struct *vma, unsigned long addr_to_ali
> > > >  	 * of the corresponding VMA, we can't align down or we will destroy part
> > > >  	 * of the current mapping.
> > > >  	 */
> > > > -	if (vma->vm_start != addr_to_align)
> > > > +	if (!for_stack && vma->vm_start != addr_to_align)
> > > >  		return false;
> > >
> > > I'm a little confused by this exception, is it very specifically for the
> > > shift_arg_pages() case where can assume we are safe to just discard the
> > > lower portion of the stack?
> > >
> > > Wouldn't the find_vma_intersection() line below fail in this case? I may be
> > > missing something here :)
> >
> > I think you are right. In v4, this was not an issue as we did this:
> >
> >
> > +	if (!for_stack && vma->vm_start != addr_to_align)
> > +		return false;
> > +
> > +	cur = find_vma_prev(vma->vm_mm, vma->vm_start, &prev);
> > +	if (WARN_ON_ONCE(cur != vma))
> > +		return false;
> >
> > Which essentially means this patch is a NOOP in v5 for the stack case.
> 
> >
> > So what we really want is the VMA previous to @vma and whether than subsumes
> > the masked address.
> >
> > Should I just change it back to the v4 version then as above for both patch 1
> > and 2 and carry your review tags?
> 
> You will not be surprised to hear that I'd rather not :) I think if we did
> revert to that approach it'd need rework anyway, so I'd ask for a respin w/o
> tag if we were to go down that road.
> 
> HOWEVER let's first clarify what we want to check.
> 
> My understand (please correct me if mistaken) is that there are two
> acceptable cases:-
> 
> 1. !for_stack
> 
>  addr_masked         addr_to_align
>  |                   |
>  v                   v
>  .                   |-----|
>  . <-must be empty-> | vma |
>  .                   |-----|
> 
> 2. for_stack
> 
>       addr_masked         addr_to_align
>       |                   |
>       v                   v
>  |----.-------------------.-----|
>  |    .        vma        .     |
>  |----.-------------------.-----|
> 
> Meaning that there are only two cases that we should care about:-
> 
> 1. !for_stack: addr_to_align == vma->vm_start and no other VMA exists
>    between this and addr_masked
> 
> 2. for_stack: addr_masked is in the same VMA as addr_to_align.
> 
> In this case, the check can surely be:-
> 
> return find_vma_intersection(vma->vm_mm, addr_masked, addr_to_align) ==
> 	(for_stack ? vma : NULL);
> 
> (maybe would be less ugly to actually assign the intersection value to a
> local var and check that)

For completness: Lorenzo made some valid points on IRC and we'll do this
patch (2/7) like this for v6 after sufficient testing.

static bool can_align_down(struct vm_area_struct *vma, unsigned long addr_to_align,
                              unsigned long mask, bool for_stack)
{
       unsigned long addr_masked = addr_to_align & mask;
       /*
        * If @addr_to_align of either source or destination is not the beginning
        * of the corresponding VMA, we can't align down or we will destroy part
        * of the current mapping for cases other than the stack.
        */
       if (!for_stack && vma->vm_start != addr_to_align)
	       return false;

       /* In the stack case we explicitly permit in-VMA alignment. */
       if (for_stack && addr_masked >= vma->vm_start)
	       return true;

       /*
        * Make sure the realignment doesn't cause the address to fall on an
        * existing mapping.
        */
       return find_vma_intersection(vma->vm_mm, addr_masked, vma->vm_start) == NULL;
}

Thanks Lorenzo for the suggestion!

> >
> > This is also hard to test as it requires triggering the execve stack move
> > case. Though it is not a bug (as it is essentially a NOOP), it still would be
> > nice to test it. This is complicated by also the fact that mremap(2) itself
> > does not allow overlapping moves. I could try to hardcode the unfavorable
> > situation as I have done in the past to force that mremap warning.
> 
> I find this exception a bit confusing, why are we so adamant on performing
> the optimisation in this case when it makes the code uglier and is rather
> hard to understand? Does it really matter that much?

Let me know if you still felt it made the code uglier, but it looks like just
one more if() condition. And who knows may be in the future we want to do
such overlapping moves for other cases? ;)

> I wonder whether it wouldn't be better to just drop that (unless you really
> felt strongly about it) for the patch set and then perhaps address it in a
> follow up?
> This may entirely be a product of my simply not entirely understanding this
> case so do forgive the probing, I just want to make sure we handle it
> correctly!

It was just to avoid that false-positive warning where we can align down the
stack move to avoid warnings about zero'd PMDs. We could certainly do it in
this series or as a follow-up but since we came up with the above snippet, I
will keep it in this series for now and hopefully you are ok with that.

thanks,

 - Joel


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2023-08-28 20:27 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-22  1:54 [PATCH v5 0/7] Optimize mremap during mutual alignment within PMD Joel Fernandes (Google)
2023-08-22  1:54 ` [PATCH v5 1/7] mm/mremap: Optimize the start addresses in move_page_tables() Joel Fernandes (Google)
2023-08-27  9:09   ` Lorenzo Stoakes
2023-08-22  1:54 ` [PATCH v5 2/7] mm/mremap: Allow moves within the same VMA Joel Fernandes (Google)
2023-08-27  9:21   ` Lorenzo Stoakes
2023-08-28 18:32     ` Joel Fernandes
2023-08-28 19:00       ` Lorenzo Stoakes
2023-08-28 20:26         ` Joel Fernandes
2023-08-22  1:54 ` [PATCH v5 3/7] selftests: mm: Fix failure case when new remap region was not found Joel Fernandes (Google)
2023-08-27  9:22   ` Lorenzo Stoakes
2023-08-22  1:54 ` [PATCH v5 4/7] selftests: mm: Add a test for mutually aligned moves > PMD size Joel Fernandes (Google)
2023-08-27  9:36   ` Lorenzo Stoakes
2023-08-22  1:54 ` [PATCH v5 5/7] selftests: mm: Add a test for remapping to area immediately after existing mapping Joel Fernandes (Google)
2023-08-27  9:42   ` Lorenzo Stoakes
2023-08-28 18:36     ` Joel Fernandes
2023-08-22  1:54 ` [PATCH v5 6/7] selftests: mm: Add a test for remapping within a range Joel Fernandes (Google)
2023-08-27  9:57   ` Lorenzo Stoakes
2023-08-27 10:15     ` Lorenzo Stoakes
2023-08-28 18:59       ` Joel Fernandes
2023-08-28 19:38         ` Lorenzo Stoakes
2023-08-28 20:10           ` Joel Fernandes
2023-08-28 18:37     ` Joel Fernandes
2023-08-22  1:55 ` [PATCH v5 7/7] selftests: mm: Add a test for moving from an offset from start of mapping Joel Fernandes (Google)
2023-08-27 10:12   ` Lorenzo Stoakes
2023-08-28 20:17     ` Joel Fernandes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).