All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/44] VMA tree type safety and remove __vma_adjust()
@ 2023-01-05 19:15 Liam Howlett
  2023-01-05 19:15 ` [PATCH v2 01/44] maple_tree: Add mas_init() function Liam Howlett
                   ` (44 more replies)
  0 siblings, 45 replies; 63+ messages in thread
From: Liam Howlett @ 2023-01-05 19:15 UTC (permalink / raw)
  To: maple-tree, linux-mm, linux-kernel, Andrew Morton; +Cc: Liam Howlett

From: "Liam R. Howlett" <Liam.Howlett@oracle.com>

Andrew,

This patch set does two things: 1. Clean up, including removal of
__vma_adjust() and 2. Extends the VMA iterator API to provide type
safety to the VMA operations using the maple tree, as requested by Linus
[1].

It also addresses another issue of usability brought up by Linus about
needing to modify the maple state within the loops.  The maple state has
been replaced by the VMA iterator and the iterator is now modified
within the MM code so the caller should not need to worry about doing
the work themselves when tree modifications occur.

This brought up a potential inconsistency of the iterator state and what
the user expects, so the inconsistency is addressed to keep the VMA
iterator safe for use after the looping over a VMA range.  This is
addressed in patch 3 ("maple_tree: Reduce user error potential") and 4
("test_maple_tree: Test modifications while iterating").

While cleaning up the state, the duplicate locking code in mm/mmap.c
introduced by the maple tree has been address by abstracting it to two
functions: vma_prepare() and vma_complete().  These abstractions allowed
for a much simpler __vma_adjust(), which eventually leads to the removal
of the __vma_adjust() function by placing the logic into the vma_merge()
function itself.

1. https://lore.kernel.org/linux-mm/CAHk-=wg9WQXBGkNdKD2bqocnN73rDswuWsavBB7T-tekykEn_A@mail.gmail.com/

Changes since v1:
 - Changed the subject to better highlight the removal of __vma_adjust()
 - Converted damon test code to use the maple tree functions as apposed
   to vma_mas_store().  This added an extra patch to the series
 - Wrap debug output in vma_iter_store() with DEBUG_VM_MAPLE_TREE config
   option
 - Fix comment in mm/rmap.c referencing __vma_adjust()

v1: https://lore.kernel.org/linux-mm/20221129164352.3374638-1-Liam.Howlett@oracle.com/

Liam R. Howlett (44):
  maple_tree: Add mas_init() function
  maple_tree: Fix potential rcu issue
  maple_tree: Reduce user error potential
  test_maple_tree: Test modifications while iterating
  mm: Expand vma iterator interface.
  mm/mmap: convert brk to use vma iterator
  kernel/fork: Convert forking to using the vmi iterator
  mmap: Convert vma_link() vma iterator
  mm/mmap: Remove preallocation from do_mas_align_munmap()
  mmap: Change do_mas_munmap and do_mas_aligned_munmap() to use vma
    iterator
  mmap: Convert vma_expand() to use vma iterator
  mm: Add temporary vma iterator versions of vma_merge(), split_vma(),
    and __split_vma()
  ipc/shm: Use the vma iterator for munmap calls
  userfaultfd: Use vma iterator
  mm: Change mprotect_fixup to vma iterator
  mlock: Convert mlock to vma iterator
  coredump: Convert to vma iterator
  mempolicy: Convert to vma iterator
  task_mmu: Convert to vma iterator
  sched: Convert to vma iterator
  madvise: Use vmi iterator for __split_vma() and vma_merge()
  mmap: Pass through vmi iterator to __split_vma()
  mmap: Use vmi version of vma_merge()
  mm/mremap: Use vmi version of vma_merge()
  mm: Switch vma_merge(), split_vma(), and __split_vma to vma iterator
  mm/damon: Stop using vma_mas_store() for maple tree store
  mmap: Convert __vma_adjust() to use vma iterator
  mm: Pass through vma iterator to __vma_adjust()
  madvise: Use split_vma() instead of __split_vma()
  mm: Remove unnecessary write to vma iterator in __vma_adjust()
  mm: Pass vma iterator through to __vma_adjust()
  mm: Add vma iterator to vma_adjust() arguments
  mmap: Clean up mmap_region() unrolling
  mm: Change munmap splitting order and move_vma()
  mm/mmap: move anon_vma setting in __vma_adjust()
  mm/mmap: Refactor locking out of __vma_adjust()
  mm/mmap: Use vma_prepare() and vma_complete() in vma_expand()
  mm/mmap: Introduce init_vma_prep() and init_multi_vma_prep()
  mm: Don't use __vma_adjust() in __split_vma()
  mm/mmap: Don't use __vma_adjust() in shift_arg_pages()
  mm/mmap: Introduce dup_vma_anon() helper
  mm/mmap: Convert do_brk_flags() to use vma_prepare() and
    vma_complete()
  mm/mmap: Remove __vma_adjust()
  vma_merge: Set vma iterator to correct position.

 fs/coredump.c              |    8 +-
 fs/exec.c                  |   16 +-
 fs/proc/task_mmu.c         |   14 +-
 fs/userfaultfd.c           |   88 ++-
 include/linux/maple_tree.h |   11 +
 include/linux/mm.h         |   87 ++-
 include/linux/mm_types.h   |    4 +-
 ipc/shm.c                  |   11 +-
 kernel/events/uprobes.c    |    2 +-
 kernel/fork.c              |   19 +-
 kernel/sched/fair.c        |   14 +-
 lib/maple_tree.c           |   12 +-
 lib/test_maple_tree.c      |   72 +++
 mm/damon/vaddr-test.h      |    6 +-
 mm/filemap.c               |    2 +-
 mm/internal.h              |   13 +
 mm/madvise.c               |   13 +-
 mm/mempolicy.c             |   25 +-
 mm/mlock.c                 |   57 +-
 mm/mmap.c                  | 1076 ++++++++++++++++++------------------
 mm/mprotect.c              |   47 +-
 mm/mremap.c                |   42 +-
 mm/rmap.c                  |   15 +-
 23 files changed, 876 insertions(+), 778 deletions(-)

-- 
2.35.1

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH v2 01/44] maple_tree: Add mas_init() function
  2023-01-05 19:15 [PATCH v2 00/44] VMA tree type safety and remove __vma_adjust() Liam Howlett
@ 2023-01-05 19:15 ` Liam Howlett
  2023-01-05 19:15 ` [PATCH v2 02/44] maple_tree: Fix potential rcu issue Liam Howlett
                   ` (43 subsequent siblings)
  44 siblings, 0 replies; 63+ messages in thread
From: Liam Howlett @ 2023-01-05 19:15 UTC (permalink / raw)
  To: maple-tree, linux-mm, linux-kernel, Andrew Morton
  Cc: Liam Howlett, Liam Howlett

From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>

Add a function that will zero out the maple state struct and set some
basic defaults.

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
---
 include/linux/maple_tree.h | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/include/linux/maple_tree.h b/include/linux/maple_tree.h
index e594db58a0f1..3f972602c978 100644
--- a/include/linux/maple_tree.h
+++ b/include/linux/maple_tree.h
@@ -433,6 +433,7 @@ struct ma_wr_state {
 		.min = 0,						\
 		.max = ULONG_MAX,					\
 		.alloc = NULL,						\
+		.mas_flags = 0,						\
 	}
 
 #define MA_WR_STATE(name, ma_state, wr_entry)				\
@@ -471,6 +472,16 @@ void *mas_next(struct ma_state *mas, unsigned long max);
 int mas_empty_area(struct ma_state *mas, unsigned long min, unsigned long max,
 		   unsigned long size);
 
+static inline void mas_init(struct ma_state *mas, struct maple_tree *tree,
+			    unsigned long addr)
+{
+	memset(mas, 0, sizeof(struct ma_state));
+	mas->tree = tree;
+	mas->index = mas->last = addr;
+	mas->max = ULONG_MAX;
+	mas->node = MAS_START;
+}
+
 /* Checks if a mas has not found anything */
 static inline bool mas_is_none(struct ma_state *mas)
 {
-- 
2.35.1

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 02/44] maple_tree: Fix potential rcu issue
  2023-01-05 19:15 [PATCH v2 00/44] VMA tree type safety and remove __vma_adjust() Liam Howlett
  2023-01-05 19:15 ` [PATCH v2 01/44] maple_tree: Add mas_init() function Liam Howlett
@ 2023-01-05 19:15 ` Liam Howlett
  2023-01-05 19:15 ` [PATCH v2 03/44] maple_tree: Reduce user error potential Liam Howlett
                   ` (42 subsequent siblings)
  44 siblings, 0 replies; 63+ messages in thread
From: Liam Howlett @ 2023-01-05 19:15 UTC (permalink / raw)
  To: maple-tree, linux-mm, linux-kernel, Andrew Morton
  Cc: Liam Howlett, Liam Howlett

From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>

Ensure the node isn't dead after reading the node end.

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
---
 lib/maple_tree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/maple_tree.c b/lib/maple_tree.c
index 26e2045d3cda..f3c5ad9ff57f 100644
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -4661,13 +4661,13 @@ static inline void *mas_next_nentry(struct ma_state *mas,
 	pivots = ma_pivots(node, type);
 	slots = ma_slots(node, type);
 	mas->index = mas_safe_min(mas, pivots, mas->offset);
+	count = ma_data_end(node, type, pivots, mas->max);
 	if (ma_dead_node(node))
 		return NULL;
 
 	if (mas->index > max)
 		return NULL;
 
-	count = ma_data_end(node, type, pivots, mas->max);
 	if (mas->offset > count)
 		return NULL;
 
-- 
2.35.1

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 05/44] mm: Expand vma iterator interface.
  2023-01-05 19:15 [PATCH v2 00/44] VMA tree type safety and remove __vma_adjust() Liam Howlett
                   ` (2 preceding siblings ...)
  2023-01-05 19:15 ` [PATCH v2 03/44] maple_tree: Reduce user error potential Liam Howlett
@ 2023-01-05 19:15 ` Liam Howlett
  2023-01-05 19:15 ` [PATCH v2 04/44] test_maple_tree: Test modifications while iterating Liam Howlett
                   ` (40 subsequent siblings)
  44 siblings, 0 replies; 63+ messages in thread
From: Liam Howlett @ 2023-01-05 19:15 UTC (permalink / raw)
  To: maple-tree, linux-mm, linux-kernel, Andrew Morton
  Cc: Liam Howlett, Liam Howlett

From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>

Add wrappers for the maple tree to the vma iterator.  This will provide
type safety at compile time.

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
---
 include/linux/mm.h       | 46 +++++++++++++++++++++---
 include/linux/mm_types.h |  4 +--
 mm/mmap.c                | 77 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 120 insertions(+), 7 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index f3f196e4d66d..f4b964f96db1 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -665,16 +665,16 @@ static inline bool vma_is_accessible(struct vm_area_struct *vma)
 static inline
 struct vm_area_struct *vma_find(struct vma_iterator *vmi, unsigned long max)
 {
-	return mas_find(&vmi->mas, max);
+	return mas_find(&vmi->mas, max - 1);
 }
 
 static inline struct vm_area_struct *vma_next(struct vma_iterator *vmi)
 {
 	/*
-	 * Uses vma_find() to get the first VMA when the iterator starts.
+	 * Uses mas_find() to get the first VMA when the iterator starts.
 	 * Calling mas_next() could skip the first entry.
 	 */
-	return vma_find(vmi, ULONG_MAX);
+	return mas_find(&vmi->mas, ULONG_MAX);
 }
 
 static inline struct vm_area_struct *vma_prev(struct vma_iterator *vmi)
@@ -687,12 +687,50 @@ static inline unsigned long vma_iter_addr(struct vma_iterator *vmi)
 	return vmi->mas.index;
 }
 
+static inline unsigned long vma_iter_end(struct vma_iterator *vmi)
+{
+	return vmi->mas.last + 1;
+}
+static inline int vma_iter_bulk_alloc(struct vma_iterator *vmi,
+				      unsigned long count)
+{
+	return mas_expected_entries(&vmi->mas, count);
+}
+
+/* Free any unused preallocations */
+static inline void vma_iter_free(struct vma_iterator *vmi)
+{
+	mas_destroy(&vmi->mas);
+}
+
+static inline int vma_iter_bulk_store(struct vma_iterator *vmi,
+				      struct vm_area_struct *vma)
+{
+	vmi->mas.index = vma->vm_start;
+	vmi->mas.last = vma->vm_end - 1;
+	mas_store(&vmi->mas, vma);
+	if (unlikely(mas_is_err(&vmi->mas)))
+		return -ENOMEM;
+
+	return 0;
+}
+
+static inline void vma_iter_invalidate(struct vma_iterator *vmi)
+{
+	mas_pause(&vmi->mas);
+}
+
+static inline void vma_iter_set(struct vma_iterator *vmi, unsigned long addr)
+{
+	mas_set(&vmi->mas, addr);
+}
+
 #define for_each_vma(__vmi, __vma)					\
 	while (((__vma) = vma_next(&(__vmi))) != NULL)
 
 /* The MM code likes to work with exclusive end addresses */
 #define for_each_vma_range(__vmi, __vma, __end)				\
-	while (((__vma) = vma_find(&(__vmi), (__end) - 1)) != NULL)
+	while (((__vma) = vma_find(&(__vmi), (__end))) != NULL)
 
 #ifdef CONFIG_SHMEM
 /*
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 3b8475007734..3cd8b7034c48 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -904,9 +904,7 @@ struct vma_iterator {
 static inline void vma_iter_init(struct vma_iterator *vmi,
 		struct mm_struct *mm, unsigned long addr)
 {
-	vmi->mas.tree = &mm->mm_mt;
-	vmi->mas.index = addr;
-	vmi->mas.node = MAS_START;
+	mas_init(&vmi->mas, &mm->mm_mt, addr);
 }
 
 struct mmu_gather;
diff --git a/mm/mmap.c b/mm/mmap.c
index 87d929316d57..9318f2ac8a6e 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -144,6 +144,83 @@ static void remove_vma(struct vm_area_struct *vma)
 	vm_area_free(vma);
 }
 
+static inline struct vm_area_struct *vma_iter_load(struct vma_iterator *vmi)
+{
+	return mas_walk(&vmi->mas);
+}
+
+static inline struct vm_area_struct *vma_prev_limit(struct vma_iterator *vmi,
+						    unsigned long min)
+{
+	return mas_prev(&vmi->mas, min);
+}
+
+static inline int vma_iter_prealloc(struct vma_iterator *vmi,
+				    struct vm_area_struct *vma)
+{
+	return mas_preallocate(&vmi->mas, vma, GFP_KERNEL);
+}
+
+/* Store a VMA with preallocated memory */
+static inline void vma_iter_store(struct vma_iterator *vmi,
+				  struct vm_area_struct *vma)
+{
+
+#if defined(CONFIG_DEBUG_VM_MAPLE_TREE)
+	if (WARN_ON(vmi->mas.node != MAS_START && vmi->mas.index > vma->vm_start)) {
+		printk("%lu > %lu\n", vmi->mas.index, vma->vm_start);
+		printk("store of vma %lu-%lu", vma->vm_start, vma->vm_end);
+		printk("into slot    %lu-%lu", vmi->mas.index, vmi->mas.last);
+		mt_dump(vmi->mas.tree);
+	}
+	if (WARN_ON(vmi->mas.node != MAS_START && vmi->mas.last <  vma->vm_start)) {
+		printk("%lu < %lu\n", vmi->mas.last, vma->vm_start);
+		printk("store of vma %lu-%lu", vma->vm_start, vma->vm_end);
+		printk("into slot    %lu-%lu", vmi->mas.index, vmi->mas.last);
+		mt_dump(vmi->mas.tree);
+	}
+#endif
+
+	if (vmi->mas.node != MAS_START &&
+	    ((vmi->mas.index > vma->vm_start) || (vmi->mas.last < vma->vm_start)))
+		vma_iter_invalidate(vmi);
+
+	vmi->mas.index = vma->vm_start;
+	vmi->mas.last = vma->vm_end - 1;
+	mas_store_prealloc(&vmi->mas, vma);
+}
+
+static inline void vma_iter_clear(struct vma_iterator *vmi,
+				  unsigned long start, unsigned long end)
+{
+	mas_set_range(&vmi->mas, start, end - 1);
+	mas_store_prealloc(&vmi->mas, NULL);
+}
+
+static inline int vma_iter_store_gfp(struct vma_iterator *vmi,
+			struct vm_area_struct *vma, gfp_t gfp)
+{
+	vmi->mas.index = vma->vm_start;
+	vmi->mas.last = vma->vm_end - 1;
+	mas_store_gfp(&vmi->mas, vma, gfp);
+	if (unlikely(mas_is_err(&vmi->mas)))
+		return -ENOMEM;
+
+	return 0;
+}
+
+static inline int vma_iter_clear_gfp(struct vma_iterator *vmi,
+			unsigned long start, unsigned long end, gfp_t gfp)
+{
+	vmi->mas.index = start;
+	vmi->mas.last = end - 1;
+	mas_store_gfp(&vmi->mas, NULL, gfp);
+	if (unlikely(mas_is_err(&vmi->mas)))
+		return -ENOMEM;
+
+	return 0;
+}
+
 /*
  * check_brk_limits() - Use platform specific check of range & verify mlock
  * limits.
-- 
2.35.1

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 03/44] maple_tree: Reduce user error potential
  2023-01-05 19:15 [PATCH v2 00/44] VMA tree type safety and remove __vma_adjust() Liam Howlett
  2023-01-05 19:15 ` [PATCH v2 01/44] maple_tree: Add mas_init() function Liam Howlett
  2023-01-05 19:15 ` [PATCH v2 02/44] maple_tree: Fix potential rcu issue Liam Howlett
@ 2023-01-05 19:15 ` Liam Howlett
  2023-01-05 19:15 ` [PATCH v2 05/44] mm: Expand vma iterator interface Liam Howlett
                   ` (41 subsequent siblings)
  44 siblings, 0 replies; 63+ messages in thread
From: Liam Howlett @ 2023-01-05 19:15 UTC (permalink / raw)
  To: maple-tree, linux-mm, linux-kernel, Andrew Morton
  Cc: Liam Howlett, Liam Howlett

From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>

When iterating, a user may operate on the tree and cause the maple state
to be altered and left in an unintuitive state.  Detect this scenario
and correct it by setting to the limit and invalidating the state.

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
---
 lib/maple_tree.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/lib/maple_tree.c b/lib/maple_tree.c
index f3c5ad9ff57f..14cff87cf058 100644
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -4742,6 +4742,11 @@ static inline void *mas_next_entry(struct ma_state *mas, unsigned long limit)
 	unsigned long last;
 	enum maple_type mt;
 
+	if (mas->index > limit) {
+		mas->index = mas->last = limit;
+		mas_pause(mas);
+		return NULL;
+	}
 	last = mas->last;
 retry:
 	offset = mas->offset;
@@ -4848,6 +4853,11 @@ static inline void *mas_prev_entry(struct ma_state *mas, unsigned long min)
 {
 	void *entry;
 
+	if (mas->index < min) {
+		mas->index = mas->last = min;
+		mas_pause(mas);
+		return NULL;
+	}
 retry:
 	while (likely(!mas_is_none(mas))) {
 		entry = mas_prev_nentry(mas, min, mas->index);
-- 
2.35.1

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 04/44] test_maple_tree: Test modifications while iterating
  2023-01-05 19:15 [PATCH v2 00/44] VMA tree type safety and remove __vma_adjust() Liam Howlett
                   ` (3 preceding siblings ...)
  2023-01-05 19:15 ` [PATCH v2 05/44] mm: Expand vma iterator interface Liam Howlett
@ 2023-01-05 19:15 ` Liam Howlett
  2023-01-05 19:15 ` [PATCH v2 06/44] mm/mmap: convert brk to use vma iterator Liam Howlett
                   ` (39 subsequent siblings)
  44 siblings, 0 replies; 63+ messages in thread
From: Liam Howlett @ 2023-01-05 19:15 UTC (permalink / raw)
  To: maple-tree, linux-mm, linux-kernel, Andrew Morton
  Cc: Liam Howlett, Liam Howlett

From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>

Add a testcase to ensure the iterator detects bad states on modifications
and does what the user expects

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
---
 lib/test_maple_tree.c | 72 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 72 insertions(+)

diff --git a/lib/test_maple_tree.c b/lib/test_maple_tree.c
index 497fc93ccf9e..e9fe4f3486e8 100644
--- a/lib/test_maple_tree.c
+++ b/lib/test_maple_tree.c
@@ -1709,6 +1709,74 @@ static noinline void check_forking(struct maple_tree *mt)
 	mtree_destroy(&newmt);
 }
 
+static noinline void check_iteration(struct maple_tree *mt)
+{
+	int i, nr_entries = 125;
+	void *val;
+	MA_STATE(mas, mt, 0, 0);
+
+	for (i = 0; i <= nr_entries; i++)
+		mtree_store_range(mt, i * 10, i * 10 + 9,
+				  xa_mk_value(i), GFP_KERNEL);
+
+	mt_set_non_kernel(99999);
+
+	i = 0;
+	mas_lock(&mas);
+	mas_for_each(&mas, val, 925) {
+		MT_BUG_ON(mt, mas.index != i * 10);
+		MT_BUG_ON(mt, mas.last != i * 10 + 9);
+		/* Overwrite end of entry 92 */
+		if (i == 92) {
+			mas.index = 925;
+			mas.last = 929;
+			mas_store(&mas, val);
+		}
+		i++;
+	}
+	/* Ensure mas_find() gets the next value */
+	val = mas_find(&mas, ULONG_MAX);
+	MT_BUG_ON(mt, val != xa_mk_value(i));
+
+	mas_set(&mas, 0);
+	i = 0;
+	mas_for_each(&mas, val, 785) {
+		MT_BUG_ON(mt, mas.index != i * 10);
+		MT_BUG_ON(mt, mas.last != i * 10 + 9);
+		/* Overwrite start of entry 78 */
+		if (i == 78) {
+			mas.index = 780;
+			mas.last = 785;
+			mas_store(&mas, val);
+		} else {
+			i++;
+		}
+	}
+	val = mas_find(&mas, ULONG_MAX);
+	MT_BUG_ON(mt, val != xa_mk_value(i));
+
+	mas_set(&mas, 0);
+	i = 0;
+	mas_for_each(&mas, val, 765) {
+		MT_BUG_ON(mt, mas.index != i * 10);
+		MT_BUG_ON(mt, mas.last != i * 10 + 9);
+		/* Overwrite end of entry 76 and advance to the end */
+		if (i == 76) {
+			mas.index = 760;
+			mas.last = 765;
+			mas_store(&mas, val);
+			mas_next(&mas, ULONG_MAX);
+		}
+		i++;
+	}
+	/* Make sure the next find returns the one after 765, 766-769 */
+	val = mas_find(&mas, ULONG_MAX);
+	MT_BUG_ON(mt, val != xa_mk_value(76));
+	mas_unlock(&mas);
+	mas_destroy(&mas);
+	mt_set_non_kernel(0);
+}
+
 static noinline void check_mas_store_gfp(struct maple_tree *mt)
 {
 
@@ -2574,6 +2642,10 @@ static int maple_tree_seed(void)
 	goto skip;
 #endif
 
+	mt_init_flags(&tree, MT_FLAGS_ALLOC_RANGE);
+	check_iteration(&tree);
+	mtree_destroy(&tree);
+
 	mt_init_flags(&tree, MT_FLAGS_ALLOC_RANGE);
 	check_forking(&tree);
 	mtree_destroy(&tree);
-- 
2.35.1

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 06/44] mm/mmap: convert brk to use vma iterator
  2023-01-05 19:15 [PATCH v2 00/44] VMA tree type safety and remove __vma_adjust() Liam Howlett
                   ` (4 preceding siblings ...)
  2023-01-05 19:15 ` [PATCH v2 04/44] test_maple_tree: Test modifications while iterating Liam Howlett
@ 2023-01-05 19:15 ` Liam Howlett
  2023-01-09 15:10   ` Vernon Yang
  2023-01-05 19:15 ` [PATCH v2 08/44] mmap: Convert vma_link() " Liam Howlett
                   ` (38 subsequent siblings)
  44 siblings, 1 reply; 63+ messages in thread
From: Liam Howlett @ 2023-01-05 19:15 UTC (permalink / raw)
  To: maple-tree, linux-mm, linux-kernel, Andrew Morton
  Cc: Liam Howlett, Liam Howlett

From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>

Use the vma iterator API for the brk() system call.  This will provide
type safety at compile time.

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
---
 mm/mmap.c | 47 +++++++++++++++++++++++------------------------
 1 file changed, 23 insertions(+), 24 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index 9318f2ac8a6e..4a6f42ab3560 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -239,10 +239,10 @@ static int check_brk_limits(unsigned long addr, unsigned long len)
 
 	return mlock_future_check(current->mm, current->mm->def_flags, len);
 }
-static int do_brk_munmap(struct ma_state *mas, struct vm_area_struct *vma,
+static int do_brk_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
 			 unsigned long newbrk, unsigned long oldbrk,
 			 struct list_head *uf);
-static int do_brk_flags(struct ma_state *mas, struct vm_area_struct *brkvma,
+static int do_brk_flags(struct vma_iterator *vmi, struct vm_area_struct *brkvma,
 		unsigned long addr, unsigned long request, unsigned long flags);
 SYSCALL_DEFINE1(brk, unsigned long, brk)
 {
@@ -253,7 +253,7 @@ SYSCALL_DEFINE1(brk, unsigned long, brk)
 	bool populate;
 	bool downgraded = false;
 	LIST_HEAD(uf);
-	MA_STATE(mas, &mm->mm_mt, 0, 0);
+	struct vma_iterator vmi;
 
 	if (mmap_write_lock_killable(mm))
 		return -EINTR;
@@ -301,8 +301,8 @@ SYSCALL_DEFINE1(brk, unsigned long, brk)
 		int ret;
 
 		/* Search one past newbrk */
-		mas_set(&mas, newbrk);
-		brkvma = mas_find(&mas, oldbrk);
+		vma_iter_init(&vmi, mm, newbrk);
+		brkvma = vma_find(&vmi, oldbrk);
 		if (!brkvma || brkvma->vm_start >= oldbrk)
 			goto out; /* mapping intersects with an existing non-brk vma. */
 		/*
@@ -311,7 +311,7 @@ SYSCALL_DEFINE1(brk, unsigned long, brk)
 		 * before calling do_brk_munmap().
 		 */
 		mm->brk = brk;
-		ret = do_brk_munmap(&mas, brkvma, newbrk, oldbrk, &uf);
+		ret = do_brk_munmap(&vmi, brkvma, newbrk, oldbrk, &uf);
 		if (ret == 1)  {
 			downgraded = true;
 			goto success;
@@ -329,14 +329,14 @@ SYSCALL_DEFINE1(brk, unsigned long, brk)
 	 * Only check if the next VMA is within the stack_guard_gap of the
 	 * expansion area
 	 */
-	mas_set(&mas, oldbrk);
-	next = mas_find(&mas, newbrk - 1 + PAGE_SIZE + stack_guard_gap);
+	vma_iter_init(&vmi, mm, oldbrk);
+	next = vma_find(&vmi, newbrk + PAGE_SIZE + stack_guard_gap);
 	if (next && newbrk + PAGE_SIZE > vm_start_gap(next))
 		goto out;
 
-	brkvma = mas_prev(&mas, mm->start_brk);
+	brkvma = vma_prev_limit(&vmi, mm->start_brk);
 	/* Ok, looks good - let it rip. */
-	if (do_brk_flags(&mas, brkvma, oldbrk, newbrk - oldbrk, 0) < 0)
+	if (do_brk_flags(&vmi, brkvma, oldbrk, newbrk - oldbrk, 0) < 0)
 		goto out;
 
 	mm->brk = brk;
@@ -2963,7 +2963,7 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
 
 /*
  * brk_munmap() - Unmap a parital vma.
- * @mas: The maple tree state.
+ * @vmi: The vma iterator
  * @vma: The vma to be modified
  * @newbrk: the start of the address to unmap
  * @oldbrk: The end of the address to unmap
@@ -2973,7 +2973,7 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
  * unmaps a partial VMA mapping.  Does not handle alignment, downgrades lock if
  * possible.
  */
-static int do_brk_munmap(struct ma_state *mas, struct vm_area_struct *vma,
+static int do_brk_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
 			 unsigned long newbrk, unsigned long oldbrk,
 			 struct list_head *uf)
 {
@@ -2981,14 +2981,14 @@ static int do_brk_munmap(struct ma_state *mas, struct vm_area_struct *vma,
 	int ret;
 
 	arch_unmap(mm, newbrk, oldbrk);
-	ret = do_mas_align_munmap(mas, vma, mm, newbrk, oldbrk, uf, true);
+	ret = do_mas_align_munmap(&vmi->mas, vma, mm, newbrk, oldbrk, uf, true);
 	validate_mm_mt(mm);
 	return ret;
 }
 
 /*
  * do_brk_flags() - Increase the brk vma if the flags match.
- * @mas: The maple tree state.
+ * @vmi: The vma iterator
  * @addr: The start address
  * @len: The length of the increase
  * @vma: The vma,
@@ -2998,7 +2998,7 @@ static int do_brk_munmap(struct ma_state *mas, struct vm_area_struct *vma,
  * do not match then create a new anonymous VMA.  Eventually we may be able to
  * do some brk-specific accounting here.
  */
-static int do_brk_flags(struct ma_state *mas, struct vm_area_struct *vma,
+static int do_brk_flags(struct vma_iterator *vmi, struct vm_area_struct *vma,
 		unsigned long addr, unsigned long len, unsigned long flags)
 {
 	struct mm_struct *mm = current->mm;
@@ -3025,8 +3025,7 @@ static int do_brk_flags(struct ma_state *mas, struct vm_area_struct *vma,
 	if (vma && vma->vm_end == addr && !vma_policy(vma) &&
 	    can_vma_merge_after(vma, flags, NULL, NULL,
 				addr >> PAGE_SHIFT, NULL_VM_UFFD_CTX, NULL)) {
-		mas_set_range(mas, vma->vm_start, addr + len - 1);
-		if (mas_preallocate(mas, vma, GFP_KERNEL))
+		if (vma_iter_prealloc(vmi, vma))
 			goto unacct_fail;
 
 		vma_adjust_trans_huge(vma, vma->vm_start, addr + len, 0);
@@ -3036,7 +3035,7 @@ static int do_brk_flags(struct ma_state *mas, struct vm_area_struct *vma,
 		}
 		vma->vm_end = addr + len;
 		vma->vm_flags |= VM_SOFTDIRTY;
-		mas_store_prealloc(mas, vma);
+		vma_iter_store(vmi, vma);
 
 		if (vma->anon_vma) {
 			anon_vma_interval_tree_post_update_vma(vma);
@@ -3057,8 +3056,8 @@ static int do_brk_flags(struct ma_state *mas, struct vm_area_struct *vma,
 	vma->vm_pgoff = addr >> PAGE_SHIFT;
 	vma->vm_flags = flags;
 	vma->vm_page_prot = vm_get_page_prot(flags);
-	mas_set_range(mas, vma->vm_start, addr + len - 1);
-	if (mas_store_gfp(mas, vma, GFP_KERNEL))
+	mas_set_range(&vmi->mas, vma->vm_start, vma->vm_end - 1);
+	if (vma_iter_store_gfp(vmi, vma, GFP_KERNEL))
 		goto mas_store_fail;
 
 	mm->map_count++;
@@ -3087,7 +3086,7 @@ int vm_brk_flags(unsigned long addr, unsigned long request, unsigned long flags)
 	int ret;
 	bool populate;
 	LIST_HEAD(uf);
-	MA_STATE(mas, &mm->mm_mt, addr, addr);
+	VMA_ITERATOR(vmi, mm, addr);
 
 	len = PAGE_ALIGN(request);
 	if (len < request)
@@ -3106,12 +3105,12 @@ int vm_brk_flags(unsigned long addr, unsigned long request, unsigned long flags)
 	if (ret)
 		goto limits_failed;
 
-	ret = do_mas_munmap(&mas, mm, addr, len, &uf, 0);
+	ret = do_mas_munmap(&vmi.mas, mm, addr, len, &uf, 0);
 	if (ret)
 		goto munmap_failed;
 
-	vma = mas_prev(&mas, 0);
-	ret = do_brk_flags(&mas, vma, addr, len, flags);
+	vma = vma_prev(&vmi);
+	ret = do_brk_flags(&vmi, vma, addr, len, flags);
 	populate = ((mm->def_flags & VM_LOCKED) != 0);
 	mmap_write_unlock(mm);
 	userfaultfd_unmap_complete(mm, &uf);
-- 
2.35.1

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 08/44] mmap: Convert vma_link() vma iterator
  2023-01-05 19:15 [PATCH v2 00/44] VMA tree type safety and remove __vma_adjust() Liam Howlett
                   ` (5 preceding siblings ...)
  2023-01-05 19:15 ` [PATCH v2 06/44] mm/mmap: convert brk to use vma iterator Liam Howlett
@ 2023-01-05 19:15 ` Liam Howlett
  2023-01-05 19:15 ` [PATCH v2 07/44] kernel/fork: Convert forking to using the vmi iterator Liam Howlett
                   ` (37 subsequent siblings)
  44 siblings, 0 replies; 63+ messages in thread
From: Liam Howlett @ 2023-01-05 19:15 UTC (permalink / raw)
  To: maple-tree, linux-mm, linux-kernel, Andrew Morton
  Cc: Liam Howlett, Liam Howlett

From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>

Avoid using the maple tree interface directly.

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
---
 mm/mmap.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index 4a6f42ab3560..00b839cc499e 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -546,10 +546,10 @@ static inline void vma_mas_szero(struct ma_state *mas, unsigned long start,
 
 static int vma_link(struct mm_struct *mm, struct vm_area_struct *vma)
 {
-	MA_STATE(mas, &mm->mm_mt, 0, 0);
+	VMA_ITERATOR(vmi, mm, 0);
 	struct address_space *mapping = NULL;
 
-	if (mas_preallocate(&mas, vma, GFP_KERNEL))
+	if (vma_iter_prealloc(&vmi, vma))
 		return -ENOMEM;
 
 	if (vma->vm_file) {
@@ -557,7 +557,7 @@ static int vma_link(struct mm_struct *mm, struct vm_area_struct *vma)
 		i_mmap_lock_write(mapping);
 	}
 
-	vma_mas_store(vma, &mas);
+	vma_iter_store(&vmi, vma);
 
 	if (mapping) {
 		__vma_link_file(vma, mapping);
-- 
2.35.1

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 07/44] kernel/fork: Convert forking to using the vmi iterator
  2023-01-05 19:15 [PATCH v2 00/44] VMA tree type safety and remove __vma_adjust() Liam Howlett
                   ` (6 preceding siblings ...)
  2023-01-05 19:15 ` [PATCH v2 08/44] mmap: Convert vma_link() " Liam Howlett
@ 2023-01-05 19:15 ` Liam Howlett
  2023-01-05 19:15 ` [PATCH v2 11/44] mmap: Convert vma_expand() to use vma iterator Liam Howlett
                   ` (36 subsequent siblings)
  44 siblings, 0 replies; 63+ messages in thread
From: Liam Howlett @ 2023-01-05 19:15 UTC (permalink / raw)
  To: maple-tree, linux-mm, linux-kernel, Andrew Morton
  Cc: Liam Howlett, Liam Howlett

From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>

Avoid using the maple tree interface directly.  This gains type safety.

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
---
 kernel/fork.c | 19 ++++++++-----------
 1 file changed, 8 insertions(+), 11 deletions(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index 9f7fe3541897..441dcec60aae 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -585,8 +585,8 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
 	int retval;
 	unsigned long charge = 0;
 	LIST_HEAD(uf);
-	MA_STATE(old_mas, &oldmm->mm_mt, 0, 0);
-	MA_STATE(mas, &mm->mm_mt, 0, 0);
+	VMA_ITERATOR(old_vmi, oldmm, 0);
+	VMA_ITERATOR(vmi, mm, 0);
 
 	uprobe_start_dup_mmap();
 	if (mmap_write_lock_killable(oldmm)) {
@@ -613,11 +613,11 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
 		goto out;
 	khugepaged_fork(mm, oldmm);
 
-	retval = mas_expected_entries(&mas, oldmm->map_count);
+	retval = vma_iter_bulk_alloc(&vmi, oldmm->map_count);
 	if (retval)
 		goto out;
 
-	mas_for_each(&old_mas, mpnt, ULONG_MAX) {
+	for_each_vma(old_vmi, mpnt) {
 		struct file *file;
 
 		if (mpnt->vm_flags & VM_DONTCOPY) {
@@ -683,11 +683,8 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
 			hugetlb_dup_vma_private(tmp);
 
 		/* Link the vma into the MT */
-		mas.index = tmp->vm_start;
-		mas.last = tmp->vm_end - 1;
-		mas_store(&mas, tmp);
-		if (mas_is_err(&mas))
-			goto fail_nomem_mas_store;
+		if (vma_iter_bulk_store(&vmi, tmp))
+			goto fail_nomem_vmi_store;
 
 		mm->map_count++;
 		if (!(tmp->vm_flags & VM_WIPEONFORK))
@@ -702,7 +699,7 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
 	/* a new mm has just been created */
 	retval = arch_dup_mmap(oldmm, mm);
 loop_out:
-	mas_destroy(&mas);
+	vma_iter_free(&vmi);
 out:
 	mmap_write_unlock(mm);
 	flush_tlb_mm(oldmm);
@@ -712,7 +709,7 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
 	uprobe_end_dup_mmap();
 	return retval;
 
-fail_nomem_mas_store:
+fail_nomem_vmi_store:
 	unlink_anon_vmas(tmp);
 fail_nomem_anon_vma_fork:
 	mpol_put(vma_policy(tmp));
-- 
2.35.1

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 09/44] mm/mmap: Remove preallocation from do_mas_align_munmap()
  2023-01-05 19:15 [PATCH v2 00/44] VMA tree type safety and remove __vma_adjust() Liam Howlett
                   ` (9 preceding siblings ...)
  2023-01-05 19:15 ` [PATCH v2 10/44] mmap: Change do_mas_munmap and do_mas_aligned_munmap() " Liam Howlett
@ 2023-01-05 19:15 ` Liam Howlett
  2023-01-05 19:15 ` [PATCH v2 15/44] mm: Change mprotect_fixup to vma iterator Liam Howlett
                   ` (33 subsequent siblings)
  44 siblings, 0 replies; 63+ messages in thread
From: Liam Howlett @ 2023-01-05 19:15 UTC (permalink / raw)
  To: maple-tree, linux-mm, linux-kernel, Andrew Morton
  Cc: Liam Howlett, Liam Howlett

From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>

In preparation of passing the vma state through split, the
pre-allocation that occurs before the split has to be moved to after.
Since the preallocation would then live right next to the store, just
call store instead of preallocating.  This effectively restores the
potential error path of splitting and not munmap'ing which pre-dates the
maple tree.

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
---
 mm/mmap.c | 14 ++++++--------
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index 00b839cc499e..238b10ca9f9d 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2384,9 +2384,6 @@ do_mas_align_munmap(struct ma_state *mas, struct vm_area_struct *vma,
 	mt_init_flags(&mt_detach, MT_FLAGS_LOCK_EXTERN);
 	mt_set_external_lock(&mt_detach, &mm->mmap_lock);
 
-	if (mas_preallocate(mas, vma, GFP_KERNEL))
-		return -ENOMEM;
-
 	mas->last = end - 1;
 	/*
 	 * If we need to split any vma, do it now to save pain later.
@@ -2477,8 +2474,6 @@ do_mas_align_munmap(struct ma_state *mas, struct vm_area_struct *vma,
 			goto userfaultfd_error;
 	}
 
-	/* Point of no return */
-	mas_set_range(mas, start, end - 1);
 #if defined(CONFIG_DEBUG_VM_MAPLE_TREE)
 	/* Make sure no VMAs are about to be lost. */
 	{
@@ -2486,6 +2481,7 @@ do_mas_align_munmap(struct ma_state *mas, struct vm_area_struct *vma,
 		struct vm_area_struct *vma_mas, *vma_test;
 		int test_count = 0;
 
+		mas_set_range(mas, start, end - 1);
 		rcu_read_lock();
 		vma_test = mas_find(&test, end - 1);
 		mas_for_each(mas, vma_mas, end - 1) {
@@ -2495,10 +2491,13 @@ do_mas_align_munmap(struct ma_state *mas, struct vm_area_struct *vma,
 		}
 		rcu_read_unlock();
 		BUG_ON(count != test_count);
-		mas_set_range(mas, start, end - 1);
 	}
 #endif
-	mas_store_prealloc(mas, NULL);
+	/* Point of no return */
+	mas_set_range(mas, start, end - 1);
+	if (mas_store_gfp(mas, NULL, GFP_KERNEL))
+		return -ENOMEM;
+
 	mm->map_count -= count;
 	/*
 	 * Do not downgrade mmap_lock if we are next to VM_GROWSDOWN or
@@ -2530,7 +2529,6 @@ do_mas_align_munmap(struct ma_state *mas, struct vm_area_struct *vma,
 	__mt_destroy(&mt_detach);
 start_split_failed:
 map_count_exceeded:
-	mas_destroy(mas);
 	return error;
 }
 
-- 
2.35.1

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 11/44] mmap: Convert vma_expand() to use vma iterator
  2023-01-05 19:15 [PATCH v2 00/44] VMA tree type safety and remove __vma_adjust() Liam Howlett
                   ` (7 preceding siblings ...)
  2023-01-05 19:15 ` [PATCH v2 07/44] kernel/fork: Convert forking to using the vmi iterator Liam Howlett
@ 2023-01-05 19:15 ` Liam Howlett
  2023-01-05 19:15 ` [PATCH v2 10/44] mmap: Change do_mas_munmap and do_mas_aligned_munmap() " Liam Howlett
                   ` (35 subsequent siblings)
  44 siblings, 0 replies; 63+ messages in thread
From: Liam Howlett @ 2023-01-05 19:15 UTC (permalink / raw)
  To: maple-tree, linux-mm, linux-kernel, Andrew Morton
  Cc: Liam Howlett, Liam Howlett

From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>

Use the vma iterator instead of the maple state for type safety and for
consistency through the mm code.

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
---
 mm/mmap.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index 41767c585120..8fd48686f708 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -586,7 +586,7 @@ static int vma_link(struct mm_struct *mm, struct vm_area_struct *vma)
  *
  * Returns: 0 on success
  */
-inline int vma_expand(struct ma_state *mas, struct vm_area_struct *vma,
+inline int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma,
 		      unsigned long start, unsigned long end, pgoff_t pgoff,
 		      struct vm_area_struct *next)
 {
@@ -615,7 +615,7 @@ inline int vma_expand(struct ma_state *mas, struct vm_area_struct *vma,
 	/* Only handles expanding */
 	VM_BUG_ON(vma->vm_start < start || vma->vm_end > end);
 
-	if (mas_preallocate(mas, vma, GFP_KERNEL))
+	if (vma_iter_prealloc(vmi, vma))
 		goto nomem;
 
 	vma_adjust_trans_huge(vma, start, end, 0);
@@ -640,8 +640,7 @@ inline int vma_expand(struct ma_state *mas, struct vm_area_struct *vma,
 	vma->vm_start = start;
 	vma->vm_end = end;
 	vma->vm_pgoff = pgoff;
-	/* Note: mas must be pointing to the expanding VMA */
-	vma_mas_store(vma, mas);
+	vma_iter_store(vmi, vma);
 
 	if (file) {
 		vma_interval_tree_insert(vma, root);
@@ -2655,7 +2654,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
 
 	/* Actually expand, if possible */
 	if (vma &&
-	    !vma_expand(&vmi.mas, vma, merge_start, merge_end, vm_pgoff, next)) {
+	    !vma_expand(&vmi, vma, merge_start, merge_end, vm_pgoff, next)) {
 		khugepaged_enter_vma(vma, vm_flags);
 		goto expanded;
 	}
-- 
2.35.1

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 10/44] mmap: Change do_mas_munmap and do_mas_aligned_munmap() to use vma iterator
  2023-01-05 19:15 [PATCH v2 00/44] VMA tree type safety and remove __vma_adjust() Liam Howlett
                   ` (8 preceding siblings ...)
  2023-01-05 19:15 ` [PATCH v2 11/44] mmap: Convert vma_expand() to use vma iterator Liam Howlett
@ 2023-01-05 19:15 ` Liam Howlett
  2023-01-10 14:53   ` Sven Schnelle
  2023-01-05 19:15 ` [PATCH v2 09/44] mm/mmap: Remove preallocation from do_mas_align_munmap() Liam Howlett
                   ` (34 subsequent siblings)
  44 siblings, 1 reply; 63+ messages in thread
From: Liam Howlett @ 2023-01-05 19:15 UTC (permalink / raw)
  To: maple-tree, linux-mm, linux-kernel, Andrew Morton
  Cc: Liam Howlett, Liam Howlett

From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>

Start passing the vma iterator through the mm code.  This will allow for
reuse of the state and cleaner invalidation if necessary.

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
---
 include/linux/mm.h |  2 +-
 mm/mmap.c          | 77 +++++++++++++++++++++-------------------------
 mm/mremap.c        |  6 ++--
 3 files changed, 39 insertions(+), 46 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index f4b964f96db1..126f94b6f434 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2896,7 +2896,7 @@ extern unsigned long mmap_region(struct file *file, unsigned long addr,
 extern unsigned long do_mmap(struct file *file, unsigned long addr,
 	unsigned long len, unsigned long prot, unsigned long flags,
 	unsigned long pgoff, unsigned long *populate, struct list_head *uf);
-extern int do_mas_munmap(struct ma_state *mas, struct mm_struct *mm,
+extern int do_vmi_munmap(struct vma_iterator *vmi, struct mm_struct *mm,
 			 unsigned long start, size_t len, struct list_head *uf,
 			 bool downgrade);
 extern int do_munmap(struct mm_struct *, unsigned long, size_t,
diff --git a/mm/mmap.c b/mm/mmap.c
index 238b10ca9f9d..41767c585120 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2360,8 +2360,8 @@ static inline int munmap_sidetree(struct vm_area_struct *vma,
 }
 
 /*
- * do_mas_align_munmap() - munmap the aligned region from @start to @end.
- * @mas: The maple_state, ideally set up to alter the correct tree location.
+ * do_vmi_align_munmap() - munmap the aligned region from @start to @end.
+ * @vmi: The vma iterator
  * @vma: The starting vm_area_struct
  * @mm: The mm_struct
  * @start: The aligned start address to munmap.
@@ -2372,7 +2372,7 @@ static inline int munmap_sidetree(struct vm_area_struct *vma,
  * If @downgrade is true, check return code for potential release of the lock.
  */
 static int
-do_mas_align_munmap(struct ma_state *mas, struct vm_area_struct *vma,
+do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
 		    struct mm_struct *mm, unsigned long start,
 		    unsigned long end, struct list_head *uf, bool downgrade)
 {
@@ -2384,7 +2384,6 @@ do_mas_align_munmap(struct ma_state *mas, struct vm_area_struct *vma,
 	mt_init_flags(&mt_detach, MT_FLAGS_LOCK_EXTERN);
 	mt_set_external_lock(&mt_detach, &mm->mmap_lock);
 
-	mas->last = end - 1;
 	/*
 	 * If we need to split any vma, do it now to save pain later.
 	 *
@@ -2404,27 +2403,23 @@ do_mas_align_munmap(struct ma_state *mas, struct vm_area_struct *vma,
 		if (end < vma->vm_end && mm->map_count >= sysctl_max_map_count)
 			goto map_count_exceeded;
 
-		/*
-		 * mas_pause() is not needed since mas->index needs to be set
-		 * differently than vma->vm_end anyways.
-		 */
 		error = __split_vma(mm, vma, start, 0);
 		if (error)
 			goto start_split_failed;
 
-		mas_set(mas, start);
-		vma = mas_walk(mas);
+		vma_iter_set(vmi, start);
+		vma = vma_find(vmi, end);
 	}
 
-	prev = mas_prev(mas, 0);
+	prev = vma_prev(vmi);
 	if (unlikely((!prev)))
-		mas_set(mas, start);
+		vma_iter_set(vmi, start);
 
 	/*
 	 * Detach a range of VMAs from the mm. Using next as a temp variable as
 	 * it is always overwritten.
 	 */
-	mas_for_each(mas, next, end - 1) {
+	for_each_vma_range(*vmi, next, end) {
 		/* Does it split the end? */
 		if (next->vm_end > end) {
 			struct vm_area_struct *split;
@@ -2433,8 +2428,8 @@ do_mas_align_munmap(struct ma_state *mas, struct vm_area_struct *vma,
 			if (error)
 				goto end_split_failed;
 
-			mas_set(mas, end);
-			split = mas_prev(mas, 0);
+			vma_iter_set(vmi, end);
+			split = vma_prev(vmi);
 			error = munmap_sidetree(split, &mas_detach);
 			if (error)
 				goto munmap_sidetree_failed;
@@ -2456,7 +2451,7 @@ do_mas_align_munmap(struct ma_state *mas, struct vm_area_struct *vma,
 	}
 
 	if (!next)
-		next = mas_next(mas, ULONG_MAX);
+		next = vma_next(vmi);
 
 	if (unlikely(uf)) {
 		/*
@@ -2481,10 +2476,10 @@ do_mas_align_munmap(struct ma_state *mas, struct vm_area_struct *vma,
 		struct vm_area_struct *vma_mas, *vma_test;
 		int test_count = 0;
 
-		mas_set_range(mas, start, end - 1);
+		vma_iter_set(vmi, start);
 		rcu_read_lock();
 		vma_test = mas_find(&test, end - 1);
-		mas_for_each(mas, vma_mas, end - 1) {
+		for_each_vma_range(*vmi, vma_mas, end) {
 			BUG_ON(vma_mas != vma_test);
 			test_count++;
 			vma_test = mas_next(&test, end - 1);
@@ -2494,8 +2489,8 @@ do_mas_align_munmap(struct ma_state *mas, struct vm_area_struct *vma,
 	}
 #endif
 	/* Point of no return */
-	mas_set_range(mas, start, end - 1);
-	if (mas_store_gfp(mas, NULL, GFP_KERNEL))
+	vma_iter_set(vmi, start);
+	if (vma_iter_clear_gfp(vmi, start, end, GFP_KERNEL))
 		return -ENOMEM;
 
 	mm->map_count -= count;
@@ -2533,8 +2528,8 @@ do_mas_align_munmap(struct ma_state *mas, struct vm_area_struct *vma,
 }
 
 /*
- * do_mas_munmap() - munmap a given range.
- * @mas: The maple state
+ * do_vmi_munmap() - munmap a given range.
+ * @vmi: The vma iterator
  * @mm: The mm_struct
  * @start: The start address to munmap
  * @len: The length of the range to munmap
@@ -2548,7 +2543,7 @@ do_mas_align_munmap(struct ma_state *mas, struct vm_area_struct *vma,
  *
  * Returns: -EINVAL on failure, 1 on success and unlock, 0 otherwise.
  */
-int do_mas_munmap(struct ma_state *mas, struct mm_struct *mm,
+int do_vmi_munmap(struct vma_iterator *vmi, struct mm_struct *mm,
 		  unsigned long start, size_t len, struct list_head *uf,
 		  bool downgrade)
 {
@@ -2566,11 +2561,11 @@ int do_mas_munmap(struct ma_state *mas, struct mm_struct *mm,
 	arch_unmap(mm, start, end);
 
 	/* Find the first overlapping VMA */
-	vma = mas_find(mas, end - 1);
+	vma = vma_find(vmi, end);
 	if (!vma)
 		return 0;
 
-	return do_mas_align_munmap(mas, vma, mm, start, end, uf, downgrade);
+	return do_vmi_align_munmap(vmi, vma, mm, start, end, uf, downgrade);
 }
 
 /* do_munmap() - Wrapper function for non-maple tree aware do_munmap() calls.
@@ -2582,9 +2577,9 @@ int do_mas_munmap(struct ma_state *mas, struct mm_struct *mm,
 int do_munmap(struct mm_struct *mm, unsigned long start, size_t len,
 	      struct list_head *uf)
 {
-	MA_STATE(mas, &mm->mm_mt, start, start);
+	VMA_ITERATOR(vmi, mm, start);
 
-	return do_mas_munmap(&mas, mm, start, len, uf, false);
+	return do_vmi_munmap(&vmi, mm, start, len, uf, false);
 }
 
 unsigned long mmap_region(struct file *file, unsigned long addr,
@@ -2600,7 +2595,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
 	unsigned long merge_start = addr, merge_end = end;
 	pgoff_t vm_pgoff;
 	int error;
-	MA_STATE(mas, &mm->mm_mt, addr, end - 1);
+	VMA_ITERATOR(vmi, mm, addr);
 
 	/* Check against address space limit. */
 	if (!may_expand_vm(mm, vm_flags, len >> PAGE_SHIFT)) {
@@ -2618,7 +2613,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
 	}
 
 	/* Unmap any existing mapping in the area */
-	if (do_mas_munmap(&mas, mm, addr, len, uf, false))
+	if (do_vmi_munmap(&vmi, mm, addr, len, uf, false))
 		return -ENOMEM;
 
 	/*
@@ -2631,8 +2626,8 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
 		vm_flags |= VM_ACCOUNT;
 	}
 
-	next = mas_next(&mas, ULONG_MAX);
-	prev = mas_prev(&mas, 0);
+	next = vma_next(&vmi);
+	prev = vma_prev(&vmi);
 	if (vm_flags & VM_SPECIAL)
 		goto cannot_expand;
 
@@ -2660,13 +2655,11 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
 
 	/* Actually expand, if possible */
 	if (vma &&
-	    !vma_expand(&mas, vma, merge_start, merge_end, vm_pgoff, next)) {
+	    !vma_expand(&vmi.mas, vma, merge_start, merge_end, vm_pgoff, next)) {
 		khugepaged_enter_vma(vma, vm_flags);
 		goto expanded;
 	}
 
-	mas.index = addr;
-	mas.last = end - 1;
 cannot_expand:
 	/*
 	 * Determine the object being mapped and call the appropriate
@@ -2705,7 +2698,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
 			error = -EINVAL;
 			goto close_and_free_vma;
 		}
-		mas_reset(&mas);
+		vma_iter_set(&vmi, addr);
 
 		/*
 		 * If vm_flags changed after call_mmap(), we should try merge
@@ -2751,7 +2744,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
 			goto free_vma;
 	}
 
-	if (mas_preallocate(&mas, vma, GFP_KERNEL)) {
+	if (vma_iter_prealloc(&vmi, vma)) {
 		error = -ENOMEM;
 		if (file)
 			goto close_and_free_vma;
@@ -2764,7 +2757,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
 	if (vma->vm_file)
 		i_mmap_lock_write(vma->vm_file->f_mapping);
 
-	vma_mas_store(vma, &mas);
+	vma_iter_store(&vmi, vma);
 	mm->map_count++;
 	if (vma->vm_file) {
 		if (vma->vm_flags & VM_SHARED)
@@ -2825,7 +2818,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
 	vma->vm_file = NULL;
 
 	/* Undo any partial mapping done by a device driver. */
-	unmap_region(mm, mas.tree, vma, prev, next, vma->vm_start, vma->vm_end);
+	unmap_region(mm, &mm->mm_mt, vma, prev, next, vma->vm_start, vma->vm_end);
 	if (file && (vm_flags & VM_SHARED))
 		mapping_unmap_writable(file->f_mapping);
 free_vma:
@@ -2842,12 +2835,12 @@ static int __vm_munmap(unsigned long start, size_t len, bool downgrade)
 	int ret;
 	struct mm_struct *mm = current->mm;
 	LIST_HEAD(uf);
-	MA_STATE(mas, &mm->mm_mt, start, start);
+	VMA_ITERATOR(vmi, mm, start);
 
 	if (mmap_write_lock_killable(mm))
 		return -EINTR;
 
-	ret = do_mas_munmap(&mas, mm, start, len, &uf, downgrade);
+	ret = do_vmi_munmap(&vmi, mm, start, len, &uf, downgrade);
 	/*
 	 * Returning 1 indicates mmap_lock is downgraded.
 	 * But 1 is not legal return value of vm_munmap() and munmap(), reset
@@ -2979,7 +2972,7 @@ static int do_brk_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
 	int ret;
 
 	arch_unmap(mm, newbrk, oldbrk);
-	ret = do_mas_align_munmap(&vmi->mas, vma, mm, newbrk, oldbrk, uf, true);
+	ret = do_vmi_align_munmap(vmi, vma, mm, newbrk, oldbrk, uf, true);
 	validate_mm_mt(mm);
 	return ret;
 }
@@ -3103,7 +3096,7 @@ int vm_brk_flags(unsigned long addr, unsigned long request, unsigned long flags)
 	if (ret)
 		goto limits_failed;
 
-	ret = do_mas_munmap(&vmi.mas, mm, addr, len, &uf, 0);
+	ret = do_vmi_munmap(&vmi, mm, addr, len, &uf, 0);
 	if (ret)
 		goto munmap_failed;
 
diff --git a/mm/mremap.c b/mm/mremap.c
index fe587c5d6591..94d2590f0871 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -978,14 +978,14 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
 	/*
 	 * Always allow a shrinking remap: that just unmaps
 	 * the unnecessary pages..
-	 * do_mas_munmap does all the needed commit accounting, and
+	 * do_vmi_munmap does all the needed commit accounting, and
 	 * downgrades mmap_lock to read if so directed.
 	 */
 	if (old_len >= new_len) {
 		int retval;
-		MA_STATE(mas, &mm->mm_mt, addr + new_len, addr + new_len);
+		VMA_ITERATOR(vmi, mm, addr + new_len);
 
-		retval = do_mas_munmap(&mas, mm, addr + new_len,
+		retval = do_vmi_munmap(&vmi, mm, addr + new_len,
 				       old_len - new_len, &uf_unmap, true);
 		/* Returning 1 indicates mmap_lock is downgraded to read. */
 		if (retval == 1) {
-- 
2.35.1

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 12/44] mm: Add temporary vma iterator versions of vma_merge(), split_vma(), and __split_vma()
  2023-01-05 19:15 [PATCH v2 00/44] VMA tree type safety and remove __vma_adjust() Liam Howlett
                   ` (11 preceding siblings ...)
  2023-01-05 19:15 ` [PATCH v2 15/44] mm: Change mprotect_fixup to vma iterator Liam Howlett
@ 2023-01-05 19:15 ` Liam Howlett
  2023-01-05 19:15 ` [PATCH v2 14/44] userfaultfd: Use vma iterator Liam Howlett
                   ` (31 subsequent siblings)
  44 siblings, 0 replies; 63+ messages in thread
From: Liam Howlett @ 2023-01-05 19:15 UTC (permalink / raw)
  To: maple-tree, linux-mm, linux-kernel, Andrew Morton
  Cc: Liam Howlett, Liam Howlett

From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>

These wrappers are short-lived in this patch set so that each user can
be converted on its own.  In the end, these functions are renamed in one
commit.

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
---
 include/linux/mm.h | 11 +++++++++--
 mm/mmap.c          | 44 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 53 insertions(+), 2 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 126f94b6f434..9c790c88f691 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2834,11 +2834,18 @@ extern struct vm_area_struct *vma_merge(struct mm_struct *,
 	struct vm_area_struct *prev, unsigned long addr, unsigned long end,
 	unsigned long vm_flags, struct anon_vma *, struct file *, pgoff_t,
 	struct mempolicy *, struct vm_userfaultfd_ctx, struct anon_vma_name *);
+extern struct vm_area_struct *vmi_vma_merge(struct vma_iterator *vmi,
+	struct mm_struct *, struct vm_area_struct *prev, unsigned long addr,
+	unsigned long end, unsigned long vm_flags, struct anon_vma *,
+	struct file *, pgoff_t, struct mempolicy *, struct vm_userfaultfd_ctx,
+	struct anon_vma_name *);
 extern struct anon_vma *find_mergeable_anon_vma(struct vm_area_struct *);
-extern int __split_vma(struct mm_struct *, struct vm_area_struct *,
-	unsigned long addr, int new_below);
+extern int vmi__split_vma(struct vma_iterator *vmi, struct mm_struct *,
+	struct vm_area_struct *, unsigned long addr, int new_below);
 extern int split_vma(struct mm_struct *, struct vm_area_struct *,
 	unsigned long addr, int new_below);
+extern int vmi_split_vma(struct vma_iterator *vmi, struct mm_struct *,
+		struct vm_area_struct *, unsigned long addr, int new_below);
 extern int insert_vm_struct(struct mm_struct *, struct vm_area_struct *);
 extern void unlink_file_vma(struct vm_area_struct *);
 extern struct vm_area_struct *copy_vma(struct vm_area_struct **,
diff --git a/mm/mmap.c b/mm/mmap.c
index 8fd48686f708..4dd7e48a312f 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1150,6 +1150,25 @@ struct vm_area_struct *vma_merge(struct mm_struct *mm,
 	return res;
 }
 
+struct vm_area_struct *vmi_vma_merge(struct vma_iterator *vmi,
+			struct mm_struct *mm,
+			struct vm_area_struct *prev, unsigned long addr,
+			unsigned long end, unsigned long vm_flags,
+			struct anon_vma *anon_vma, struct file *file,
+			pgoff_t pgoff, struct mempolicy *policy,
+			struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
+			struct anon_vma_name *anon_name)
+{
+	struct vm_area_struct *tmp;
+
+	tmp = vma_merge(mm, prev, addr, end, vm_flags, anon_vma, file, pgoff,
+			policy, vm_userfaultfd_ctx, anon_name);
+	if (tmp)
+		vma_iter_set(vmi, end);
+
+	return tmp;
+}
+
 /*
  * Rough compatibility check to quickly see if it's even worth looking
  * at sharing an anon_vma.
@@ -2331,6 +2350,18 @@ int __split_vma(struct mm_struct *mm, struct vm_area_struct *vma,
 	validate_mm_mt(mm);
 	return err;
 }
+int vmi__split_vma(struct vma_iterator *vmi, struct mm_struct *mm,
+		   struct vm_area_struct *vma, unsigned long addr, int new_below)
+{
+	int ret;
+	unsigned long end = vma->vm_end;
+
+	ret = __split_vma(mm, vma, addr, new_below);
+	if (!ret)
+		vma_iter_set(vmi, end);
+
+	return ret;
+}
 
 /*
  * Split a vma into two pieces at address 'addr', a new vma is allocated
@@ -2345,6 +2376,19 @@ int split_vma(struct mm_struct *mm, struct vm_area_struct *vma,
 	return __split_vma(mm, vma, addr, new_below);
 }
 
+int vmi_split_vma(struct vma_iterator *vmi, struct mm_struct *mm,
+		  struct vm_area_struct *vma, unsigned long addr, int new_below)
+{
+	int ret;
+	unsigned long end = vma->vm_end;
+
+	ret = split_vma(mm, vma, addr, new_below);
+	if (!ret)
+		vma_iter_set(vmi, end);
+
+	return ret;
+}
+
 static inline int munmap_sidetree(struct vm_area_struct *vma,
 				   struct ma_state *mas_detach)
 {
-- 
2.35.1

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 13/44] ipc/shm: Use the vma iterator for munmap calls
  2023-01-05 19:15 [PATCH v2 00/44] VMA tree type safety and remove __vma_adjust() Liam Howlett
                   ` (13 preceding siblings ...)
  2023-01-05 19:15 ` [PATCH v2 14/44] userfaultfd: Use vma iterator Liam Howlett
@ 2023-01-05 19:15 ` Liam Howlett
  2023-01-05 19:15 ` [PATCH v2 16/44] mlock: Convert mlock to vma iterator Liam Howlett
                   ` (29 subsequent siblings)
  44 siblings, 0 replies; 63+ messages in thread
From: Liam Howlett @ 2023-01-05 19:15 UTC (permalink / raw)
  To: maple-tree, linux-mm, linux-kernel, Andrew Morton
  Cc: Liam Howlett, Liam Howlett

From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>

Pass through the vma iterator to do_vmi_munmap() to handle the iterator
state internally

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
---
 ipc/shm.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/ipc/shm.c b/ipc/shm.c
index bd2fcc4d454e..1c6a6b319a49 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -1786,8 +1786,8 @@ long ksys_shmdt(char __user *shmaddr)
 			 */
 			file = vma->vm_file;
 			size = i_size_read(file_inode(vma->vm_file));
-			do_munmap(mm, vma->vm_start, vma->vm_end - vma->vm_start, NULL);
-			mas_pause(&vmi.mas);
+			do_vmi_munmap(&vmi, mm, vma->vm_start,
+			      vma->vm_end - vma->vm_start, NULL, false);
 			/*
 			 * We discovered the size of the shm segment, so
 			 * break out of here and fall through to the next
@@ -1810,10 +1810,9 @@ long ksys_shmdt(char __user *shmaddr)
 		/* finding a matching vma now does not alter retval */
 		if ((vma->vm_ops == &shm_vm_ops) &&
 		    ((vma->vm_start - addr)/PAGE_SIZE == vma->vm_pgoff) &&
-		    (vma->vm_file == file)) {
-			do_munmap(mm, vma->vm_start, vma->vm_end - vma->vm_start, NULL);
-			mas_pause(&vmi.mas);
-		}
+		    (vma->vm_file == file))
+			do_vmi_munmap(&vmi, mm, vma->vm_start,
+			      vma->vm_end - vma->vm_start, NULL, false);
 
 		vma = vma_next(&vmi);
 	}
-- 
2.35.1

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 14/44] userfaultfd: Use vma iterator
  2023-01-05 19:15 [PATCH v2 00/44] VMA tree type safety and remove __vma_adjust() Liam Howlett
                   ` (12 preceding siblings ...)
  2023-01-05 19:15 ` [PATCH v2 12/44] mm: Add temporary vma iterator versions of vma_merge(), split_vma(), and __split_vma() Liam Howlett
@ 2023-01-05 19:15 ` Liam Howlett
  2023-01-05 19:15 ` [PATCH v2 13/44] ipc/shm: Use the vma iterator for munmap calls Liam Howlett
                   ` (30 subsequent siblings)
  44 siblings, 0 replies; 63+ messages in thread
From: Liam Howlett @ 2023-01-05 19:15 UTC (permalink / raw)
  To: maple-tree, linux-mm, linux-kernel, Andrew Morton
  Cc: Liam Howlett, Liam Howlett

From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>

Use the vma iterator so that the iterator can be invalidated or updated
to avoid each caller doing so.

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
---
 fs/userfaultfd.c | 88 +++++++++++++++++++-----------------------------
 1 file changed, 34 insertions(+), 54 deletions(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 98ac37e34e3d..b3249388696a 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -857,7 +857,7 @@ static int userfaultfd_release(struct inode *inode, struct file *file)
 	/* len == 0 means wake all */
 	struct userfaultfd_wake_range range = { .len = 0, };
 	unsigned long new_flags;
-	MA_STATE(mas, &mm->mm_mt, 0, 0);
+	VMA_ITERATOR(vmi, mm, 0);
 
 	WRITE_ONCE(ctx->released, true);
 
@@ -874,7 +874,7 @@ static int userfaultfd_release(struct inode *inode, struct file *file)
 	 */
 	mmap_write_lock(mm);
 	prev = NULL;
-	mas_for_each(&mas, vma, ULONG_MAX) {
+	for_each_vma(vmi, vma) {
 		cond_resched();
 		BUG_ON(!!vma->vm_userfaultfd_ctx.ctx ^
 		       !!(vma->vm_flags & __VM_UFFD_FLAGS));
@@ -883,13 +883,12 @@ static int userfaultfd_release(struct inode *inode, struct file *file)
 			continue;
 		}
 		new_flags = vma->vm_flags & ~__VM_UFFD_FLAGS;
-		prev = vma_merge(mm, prev, vma->vm_start, vma->vm_end,
+		prev = vmi_vma_merge(&vmi, mm, prev, vma->vm_start, vma->vm_end,
 				 new_flags, vma->anon_vma,
 				 vma->vm_file, vma->vm_pgoff,
 				 vma_policy(vma),
 				 NULL_VM_UFFD_CTX, anon_vma_name(vma));
 		if (prev) {
-			mas_pause(&mas);
 			vma = prev;
 		} else {
 			prev = vma;
@@ -1276,7 +1275,7 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
 	bool found;
 	bool basic_ioctls;
 	unsigned long start, end, vma_end;
-	MA_STATE(mas, &mm->mm_mt, 0, 0);
+	struct vma_iterator vmi;
 
 	user_uffdio_register = (struct uffdio_register __user *) arg;
 
@@ -1318,17 +1317,13 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
 	if (!mmget_not_zero(mm))
 		goto out;
 
+	ret = -EINVAL;
 	mmap_write_lock(mm);
-	mas_set(&mas, start);
-	vma = mas_find(&mas, ULONG_MAX);
+	vma_iter_init(&vmi, mm, start);
+	vma = vma_find(&vmi, end);
 	if (!vma)
 		goto out_unlock;
 
-	/* check that there's at least one vma in the range */
-	ret = -EINVAL;
-	if (vma->vm_start >= end)
-		goto out_unlock;
-
 	/*
 	 * If the first vma contains huge pages, make sure start address
 	 * is aligned to huge page size.
@@ -1345,7 +1340,8 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
 	 */
 	found = false;
 	basic_ioctls = false;
-	for (cur = vma; cur; cur = mas_next(&mas, end - 1)) {
+	cur = vma;
+	do {
 		cond_resched();
 
 		BUG_ON(!!cur->vm_userfaultfd_ctx.ctx ^
@@ -1402,16 +1398,14 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
 			basic_ioctls = true;
 
 		found = true;
-	}
+	} for_each_vma_range(vmi, cur, end);
 	BUG_ON(!found);
 
-	mas_set(&mas, start);
-	prev = mas_prev(&mas, 0);
-	if (prev != vma)
-		mas_next(&mas, ULONG_MAX);
+	vma_iter_set(&vmi, start);
+	prev = vma_prev(&vmi);
 
 	ret = 0;
-	do {
+	for_each_vma_range(vmi, vma, end) {
 		cond_resched();
 
 		BUG_ON(!vma_can_userfault(vma, vm_flags));
@@ -1432,30 +1426,25 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
 		vma_end = min(end, vma->vm_end);
 
 		new_flags = (vma->vm_flags & ~__VM_UFFD_FLAGS) | vm_flags;
-		prev = vma_merge(mm, prev, start, vma_end, new_flags,
+		prev = vmi_vma_merge(&vmi, mm, prev, start, vma_end, new_flags,
 				 vma->anon_vma, vma->vm_file, vma->vm_pgoff,
 				 vma_policy(vma),
 				 ((struct vm_userfaultfd_ctx){ ctx }),
 				 anon_vma_name(vma));
 		if (prev) {
 			/* vma_merge() invalidated the mas */
-			mas_pause(&mas);
 			vma = prev;
 			goto next;
 		}
 		if (vma->vm_start < start) {
-			ret = split_vma(mm, vma, start, 1);
+			ret = vmi_split_vma(&vmi, mm, vma, start, 1);
 			if (ret)
 				break;
-			/* split_vma() invalidated the mas */
-			mas_pause(&mas);
 		}
 		if (vma->vm_end > end) {
-			ret = split_vma(mm, vma, end, 0);
+			ret = vmi_split_vma(&vmi, mm, vma, end, 0);
 			if (ret)
 				break;
-			/* split_vma() invalidated the mas */
-			mas_pause(&mas);
 		}
 	next:
 		/*
@@ -1472,8 +1461,8 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
 	skip:
 		prev = vma;
 		start = vma->vm_end;
-		vma = mas_next(&mas, end - 1);
-	} while (vma);
+	}
+
 out_unlock:
 	mmap_write_unlock(mm);
 	mmput(mm);
@@ -1517,7 +1506,7 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
 	bool found;
 	unsigned long start, end, vma_end;
 	const void __user *buf = (void __user *)arg;
-	MA_STATE(mas, &mm->mm_mt, 0, 0);
+	struct vma_iterator vmi;
 
 	ret = -EFAULT;
 	if (copy_from_user(&uffdio_unregister, buf, sizeof(uffdio_unregister)))
@@ -1536,14 +1525,10 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
 		goto out;
 
 	mmap_write_lock(mm);
-	mas_set(&mas, start);
-	vma = mas_find(&mas, ULONG_MAX);
-	if (!vma)
-		goto out_unlock;
-
-	/* check that there's at least one vma in the range */
 	ret = -EINVAL;
-	if (vma->vm_start >= end)
+	vma_iter_init(&vmi, mm, start);
+	vma = vma_find(&vmi, end);
+	if (!vma)
 		goto out_unlock;
 
 	/*
@@ -1561,8 +1546,8 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
 	 * Search for not compatible vmas.
 	 */
 	found = false;
-	ret = -EINVAL;
-	for (cur = vma; cur; cur = mas_next(&mas, end - 1)) {
+	cur = vma;
+	do {
 		cond_resched();
 
 		BUG_ON(!!cur->vm_userfaultfd_ctx.ctx ^
@@ -1579,16 +1564,13 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
 			goto out_unlock;
 
 		found = true;
-	}
+	} for_each_vma_range(vmi, cur, end);
 	BUG_ON(!found);
 
-	mas_set(&mas, start);
-	prev = mas_prev(&mas, 0);
-	if (prev != vma)
-		mas_next(&mas, ULONG_MAX);
-
+	vma_iter_set(&vmi, start);
+	prev = vma_prev(&vmi);
 	ret = 0;
-	do {
+	for_each_vma_range(vmi, vma, end) {
 		cond_resched();
 
 		BUG_ON(!vma_can_userfault(vma, vma->vm_flags));
@@ -1624,26 +1606,24 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
 			uffd_wp_range(mm, vma, start, vma_end - start, false);
 
 		new_flags = vma->vm_flags & ~__VM_UFFD_FLAGS;
-		prev = vma_merge(mm, prev, start, vma_end, new_flags,
+		prev = vmi_vma_merge(&vmi, mm, prev, start, vma_end, new_flags,
 				 vma->anon_vma, vma->vm_file, vma->vm_pgoff,
 				 vma_policy(vma),
 				 NULL_VM_UFFD_CTX, anon_vma_name(vma));
 		if (prev) {
 			vma = prev;
-			mas_pause(&mas);
 			goto next;
 		}
 		if (vma->vm_start < start) {
-			ret = split_vma(mm, vma, start, 1);
+			ret = vmi_split_vma(&vmi, mm, vma, start, 1);
 			if (ret)
 				break;
-			mas_pause(&mas);
 		}
 		if (vma->vm_end > end) {
-			ret = split_vma(mm, vma, end, 0);
+			vma_iter_set(&vmi, vma->vm_end);
+			ret = vmi_split_vma(&vmi, mm, vma, end, 0);
 			if (ret)
 				break;
-			mas_pause(&mas);
 		}
 	next:
 		/*
@@ -1657,8 +1637,8 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
 	skip:
 		prev = vma;
 		start = vma->vm_end;
-		vma = mas_next(&mas, end - 1);
-	} while (vma);
+	}
+
 out_unlock:
 	mmap_write_unlock(mm);
 	mmput(mm);
-- 
2.35.1

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 15/44] mm: Change mprotect_fixup to vma iterator
  2023-01-05 19:15 [PATCH v2 00/44] VMA tree type safety and remove __vma_adjust() Liam Howlett
                   ` (10 preceding siblings ...)
  2023-01-05 19:15 ` [PATCH v2 09/44] mm/mmap: Remove preallocation from do_mas_align_munmap() Liam Howlett
@ 2023-01-05 19:15 ` Liam Howlett
  2023-01-05 19:15 ` [PATCH v2 12/44] mm: Add temporary vma iterator versions of vma_merge(), split_vma(), and __split_vma() Liam Howlett
                   ` (32 subsequent siblings)
  44 siblings, 0 replies; 63+ messages in thread
From: Liam Howlett @ 2023-01-05 19:15 UTC (permalink / raw)
  To: maple-tree, linux-mm, linux-kernel, Andrew Morton
  Cc: Liam Howlett, Liam Howlett

From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>

Use the vma iterator so that the iterator can be invalidated or updated
to avoid each caller doing so.

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
---
 fs/exec.c          |  5 ++++-
 include/linux/mm.h |  6 +++---
 mm/mprotect.c      | 47 ++++++++++++++++++++++------------------------
 3 files changed, 29 insertions(+), 29 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index ab913243a367..b98647eeae9f 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -758,6 +758,7 @@ int setup_arg_pages(struct linux_binprm *bprm,
 	unsigned long stack_expand;
 	unsigned long rlim_stack;
 	struct mmu_gather tlb;
+	struct vma_iterator vmi;
 
 #ifdef CONFIG_STACK_GROWSUP
 	/* Limit stack size */
@@ -812,8 +813,10 @@ int setup_arg_pages(struct linux_binprm *bprm,
 	vm_flags |= mm->def_flags;
 	vm_flags |= VM_STACK_INCOMPLETE_SETUP;
 
+	vma_iter_init(&vmi, mm, vma->vm_start);
+
 	tlb_gather_mmu(&tlb, mm);
-	ret = mprotect_fixup(&tlb, vma, &prev, vma->vm_start, vma->vm_end,
+	ret = mprotect_fixup(&vmi, &tlb, vma, &prev, vma->vm_start, vma->vm_end,
 			vm_flags);
 	tlb_finish_mmu(&tlb);
 
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 9c790c88f691..98c91a25d257 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2188,9 +2188,9 @@ extern unsigned long change_protection(struct mmu_gather *tlb,
 			      struct vm_area_struct *vma, unsigned long start,
 			      unsigned long end, pgprot_t newprot,
 			      unsigned long cp_flags);
-extern int mprotect_fixup(struct mmu_gather *tlb, struct vm_area_struct *vma,
-			  struct vm_area_struct **pprev, unsigned long start,
-			  unsigned long end, unsigned long newflags);
+extern int mprotect_fixup(struct vma_iterator *vmi, struct mmu_gather *tlb,
+	  struct vm_area_struct *vma, struct vm_area_struct **pprev,
+	  unsigned long start, unsigned long end, unsigned long newflags);
 
 /*
  * doesn't attempt to fault and will return short.
diff --git a/mm/mprotect.c b/mm/mprotect.c
index 908df12caa26..7e6cb2165000 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -548,9 +548,9 @@ static const struct mm_walk_ops prot_none_walk_ops = {
 };
 
 int
-mprotect_fixup(struct mmu_gather *tlb, struct vm_area_struct *vma,
-	       struct vm_area_struct **pprev, unsigned long start,
-	       unsigned long end, unsigned long newflags)
+mprotect_fixup(struct vma_iterator *vmi, struct mmu_gather *tlb,
+	       struct vm_area_struct *vma, struct vm_area_struct **pprev,
+	       unsigned long start, unsigned long end, unsigned long newflags)
 {
 	struct mm_struct *mm = vma->vm_mm;
 	unsigned long oldflags = vma->vm_flags;
@@ -605,7 +605,7 @@ mprotect_fixup(struct mmu_gather *tlb, struct vm_area_struct *vma,
 	 * First try to merge with previous and/or next vma.
 	 */
 	pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
-	*pprev = vma_merge(mm, *pprev, start, end, newflags,
+	*pprev = vmi_vma_merge(vmi, mm, *pprev, start, end, newflags,
 			   vma->anon_vma, vma->vm_file, pgoff, vma_policy(vma),
 			   vma->vm_userfaultfd_ctx, anon_vma_name(vma));
 	if (*pprev) {
@@ -617,13 +617,13 @@ mprotect_fixup(struct mmu_gather *tlb, struct vm_area_struct *vma,
 	*pprev = vma;
 
 	if (start != vma->vm_start) {
-		error = split_vma(mm, vma, start, 1);
+		error = vmi_split_vma(vmi, mm, vma, start, 1);
 		if (error)
 			goto fail;
 	}
 
 	if (end != vma->vm_end) {
-		error = split_vma(mm, vma, end, 0);
+		error = vmi_split_vma(vmi, mm, vma, end, 0);
 		if (error)
 			goto fail;
 	}
@@ -672,7 +672,7 @@ static int do_mprotect_pkey(unsigned long start, size_t len,
 	const bool rier = (current->personality & READ_IMPLIES_EXEC) &&
 				(prot & PROT_READ);
 	struct mmu_gather tlb;
-	MA_STATE(mas, &current->mm->mm_mt, 0, 0);
+	struct vma_iterator vmi;
 
 	start = untagged_addr(start);
 
@@ -704,8 +704,8 @@ static int do_mprotect_pkey(unsigned long start, size_t len,
 	if ((pkey != -1) && !mm_pkey_is_allocated(current->mm, pkey))
 		goto out;
 
-	mas_set(&mas, start);
-	vma = mas_find(&mas, ULONG_MAX);
+	vma_iter_init(&vmi, current->mm, start);
+	vma = vma_find(&vmi, end);
 	error = -ENOMEM;
 	if (!vma)
 		goto out;
@@ -728,18 +728,22 @@ static int do_mprotect_pkey(unsigned long start, size_t len,
 		}
 	}
 
+	prev = vma_prev(&vmi);
 	if (start > vma->vm_start)
 		prev = vma;
-	else
-		prev = mas_prev(&mas, 0);
 
 	tlb_gather_mmu(&tlb, current->mm);
-	for (nstart = start ; ; ) {
+	nstart = start;
+	tmp = vma->vm_start;
+	for_each_vma_range(vmi, vma, end) {
 		unsigned long mask_off_old_flags;
 		unsigned long newflags;
 		int new_vma_pkey;
 
-		/* Here we know that vma->vm_start <= nstart < vma->vm_end. */
+		if (vma->vm_start != tmp) {
+			error = -ENOMEM;
+			break;
+		}
 
 		/* Does the application expect PROT_READ to imply PROT_EXEC */
 		if (rier && (vma->vm_flags & VM_MAYEXEC))
@@ -782,25 +786,18 @@ static int do_mprotect_pkey(unsigned long start, size_t len,
 				break;
 		}
 
-		error = mprotect_fixup(&tlb, vma, &prev, nstart, tmp, newflags);
+		error = mprotect_fixup(&vmi, &tlb, vma, &prev, nstart, tmp, newflags);
 		if (error)
 			break;
 
 		nstart = tmp;
-
-		if (nstart < prev->vm_end)
-			nstart = prev->vm_end;
-		if (nstart >= end)
-			break;
-
-		vma = find_vma(current->mm, prev->vm_end);
-		if (!vma || vma->vm_start != nstart) {
-			error = -ENOMEM;
-			break;
-		}
 		prot = reqprot;
 	}
 	tlb_finish_mmu(&tlb);
+
+	if (vma_iter_end(&vmi) < end)
+		error = -ENOMEM;
+
 out:
 	mmap_write_unlock(current->mm);
 	return error;
-- 
2.35.1

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 16/44] mlock: Convert mlock to vma iterator
  2023-01-05 19:15 [PATCH v2 00/44] VMA tree type safety and remove __vma_adjust() Liam Howlett
                   ` (14 preceding siblings ...)
  2023-01-05 19:15 ` [PATCH v2 13/44] ipc/shm: Use the vma iterator for munmap calls Liam Howlett
@ 2023-01-05 19:15 ` Liam Howlett
  2023-01-05 19:15 ` [PATCH v2 17/44] coredump: Convert " Liam Howlett
                   ` (28 subsequent siblings)
  44 siblings, 0 replies; 63+ messages in thread
From: Liam Howlett @ 2023-01-05 19:15 UTC (permalink / raw)
  To: maple-tree, linux-mm, linux-kernel, Andrew Morton
  Cc: Liam Howlett, Liam Howlett

From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>

Use the vma iterator so that the iterator can be invalidated or updated
to avoid each caller doing so.

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
---
 mm/mlock.c | 57 +++++++++++++++++++++++++++---------------------------
 1 file changed, 28 insertions(+), 29 deletions(-)

diff --git a/mm/mlock.c b/mm/mlock.c
index 7032f6dd0ce1..f06b02b631b5 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -401,8 +401,9 @@ static void mlock_vma_pages_range(struct vm_area_struct *vma,
  *
  * For vmas that pass the filters, merge/split as appropriate.
  */
-static int mlock_fixup(struct vm_area_struct *vma, struct vm_area_struct **prev,
-	unsigned long start, unsigned long end, vm_flags_t newflags)
+static int mlock_fixup(struct vma_iterator *vmi, struct vm_area_struct *vma,
+	       struct vm_area_struct **prev, unsigned long start,
+	       unsigned long end, vm_flags_t newflags)
 {
 	struct mm_struct *mm = vma->vm_mm;
 	pgoff_t pgoff;
@@ -417,22 +418,22 @@ static int mlock_fixup(struct vm_area_struct *vma, struct vm_area_struct **prev,
 		goto out;
 
 	pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
-	*prev = vma_merge(mm, *prev, start, end, newflags, vma->anon_vma,
-			  vma->vm_file, pgoff, vma_policy(vma),
-			  vma->vm_userfaultfd_ctx, anon_vma_name(vma));
+	*prev = vmi_vma_merge(vmi, mm, *prev, start, end, newflags,
+			vma->anon_vma, vma->vm_file, pgoff, vma_policy(vma),
+			vma->vm_userfaultfd_ctx, anon_vma_name(vma));
 	if (*prev) {
 		vma = *prev;
 		goto success;
 	}
 
 	if (start != vma->vm_start) {
-		ret = split_vma(mm, vma, start, 1);
+		ret = vmi_split_vma(vmi, mm, vma, start, 1);
 		if (ret)
 			goto out;
 	}
 
 	if (end != vma->vm_end) {
-		ret = split_vma(mm, vma, end, 0);
+		ret = vmi_split_vma(vmi, mm, vma, end, 0);
 		if (ret)
 			goto out;
 	}
@@ -471,7 +472,7 @@ static int apply_vma_lock_flags(unsigned long start, size_t len,
 	unsigned long nstart, end, tmp;
 	struct vm_area_struct *vma, *prev;
 	int error;
-	MA_STATE(mas, &current->mm->mm_mt, start, start);
+	VMA_ITERATOR(vmi, current->mm, start);
 
 	VM_BUG_ON(offset_in_page(start));
 	VM_BUG_ON(len != PAGE_ALIGN(len));
@@ -480,39 +481,37 @@ static int apply_vma_lock_flags(unsigned long start, size_t len,
 		return -EINVAL;
 	if (end == start)
 		return 0;
-	vma = mas_walk(&mas);
+	vma = vma_find(&vmi, end);
 	if (!vma)
 		return -ENOMEM;
 
+	prev = vma_prev(&vmi);
 	if (start > vma->vm_start)
 		prev = vma;
-	else
-		prev = mas_prev(&mas, 0);
 
-	for (nstart = start ; ; ) {
-		vm_flags_t newflags = vma->vm_flags & VM_LOCKED_CLEAR_MASK;
+	nstart = start;
+	tmp = vma->vm_start;
+	for_each_vma_range(vmi, vma, end) {
+		vm_flags_t newflags;
 
-		newflags |= flags;
+		if (vma->vm_start != tmp)
+			return -ENOMEM;
 
+		newflags = vma->vm_flags & VM_LOCKED_CLEAR_MASK;
+		newflags |= flags;
 		/* Here we know that  vma->vm_start <= nstart < vma->vm_end. */
 		tmp = vma->vm_end;
 		if (tmp > end)
 			tmp = end;
-		error = mlock_fixup(vma, &prev, nstart, tmp, newflags);
+		error = mlock_fixup(&vmi, vma, &prev, nstart, tmp, newflags);
 		if (error)
 			break;
 		nstart = tmp;
-		if (nstart < prev->vm_end)
-			nstart = prev->vm_end;
-		if (nstart >= end)
-			break;
-
-		vma = find_vma(prev->vm_mm, prev->vm_end);
-		if (!vma || vma->vm_start != nstart) {
-			error = -ENOMEM;
-			break;
-		}
 	}
+
+	if (vma_iter_end(&vmi) < end)
+		return -ENOMEM;
+
 	return error;
 }
 
@@ -658,7 +657,7 @@ SYSCALL_DEFINE2(munlock, unsigned long, start, size_t, len)
  */
 static int apply_mlockall_flags(int flags)
 {
-	MA_STATE(mas, &current->mm->mm_mt, 0, 0);
+	VMA_ITERATOR(vmi, current->mm, 0);
 	struct vm_area_struct *vma, *prev = NULL;
 	vm_flags_t to_add = 0;
 
@@ -679,15 +678,15 @@ static int apply_mlockall_flags(int flags)
 			to_add |= VM_LOCKONFAULT;
 	}
 
-	mas_for_each(&mas, vma, ULONG_MAX) {
+	for_each_vma(vmi, vma) {
 		vm_flags_t newflags;
 
 		newflags = vma->vm_flags & VM_LOCKED_CLEAR_MASK;
 		newflags |= to_add;
 
 		/* Ignore errors */
-		mlock_fixup(vma, &prev, vma->vm_start, vma->vm_end, newflags);
-		mas_pause(&mas);
+		mlock_fixup(&vmi, vma, &prev, vma->vm_start, vma->vm_end,
+			    newflags);
 		cond_resched();
 	}
 out:
-- 
2.35.1

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 18/44] mempolicy: Convert to vma iterator
  2023-01-05 19:15 [PATCH v2 00/44] VMA tree type safety and remove __vma_adjust() Liam Howlett
                   ` (16 preceding siblings ...)
  2023-01-05 19:15 ` [PATCH v2 17/44] coredump: Convert " Liam Howlett
@ 2023-01-05 19:15 ` Liam Howlett
  2023-01-05 19:15 ` [PATCH v2 22/44] mmap: Pass through vmi iterator to __split_vma() Liam Howlett
                   ` (26 subsequent siblings)
  44 siblings, 0 replies; 63+ messages in thread
From: Liam Howlett @ 2023-01-05 19:15 UTC (permalink / raw)
  To: maple-tree, linux-mm, linux-kernel, Andrew Morton
  Cc: Liam Howlett, Liam Howlett

From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>

Use the vma iterator so that the iterator can be invalidated or updated
to avoid each caller doing so.

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
---
 mm/mempolicy.c | 25 ++++++++-----------------
 1 file changed, 8 insertions(+), 17 deletions(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 02c8a712282f..6f41a30c24d5 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -787,24 +787,21 @@ static int vma_replace_policy(struct vm_area_struct *vma,
 static int mbind_range(struct mm_struct *mm, unsigned long start,
 		       unsigned long end, struct mempolicy *new_pol)
 {
-	MA_STATE(mas, &mm->mm_mt, start, start);
+	VMA_ITERATOR(vmi, mm, start);
 	struct vm_area_struct *prev;
 	struct vm_area_struct *vma;
 	int err = 0;
 	pgoff_t pgoff;
 
-	prev = mas_prev(&mas, 0);
-	if (unlikely(!prev))
-		mas_set(&mas, start);
-
-	vma = mas_find(&mas, end - 1);
+	prev = vma_prev(&vmi);
+	vma = vma_find(&vmi, end);
 	if (WARN_ON(!vma))
 		return 0;
 
 	if (start > vma->vm_start)
 		prev = vma;
 
-	for (; vma; vma = mas_next(&mas, end - 1)) {
+	do {
 		unsigned long vmstart = max(start, vma->vm_start);
 		unsigned long vmend = min(end, vma->vm_end);
 
@@ -813,29 +810,23 @@ static int mbind_range(struct mm_struct *mm, unsigned long start,
 
 		pgoff = vma->vm_pgoff +
 			((vmstart - vma->vm_start) >> PAGE_SHIFT);
-		prev = vma_merge(mm, prev, vmstart, vmend, vma->vm_flags,
+		prev = vmi_vma_merge(&vmi, mm, prev, vmstart, vmend, vma->vm_flags,
 				 vma->anon_vma, vma->vm_file, pgoff,
 				 new_pol, vma->vm_userfaultfd_ctx,
 				 anon_vma_name(vma));
 		if (prev) {
-			/* vma_merge() invalidated the mas */
-			mas_pause(&mas);
 			vma = prev;
 			goto replace;
 		}
 		if (vma->vm_start != vmstart) {
-			err = split_vma(vma->vm_mm, vma, vmstart, 1);
+			err = vmi_split_vma(&vmi, vma->vm_mm, vma, vmstart, 1);
 			if (err)
 				goto out;
-			/* split_vma() invalidated the mas */
-			mas_pause(&mas);
 		}
 		if (vma->vm_end != vmend) {
-			err = split_vma(vma->vm_mm, vma, vmend, 0);
+			err = vmi_split_vma(&vmi, vma->vm_mm, vma, vmend, 0);
 			if (err)
 				goto out;
-			/* split_vma() invalidated the mas */
-			mas_pause(&mas);
 		}
 replace:
 		err = vma_replace_policy(vma, new_pol);
@@ -843,7 +834,7 @@ static int mbind_range(struct mm_struct *mm, unsigned long start,
 			goto out;
 next:
 		prev = vma;
-	}
+	} for_each_vma_range(vmi, vma, end);
 
 out:
 	return err;
-- 
2.35.1

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 17/44] coredump: Convert to vma iterator
  2023-01-05 19:15 [PATCH v2 00/44] VMA tree type safety and remove __vma_adjust() Liam Howlett
                   ` (15 preceding siblings ...)
  2023-01-05 19:15 ` [PATCH v2 16/44] mlock: Convert mlock to vma iterator Liam Howlett
@ 2023-01-05 19:15 ` Liam Howlett
  2023-01-05 19:15 ` [PATCH v2 18/44] mempolicy: " Liam Howlett
                   ` (27 subsequent siblings)
  44 siblings, 0 replies; 63+ messages in thread
From: Liam Howlett @ 2023-01-05 19:15 UTC (permalink / raw)
  To: maple-tree, linux-mm, linux-kernel, Andrew Morton
  Cc: Liam Howlett, Liam Howlett

From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>

Use the vma iterator so that the iterator can be invalidated or updated
to avoid each caller doing so.

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
---
 fs/coredump.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/coredump.c b/fs/coredump.c
index de78bde2991b..f27d734f3102 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -1111,14 +1111,14 @@ static unsigned long vma_dump_size(struct vm_area_struct *vma,
  * Helper function for iterating across a vma list.  It ensures that the caller
  * will visit `gate_vma' prior to terminating the search.
  */
-static struct vm_area_struct *coredump_next_vma(struct ma_state *mas,
+static struct vm_area_struct *coredump_next_vma(struct vma_iterator *vmi,
 				       struct vm_area_struct *vma,
 				       struct vm_area_struct *gate_vma)
 {
 	if (gate_vma && (vma == gate_vma))
 		return NULL;
 
-	vma = mas_next(mas, ULONG_MAX);
+	vma = vma_next(vmi);
 	if (vma)
 		return vma;
 	return gate_vma;
@@ -1146,7 +1146,7 @@ static bool dump_vma_snapshot(struct coredump_params *cprm)
 {
 	struct vm_area_struct *gate_vma, *vma = NULL;
 	struct mm_struct *mm = current->mm;
-	MA_STATE(mas, &mm->mm_mt, 0, 0);
+	VMA_ITERATOR(vmi, mm, 0);
 	int i = 0;
 
 	/*
@@ -1167,7 +1167,7 @@ static bool dump_vma_snapshot(struct coredump_params *cprm)
 		return false;
 	}
 
-	while ((vma = coredump_next_vma(&mas, vma, gate_vma)) != NULL) {
+	while ((vma = coredump_next_vma(&vmi, vma, gate_vma)) != NULL) {
 		struct core_vma_metadata *m = cprm->vma_meta + i;
 
 		m->start = vma->vm_start;
-- 
2.35.1

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 21/44] madvise: Use vmi iterator for __split_vma() and vma_merge()
  2023-01-05 19:15 [PATCH v2 00/44] VMA tree type safety and remove __vma_adjust() Liam Howlett
                   ` (19 preceding siblings ...)
  2023-01-05 19:15 ` [PATCH v2 19/44] task_mmu: Convert to vma iterator Liam Howlett
@ 2023-01-05 19:15 ` Liam Howlett
  2023-01-05 19:15 ` [PATCH v2 20/44] sched: Convert to vma iterator Liam Howlett
                   ` (23 subsequent siblings)
  44 siblings, 0 replies; 63+ messages in thread
From: Liam Howlett @ 2023-01-05 19:15 UTC (permalink / raw)
  To: maple-tree, linux-mm, linux-kernel, Andrew Morton
  Cc: Liam Howlett, Liam Howlett

From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>

Use the vma iterator so that the iterator can be invalidated or updated
to avoid each caller doing so.

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
---
 mm/madvise.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index a56a6d17e201..4ee85b85806a 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -142,6 +142,7 @@ static int madvise_update_vma(struct vm_area_struct *vma,
 	struct mm_struct *mm = vma->vm_mm;
 	int error;
 	pgoff_t pgoff;
+	VMA_ITERATOR(vmi, mm, 0);
 
 	if (new_flags == vma->vm_flags && anon_vma_name_eq(anon_vma_name(vma), anon_name)) {
 		*prev = vma;
@@ -149,8 +150,8 @@ static int madvise_update_vma(struct vm_area_struct *vma,
 	}
 
 	pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
-	*prev = vma_merge(mm, *prev, start, end, new_flags, vma->anon_vma,
-			  vma->vm_file, pgoff, vma_policy(vma),
+	*prev = vmi_vma_merge(&vmi, mm, *prev, start, end, new_flags,
+			  vma->anon_vma, vma->vm_file, pgoff, vma_policy(vma),
 			  vma->vm_userfaultfd_ctx, anon_name);
 	if (*prev) {
 		vma = *prev;
@@ -162,7 +163,7 @@ static int madvise_update_vma(struct vm_area_struct *vma,
 	if (start != vma->vm_start) {
 		if (unlikely(mm->map_count >= sysctl_max_map_count))
 			return -ENOMEM;
-		error = __split_vma(mm, vma, start, 1);
+		error = vmi__split_vma(&vmi, mm, vma, start, 1);
 		if (error)
 			return error;
 	}
@@ -170,7 +171,7 @@ static int madvise_update_vma(struct vm_area_struct *vma,
 	if (end != vma->vm_end) {
 		if (unlikely(mm->map_count >= sysctl_max_map_count))
 			return -ENOMEM;
-		error = __split_vma(mm, vma, end, 0);
+		error = vmi__split_vma(&vmi, mm, vma, end, 0);
 		if (error)
 			return error;
 	}
-- 
2.35.1

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 19/44] task_mmu: Convert to vma iterator
  2023-01-05 19:15 [PATCH v2 00/44] VMA tree type safety and remove __vma_adjust() Liam Howlett
                   ` (18 preceding siblings ...)
  2023-01-05 19:15 ` [PATCH v2 22/44] mmap: Pass through vmi iterator to __split_vma() Liam Howlett
@ 2023-01-05 19:15 ` Liam Howlett
  2023-01-05 19:15 ` [PATCH v2 21/44] madvise: Use vmi iterator for __split_vma() and vma_merge() Liam Howlett
                   ` (24 subsequent siblings)
  44 siblings, 0 replies; 63+ messages in thread
From: Liam Howlett @ 2023-01-05 19:15 UTC (permalink / raw)
  To: maple-tree, linux-mm, linux-kernel, Andrew Morton
  Cc: Liam Howlett, Liam Howlett

From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>

Use the vma iterator so that the iterator can be invalidated or updated
to avoid each caller doing so.

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
---
 fs/proc/task_mmu.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index e35a0398db63..2bae7c80d502 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -892,7 +892,7 @@ static int show_smaps_rollup(struct seq_file *m, void *v)
 	struct vm_area_struct *vma;
 	unsigned long vma_start = 0, last_vma_end = 0;
 	int ret = 0;
-	MA_STATE(mas, &mm->mm_mt, 0, 0);
+	VMA_ITERATOR(vmi, mm, 0);
 
 	priv->task = get_proc_task(priv->inode);
 	if (!priv->task)
@@ -910,7 +910,7 @@ static int show_smaps_rollup(struct seq_file *m, void *v)
 		goto out_put_mm;
 
 	hold_task_mempolicy(priv);
-	vma = mas_find(&mas, ULONG_MAX);
+	vma = vma_next(&vmi);
 
 	if (unlikely(!vma))
 		goto empty_set;
@@ -925,7 +925,7 @@ static int show_smaps_rollup(struct seq_file *m, void *v)
 		 * access it for write request.
 		 */
 		if (mmap_lock_is_contended(mm)) {
-			mas_pause(&mas);
+			vma_iter_set(&vmi, vma->vm_end);
 			mmap_read_unlock(mm);
 			ret = mmap_read_lock_killable(mm);
 			if (ret) {
@@ -969,7 +969,7 @@ static int show_smaps_rollup(struct seq_file *m, void *v)
 			 *    contains last_vma_end.
 			 *    Iterate VMA' from last_vma_end.
 			 */
-			vma = mas_find(&mas, ULONG_MAX);
+			vma = vma_next(&vmi);
 			/* Case 3 above */
 			if (!vma)
 				break;
@@ -983,7 +983,7 @@ static int show_smaps_rollup(struct seq_file *m, void *v)
 				smap_gather_stats(vma, &mss, last_vma_end);
 		}
 		/* Case 2 above */
-	} while ((vma = mas_find(&mas, ULONG_MAX)) != NULL);
+	} for_each_vma(vmi, vma);
 
 empty_set:
 	show_vma_header_prefix(m, vma_start, last_vma_end, 0, 0, 0, 0);
@@ -1279,7 +1279,7 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf,
 		return -ESRCH;
 	mm = get_task_mm(task);
 	if (mm) {
-		MA_STATE(mas, &mm->mm_mt, 0, 0);
+		VMA_ITERATOR(vmi, mm, 0);
 		struct mmu_notifier_range range;
 		struct clear_refs_private cp = {
 			.type = type,
@@ -1299,7 +1299,7 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf,
 		}
 
 		if (type == CLEAR_REFS_SOFT_DIRTY) {
-			mas_for_each(&mas, vma, ULONG_MAX) {
+			for_each_vma(vmi, vma) {
 				if (!(vma->vm_flags & VM_SOFTDIRTY))
 					continue;
 				vma->vm_flags &= ~VM_SOFTDIRTY;
-- 
2.35.1

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 20/44] sched: Convert to vma iterator
  2023-01-05 19:15 [PATCH v2 00/44] VMA tree type safety and remove __vma_adjust() Liam Howlett
                   ` (20 preceding siblings ...)
  2023-01-05 19:15 ` [PATCH v2 21/44] madvise: Use vmi iterator for __split_vma() and vma_merge() Liam Howlett
@ 2023-01-05 19:15 ` Liam Howlett
  2023-01-05 19:15 ` [PATCH v2 23/44] mmap: Use vmi version of vma_merge() Liam Howlett
                   ` (22 subsequent siblings)
  44 siblings, 0 replies; 63+ messages in thread
From: Liam Howlett @ 2023-01-05 19:15 UTC (permalink / raw)
  To: maple-tree, linux-mm, linux-kernel, Andrew Morton
  Cc: Liam Howlett, Liam Howlett

From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>

Use the vma iterator so that the iterator can be invalidated or updated
to avoid each caller doing so.

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
---
 kernel/sched/fair.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c36aa54ae071..9c9950249d7b 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2938,11 +2938,11 @@ static void task_numa_work(struct callback_head *work)
 	struct task_struct *p = current;
 	struct mm_struct *mm = p->mm;
 	u64 runtime = p->se.sum_exec_runtime;
-	MA_STATE(mas, &mm->mm_mt, 0, 0);
 	struct vm_area_struct *vma;
 	unsigned long start, end;
 	unsigned long nr_pte_updates = 0;
 	long pages, virtpages;
+	struct vma_iterator vmi;
 
 	SCHED_WARN_ON(p != container_of(work, struct task_struct, numa_work));
 
@@ -2995,16 +2995,16 @@ static void task_numa_work(struct callback_head *work)
 
 	if (!mmap_read_trylock(mm))
 		return;
-	mas_set(&mas, start);
-	vma = mas_find(&mas, ULONG_MAX);
+	vma_iter_init(&vmi, mm, start);
+	vma = vma_next(&vmi);
 	if (!vma) {
 		reset_ptenuma_scan(p);
 		start = 0;
-		mas_set(&mas, start);
-		vma = mas_find(&mas, ULONG_MAX);
+		vma_iter_set(&vmi, start);
+		vma = vma_next(&vmi);
 	}
 
-	for (; vma; vma = mas_find(&mas, ULONG_MAX)) {
+	do {
 		if (!vma_migratable(vma) || !vma_policy_mof(vma) ||
 			is_vm_hugetlb_page(vma) || (vma->vm_flags & VM_MIXEDMAP)) {
 			continue;
@@ -3051,7 +3051,7 @@ static void task_numa_work(struct callback_head *work)
 
 			cond_resched();
 		} while (end != vma->vm_end);
-	}
+	} for_each_vma(vmi, vma);
 
 out:
 	/*
-- 
2.35.1

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 22/44] mmap: Pass through vmi iterator to __split_vma()
  2023-01-05 19:15 [PATCH v2 00/44] VMA tree type safety and remove __vma_adjust() Liam Howlett
                   ` (17 preceding siblings ...)
  2023-01-05 19:15 ` [PATCH v2 18/44] mempolicy: " Liam Howlett
@ 2023-01-05 19:15 ` Liam Howlett
  2023-01-07  2:01   ` SeongJae Park
  2023-01-05 19:15 ` [PATCH v2 19/44] task_mmu: Convert to vma iterator Liam Howlett
                   ` (25 subsequent siblings)
  44 siblings, 1 reply; 63+ messages in thread
From: Liam Howlett @ 2023-01-05 19:15 UTC (permalink / raw)
  To: maple-tree, linux-mm, linux-kernel, Andrew Morton
  Cc: Liam Howlett, Liam Howlett

From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>

Use the vma iterator so that the iterator can be invalidated or updated
to avoid each caller doing so.

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
---
 mm/mmap.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index 4dd7e48a312f..80f12fcf158c 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2446,7 +2446,7 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
 		if (end < vma->vm_end && mm->map_count >= sysctl_max_map_count)
 			goto map_count_exceeded;
 
-		error = __split_vma(mm, vma, start, 0);
+		error = vmi__split_vma(vmi, mm, vma, start, 0);
 		if (error)
 			goto start_split_failed;
 
@@ -2467,7 +2467,7 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
 		if (next->vm_end > end) {
 			struct vm_area_struct *split;
 
-			error = __split_vma(mm, next, end, 1);
+			error = vmi__split_vma(vmi, mm, next, end, 1);
 			if (error)
 				goto end_split_failed;
 
-- 
2.35.1

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 23/44] mmap: Use vmi version of vma_merge()
  2023-01-05 19:15 [PATCH v2 00/44] VMA tree type safety and remove __vma_adjust() Liam Howlett
                   ` (21 preceding siblings ...)
  2023-01-05 19:15 ` [PATCH v2 20/44] sched: Convert to vma iterator Liam Howlett
@ 2023-01-05 19:15 ` Liam Howlett
  2023-01-05 19:15 ` [PATCH v2 25/44] mm: Switch vma_merge(), split_vma(), and __split_vma to vma iterator Liam Howlett
                   ` (21 subsequent siblings)
  44 siblings, 0 replies; 63+ messages in thread
From: Liam Howlett @ 2023-01-05 19:15 UTC (permalink / raw)
  To: maple-tree, linux-mm, linux-kernel, Andrew Morton
  Cc: Liam Howlett, Liam Howlett

From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>

Use the vma iterator so that the iterator can be invalidated or updated
to avoid each caller doing so.

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
---
 mm/mmap.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index 80f12fcf158c..579d586e4e6a 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2748,8 +2748,9 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
 		 * vma again as we may succeed this time.
 		 */
 		if (unlikely(vm_flags != vma->vm_flags && prev)) {
-			merge = vma_merge(mm, prev, vma->vm_start, vma->vm_end, vma->vm_flags,
-				NULL, vma->vm_file, vma->vm_pgoff, NULL, NULL_VM_UFFD_CTX, NULL);
+			merge = vmi_vma_merge(&vmi, mm, prev, vma->vm_start,
+				vma->vm_end, vma->vm_flags, NULL, vma->vm_file,
+				vma->vm_pgoff, NULL, NULL_VM_UFFD_CTX, NULL);
 			if (merge) {
 				/*
 				 * ->mmap() can change vma->vm_file and fput
@@ -3280,6 +3281,7 @@ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap,
 	struct mm_struct *mm = vma->vm_mm;
 	struct vm_area_struct *new_vma, *prev;
 	bool faulted_in_anon_vma = true;
+	VMA_ITERATOR(vmi, mm, addr);
 
 	validate_mm_mt(mm);
 	/*
@@ -3295,7 +3297,7 @@ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap,
 	if (new_vma && new_vma->vm_start < addr + len)
 		return NULL;	/* should never get here */
 
-	new_vma = vma_merge(mm, prev, addr, addr + len, vma->vm_flags,
+	new_vma = vmi_vma_merge(&vmi, mm, prev, addr, addr + len, vma->vm_flags,
 			    vma->anon_vma, vma->vm_file, pgoff, vma_policy(vma),
 			    vma->vm_userfaultfd_ctx, anon_vma_name(vma));
 	if (new_vma) {
-- 
2.35.1

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 25/44] mm: Switch vma_merge(), split_vma(), and __split_vma to vma iterator
  2023-01-05 19:15 [PATCH v2 00/44] VMA tree type safety and remove __vma_adjust() Liam Howlett
                   ` (22 preceding siblings ...)
  2023-01-05 19:15 ` [PATCH v2 23/44] mmap: Use vmi version of vma_merge() Liam Howlett
@ 2023-01-05 19:15 ` Liam Howlett
  2023-01-06 17:23   ` SeongJae Park
  2023-01-05 19:15 ` [PATCH v2 24/44] mm/mremap: Use vmi version of vma_merge() Liam Howlett
                   ` (20 subsequent siblings)
  44 siblings, 1 reply; 63+ messages in thread
From: Liam Howlett @ 2023-01-05 19:15 UTC (permalink / raw)
  To: maple-tree, linux-mm, linux-kernel, Andrew Morton
  Cc: Liam Howlett, Liam Howlett

From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>

Drop the vmi_* functions and transition all users to use the vma
iterator directly.

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
---
 fs/userfaultfd.c   | 14 ++++----
 include/linux/mm.h | 16 +++-------
 mm/madvise.c       |  6 ++--
 mm/mempolicy.c     |  6 ++--
 mm/mlock.c         |  6 ++--
 mm/mmap.c          | 79 +++++++++++++---------------------------------
 mm/mprotect.c      |  6 ++--
 mm/mremap.c        |  2 +-
 8 files changed, 47 insertions(+), 88 deletions(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index b3249388696a..e60f86d6b91c 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -883,7 +883,7 @@ static int userfaultfd_release(struct inode *inode, struct file *file)
 			continue;
 		}
 		new_flags = vma->vm_flags & ~__VM_UFFD_FLAGS;
-		prev = vmi_vma_merge(&vmi, mm, prev, vma->vm_start, vma->vm_end,
+		prev = vma_merge(&vmi, mm, prev, vma->vm_start, vma->vm_end,
 				 new_flags, vma->anon_vma,
 				 vma->vm_file, vma->vm_pgoff,
 				 vma_policy(vma),
@@ -1426,7 +1426,7 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
 		vma_end = min(end, vma->vm_end);
 
 		new_flags = (vma->vm_flags & ~__VM_UFFD_FLAGS) | vm_flags;
-		prev = vmi_vma_merge(&vmi, mm, prev, start, vma_end, new_flags,
+		prev = vma_merge(&vmi, mm, prev, start, vma_end, new_flags,
 				 vma->anon_vma, vma->vm_file, vma->vm_pgoff,
 				 vma_policy(vma),
 				 ((struct vm_userfaultfd_ctx){ ctx }),
@@ -1437,12 +1437,12 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
 			goto next;
 		}
 		if (vma->vm_start < start) {
-			ret = vmi_split_vma(&vmi, mm, vma, start, 1);
+			ret = split_vma(&vmi, vma, start, 1);
 			if (ret)
 				break;
 		}
 		if (vma->vm_end > end) {
-			ret = vmi_split_vma(&vmi, mm, vma, end, 0);
+			ret = split_vma(&vmi, vma, end, 0);
 			if (ret)
 				break;
 		}
@@ -1606,7 +1606,7 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
 			uffd_wp_range(mm, vma, start, vma_end - start, false);
 
 		new_flags = vma->vm_flags & ~__VM_UFFD_FLAGS;
-		prev = vmi_vma_merge(&vmi, mm, prev, start, vma_end, new_flags,
+		prev = vma_merge(&vmi, mm, prev, start, vma_end, new_flags,
 				 vma->anon_vma, vma->vm_file, vma->vm_pgoff,
 				 vma_policy(vma),
 				 NULL_VM_UFFD_CTX, anon_vma_name(vma));
@@ -1615,13 +1615,13 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
 			goto next;
 		}
 		if (vma->vm_start < start) {
-			ret = vmi_split_vma(&vmi, mm, vma, start, 1);
+			ret = split_vma(&vmi, vma, start, 1);
 			if (ret)
 				break;
 		}
 		if (vma->vm_end > end) {
 			vma_iter_set(&vmi, vma->vm_end);
-			ret = vmi_split_vma(&vmi, mm, vma, end, 0);
+			ret = split_vma(&vmi, vma, end, 0);
 			if (ret)
 				break;
 		}
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 98c91a25d257..71474615b4ab 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2830,22 +2830,16 @@ static inline int vma_adjust(struct vm_area_struct *vma, unsigned long start,
 {
 	return __vma_adjust(vma, start, end, pgoff, insert, NULL);
 }
-extern struct vm_area_struct *vma_merge(struct mm_struct *,
-	struct vm_area_struct *prev, unsigned long addr, unsigned long end,
-	unsigned long vm_flags, struct anon_vma *, struct file *, pgoff_t,
-	struct mempolicy *, struct vm_userfaultfd_ctx, struct anon_vma_name *);
-extern struct vm_area_struct *vmi_vma_merge(struct vma_iterator *vmi,
+extern struct vm_area_struct *vma_merge(struct vma_iterator *vmi,
 	struct mm_struct *, struct vm_area_struct *prev, unsigned long addr,
 	unsigned long end, unsigned long vm_flags, struct anon_vma *,
 	struct file *, pgoff_t, struct mempolicy *, struct vm_userfaultfd_ctx,
 	struct anon_vma_name *);
 extern struct anon_vma *find_mergeable_anon_vma(struct vm_area_struct *);
-extern int vmi__split_vma(struct vma_iterator *vmi, struct mm_struct *,
-	struct vm_area_struct *, unsigned long addr, int new_below);
-extern int split_vma(struct mm_struct *, struct vm_area_struct *,
-	unsigned long addr, int new_below);
-extern int vmi_split_vma(struct vma_iterator *vmi, struct mm_struct *,
-		struct vm_area_struct *, unsigned long addr, int new_below);
+extern int __split_vma(struct vma_iterator *vmi, struct vm_area_struct *,
+		       unsigned long addr, int new_below);
+extern int split_vma(struct vma_iterator *vmi, struct vm_area_struct *,
+			 unsigned long addr, int new_below);
 extern int insert_vm_struct(struct mm_struct *, struct vm_area_struct *);
 extern void unlink_file_vma(struct vm_area_struct *);
 extern struct vm_area_struct *copy_vma(struct vm_area_struct **,
diff --git a/mm/madvise.c b/mm/madvise.c
index 4ee85b85806a..4115516f58dd 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -150,7 +150,7 @@ static int madvise_update_vma(struct vm_area_struct *vma,
 	}
 
 	pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
-	*prev = vmi_vma_merge(&vmi, mm, *prev, start, end, new_flags,
+	*prev = vma_merge(&vmi, mm, *prev, start, end, new_flags,
 			  vma->anon_vma, vma->vm_file, pgoff, vma_policy(vma),
 			  vma->vm_userfaultfd_ctx, anon_name);
 	if (*prev) {
@@ -163,7 +163,7 @@ static int madvise_update_vma(struct vm_area_struct *vma,
 	if (start != vma->vm_start) {
 		if (unlikely(mm->map_count >= sysctl_max_map_count))
 			return -ENOMEM;
-		error = vmi__split_vma(&vmi, mm, vma, start, 1);
+		error = __split_vma(&vmi, vma, start, 1);
 		if (error)
 			return error;
 	}
@@ -171,7 +171,7 @@ static int madvise_update_vma(struct vm_area_struct *vma,
 	if (end != vma->vm_end) {
 		if (unlikely(mm->map_count >= sysctl_max_map_count))
 			return -ENOMEM;
-		error = vmi__split_vma(&vmi, mm, vma, end, 0);
+		error = __split_vma(&vmi, vma, end, 0);
 		if (error)
 			return error;
 	}
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 6f41a30c24d5..171525b0c7a8 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -810,7 +810,7 @@ static int mbind_range(struct mm_struct *mm, unsigned long start,
 
 		pgoff = vma->vm_pgoff +
 			((vmstart - vma->vm_start) >> PAGE_SHIFT);
-		prev = vmi_vma_merge(&vmi, mm, prev, vmstart, vmend, vma->vm_flags,
+		prev = vma_merge(&vmi, mm, prev, vmstart, vmend, vma->vm_flags,
 				 vma->anon_vma, vma->vm_file, pgoff,
 				 new_pol, vma->vm_userfaultfd_ctx,
 				 anon_vma_name(vma));
@@ -819,12 +819,12 @@ static int mbind_range(struct mm_struct *mm, unsigned long start,
 			goto replace;
 		}
 		if (vma->vm_start != vmstart) {
-			err = vmi_split_vma(&vmi, vma->vm_mm, vma, vmstart, 1);
+			err = split_vma(&vmi, vma, vmstart, 1);
 			if (err)
 				goto out;
 		}
 		if (vma->vm_end != vmend) {
-			err = vmi_split_vma(&vmi, vma->vm_mm, vma, vmend, 0);
+			err = split_vma(&vmi, vma, vmend, 0);
 			if (err)
 				goto out;
 		}
diff --git a/mm/mlock.c b/mm/mlock.c
index f06b02b631b5..393cddee2f06 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -418,7 +418,7 @@ static int mlock_fixup(struct vma_iterator *vmi, struct vm_area_struct *vma,
 		goto out;
 
 	pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
-	*prev = vmi_vma_merge(vmi, mm, *prev, start, end, newflags,
+	*prev = vma_merge(vmi, mm, *prev, start, end, newflags,
 			vma->anon_vma, vma->vm_file, pgoff, vma_policy(vma),
 			vma->vm_userfaultfd_ctx, anon_vma_name(vma));
 	if (*prev) {
@@ -427,13 +427,13 @@ static int mlock_fixup(struct vma_iterator *vmi, struct vm_area_struct *vma,
 	}
 
 	if (start != vma->vm_start) {
-		ret = vmi_split_vma(vmi, mm, vma, start, 1);
+		ret = split_vma(vmi, vma, start, 1);
 		if (ret)
 			goto out;
 	}
 
 	if (end != vma->vm_end) {
-		ret = vmi_split_vma(vmi, mm, vma, end, 0);
+		ret = split_vma(vmi, vma, end, 0);
 		if (ret)
 			goto out;
 	}
diff --git a/mm/mmap.c b/mm/mmap.c
index 579d586e4e6a..8e7f4fc36960 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1072,7 +1072,7 @@ can_vma_merge_after(struct vm_area_struct *vma, unsigned long vm_flags,
  * parameter) may establish ptes with the wrong permissions of NNNN
  * instead of the right permissions of XXXX.
  */
-struct vm_area_struct *vma_merge(struct mm_struct *mm,
+struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
 			struct vm_area_struct *prev, unsigned long addr,
 			unsigned long end, unsigned long vm_flags,
 			struct anon_vma *anon_vma, struct file *file,
@@ -1081,7 +1081,7 @@ struct vm_area_struct *vma_merge(struct mm_struct *mm,
 			struct anon_vma_name *anon_name)
 {
 	pgoff_t pglen = (end - addr) >> PAGE_SHIFT;
-	struct vm_area_struct *mid, *next, *res;
+	struct vm_area_struct *mid, *next, *res = NULL;
 	int err = -1;
 	bool merge_prev = false;
 	bool merge_next = false;
@@ -1147,26 +1147,11 @@ struct vm_area_struct *vma_merge(struct mm_struct *mm,
 	if (err)
 		return NULL;
 	khugepaged_enter_vma(res, vm_flags);
-	return res;
-}
 
-struct vm_area_struct *vmi_vma_merge(struct vma_iterator *vmi,
-			struct mm_struct *mm,
-			struct vm_area_struct *prev, unsigned long addr,
-			unsigned long end, unsigned long vm_flags,
-			struct anon_vma *anon_vma, struct file *file,
-			pgoff_t pgoff, struct mempolicy *policy,
-			struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
-			struct anon_vma_name *anon_name)
-{
-	struct vm_area_struct *tmp;
-
-	tmp = vma_merge(mm, prev, addr, end, vm_flags, anon_vma, file, pgoff,
-			policy, vm_userfaultfd_ctx, anon_name);
-	if (tmp)
+	if (res)
 		vma_iter_set(vmi, end);
 
-	return tmp;
+	return res;
 }
 
 /*
@@ -2286,12 +2271,14 @@ static void unmap_region(struct mm_struct *mm, struct maple_tree *mt,
  * __split_vma() bypasses sysctl_max_map_count checking.  We use this where it
  * has already been checked or doesn't make sense to fail.
  */
-int __split_vma(struct mm_struct *mm, struct vm_area_struct *vma,
+int __split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma,
 		unsigned long addr, int new_below)
 {
 	struct vm_area_struct *new;
 	int err;
-	validate_mm_mt(mm);
+	unsigned long end = vma->vm_end;
+
+	validate_mm_mt(vma->vm_mm);
 
 	if (vma->vm_ops && vma->vm_ops->may_split) {
 		err = vma->vm_ops->may_split(vma, addr);
@@ -2331,8 +2318,10 @@ int __split_vma(struct mm_struct *mm, struct vm_area_struct *vma,
 		err = vma_adjust(vma, vma->vm_start, addr, vma->vm_pgoff, new);
 
 	/* Success. */
-	if (!err)
+	if (!err) {
+		vma_iter_set(vmi, end);
 		return 0;
+	}
 
 	/* Avoid vm accounting in close() operation */
 	new->vm_start = new->vm_end;
@@ -2347,46 +2336,21 @@ int __split_vma(struct mm_struct *mm, struct vm_area_struct *vma,
 	mpol_put(vma_policy(new));
  out_free_vma:
 	vm_area_free(new);
-	validate_mm_mt(mm);
+	validate_mm_mt(vma->vm_mm);
 	return err;
 }
-int vmi__split_vma(struct vma_iterator *vmi, struct mm_struct *mm,
-		   struct vm_area_struct *vma, unsigned long addr, int new_below)
-{
-	int ret;
-	unsigned long end = vma->vm_end;
-
-	ret = __split_vma(mm, vma, addr, new_below);
-	if (!ret)
-		vma_iter_set(vmi, end);
-
-	return ret;
-}
 
 /*
  * Split a vma into two pieces at address 'addr', a new vma is allocated
  * either for the first part or the tail.
  */
-int split_vma(struct mm_struct *mm, struct vm_area_struct *vma,
+int split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma,
 	      unsigned long addr, int new_below)
 {
-	if (mm->map_count >= sysctl_max_map_count)
+	if (vma->vm_mm->map_count >= sysctl_max_map_count)
 		return -ENOMEM;
 
-	return __split_vma(mm, vma, addr, new_below);
-}
-
-int vmi_split_vma(struct vma_iterator *vmi, struct mm_struct *mm,
-		  struct vm_area_struct *vma, unsigned long addr, int new_below)
-{
-	int ret;
-	unsigned long end = vma->vm_end;
-
-	ret = split_vma(mm, vma, addr, new_below);
-	if (!ret)
-		vma_iter_set(vmi, end);
-
-	return ret;
+	return __split_vma(vmi, vma, addr, new_below);
 }
 
 static inline int munmap_sidetree(struct vm_area_struct *vma,
@@ -2446,7 +2410,7 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
 		if (end < vma->vm_end && mm->map_count >= sysctl_max_map_count)
 			goto map_count_exceeded;
 
-		error = vmi__split_vma(vmi, mm, vma, start, 0);
+		error = __split_vma(vmi, vma, start, 0);
 		if (error)
 			goto start_split_failed;
 
@@ -2467,7 +2431,7 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
 		if (next->vm_end > end) {
 			struct vm_area_struct *split;
 
-			error = vmi__split_vma(vmi, mm, next, end, 1);
+			error = __split_vma(vmi, next, end, 1);
 			if (error)
 				goto end_split_failed;
 
@@ -2748,9 +2712,10 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
 		 * vma again as we may succeed this time.
 		 */
 		if (unlikely(vm_flags != vma->vm_flags && prev)) {
-			merge = vmi_vma_merge(&vmi, mm, prev, vma->vm_start,
-				vma->vm_end, vma->vm_flags, NULL, vma->vm_file,
-				vma->vm_pgoff, NULL, NULL_VM_UFFD_CTX, NULL);
+			merge = vma_merge(&vmi, mm, prev, vma->vm_start,
+				    vma->vm_end, vma->vm_flags, NULL,
+				    vma->vm_file, vma->vm_pgoff, NULL,
+				    NULL_VM_UFFD_CTX, NULL);
 			if (merge) {
 				/*
 				 * ->mmap() can change vma->vm_file and fput
@@ -3297,7 +3262,7 @@ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap,
 	if (new_vma && new_vma->vm_start < addr + len)
 		return NULL;	/* should never get here */
 
-	new_vma = vmi_vma_merge(&vmi, mm, prev, addr, addr + len, vma->vm_flags,
+	new_vma = vma_merge(&vmi, mm, prev, addr, addr + len, vma->vm_flags,
 			    vma->anon_vma, vma->vm_file, pgoff, vma_policy(vma),
 			    vma->vm_userfaultfd_ctx, anon_vma_name(vma));
 	if (new_vma) {
diff --git a/mm/mprotect.c b/mm/mprotect.c
index 7e6cb2165000..057b7e3e93bb 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -605,7 +605,7 @@ mprotect_fixup(struct vma_iterator *vmi, struct mmu_gather *tlb,
 	 * First try to merge with previous and/or next vma.
 	 */
 	pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
-	*pprev = vmi_vma_merge(vmi, mm, *pprev, start, end, newflags,
+	*pprev = vma_merge(vmi, mm, *pprev, start, end, newflags,
 			   vma->anon_vma, vma->vm_file, pgoff, vma_policy(vma),
 			   vma->vm_userfaultfd_ctx, anon_vma_name(vma));
 	if (*pprev) {
@@ -617,13 +617,13 @@ mprotect_fixup(struct vma_iterator *vmi, struct mmu_gather *tlb,
 	*pprev = vma;
 
 	if (start != vma->vm_start) {
-		error = vmi_split_vma(vmi, mm, vma, start, 1);
+		error = split_vma(vmi, vma, start, 1);
 		if (error)
 			goto fail;
 	}
 
 	if (end != vma->vm_end) {
-		error = vmi_split_vma(vmi, mm, vma, end, 0);
+		error = split_vma(vmi, vma, end, 0);
 		if (error)
 			goto fail;
 	}
diff --git a/mm/mremap.c b/mm/mremap.c
index 4364daaf0e83..00845aec5441 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -1034,7 +1034,7 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
 			 * with the next vma if it becomes adjacent to the expanded vma and
 			 * otherwise compatible.
 			 */
-			vma = vmi_vma_merge(&vmi, mm, vma, extension_start,
+			vma = vma_merge(&vmi, mm, vma, extension_start,
 				extension_end, vma->vm_flags, vma->anon_vma,
 				vma->vm_file, extension_pgoff, vma_policy(vma),
 				vma->vm_userfaultfd_ctx, anon_vma_name(vma));
-- 
2.35.1

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 24/44] mm/mremap: Use vmi version of vma_merge()
  2023-01-05 19:15 [PATCH v2 00/44] VMA tree type safety and remove __vma_adjust() Liam Howlett
                   ` (23 preceding siblings ...)
  2023-01-05 19:15 ` [PATCH v2 25/44] mm: Switch vma_merge(), split_vma(), and __split_vma to vma iterator Liam Howlett
@ 2023-01-05 19:15 ` Liam Howlett
  2023-01-05 19:16 ` [PATCH v2 26/44] mm/damon: Stop using vma_mas_store() for maple tree store Liam Howlett
                   ` (19 subsequent siblings)
  44 siblings, 0 replies; 63+ messages in thread
From: Liam Howlett @ 2023-01-05 19:15 UTC (permalink / raw)
  To: maple-tree, linux-mm, linux-kernel, Andrew Morton
  Cc: Liam Howlett, Liam Howlett

From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>

Use the vma iterator so that the iterator can be invalidated or updated
to avoid each caller doing so.

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
---
 mm/mremap.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/mm/mremap.c b/mm/mremap.c
index 94d2590f0871..4364daaf0e83 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -1018,6 +1018,7 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
 			unsigned long extension_end = addr + new_len;
 			pgoff_t extension_pgoff = vma->vm_pgoff +
 				((extension_start - vma->vm_start) >> PAGE_SHIFT);
+			VMA_ITERATOR(vmi, mm, extension_start);
 
 			if (vma->vm_flags & VM_ACCOUNT) {
 				if (security_vm_enough_memory_mm(mm, pages)) {
@@ -1033,10 +1034,10 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
 			 * with the next vma if it becomes adjacent to the expanded vma and
 			 * otherwise compatible.
 			 */
-			vma = vma_merge(mm, vma, extension_start, extension_end,
-					vma->vm_flags, vma->anon_vma, vma->vm_file,
-					extension_pgoff, vma_policy(vma),
-					vma->vm_userfaultfd_ctx, anon_vma_name(vma));
+			vma = vmi_vma_merge(&vmi, mm, vma, extension_start,
+				extension_end, vma->vm_flags, vma->anon_vma,
+				vma->vm_file, extension_pgoff, vma_policy(vma),
+				vma->vm_userfaultfd_ctx, anon_vma_name(vma));
 			if (!vma) {
 				vm_unacct_memory(pages);
 				ret = -ENOMEM;
-- 
2.35.1

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 26/44] mm/damon: Stop using vma_mas_store() for maple tree store
  2023-01-05 19:15 [PATCH v2 00/44] VMA tree type safety and remove __vma_adjust() Liam Howlett
                   ` (24 preceding siblings ...)
  2023-01-05 19:15 ` [PATCH v2 24/44] mm/mremap: Use vmi version of vma_merge() Liam Howlett
@ 2023-01-05 19:16 ` Liam Howlett
  2023-01-05 19:32   ` SeongJae Park
  2023-01-05 19:16 ` [PATCH v2 27/44] mmap: Convert __vma_adjust() to use vma iterator Liam Howlett
                   ` (18 subsequent siblings)
  44 siblings, 1 reply; 63+ messages in thread
From: Liam Howlett @ 2023-01-05 19:16 UTC (permalink / raw)
  To: maple-tree, linux-mm, linux-kernel, Andrew Morton
  Cc: Liam Howlett, SeongJae Park, damon, kernel test robot

From: "Liam R. Howlett" <Liam.Howlett@oracle.com>

Prepare for the removal of the vma_mas_store() function by open coding
the maple tree store in this test code.  Set the range of the maple
state and call the store function directly.

Cc: SeongJae Park <sj@kernel.org>
Cc: damon@lists.linux.dev
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
---
 mm/damon/vaddr-test.h | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/mm/damon/vaddr-test.h b/mm/damon/vaddr-test.h
index bce37c487540..41532f7355d0 100644
--- a/mm/damon/vaddr-test.h
+++ b/mm/damon/vaddr-test.h
@@ -24,8 +24,10 @@ static void __link_vmas(struct maple_tree *mt, struct vm_area_struct *vmas,
 		return;
 
 	mas_lock(&mas);
-	for (i = 0; i < nr_vmas; i++)
-		vma_mas_store(&vmas[i], &mas);
+	for (i = 0; i < nr_vmas; i++) {
+		mas_set_range(&mas, vmas[i].vm_start, vmas[i].vm_end - 1);
+		mas_store_gfp(&mas, &vmas[i], GFP_KERNEL);
+	}
 	mas_unlock(&mas);
 }
 
-- 
2.35.1

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 27/44] mmap: Convert __vma_adjust() to use vma iterator
  2023-01-05 19:15 [PATCH v2 00/44] VMA tree type safety and remove __vma_adjust() Liam Howlett
                   ` (25 preceding siblings ...)
  2023-01-05 19:16 ` [PATCH v2 26/44] mm/damon: Stop using vma_mas_store() for maple tree store Liam Howlett
@ 2023-01-05 19:16 ` Liam Howlett
  2023-01-05 19:16 ` [PATCH v2 28/44] mm: Pass through vma iterator to __vma_adjust() Liam Howlett
                   ` (17 subsequent siblings)
  44 siblings, 0 replies; 63+ messages in thread
From: Liam Howlett @ 2023-01-05 19:16 UTC (permalink / raw)
  To: maple-tree, linux-mm, linux-kernel, Andrew Morton
  Cc: Liam Howlett, Liam Howlett

From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>

Use the vma iterator internally for __vma_adjust().  Avoid using the
maple tree interface directly for type safety.

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
---
 include/linux/mm.h |  3 --
 mm/mmap.c          | 75 ++++++++--------------------------------------
 2 files changed, 13 insertions(+), 65 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 71474615b4ab..28973a3941a4 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2847,9 +2847,6 @@ extern struct vm_area_struct *copy_vma(struct vm_area_struct **,
 	bool *need_rmap_locks);
 extern void exit_mmap(struct mm_struct *);
 
-void vma_mas_store(struct vm_area_struct *vma, struct ma_state *mas);
-void vma_mas_remove(struct vm_area_struct *vma, struct ma_state *mas);
-
 static inline int check_data_rlimit(unsigned long rlim,
 				    unsigned long new,
 				    unsigned long start,
diff --git a/mm/mmap.c b/mm/mmap.c
index 8e7f4fc36960..a898ae2a57d5 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -494,56 +494,6 @@ static void __vma_link_file(struct vm_area_struct *vma,
 	flush_dcache_mmap_unlock(mapping);
 }
 
-/*
- * vma_mas_store() - Store a VMA in the maple tree.
- * @vma: The vm_area_struct
- * @mas: The maple state
- *
- * Efficient way to store a VMA in the maple tree when the @mas has already
- * walked to the correct location.
- *
- * Note: the end address is inclusive in the maple tree.
- */
-void vma_mas_store(struct vm_area_struct *vma, struct ma_state *mas)
-{
-	trace_vma_store(mas->tree, vma);
-	mas_set_range(mas, vma->vm_start, vma->vm_end - 1);
-	mas_store_prealloc(mas, vma);
-}
-
-/*
- * vma_mas_remove() - Remove a VMA from the maple tree.
- * @vma: The vm_area_struct
- * @mas: The maple state
- *
- * Efficient way to remove a VMA from the maple tree when the @mas has already
- * been established and points to the correct location.
- * Note: the end address is inclusive in the maple tree.
- */
-void vma_mas_remove(struct vm_area_struct *vma, struct ma_state *mas)
-{
-	trace_vma_mas_szero(mas->tree, vma->vm_start, vma->vm_end - 1);
-	mas->index = vma->vm_start;
-	mas->last = vma->vm_end - 1;
-	mas_store_prealloc(mas, NULL);
-}
-
-/*
- * vma_mas_szero() - Set a given range to zero.  Used when modifying a
- * vm_area_struct start or end.
- *
- * @mas: The maple tree ma_state
- * @start: The start address to zero
- * @end: The end address to zero.
- */
-static inline void vma_mas_szero(struct ma_state *mas, unsigned long start,
-				unsigned long end)
-{
-	trace_vma_mas_szero(mas->tree, start, end - 1);
-	mas_set_range(mas, start, end - 1);
-	mas_store_prealloc(mas, NULL);
-}
-
 static int vma_link(struct mm_struct *mm, struct vm_area_struct *vma)
 {
 	VMA_ITERATOR(vmi, mm, 0);
@@ -703,7 +653,7 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start,
 	bool vma_changed = false;
 	long adjust_next = 0;
 	int remove_next = 0;
-	MA_STATE(mas, &mm->mm_mt, 0, 0);
+	VMA_ITERATOR(vmi, mm, 0);
 	struct vm_area_struct *exporter = NULL, *importer = NULL;
 
 	if (next && !insert) {
@@ -788,7 +738,7 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start,
 		}
 	}
 
-	if (mas_preallocate(&mas, vma, GFP_KERNEL))
+	if (vma_iter_prealloc(&vmi, vma))
 		return -ENOMEM;
 
 	vma_adjust_trans_huge(orig_vma, start, end, adjust_next);
@@ -834,7 +784,7 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start,
 	if (start != vma->vm_start) {
 		if ((vma->vm_start < start) &&
 		    (!insert || (insert->vm_end != start))) {
-			vma_mas_szero(&mas, vma->vm_start, start);
+			vma_iter_clear(&vmi, vma->vm_start, start);
 			VM_WARN_ON(insert && insert->vm_start > vma->vm_start);
 		} else {
 			vma_changed = true;
@@ -844,8 +794,8 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start,
 	if (end != vma->vm_end) {
 		if (vma->vm_end > end) {
 			if (!insert || (insert->vm_start != end)) {
-				vma_mas_szero(&mas, end, vma->vm_end);
-				mas_reset(&mas);
+				vma_iter_clear(&vmi, end, vma->vm_end);
+				vma_iter_set(&vmi, vma->vm_end);
 				VM_WARN_ON(insert &&
 					   insert->vm_end < vma->vm_end);
 			}
@@ -856,13 +806,13 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start,
 	}
 
 	if (vma_changed)
-		vma_mas_store(vma, &mas);
+		vma_iter_store(&vmi, vma);
 
 	vma->vm_pgoff = pgoff;
 	if (adjust_next) {
 		next->vm_start += adjust_next;
 		next->vm_pgoff += adjust_next >> PAGE_SHIFT;
-		vma_mas_store(next, &mas);
+		vma_iter_store(&vmi, next);
 	}
 
 	if (file) {
@@ -882,8 +832,7 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start,
 		 * us to insert it before dropping the locks
 		 * (it may either follow vma or precede it).
 		 */
-		mas_reset(&mas);
-		vma_mas_store(insert, &mas);
+		vma_iter_store(&vmi, insert);
 		mm->map_count++;
 	}
 
@@ -929,7 +878,7 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start,
 	if (insert && file)
 		uprobe_mmap(insert);
 
-	mas_destroy(&mas);
+	vma_iter_free(&vmi);
 	validate_mm(mm);
 
 	return 0;
@@ -2057,7 +2006,8 @@ int expand_upwards(struct vm_area_struct *vma, unsigned long address)
 				anon_vma_interval_tree_pre_update_vma(vma);
 				vma->vm_end = address;
 				/* Overwrite old entry in mtree. */
-				vma_mas_store(vma, &mas);
+				mas_set_range(&mas, vma->vm_start, address - 1);
+				mas_store_prealloc(&mas, vma);
 				anon_vma_interval_tree_post_update_vma(vma);
 				spin_unlock(&mm->page_table_lock);
 
@@ -2139,7 +2089,8 @@ int expand_downwards(struct vm_area_struct *vma, unsigned long address)
 				vma->vm_start = address;
 				vma->vm_pgoff -= grow;
 				/* Overwrite old entry in mtree. */
-				vma_mas_store(vma, &mas);
+				mas_set_range(&mas, address, vma->vm_end - 1);
+				mas_store_prealloc(&mas, vma);
 				anon_vma_interval_tree_post_update_vma(vma);
 				spin_unlock(&mm->page_table_lock);
 
-- 
2.35.1

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 28/44] mm: Pass through vma iterator to __vma_adjust()
  2023-01-05 19:15 [PATCH v2 00/44] VMA tree type safety and remove __vma_adjust() Liam Howlett
                   ` (26 preceding siblings ...)
  2023-01-05 19:16 ` [PATCH v2 27/44] mmap: Convert __vma_adjust() to use vma iterator Liam Howlett
@ 2023-01-05 19:16 ` Liam Howlett
  2023-01-05 19:16 ` [PATCH v2 29/44] madvise: Use split_vma() instead of __split_vma() Liam Howlett
                   ` (16 subsequent siblings)
  44 siblings, 0 replies; 63+ messages in thread
From: Liam Howlett @ 2023-01-05 19:16 UTC (permalink / raw)
  To: maple-tree, linux-mm, linux-kernel, Andrew Morton
  Cc: Liam Howlett, Liam Howlett

From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>

Pass the vma iterator through to __vma_adjust() so the state can be
updated.

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
---
 include/linux/mm.h |  6 ++++--
 mm/mmap.c          | 31 +++++++++++++++----------------
 2 files changed, 19 insertions(+), 18 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 28973a3941a4..294894969cd9 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2822,13 +2822,15 @@ void anon_vma_interval_tree_verify(struct anon_vma_chain *node);
 
 /* mmap.c */
 extern int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin);
-extern int __vma_adjust(struct vm_area_struct *vma, unsigned long start,
+extern int __vma_adjust(struct vma_iterator *vmi, struct vm_area_struct *vma, unsigned long start,
 	unsigned long end, pgoff_t pgoff, struct vm_area_struct *insert,
 	struct vm_area_struct *expand);
 static inline int vma_adjust(struct vm_area_struct *vma, unsigned long start,
 	unsigned long end, pgoff_t pgoff, struct vm_area_struct *insert)
 {
-	return __vma_adjust(vma, start, end, pgoff, insert, NULL);
+	VMA_ITERATOR(vmi, vma->vm_mm, start);
+
+	return __vma_adjust(&vmi, vma, start, end, pgoff, insert, NULL);
 }
 extern struct vm_area_struct *vma_merge(struct vma_iterator *vmi,
 	struct mm_struct *, struct vm_area_struct *prev, unsigned long addr,
diff --git a/mm/mmap.c b/mm/mmap.c
index a898ae2a57d5..a4e564163334 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -638,9 +638,9 @@ inline int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma,
  * are necessary.  The "insert" vma (if any) is to be inserted
  * before we drop the necessary locks.
  */
-int __vma_adjust(struct vm_area_struct *vma, unsigned long start,
-	unsigned long end, pgoff_t pgoff, struct vm_area_struct *insert,
-	struct vm_area_struct *expand)
+int __vma_adjust(struct vma_iterator *vmi, struct vm_area_struct *vma,
+	unsigned long start, unsigned long end, pgoff_t pgoff,
+	struct vm_area_struct *insert, struct vm_area_struct *expand)
 {
 	struct mm_struct *mm = vma->vm_mm;
 	struct vm_area_struct *next_next = NULL;	/* uninit var warning */
@@ -653,7 +653,6 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start,
 	bool vma_changed = false;
 	long adjust_next = 0;
 	int remove_next = 0;
-	VMA_ITERATOR(vmi, mm, 0);
 	struct vm_area_struct *exporter = NULL, *importer = NULL;
 
 	if (next && !insert) {
@@ -738,7 +737,7 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start,
 		}
 	}
 
-	if (vma_iter_prealloc(&vmi, vma))
+	if (vma_iter_prealloc(vmi, vma))
 		return -ENOMEM;
 
 	vma_adjust_trans_huge(orig_vma, start, end, adjust_next);
@@ -784,7 +783,7 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start,
 	if (start != vma->vm_start) {
 		if ((vma->vm_start < start) &&
 		    (!insert || (insert->vm_end != start))) {
-			vma_iter_clear(&vmi, vma->vm_start, start);
+			vma_iter_clear(vmi, vma->vm_start, start);
 			VM_WARN_ON(insert && insert->vm_start > vma->vm_start);
 		} else {
 			vma_changed = true;
@@ -794,8 +793,8 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start,
 	if (end != vma->vm_end) {
 		if (vma->vm_end > end) {
 			if (!insert || (insert->vm_start != end)) {
-				vma_iter_clear(&vmi, end, vma->vm_end);
-				vma_iter_set(&vmi, vma->vm_end);
+				vma_iter_clear(vmi, end, vma->vm_end);
+				vma_iter_set(vmi, vma->vm_end);
 				VM_WARN_ON(insert &&
 					   insert->vm_end < vma->vm_end);
 			}
@@ -806,13 +805,13 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start,
 	}
 
 	if (vma_changed)
-		vma_iter_store(&vmi, vma);
+		vma_iter_store(vmi, vma);
 
 	vma->vm_pgoff = pgoff;
 	if (adjust_next) {
 		next->vm_start += adjust_next;
 		next->vm_pgoff += adjust_next >> PAGE_SHIFT;
-		vma_iter_store(&vmi, next);
+		vma_iter_store(vmi, next);
 	}
 
 	if (file) {
@@ -832,7 +831,7 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start,
 		 * us to insert it before dropping the locks
 		 * (it may either follow vma or precede it).
 		 */
-		vma_iter_store(&vmi, insert);
+		vma_iter_store(vmi, insert);
 		mm->map_count++;
 	}
 
@@ -878,7 +877,7 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start,
 	if (insert && file)
 		uprobe_mmap(insert);
 
-	vma_iter_free(&vmi);
+	vma_iter_free(vmi);
 	validate_mm(mm);
 
 	return 0;
@@ -1072,20 +1071,20 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
 	if (merge_prev && merge_next &&
 			is_mergeable_anon_vma(prev->anon_vma,
 				next->anon_vma, NULL)) {	 /* cases 1, 6 */
-		err = __vma_adjust(prev, prev->vm_start,
+		err = __vma_adjust(vmi, prev, prev->vm_start,
 					next->vm_end, prev->vm_pgoff, NULL,
 					prev);
 		res = prev;
 	} else if (merge_prev) {			/* cases 2, 5, 7 */
-		err = __vma_adjust(prev, prev->vm_start,
+		err = __vma_adjust(vmi, prev, prev->vm_start,
 					end, prev->vm_pgoff, NULL, prev);
 		res = prev;
 	} else if (merge_next) {
 		if (prev && addr < prev->vm_end)	/* case 4 */
-			err = __vma_adjust(prev, prev->vm_start,
+			err = __vma_adjust(vmi, prev, prev->vm_start,
 					addr, prev->vm_pgoff, NULL, next);
 		else					/* cases 3, 8 */
-			err = __vma_adjust(mid, addr, next->vm_end,
+			err = __vma_adjust(vmi, mid, addr, next->vm_end,
 					next->vm_pgoff - pglen, NULL, next);
 		res = next;
 	}
-- 
2.35.1

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 30/44] mm: Remove unnecessary write to vma iterator in __vma_adjust()
  2023-01-05 19:15 [PATCH v2 00/44] VMA tree type safety and remove __vma_adjust() Liam Howlett
                   ` (28 preceding siblings ...)
  2023-01-05 19:16 ` [PATCH v2 29/44] madvise: Use split_vma() instead of __split_vma() Liam Howlett
@ 2023-01-05 19:16 ` Liam Howlett
  2023-01-05 19:16 ` [PATCH v2 31/44] mm: Pass vma iterator through to __vma_adjust() Liam Howlett
                   ` (14 subsequent siblings)
  44 siblings, 0 replies; 63+ messages in thread
From: Liam Howlett @ 2023-01-05 19:16 UTC (permalink / raw)
  To: maple-tree, linux-mm, linux-kernel, Andrew Morton
  Cc: Liam Howlett, Liam Howlett

From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>

If the vma start address is going to change due to an insert, then it is
safe to not write the vma to the tree.  The write of the insert vma will
alter the tree as necessary.

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
---
 mm/mmap.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index a4e564163334..174cbf25251f 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -781,10 +781,12 @@ int __vma_adjust(struct vma_iterator *vmi, struct vm_area_struct *vma,
 	}
 
 	if (start != vma->vm_start) {
-		if ((vma->vm_start < start) &&
-		    (!insert || (insert->vm_end != start))) {
-			vma_iter_clear(vmi, vma->vm_start, start);
-			VM_WARN_ON(insert && insert->vm_start > vma->vm_start);
+		if (vma->vm_start < start) {
+			if (!insert || (insert->vm_end != start)) {
+				vma_iter_clear(vmi, vma->vm_start, start);
+				vma_iter_set(vmi, start);
+				VM_WARN_ON(insert && insert->vm_start > vma->vm_start);
+			}
 		} else {
 			vma_changed = true;
 		}
-- 
2.35.1

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 29/44] madvise: Use split_vma() instead of __split_vma()
  2023-01-05 19:15 [PATCH v2 00/44] VMA tree type safety and remove __vma_adjust() Liam Howlett
                   ` (27 preceding siblings ...)
  2023-01-05 19:16 ` [PATCH v2 28/44] mm: Pass through vma iterator to __vma_adjust() Liam Howlett
@ 2023-01-05 19:16 ` Liam Howlett
  2023-01-05 19:16 ` [PATCH v2 30/44] mm: Remove unnecessary write to vma iterator in __vma_adjust() Liam Howlett
                   ` (15 subsequent siblings)
  44 siblings, 0 replies; 63+ messages in thread
From: Liam Howlett @ 2023-01-05 19:16 UTC (permalink / raw)
  To: maple-tree, linux-mm, linux-kernel, Andrew Morton
  Cc: Liam Howlett, Liam Howlett

From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>

The split_vma() wrapper is specifically for this use case, so use it.

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
---
 mm/madvise.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index 4115516f58dd..86f9ad95f0fa 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -161,17 +161,13 @@ static int madvise_update_vma(struct vm_area_struct *vma,
 	*prev = vma;
 
 	if (start != vma->vm_start) {
-		if (unlikely(mm->map_count >= sysctl_max_map_count))
-			return -ENOMEM;
-		error = __split_vma(&vmi, vma, start, 1);
+		error = split_vma(&vmi, vma, start, 1);
 		if (error)
 			return error;
 	}
 
 	if (end != vma->vm_end) {
-		if (unlikely(mm->map_count >= sysctl_max_map_count))
-			return -ENOMEM;
-		error = __split_vma(&vmi, vma, end, 0);
+		error = split_vma(&vmi, vma, end, 0);
 		if (error)
 			return error;
 	}
-- 
2.35.1

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 31/44] mm: Pass vma iterator through to __vma_adjust()
  2023-01-05 19:15 [PATCH v2 00/44] VMA tree type safety and remove __vma_adjust() Liam Howlett
                   ` (29 preceding siblings ...)
  2023-01-05 19:16 ` [PATCH v2 30/44] mm: Remove unnecessary write to vma iterator in __vma_adjust() Liam Howlett
@ 2023-01-05 19:16 ` Liam Howlett
  2023-01-05 19:16 ` [PATCH v2 32/44] mm: Add vma iterator to vma_adjust() arguments Liam Howlett
                   ` (13 subsequent siblings)
  44 siblings, 0 replies; 63+ messages in thread
From: Liam Howlett @ 2023-01-05 19:16 UTC (permalink / raw)
  To: maple-tree, linux-mm, linux-kernel, Andrew Morton
  Cc: Liam Howlett, Liam Howlett

From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>

Pass the iterator through to be used in __vma_adjust().  The state of
the iterator needs to be correct for the operation that will occur so
make the adjustments.

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
---
 mm/mmap.c | 22 ++++++++++++++--------
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index 174cbf25251f..c10ab873b8e4 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -587,6 +587,10 @@ inline int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma,
 		vma_interval_tree_remove(vma, root);
 	}
 
+	/* VMA iterator points to previous, so set to start if necessary */
+	if (vma_iter_addr(vmi) != start)
+		vma_iter_set(vmi, start);
+
 	vma->vm_start = start;
 	vma->vm_end = end;
 	vma->vm_pgoff = pgoff;
@@ -2222,13 +2226,13 @@ static void unmap_region(struct mm_struct *mm, struct maple_tree *mt,
 /*
  * __split_vma() bypasses sysctl_max_map_count checking.  We use this where it
  * has already been checked or doesn't make sense to fail.
+ * VMA Iterator will point to the end VMA.
  */
 int __split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma,
 		unsigned long addr, int new_below)
 {
 	struct vm_area_struct *new;
 	int err;
-	unsigned long end = vma->vm_end;
 
 	validate_mm_mt(vma->vm_mm);
 
@@ -2264,14 +2268,17 @@ int __split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma,
 		new->vm_ops->open(new);
 
 	if (new_below)
-		err = vma_adjust(vma, addr, vma->vm_end, vma->vm_pgoff +
-			((addr - new->vm_start) >> PAGE_SHIFT), new);
+		err = __vma_adjust(vmi, vma, addr, vma->vm_end,
+		   vma->vm_pgoff + ((addr - new->vm_start) >> PAGE_SHIFT),
+		   new, NULL);
 	else
-		err = vma_adjust(vma, vma->vm_start, addr, vma->vm_pgoff, new);
+		err = __vma_adjust(vmi, vma, vma->vm_start, addr, vma->vm_pgoff,
+				 new, NULL);
 
 	/* Success. */
 	if (!err) {
-		vma_iter_set(vmi, end);
+		if (new_below)
+			vma_next(vmi);
 		return 0;
 	}
 
@@ -2366,8 +2373,7 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
 		if (error)
 			goto start_split_failed;
 
-		vma_iter_set(vmi, start);
-		vma = vma_find(vmi, end);
+		vma = vma_iter_load(vmi);
 	}
 
 	prev = vma_prev(vmi);
@@ -2387,7 +2393,6 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
 			if (error)
 				goto end_split_failed;
 
-			vma_iter_set(vmi, end);
 			split = vma_prev(vmi);
 			error = munmap_sidetree(split, &mas_detach);
 			if (error)
@@ -2631,6 +2636,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
 		goto unacct_error;
 	}
 
+	vma_iter_set(&vmi, addr);
 	vma->vm_start = addr;
 	vma->vm_end = end;
 	vma->vm_flags = vm_flags;
-- 
2.35.1

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 32/44] mm: Add vma iterator to vma_adjust() arguments
  2023-01-05 19:15 [PATCH v2 00/44] VMA tree type safety and remove __vma_adjust() Liam Howlett
                   ` (30 preceding siblings ...)
  2023-01-05 19:16 ` [PATCH v2 31/44] mm: Pass vma iterator through to __vma_adjust() Liam Howlett
@ 2023-01-05 19:16 ` Liam Howlett
  2023-01-05 19:16 ` [PATCH v2 34/44] mm: Change munmap splitting order and move_vma() Liam Howlett
                   ` (12 subsequent siblings)
  44 siblings, 0 replies; 63+ messages in thread
From: Liam Howlett @ 2023-01-05 19:16 UTC (permalink / raw)
  To: maple-tree, linux-mm, linux-kernel, Andrew Morton
  Cc: Liam Howlett, Liam Howlett

From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>

Change the vma_adjust() function definition to accept the vma iterator
and pass it through to __vma_adjust().

Update fs/exec to use the new vma_adjust() function parameters.

Revert the __split_vma() calls back from __vma_adjust() to vma_adjust()
and pass through the vma iterator.

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
---
 fs/exec.c          | 11 ++++-------
 include/linux/mm.h |  9 ++++-----
 mm/mmap.c          | 10 +++++-----
 3 files changed, 13 insertions(+), 17 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index b98647eeae9f..76ee62e1d3f1 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -699,7 +699,7 @@ static int shift_arg_pages(struct vm_area_struct *vma, unsigned long shift)
 	/*
 	 * cover the whole range: [new_start, old_end)
 	 */
-	if (vma_adjust(vma, new_start, old_end, vma->vm_pgoff, NULL))
+	if (vma_adjust(&vmi, vma, new_start, old_end, vma->vm_pgoff, NULL))
 		return -ENOMEM;
 
 	/*
@@ -731,12 +731,9 @@ static int shift_arg_pages(struct vm_area_struct *vma, unsigned long shift)
 	}
 	tlb_finish_mmu(&tlb);
 
-	/*
-	 * Shrink the vma to just the new range.  Always succeeds.
-	 */
-	vma_adjust(vma, new_start, new_end, vma->vm_pgoff, NULL);
-
-	return 0;
+	vma_prev(&vmi);
+	/* Shrink the vma to just the new range */
+	return vma_adjust(&vmi, vma, new_start, new_end, vma->vm_pgoff, NULL);
 }
 
 /*
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 294894969cd9..aabfd4183091 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2825,12 +2825,11 @@ extern int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admi
 extern int __vma_adjust(struct vma_iterator *vmi, struct vm_area_struct *vma, unsigned long start,
 	unsigned long end, pgoff_t pgoff, struct vm_area_struct *insert,
 	struct vm_area_struct *expand);
-static inline int vma_adjust(struct vm_area_struct *vma, unsigned long start,
-	unsigned long end, pgoff_t pgoff, struct vm_area_struct *insert)
+static inline int vma_adjust(struct vma_iterator *vmi,
+	struct vm_area_struct *vma, unsigned long start, unsigned long end,
+	pgoff_t pgoff, struct vm_area_struct *insert)
 {
-	VMA_ITERATOR(vmi, vma->vm_mm, start);
-
-	return __vma_adjust(&vmi, vma, start, end, pgoff, insert, NULL);
+	return __vma_adjust(vmi, vma, start, end, pgoff, insert, NULL);
 }
 extern struct vm_area_struct *vma_merge(struct vma_iterator *vmi,
 	struct mm_struct *, struct vm_area_struct *prev, unsigned long addr,
diff --git a/mm/mmap.c b/mm/mmap.c
index c10ab873b8e4..d7530abdd7c0 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2268,12 +2268,12 @@ int __split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma,
 		new->vm_ops->open(new);
 
 	if (new_below)
-		err = __vma_adjust(vmi, vma, addr, vma->vm_end,
-		   vma->vm_pgoff + ((addr - new->vm_start) >> PAGE_SHIFT),
-		   new, NULL);
+		err = vma_adjust(vmi, vma, addr, vma->vm_end,
+			vma->vm_pgoff + ((addr - new->vm_start) >> PAGE_SHIFT),
+			new);
 	else
-		err = __vma_adjust(vmi, vma, vma->vm_start, addr, vma->vm_pgoff,
-				 new, NULL);
+		err = vma_adjust(vmi, vma, vma->vm_start, addr, vma->vm_pgoff,
+				 new);
 
 	/* Success. */
 	if (!err) {
-- 
2.35.1

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 34/44] mm: Change munmap splitting order and move_vma()
  2023-01-05 19:15 [PATCH v2 00/44] VMA tree type safety and remove __vma_adjust() Liam Howlett
                   ` (31 preceding siblings ...)
  2023-01-05 19:16 ` [PATCH v2 32/44] mm: Add vma iterator to vma_adjust() arguments Liam Howlett
@ 2023-01-05 19:16 ` Liam Howlett
  2023-01-05 19:16 ` [PATCH v2 33/44] mmap: Clean up mmap_region() unrolling Liam Howlett
                   ` (11 subsequent siblings)
  44 siblings, 0 replies; 63+ messages in thread
From: Liam Howlett @ 2023-01-05 19:16 UTC (permalink / raw)
  To: maple-tree, linux-mm, linux-kernel, Andrew Morton
  Cc: Liam Howlett, Liam Howlett

From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>

Splitting can be more efficient when the order is not of concern.
Change do_vmi_align_munmap() to reduce walking of the tree during split
operations.

move_vma() must also be altered to remove the dependency of keeping the
original VMA as the active part of the split.  Transition to using vma
iterator to look up the prev and/or next vma after munmap.

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
---
 mm/mmap.c   | 18 ++----------------
 mm/mremap.c | 27 ++++++++++++++++-----------
 2 files changed, 18 insertions(+), 27 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index 99c94d49640b..c1796f9261e4 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2387,21 +2387,9 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
 	for_each_vma_range(*vmi, next, end) {
 		/* Does it split the end? */
 		if (next->vm_end > end) {
-			struct vm_area_struct *split;
-
-			error = __split_vma(vmi, next, end, 1);
+			error = __split_vma(vmi, next, end, 0);
 			if (error)
 				goto end_split_failed;
-
-			split = vma_prev(vmi);
-			error = munmap_sidetree(split, &mas_detach);
-			if (error)
-				goto munmap_sidetree_failed;
-
-			count++;
-			if (vma == next)
-				vma = split;
-			break;
 		}
 		error = munmap_sidetree(next, &mas_detach);
 		if (error)
@@ -2414,9 +2402,7 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
 #endif
 	}
 
-	if (!next)
-		next = vma_next(vmi);
-
+	next = vma_next(vmi);
 	if (unlikely(uf)) {
 		/*
 		 * If userfaultfd_unmap_prep returns an error the vmas
diff --git a/mm/mremap.c b/mm/mremap.c
index 00845aec5441..98f27d466265 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -580,11 +580,12 @@ static unsigned long move_vma(struct vm_area_struct *vma,
 	unsigned long vm_flags = vma->vm_flags;
 	unsigned long new_pgoff;
 	unsigned long moved_len;
-	unsigned long excess = 0;
+	unsigned long account_start = 0;
+	unsigned long account_end = 0;
 	unsigned long hiwater_vm;
-	int split = 0;
 	int err = 0;
 	bool need_rmap_locks;
+	VMA_ITERATOR(vmi, mm, old_addr);
 
 	/*
 	 * We'd prefer to avoid failure later on in do_munmap:
@@ -662,10 +663,10 @@ static unsigned long move_vma(struct vm_area_struct *vma,
 	/* Conceal VM_ACCOUNT so old reservation is not undone */
 	if (vm_flags & VM_ACCOUNT && !(flags & MREMAP_DONTUNMAP)) {
 		vma->vm_flags &= ~VM_ACCOUNT;
-		excess = vma->vm_end - vma->vm_start - old_len;
-		if (old_addr > vma->vm_start &&
-		    old_addr + old_len < vma->vm_end)
-			split = 1;
+		if (vma->vm_start < old_addr)
+			account_start = vma->vm_start;
+		if (vma->vm_end > old_addr + old_len)
+			account_end = vma->vm_end;
 	}
 
 	/*
@@ -700,11 +701,11 @@ static unsigned long move_vma(struct vm_area_struct *vma,
 		return new_addr;
 	}
 
-	if (do_munmap(mm, old_addr, old_len, uf_unmap) < 0) {
+	if (do_vmi_munmap(&vmi, mm, old_addr, old_len, uf_unmap, false) < 0) {
 		/* OOM: unable to split vma, just get accounts right */
 		if (vm_flags & VM_ACCOUNT && !(flags & MREMAP_DONTUNMAP))
 			vm_acct_memory(old_len >> PAGE_SHIFT);
-		excess = 0;
+		account_start = account_end = 0;
 	}
 
 	if (vm_flags & VM_LOCKED) {
@@ -715,10 +716,14 @@ static unsigned long move_vma(struct vm_area_struct *vma,
 	mm->hiwater_vm = hiwater_vm;
 
 	/* Restore VM_ACCOUNT if one or two pieces of vma left */
-	if (excess) {
+	if (account_start) {
+		vma = vma_prev(&vmi);
+		vma->vm_flags |= VM_ACCOUNT;
+	}
+
+	if (account_end) {
+		vma = vma_next(&vmi);
 		vma->vm_flags |= VM_ACCOUNT;
-		if (split)
-			find_vma(mm, vma->vm_end)->vm_flags |= VM_ACCOUNT;
 	}
 
 	return new_addr;
-- 
2.35.1

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 33/44] mmap: Clean up mmap_region() unrolling
  2023-01-05 19:15 [PATCH v2 00/44] VMA tree type safety and remove __vma_adjust() Liam Howlett
                   ` (32 preceding siblings ...)
  2023-01-05 19:16 ` [PATCH v2 34/44] mm: Change munmap splitting order and move_vma() Liam Howlett
@ 2023-01-05 19:16 ` Liam Howlett
  2023-01-05 19:16 ` [PATCH v2 35/44] mm/mmap: move anon_vma setting in __vma_adjust() Liam Howlett
                   ` (10 subsequent siblings)
  44 siblings, 0 replies; 63+ messages in thread
From: Liam Howlett @ 2023-01-05 19:16 UTC (permalink / raw)
  To: maple-tree, linux-mm, linux-kernel, Andrew Morton
  Cc: Liam Howlett, Li Zetao, Liam Howlett

From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>

Move logic of unrolling to the error path as apposed to duplicating it
within the function body.  This reduces the potential of missing an
update to one path when making changes.

Cc: Li Zetao <lizetao1@huawei.com>
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
---
 mm/mmap.c | 45 ++++++++++++++++++---------------------------
 1 file changed, 18 insertions(+), 27 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index d7530abdd7c0..99c94d49640b 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2659,12 +2659,11 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
 		 * Expansion is handled above, merging is handled below.
 		 * Drivers should not alter the address of the VMA.
 		 */
-		if (WARN_ON((addr != vma->vm_start))) {
-			error = -EINVAL;
+		error = -EINVAL;
+		if (WARN_ON((addr != vma->vm_start)))
 			goto close_and_free_vma;
-		}
-		vma_iter_set(&vmi, addr);
 
+		vma_iter_set(&vmi, addr);
 		/*
 		 * If vm_flags changed after call_mmap(), we should try merge
 		 * vma again as we may succeed this time.
@@ -2701,25 +2700,13 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
 	}
 
 	/* Allow architectures to sanity-check the vm_flags */
-	if (!arch_validate_flags(vma->vm_flags)) {
-		error = -EINVAL;
-		if (file)
-			goto close_and_free_vma;
-		else if (vma->vm_file)
-			goto unmap_and_free_vma;
-		else
-			goto free_vma;
-	}
+	error = -EINVAL;
+	if (!arch_validate_flags(vma->vm_flags))
+		goto close_and_free_vma;
 
-	if (vma_iter_prealloc(&vmi, vma)) {
-		error = -ENOMEM;
-		if (file)
-			goto close_and_free_vma;
-		else if (vma->vm_file)
-			goto unmap_and_free_vma;
-		else
-			goto free_vma;
-	}
+	error = -ENOMEM;
+	if (vma_iter_prealloc(&vmi, vma))
+		goto close_and_free_vma;
 
 	if (vma->vm_file)
 		i_mmap_lock_write(vma->vm_file->f_mapping);
@@ -2778,14 +2765,18 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
 	return addr;
 
 close_and_free_vma:
-	if (vma->vm_ops && vma->vm_ops->close)
+	if (file && vma->vm_ops && vma->vm_ops->close)
 		vma->vm_ops->close(vma);
+
+	if (file || vma->vm_file) {
 unmap_and_free_vma:
-	fput(vma->vm_file);
-	vma->vm_file = NULL;
+		fput(vma->vm_file);
+		vma->vm_file = NULL;
 
-	/* Undo any partial mapping done by a device driver. */
-	unmap_region(mm, &mm->mm_mt, vma, prev, next, vma->vm_start, vma->vm_end);
+		/* Undo any partial mapping done by a device driver. */
+		unmap_region(mm, &mm->mm_mt, vma, prev, next, vma->vm_start,
+			     vma->vm_end);
+	}
 	if (file && (vm_flags & VM_SHARED))
 		mapping_unmap_writable(file->f_mapping);
 free_vma:
-- 
2.35.1

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 35/44] mm/mmap: move anon_vma setting in __vma_adjust()
  2023-01-05 19:15 [PATCH v2 00/44] VMA tree type safety and remove __vma_adjust() Liam Howlett
                   ` (33 preceding siblings ...)
  2023-01-05 19:16 ` [PATCH v2 33/44] mmap: Clean up mmap_region() unrolling Liam Howlett
@ 2023-01-05 19:16 ` Liam Howlett
  2023-01-05 19:16 ` [PATCH v2 36/44] mm/mmap: Refactor locking out of __vma_adjust() Liam Howlett
                   ` (9 subsequent siblings)
  44 siblings, 0 replies; 63+ messages in thread
From: Liam Howlett @ 2023-01-05 19:16 UTC (permalink / raw)
  To: maple-tree, linux-mm, linux-kernel, Andrew Morton
  Cc: Liam Howlett, Liam Howlett

From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>

Move the anon_vma setting & warn_no up the function.  This is done to
clear up the locking later.

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
---
 mm/mmap.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index c1796f9261e4..c15a04bf3518 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -744,6 +744,14 @@ int __vma_adjust(struct vma_iterator *vmi, struct vm_area_struct *vma,
 	if (vma_iter_prealloc(vmi, vma))
 		return -ENOMEM;
 
+	anon_vma = vma->anon_vma;
+	if (!anon_vma && adjust_next)
+		anon_vma = next->anon_vma;
+
+	if (anon_vma)
+		VM_WARN_ON(adjust_next && next->anon_vma &&
+			   anon_vma != next->anon_vma);
+
 	vma_adjust_trans_huge(orig_vma, start, end, adjust_next);
 	if (file) {
 		mapping = file->f_mapping;
@@ -765,12 +773,7 @@ int __vma_adjust(struct vma_iterator *vmi, struct vm_area_struct *vma,
 		}
 	}
 
-	anon_vma = vma->anon_vma;
-	if (!anon_vma && adjust_next)
-		anon_vma = next->anon_vma;
 	if (anon_vma) {
-		VM_WARN_ON(adjust_next && next->anon_vma &&
-			   anon_vma != next->anon_vma);
 		anon_vma_lock_write(anon_vma);
 		anon_vma_interval_tree_pre_update_vma(vma);
 		if (adjust_next)
-- 
2.35.1

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 37/44] mm/mmap: Use vma_prepare() and vma_complete() in vma_expand()
  2023-01-05 19:15 [PATCH v2 00/44] VMA tree type safety and remove __vma_adjust() Liam Howlett
                   ` (35 preceding siblings ...)
  2023-01-05 19:16 ` [PATCH v2 36/44] mm/mmap: Refactor locking out of __vma_adjust() Liam Howlett
@ 2023-01-05 19:16 ` Liam Howlett
  2023-01-05 19:16 ` [PATCH v2 39/44] mm: Don't use __vma_adjust() in __split_vma() Liam Howlett
                   ` (7 subsequent siblings)
  44 siblings, 0 replies; 63+ messages in thread
From: Liam Howlett @ 2023-01-05 19:16 UTC (permalink / raw)
  To: maple-tree, linux-mm, linux-kernel, Andrew Morton
  Cc: Liam Howlett, Liam Howlett

From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>

Use the new locking functions for vma_expand().  This reduces code
duplication.

At the same time change VM_BUG_ON() to VM_WARN_ON()

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
---
 mm/mmap.c | 189 +++++++++++++++++++++---------------------------------
 1 file changed, 73 insertions(+), 116 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index 3cf08aaee17d..9546d5811ca9 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -519,122 +519,6 @@ static int vma_link(struct mm_struct *mm, struct vm_area_struct *vma)
 	return 0;
 }
 
-/*
- * vma_expand - Expand an existing VMA
- *
- * @mas: The maple state
- * @vma: The vma to expand
- * @start: The start of the vma
- * @end: The exclusive end of the vma
- * @pgoff: The page offset of vma
- * @next: The current of next vma.
- *
- * Expand @vma to @start and @end.  Can expand off the start and end.  Will
- * expand over @next if it's different from @vma and @end == @next->vm_end.
- * Checking if the @vma can expand and merge with @next needs to be handled by
- * the caller.
- *
- * Returns: 0 on success
- */
-inline int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma,
-		      unsigned long start, unsigned long end, pgoff_t pgoff,
-		      struct vm_area_struct *next)
-{
-	struct mm_struct *mm = vma->vm_mm;
-	struct address_space *mapping = NULL;
-	struct rb_root_cached *root = NULL;
-	struct anon_vma *anon_vma = vma->anon_vma;
-	struct file *file = vma->vm_file;
-	bool remove_next = false;
-
-	if (next && (vma != next) && (end == next->vm_end)) {
-		remove_next = true;
-		if (next->anon_vma && !vma->anon_vma) {
-			int error;
-
-			anon_vma = next->anon_vma;
-			vma->anon_vma = anon_vma;
-			error = anon_vma_clone(vma, next);
-			if (error)
-				return error;
-		}
-	}
-
-	/* Not merging but overwriting any part of next is not handled. */
-	VM_BUG_ON(next && !remove_next && next != vma && end > next->vm_start);
-	/* Only handles expanding */
-	VM_BUG_ON(vma->vm_start < start || vma->vm_end > end);
-
-	if (vma_iter_prealloc(vmi, vma))
-		goto nomem;
-
-	vma_adjust_trans_huge(vma, start, end, 0);
-
-	if (file) {
-		mapping = file->f_mapping;
-		root = &mapping->i_mmap;
-		uprobe_munmap(vma, vma->vm_start, vma->vm_end);
-		i_mmap_lock_write(mapping);
-	}
-
-	if (anon_vma) {
-		anon_vma_lock_write(anon_vma);
-		anon_vma_interval_tree_pre_update_vma(vma);
-	}
-
-	if (file) {
-		flush_dcache_mmap_lock(mapping);
-		vma_interval_tree_remove(vma, root);
-	}
-
-	/* VMA iterator points to previous, so set to start if necessary */
-	if (vma_iter_addr(vmi) != start)
-		vma_iter_set(vmi, start);
-
-	vma->vm_start = start;
-	vma->vm_end = end;
-	vma->vm_pgoff = pgoff;
-	vma_iter_store(vmi, vma);
-
-	if (file) {
-		vma_interval_tree_insert(vma, root);
-		flush_dcache_mmap_unlock(mapping);
-	}
-
-	/* Expanding over the next vma */
-	if (remove_next && file) {
-		__remove_shared_vm_struct(next, file, mapping);
-	}
-
-	if (anon_vma) {
-		anon_vma_interval_tree_post_update_vma(vma);
-		anon_vma_unlock_write(anon_vma);
-	}
-
-	if (file) {
-		i_mmap_unlock_write(mapping);
-		uprobe_mmap(vma);
-	}
-
-	if (remove_next) {
-		if (file) {
-			uprobe_munmap(next, next->vm_start, next->vm_end);
-			fput(file);
-		}
-		if (next->anon_vma)
-			anon_vma_merge(vma, next);
-		mm->map_count--;
-		mpol_put(vma_policy(next));
-		vm_area_free(next);
-	}
-
-	validate_mm(mm);
-	return 0;
-
-nomem:
-	return -ENOMEM;
-}
-
 /*
  * vma_prepare() - Helper function for handling locking VMAs prior to altering
  * @vp: The initialized vma_prepare struct
@@ -756,6 +640,79 @@ static inline void vma_complete(struct vma_prepare *vp,
 		uprobe_mmap(vp->insert);
 }
 
+/*
+ * vma_expand - Expand an existing VMA
+ *
+ * @vmi: The vma iterator
+ * @vma: The vma to expand
+ * @start: The start of the vma
+ * @end: The exclusive end of the vma
+ * @pgoff: The page offset of vma
+ * @next: The current of next vma.
+ *
+ * Expand @vma to @start and @end.  Can expand off the start and end.  Will
+ * expand over @next if it's different from @vma and @end == @next->vm_end.
+ * Checking if the @vma can expand and merge with @next needs to be handled by
+ * the caller.
+ *
+ * Returns: 0 on success
+ */
+inline int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma,
+		      unsigned long start, unsigned long end, pgoff_t pgoff,
+		      struct vm_area_struct *next)
+
+{
+	struct vma_prepare vp;
+
+	memset(&vp, 0, sizeof(vp));
+	vp.vma = vma;
+	vp.anon_vma = vma->anon_vma;
+	if (next && (vma != next) && (end == next->vm_end)) {
+		vp.remove = next;
+		if (next->anon_vma && !vma->anon_vma) {
+			int error;
+
+			vp.anon_vma = next->anon_vma;
+			vma->anon_vma = next->anon_vma;
+			error = anon_vma_clone(vma, next);
+			if (error)
+				return error;
+		}
+	}
+
+	/* Not merging but overwriting any part of next is not handled. */
+	VM_WARN_ON(next && !vp.remove &&
+		  next != vma && end > next->vm_start);
+	/* Only handles expanding */
+	VM_WARN_ON(vma->vm_start < start || vma->vm_end > end);
+
+	if (vma_iter_prealloc(vmi, vma))
+		goto nomem;
+
+	vma_adjust_trans_huge(vma, start, end, 0);
+
+	vp.file = vma->vm_file;
+	if (vp.file)
+		vp.mapping = vp.file->f_mapping;
+
+	/* VMA iterator points to previous, so set to start if necessary */
+	if (vma_iter_addr(vmi) != start)
+		vma_iter_set(vmi, start);
+
+	vma_prepare(&vp);
+	vma->vm_start = start;
+	vma->vm_end = end;
+	vma->vm_pgoff = pgoff;
+	/* Note: mas must be pointing to the expanding VMA */
+	vma_iter_store(vmi, vma);
+
+	vma_complete(&vp, vmi, vma->vm_mm);
+	validate_mm(vma->vm_mm);
+	return 0;
+
+nomem:
+	return -ENOMEM;
+}
 /*
  * We cannot adjust vm_start, vm_end, vm_pgoff fields of a vma that
  * is already present in an i_mmap tree without adjusting the tree.
-- 
2.35.1

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 36/44] mm/mmap: Refactor locking out of __vma_adjust()
  2023-01-05 19:15 [PATCH v2 00/44] VMA tree type safety and remove __vma_adjust() Liam Howlett
                   ` (34 preceding siblings ...)
  2023-01-05 19:16 ` [PATCH v2 35/44] mm/mmap: move anon_vma setting in __vma_adjust() Liam Howlett
@ 2023-01-05 19:16 ` Liam Howlett
  2023-01-05 19:16 ` [PATCH v2 37/44] mm/mmap: Use vma_prepare() and vma_complete() in vma_expand() Liam Howlett
                   ` (8 subsequent siblings)
  44 siblings, 0 replies; 63+ messages in thread
From: Liam Howlett @ 2023-01-05 19:16 UTC (permalink / raw)
  To: maple-tree, linux-mm, linux-kernel, Andrew Morton
  Cc: Liam Howlett, Liam Howlett

From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>

Move the locking into vma_prepare() and vma_complete() for use elsewhere

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
---
 mm/internal.h |  13 +++
 mm/mmap.c     | 231 +++++++++++++++++++++++++++++---------------------
 2 files changed, 149 insertions(+), 95 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index bcf75a8b032d..0951e6181284 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -848,4 +848,17 @@ static inline bool vma_soft_dirty_enabled(struct vm_area_struct *vma)
 	return !(vma->vm_flags & VM_SOFTDIRTY);
 }
 
+/*
+ * VMA lock generalization
+ */
+struct vma_prepare {
+	struct vm_area_struct *vma;
+	struct vm_area_struct *adj_next;
+	struct file *file;
+	struct address_space *mapping;
+	struct anon_vma *anon_vma;
+	struct vm_area_struct *insert;
+	struct vm_area_struct *remove;
+	struct vm_area_struct *remove2;
+};
 #endif	/* __MM_INTERNAL_H */
diff --git a/mm/mmap.c b/mm/mmap.c
index c15a04bf3518..3cf08aaee17d 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -635,6 +635,127 @@ inline int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma,
 	return -ENOMEM;
 }
 
+/*
+ * vma_prepare() - Helper function for handling locking VMAs prior to altering
+ * @vp: The initialized vma_prepare struct
+ */
+static inline void vma_prepare(struct vma_prepare *vp)
+{
+	if (vp->file) {
+		uprobe_munmap(vp->vma, vp->vma->vm_start, vp->vma->vm_end);
+
+		if (vp->adj_next)
+			uprobe_munmap(vp->adj_next, vp->adj_next->vm_start,
+				      vp->adj_next->vm_end);
+
+		i_mmap_lock_write(vp->mapping);
+		if (vp->insert && vp->insert->vm_file) {
+			/*
+			 * Put into interval tree now, so instantiated pages
+			 * are visible to arm/parisc __flush_dcache_page
+			 * throughout; but we cannot insert into address
+			 * space until vma start or end is updated.
+			 */
+			__vma_link_file(vp->insert,
+					vp->insert->vm_file->f_mapping);
+		}
+	}
+
+	if (vp->anon_vma) {
+		anon_vma_lock_write(vp->anon_vma);
+		anon_vma_interval_tree_pre_update_vma(vp->vma);
+		if (vp->adj_next)
+			anon_vma_interval_tree_pre_update_vma(vp->adj_next);
+	}
+
+	if (vp->file) {
+		flush_dcache_mmap_lock(vp->mapping);
+		vma_interval_tree_remove(vp->vma, &vp->mapping->i_mmap);
+		if (vp->adj_next)
+			vma_interval_tree_remove(vp->adj_next,
+						 &vp->mapping->i_mmap);
+	}
+
+}
+
+/*
+ * vma_complete- Helper function for handling the unlocking after altering VMAs,
+ * or for inserting a VMA.
+ *
+ * @vp: The vma_prepare struct
+ * @vmi: The vma iterator
+ * @mm: The mm_struct
+ */
+static inline void vma_complete(struct vma_prepare *vp,
+				struct vma_iterator *vmi, struct mm_struct *mm)
+{
+	if (vp->file) {
+		if (vp->adj_next)
+			vma_interval_tree_insert(vp->adj_next,
+						 &vp->mapping->i_mmap);
+		vma_interval_tree_insert(vp->vma, &vp->mapping->i_mmap);
+		flush_dcache_mmap_unlock(vp->mapping);
+	}
+
+	if (vp->remove && vp->file) {
+		__remove_shared_vm_struct(vp->remove, vp->file, vp->mapping);
+		if (vp->remove2)
+			__remove_shared_vm_struct(vp->remove2, vp->file,
+						  vp->mapping);
+	} else if (vp->insert) {
+		/*
+		 * split_vma has split insert from vma, and needs
+		 * us to insert it before dropping the locks
+		 * (it may either follow vma or precede it).
+		 */
+		vma_iter_store(vmi, vp->insert);
+		mm->map_count++;
+	}
+
+	if (vp->anon_vma) {
+		anon_vma_interval_tree_post_update_vma(vp->vma);
+		if (vp->adj_next)
+			anon_vma_interval_tree_post_update_vma(vp->adj_next);
+		anon_vma_unlock_write(vp->anon_vma);
+	}
+
+	if (vp->file) {
+		i_mmap_unlock_write(vp->mapping);
+		uprobe_mmap(vp->vma);
+
+		if (vp->adj_next)
+			uprobe_mmap(vp->adj_next);
+	}
+
+	if (vp->remove) {
+again:
+		if (vp->file) {
+			uprobe_munmap(vp->remove, vp->remove->vm_start,
+				      vp->remove->vm_end);
+			fput(vp->file);
+		}
+		if (vp->remove->anon_vma)
+			anon_vma_merge(vp->vma, vp->remove);
+		mm->map_count--;
+		mpol_put(vma_policy(vp->remove));
+		if (!vp->remove2)
+			WARN_ON_ONCE(vp->vma->vm_end < vp->remove->vm_end);
+		vm_area_free(vp->remove);
+
+		/*
+		 * In mprotect's case 6 (see comments on vma_merge),
+		 * we must remove next_next too.
+		 */
+		if (vp->remove2) {
+			vp->remove = vp->remove2;
+			vp->remove2 = NULL;
+			goto again;
+		}
+	}
+	if (vp->insert && vp->file)
+		uprobe_mmap(vp->insert);
+}
+
 /*
  * We cannot adjust vm_start, vm_end, vm_pgoff fields of a vma that
  * is already present in an i_mmap tree without adjusting the tree.
@@ -650,14 +771,13 @@ int __vma_adjust(struct vma_iterator *vmi, struct vm_area_struct *vma,
 	struct vm_area_struct *next_next = NULL;	/* uninit var warning */
 	struct vm_area_struct *next = find_vma(mm, vma->vm_end);
 	struct vm_area_struct *orig_vma = vma;
-	struct address_space *mapping = NULL;
-	struct rb_root_cached *root = NULL;
 	struct anon_vma *anon_vma = NULL;
 	struct file *file = vma->vm_file;
 	bool vma_changed = false;
 	long adjust_next = 0;
 	int remove_next = 0;
 	struct vm_area_struct *exporter = NULL, *importer = NULL;
+	struct vma_prepare vma_prep;
 
 	if (next && !insert) {
 		if (end >= next->vm_end) {
@@ -753,39 +873,22 @@ int __vma_adjust(struct vma_iterator *vmi, struct vm_area_struct *vma,
 			   anon_vma != next->anon_vma);
 
 	vma_adjust_trans_huge(orig_vma, start, end, adjust_next);
-	if (file) {
-		mapping = file->f_mapping;
-		root = &mapping->i_mmap;
-		uprobe_munmap(vma, vma->vm_start, vma->vm_end);
-
-		if (adjust_next)
-			uprobe_munmap(next, next->vm_start, next->vm_end);
-
-		i_mmap_lock_write(mapping);
-		if (insert && insert->vm_file) {
-			/*
-			 * Put into interval tree now, so instantiated pages
-			 * are visible to arm/parisc __flush_dcache_page
-			 * throughout; but we cannot insert into address
-			 * space until vma start or end is updated.
-			 */
-			__vma_link_file(insert, insert->vm_file->f_mapping);
-		}
-	}
 
-	if (anon_vma) {
-		anon_vma_lock_write(anon_vma);
-		anon_vma_interval_tree_pre_update_vma(vma);
-		if (adjust_next)
-			anon_vma_interval_tree_pre_update_vma(next);
+	memset(&vma_prep, 0, sizeof(vma_prep));
+	vma_prep.vma = vma;
+	vma_prep.anon_vma = anon_vma;
+	vma_prep.file = file;
+	if (adjust_next)
+		vma_prep.adj_next = next;
+	if (file)
+		vma_prep.mapping = file->f_mapping;
+	vma_prep.insert = insert;
+	if (remove_next) {
+		vma_prep.remove = next;
+		vma_prep.remove2 = next_next;
 	}
 
-	if (file) {
-		flush_dcache_mmap_lock(mapping);
-		vma_interval_tree_remove(vma, root);
-		if (adjust_next)
-			vma_interval_tree_remove(next, root);
-	}
+	vma_prepare(&vma_prep);
 
 	if (start != vma->vm_start) {
 		if (vma->vm_start < start) {
@@ -823,69 +926,7 @@ int __vma_adjust(struct vma_iterator *vmi, struct vm_area_struct *vma,
 		vma_iter_store(vmi, next);
 	}
 
-	if (file) {
-		if (adjust_next)
-			vma_interval_tree_insert(next, root);
-		vma_interval_tree_insert(vma, root);
-		flush_dcache_mmap_unlock(mapping);
-	}
-
-	if (remove_next && file) {
-		__remove_shared_vm_struct(next, file, mapping);
-		if (remove_next == 2)
-			__remove_shared_vm_struct(next_next, file, mapping);
-	} else if (insert) {
-		/*
-		 * split_vma has split insert from vma, and needs
-		 * us to insert it before dropping the locks
-		 * (it may either follow vma or precede it).
-		 */
-		vma_iter_store(vmi, insert);
-		mm->map_count++;
-	}
-
-	if (anon_vma) {
-		anon_vma_interval_tree_post_update_vma(vma);
-		if (adjust_next)
-			anon_vma_interval_tree_post_update_vma(next);
-		anon_vma_unlock_write(anon_vma);
-	}
-
-	if (file) {
-		i_mmap_unlock_write(mapping);
-		uprobe_mmap(vma);
-
-		if (adjust_next)
-			uprobe_mmap(next);
-	}
-
-	if (remove_next) {
-again:
-		if (file) {
-			uprobe_munmap(next, next->vm_start, next->vm_end);
-			fput(file);
-		}
-		if (next->anon_vma)
-			anon_vma_merge(vma, next);
-		mm->map_count--;
-		mpol_put(vma_policy(next));
-		if (remove_next != 2)
-			BUG_ON(vma->vm_end < next->vm_end);
-		vm_area_free(next);
-
-		/*
-		 * In mprotect's case 6 (see comments on vma_merge),
-		 * we must remove next_next too.
-		 */
-		if (remove_next == 2) {
-			remove_next = 1;
-			next = next_next;
-			goto again;
-		}
-	}
-	if (insert && file)
-		uprobe_mmap(insert);
-
+	vma_complete(&vma_prep, vmi, mm);
 	vma_iter_free(vmi);
 	validate_mm(mm);
 
-- 
2.35.1

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 38/44] mm/mmap: Introduce init_vma_prep() and init_multi_vma_prep()
  2023-01-05 19:15 [PATCH v2 00/44] VMA tree type safety and remove __vma_adjust() Liam Howlett
                   ` (37 preceding siblings ...)
  2023-01-05 19:16 ` [PATCH v2 39/44] mm: Don't use __vma_adjust() in __split_vma() Liam Howlett
@ 2023-01-05 19:16 ` Liam Howlett
  2023-01-05 19:16 ` [PATCH v2 40/44] mm/mmap: Don't use __vma_adjust() in shift_arg_pages() Liam Howlett
                   ` (5 subsequent siblings)
  44 siblings, 0 replies; 63+ messages in thread
From: Liam Howlett @ 2023-01-05 19:16 UTC (permalink / raw)
  To: maple-tree, linux-mm, linux-kernel, Andrew Morton
  Cc: Liam Howlett, Liam Howlett

From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>

Add init_vma_prep() and init_multi_vma_prep() to set up the struct
vma_prepare.  This is to abstract the locking when adjusting the VMAs.

Also change __vma_adjust() variable remove_next int in favour of a
pointer to the VMA to remove.  Rename next_next to remove2 since this
better reflects its use.

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
---
 mm/mmap.c | 108 ++++++++++++++++++++++++++++++------------------------
 1 file changed, 61 insertions(+), 47 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index 9546d5811ca9..431c5ee9ce00 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -519,6 +519,45 @@ static int vma_link(struct mm_struct *mm, struct vm_area_struct *vma)
 	return 0;
 }
 
+/*
+ * init_multi_vma_prep() - Initializer for struct vma_prepare
+ * @vp: The vma_prepare struct
+ * @vma: The vma that will be altered once locked
+ * @next: The next vma if it is to be adjusted
+ * @remove: The first vma to be removed
+ * @remove2: The second vma to be removed
+ */
+static inline void init_multi_vma_prep(struct vma_prepare *vp,
+		struct vm_area_struct *vma, struct vm_area_struct *next,
+		struct vm_area_struct *remove, struct vm_area_struct *remove2)
+{
+	memset(vp, 0, sizeof(struct vma_prepare));
+	vp->vma = vma;
+	vp->anon_vma = vma->anon_vma;
+	vp->remove = remove;
+	vp->remove2 = remove2;
+	vp->adj_next = next;
+	if (!vp->anon_vma && next)
+		vp->anon_vma = next->anon_vma;
+
+	vp->file = vma->vm_file;
+	if (vp->file)
+		vp->mapping = vma->vm_file->f_mapping;
+
+}
+
+/*
+ * init_vma_prep() - Initializer wrapper for vma_prepare struct
+ * @vp: The vma_prepare struct
+ * @vma: The vma that will be altered once locked
+ */
+static inline void init_vma_prep(struct vma_prepare *vp,
+				 struct vm_area_struct *vma)
+{
+	init_multi_vma_prep(vp, vma, NULL, NULL, NULL);
+}
+
+
 /*
  * vma_prepare() - Helper function for handling locking VMAs prior to altering
  * @vp: The initialized vma_prepare struct
@@ -628,7 +667,7 @@ static inline void vma_complete(struct vma_prepare *vp,
 
 		/*
 		 * In mprotect's case 6 (see comments on vma_merge),
-		 * we must remove next_next too.
+		 * we must remove the one after next as well.
 		 */
 		if (vp->remove2) {
 			vp->remove = vp->remove2;
@@ -662,17 +701,14 @@ inline int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma,
 		      struct vm_area_struct *next)
 
 {
+	bool remove_next = false;
 	struct vma_prepare vp;
 
-	memset(&vp, 0, sizeof(vp));
-	vp.vma = vma;
-	vp.anon_vma = vma->anon_vma;
 	if (next && (vma != next) && (end == next->vm_end)) {
-		vp.remove = next;
+		remove_next = true;
 		if (next->anon_vma && !vma->anon_vma) {
 			int error;
 
-			vp.anon_vma = next->anon_vma;
 			vma->anon_vma = next->anon_vma;
 			error = anon_vma_clone(vma, next);
 			if (error)
@@ -680,6 +716,7 @@ inline int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma,
 		}
 	}
 
+	init_multi_vma_prep(&vp, vma, NULL, remove_next ? next : NULL, NULL);
 	/* Not merging but overwriting any part of next is not handled. */
 	VM_WARN_ON(next && !vp.remove &&
 		  next != vma && end > next->vm_start);
@@ -690,11 +727,6 @@ inline int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma,
 		goto nomem;
 
 	vma_adjust_trans_huge(vma, start, end, 0);
-
-	vp.file = vma->vm_file;
-	if (vp.file)
-		vp.mapping = vp.file->f_mapping;
-
 	/* VMA iterator points to previous, so set to start if necessary */
 	if (vma_iter_addr(vmi) != start)
 		vma_iter_set(vmi, start);
@@ -725,14 +757,13 @@ int __vma_adjust(struct vma_iterator *vmi, struct vm_area_struct *vma,
 	struct vm_area_struct *insert, struct vm_area_struct *expand)
 {
 	struct mm_struct *mm = vma->vm_mm;
-	struct vm_area_struct *next_next = NULL;	/* uninit var warning */
+	struct vm_area_struct *remove2 = NULL;
+	struct vm_area_struct *remove = NULL;
 	struct vm_area_struct *next = find_vma(mm, vma->vm_end);
 	struct vm_area_struct *orig_vma = vma;
-	struct anon_vma *anon_vma = NULL;
 	struct file *file = vma->vm_file;
 	bool vma_changed = false;
 	long adjust_next = 0;
-	int remove_next = 0;
 	struct vm_area_struct *exporter = NULL, *importer = NULL;
 	struct vma_prepare vma_prep;
 
@@ -751,25 +782,24 @@ int __vma_adjust(struct vma_iterator *vmi, struct vm_area_struct *vma,
 				 */
 				VM_WARN_ON(end != next->vm_end);
 				/*
-				 * remove_next == 3 means we're
-				 * removing "vma" and that to do so we
+				 * we're removing "vma" and that to do so we
 				 * swapped "vma" and "next".
 				 */
-				remove_next = 3;
 				VM_WARN_ON(file != next->vm_file);
 				swap(vma, next);
+				remove = next;
 			} else {
 				VM_WARN_ON(expand != vma);
 				/*
-				 * case 1, 6, 7, remove_next == 2 is case 6,
-				 * remove_next == 1 is case 1 or 7.
+				 * case 1, 6, 7, remove next.
+				 * case 6 also removes the one beyond next
 				 */
-				remove_next = 1 + (end > next->vm_end);
-				if (remove_next == 2)
-					next_next = find_vma(mm, next->vm_end);
+				remove = next;
+				if (end > next->vm_end)
+					remove2 = find_vma(mm, next->vm_end);
 
-				VM_WARN_ON(remove_next == 2 &&
-					   end != next_next->vm_end);
+				VM_WARN_ON(remove2 != NULL &&
+					   end != remove2->vm_end);
 			}
 
 			exporter = next;
@@ -779,8 +809,8 @@ int __vma_adjust(struct vma_iterator *vmi, struct vm_area_struct *vma,
 			 * If next doesn't have anon_vma, import from vma after
 			 * next, if the vma overlaps with it.
 			 */
-			if (remove_next == 2 && !next->anon_vma)
-				exporter = next_next;
+			if (remove2 != NULL && !next->anon_vma)
+				exporter = remove2;
 
 		} else if (end > next->vm_start) {
 			/*
@@ -821,30 +851,14 @@ int __vma_adjust(struct vma_iterator *vmi, struct vm_area_struct *vma,
 	if (vma_iter_prealloc(vmi, vma))
 		return -ENOMEM;
 
-	anon_vma = vma->anon_vma;
-	if (!anon_vma && adjust_next)
-		anon_vma = next->anon_vma;
-
-	if (anon_vma)
-		VM_WARN_ON(adjust_next && next->anon_vma &&
-			   anon_vma != next->anon_vma);
-
 	vma_adjust_trans_huge(orig_vma, start, end, adjust_next);
 
-	memset(&vma_prep, 0, sizeof(vma_prep));
-	vma_prep.vma = vma;
-	vma_prep.anon_vma = anon_vma;
-	vma_prep.file = file;
-	if (adjust_next)
-		vma_prep.adj_next = next;
-	if (file)
-		vma_prep.mapping = file->f_mapping;
-	vma_prep.insert = insert;
-	if (remove_next) {
-		vma_prep.remove = next;
-		vma_prep.remove2 = next_next;
-	}
+	init_multi_vma_prep(&vma_prep, vma, adjust_next ? next : NULL, remove,
+			    remove2);
+	VM_WARN_ON(vma_prep.anon_vma && adjust_next && next->anon_vma &&
+		   vma_prep.anon_vma != next->anon_vma);
 
+	vma_prep.insert = insert;
 	vma_prepare(&vma_prep);
 
 	if (start != vma->vm_start) {
-- 
2.35.1

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 40/44] mm/mmap: Don't use __vma_adjust() in shift_arg_pages()
  2023-01-05 19:15 [PATCH v2 00/44] VMA tree type safety and remove __vma_adjust() Liam Howlett
                   ` (38 preceding siblings ...)
  2023-01-05 19:16 ` [PATCH v2 38/44] mm/mmap: Introduce init_vma_prep() and init_multi_vma_prep() Liam Howlett
@ 2023-01-05 19:16 ` Liam Howlett
  2023-01-05 19:16 ` [PATCH v2 44/44] vma_merge: Set vma iterator to correct position Liam Howlett
                   ` (4 subsequent siblings)
  44 siblings, 0 replies; 63+ messages in thread
From: Liam Howlett @ 2023-01-05 19:16 UTC (permalink / raw)
  To: maple-tree, linux-mm, linux-kernel, Andrew Morton
  Cc: Liam Howlett, Liam Howlett

From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>

Introduce shrink_vma() which uses the vma_prepare() and vma_complete()
functions to reduce the vma coverage.

Convert shift_arg_pages() to use expand_vma() and the new shrink_vma()
function.  Remove support from __vma_adjust() to reduce a vma size since
shift_arg_pages() is the only user that shrinks a VMA in this way.

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
---
 fs/exec.c          |  4 ++--
 include/linux/mm.h | 13 ++++------
 mm/mmap.c          | 59 ++++++++++++++++++++++++++++++++++++----------
 3 files changed, 53 insertions(+), 23 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index d52fca2dd30b..c0df813d2b45 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -699,7 +699,7 @@ static int shift_arg_pages(struct vm_area_struct *vma, unsigned long shift)
 	/*
 	 * cover the whole range: [new_start, old_end)
 	 */
-	if (vma_adjust(&vmi, vma, new_start, old_end, vma->vm_pgoff))
+	if (vma_expand(&vmi, vma, new_start, old_end, vma->vm_pgoff, NULL))
 		return -ENOMEM;
 
 	/*
@@ -733,7 +733,7 @@ static int shift_arg_pages(struct vm_area_struct *vma, unsigned long shift)
 
 	vma_prev(&vmi);
 	/* Shrink the vma to just the new range */
-	return vma_adjust(&vmi, vma, new_start, new_end, vma->vm_pgoff);
+	return vma_shrink(&vmi, vma, new_start, new_end, vma->vm_pgoff);
 }
 
 /*
diff --git a/include/linux/mm.h b/include/linux/mm.h
index a00871cc63cc..0b229ddf43a4 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2822,14 +2822,11 @@ void anon_vma_interval_tree_verify(struct anon_vma_chain *node);
 
 /* mmap.c */
 extern int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin);
-extern int __vma_adjust(struct vma_iterator *vmi, struct vm_area_struct *vma, unsigned long start,
-	unsigned long end, pgoff_t pgoff, struct vm_area_struct *expand);
-static inline int vma_adjust(struct vma_iterator *vmi,
-	struct vm_area_struct *vma, unsigned long start, unsigned long end,
-	pgoff_t pgoff)
-{
-	return __vma_adjust(vmi, vma, start, end, pgoff, NULL);
-}
+extern int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma,
+		      unsigned long start, unsigned long end, pgoff_t pgoff,
+		      struct vm_area_struct *next);
+extern int vma_shrink(struct vma_iterator *vmi, struct vm_area_struct *vma,
+		       unsigned long start, unsigned long end, pgoff_t pgoff);
 extern struct vm_area_struct *vma_merge(struct vma_iterator *vmi,
 	struct mm_struct *, struct vm_area_struct *prev, unsigned long addr,
 	unsigned long end, unsigned long vm_flags, struct anon_vma *,
diff --git a/mm/mmap.c b/mm/mmap.c
index 3bca62c11686..dad5c0113380 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -696,10 +696,9 @@ static inline void vma_complete(struct vma_prepare *vp,
  *
  * Returns: 0 on success
  */
-inline int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma,
-		      unsigned long start, unsigned long end, pgoff_t pgoff,
-		      struct vm_area_struct *next)
-
+int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma,
+	       unsigned long start, unsigned long end, pgoff_t pgoff,
+	       struct vm_area_struct *next)
 {
 	bool remove_next = false;
 	struct vma_prepare vp;
@@ -745,6 +744,44 @@ inline int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma,
 nomem:
 	return -ENOMEM;
 }
+
+/*
+ * vma_shrink() - Reduce an existing VMAs memory area
+ * @vmi: The vma iterator
+ * @vma: The VMA to modify
+ * @start: The new start
+ * @end: The new end
+ *
+ * Returns: 0 on success, -ENOMEM otherwise
+ */
+int vma_shrink(struct vma_iterator *vmi, struct vm_area_struct *vma,
+	       unsigned long start, unsigned long end, pgoff_t pgoff)
+{
+	struct vma_prepare vp;
+
+	WARN_ON((vma->vm_start != start) && (vma->vm_end != end));
+
+	if (vma_iter_prealloc(vmi, vma))
+		return -ENOMEM;
+
+	init_vma_prep(&vp, vma);
+	vma_adjust_trans_huge(vma, start, end, 0);
+	vma_prepare(&vp);
+
+	if (vma->vm_start < start)
+		vma_iter_clear(vmi, vma->vm_start, start);
+
+	if (vma->vm_end > end)
+		vma_iter_clear(vmi, end, vma->vm_end);
+
+	vma->vm_start = start;
+	vma->vm_end = end;
+	vma->vm_pgoff = pgoff;
+	vma_complete(&vp, vmi, vma->vm_mm);
+	validate_mm(vma->vm_mm);
+	return 0;
+}
+
 /*
  * We cannot adjust vm_start, vm_end, vm_pgoff fields of a vma that
  * is already present in an i_mmap tree without adjusting the tree.
@@ -860,14 +897,7 @@ int __vma_adjust(struct vma_iterator *vmi, struct vm_area_struct *vma,
 
 	vma_prepare(&vma_prep);
 
-	if (vma->vm_start < start)
-		vma_iter_clear(vmi, vma->vm_start, start);
-	else if (start != vma->vm_start)
-		vma_changed = true;
-
-	if (vma->vm_end > end)
-		vma_iter_clear(vmi, end, vma->vm_end);
-	else if (end != vma->vm_end)
+	if (start < vma->vm_start || end > vma->vm_end)
 		vma_changed = true;
 
 	vma->vm_start = start;
@@ -880,7 +910,10 @@ int __vma_adjust(struct vma_iterator *vmi, struct vm_area_struct *vma,
 	if (adjust_next) {
 		next->vm_start += adjust_next;
 		next->vm_pgoff += adjust_next >> PAGE_SHIFT;
-		vma_iter_store(vmi, next);
+		if (adjust_next < 0) {
+			WARN_ON_ONCE(vma_changed);
+			vma_iter_store(vmi, next);
+		}
 	}
 
 	vma_complete(&vma_prep, vmi, mm);
-- 
2.35.1

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 39/44] mm: Don't use __vma_adjust() in __split_vma()
  2023-01-05 19:15 [PATCH v2 00/44] VMA tree type safety and remove __vma_adjust() Liam Howlett
                   ` (36 preceding siblings ...)
  2023-01-05 19:16 ` [PATCH v2 37/44] mm/mmap: Use vma_prepare() and vma_complete() in vma_expand() Liam Howlett
@ 2023-01-05 19:16 ` Liam Howlett
  2023-01-05 19:16 ` [PATCH v2 38/44] mm/mmap: Introduce init_vma_prep() and init_multi_vma_prep() Liam Howlett
                   ` (6 subsequent siblings)
  44 siblings, 0 replies; 63+ messages in thread
From: Liam Howlett @ 2023-01-05 19:16 UTC (permalink / raw)
  To: maple-tree, linux-mm, linux-kernel, Andrew Morton
  Cc: Liam Howlett, Liam Howlett

From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>

Use the abstracted locking and maple tree operations.  Since
__split_vma() is the only user of the __vma_adjust() function to use the
insert argument, drop that argument.  Remove the NULL passed through
from fs/exec's shift_arg_pages() at the same time.

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
---
 fs/exec.c          |   4 +-
 include/linux/mm.h |   7 ++-
 mm/mmap.c          | 114 ++++++++++++++++++++-------------------------
 3 files changed, 56 insertions(+), 69 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 76ee62e1d3f1..d52fca2dd30b 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -699,7 +699,7 @@ static int shift_arg_pages(struct vm_area_struct *vma, unsigned long shift)
 	/*
 	 * cover the whole range: [new_start, old_end)
 	 */
-	if (vma_adjust(&vmi, vma, new_start, old_end, vma->vm_pgoff, NULL))
+	if (vma_adjust(&vmi, vma, new_start, old_end, vma->vm_pgoff))
 		return -ENOMEM;
 
 	/*
@@ -733,7 +733,7 @@ static int shift_arg_pages(struct vm_area_struct *vma, unsigned long shift)
 
 	vma_prev(&vmi);
 	/* Shrink the vma to just the new range */
-	return vma_adjust(&vmi, vma, new_start, new_end, vma->vm_pgoff, NULL);
+	return vma_adjust(&vmi, vma, new_start, new_end, vma->vm_pgoff);
 }
 
 /*
diff --git a/include/linux/mm.h b/include/linux/mm.h
index aabfd4183091..a00871cc63cc 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2823,13 +2823,12 @@ void anon_vma_interval_tree_verify(struct anon_vma_chain *node);
 /* mmap.c */
 extern int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin);
 extern int __vma_adjust(struct vma_iterator *vmi, struct vm_area_struct *vma, unsigned long start,
-	unsigned long end, pgoff_t pgoff, struct vm_area_struct *insert,
-	struct vm_area_struct *expand);
+	unsigned long end, pgoff_t pgoff, struct vm_area_struct *expand);
 static inline int vma_adjust(struct vma_iterator *vmi,
 	struct vm_area_struct *vma, unsigned long start, unsigned long end,
-	pgoff_t pgoff, struct vm_area_struct *insert)
+	pgoff_t pgoff)
 {
-	return __vma_adjust(vmi, vma, start, end, pgoff, insert, NULL);
+	return __vma_adjust(vmi, vma, start, end, pgoff, NULL);
 }
 extern struct vm_area_struct *vma_merge(struct vma_iterator *vmi,
 	struct mm_struct *, struct vm_area_struct *prev, unsigned long addr,
diff --git a/mm/mmap.c b/mm/mmap.c
index 431c5ee9ce00..3bca62c11686 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -754,7 +754,7 @@ inline int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma,
  */
 int __vma_adjust(struct vma_iterator *vmi, struct vm_area_struct *vma,
 	unsigned long start, unsigned long end, pgoff_t pgoff,
-	struct vm_area_struct *insert, struct vm_area_struct *expand)
+	struct vm_area_struct *expand)
 {
 	struct mm_struct *mm = vma->vm_mm;
 	struct vm_area_struct *remove2 = NULL;
@@ -767,7 +767,7 @@ int __vma_adjust(struct vma_iterator *vmi, struct vm_area_struct *vma,
 	struct vm_area_struct *exporter = NULL, *importer = NULL;
 	struct vma_prepare vma_prep;
 
-	if (next && !insert) {
+	if (next) {
 		if (end >= next->vm_end) {
 			/*
 			 * vma expands, overlapping all the next, and
@@ -858,39 +858,25 @@ int __vma_adjust(struct vma_iterator *vmi, struct vm_area_struct *vma,
 	VM_WARN_ON(vma_prep.anon_vma && adjust_next && next->anon_vma &&
 		   vma_prep.anon_vma != next->anon_vma);
 
-	vma_prep.insert = insert;
 	vma_prepare(&vma_prep);
 
-	if (start != vma->vm_start) {
-		if (vma->vm_start < start) {
-			if (!insert || (insert->vm_end != start)) {
-				vma_iter_clear(vmi, vma->vm_start, start);
-				vma_iter_set(vmi, start);
-				VM_WARN_ON(insert && insert->vm_start > vma->vm_start);
-			}
-		} else {
-			vma_changed = true;
-		}
-		vma->vm_start = start;
-	}
-	if (end != vma->vm_end) {
-		if (vma->vm_end > end) {
-			if (!insert || (insert->vm_start != end)) {
-				vma_iter_clear(vmi, end, vma->vm_end);
-				vma_iter_set(vmi, vma->vm_end);
-				VM_WARN_ON(insert &&
-					   insert->vm_end < vma->vm_end);
-			}
-		} else {
-			vma_changed = true;
-		}
-		vma->vm_end = end;
-	}
+	if (vma->vm_start < start)
+		vma_iter_clear(vmi, vma->vm_start, start);
+	else if (start != vma->vm_start)
+		vma_changed = true;
+
+	if (vma->vm_end > end)
+		vma_iter_clear(vmi, end, vma->vm_end);
+	else if (end != vma->vm_end)
+		vma_changed = true;
+
+	vma->vm_start = start;
+	vma->vm_end = end;
+	vma->vm_pgoff = pgoff;
 
 	if (vma_changed)
 		vma_iter_store(vmi, vma);
 
-	vma->vm_pgoff = pgoff;
 	if (adjust_next) {
 		next->vm_start += adjust_next;
 		next->vm_pgoff += adjust_next >> PAGE_SHIFT;
@@ -909,9 +895,9 @@ int __vma_adjust(struct vma_iterator *vmi, struct vm_area_struct *vma,
  * per-vma resources, so we don't attempt to merge those.
  */
 static inline int is_mergeable_vma(struct vm_area_struct *vma,
-				struct file *file, unsigned long vm_flags,
-				struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
-				struct anon_vma_name *anon_name)
+				   struct file *file, unsigned long vm_flags,
+				   struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
+				   struct anon_vma_name *anon_name)
 {
 	/*
 	 * VM_SOFTDIRTY should not prevent from VMA merging, if we
@@ -1093,20 +1079,19 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
 			is_mergeable_anon_vma(prev->anon_vma,
 				next->anon_vma, NULL)) {	 /* cases 1, 6 */
 		err = __vma_adjust(vmi, prev, prev->vm_start,
-					next->vm_end, prev->vm_pgoff, NULL,
-					prev);
+					next->vm_end, prev->vm_pgoff, prev);
 		res = prev;
 	} else if (merge_prev) {			/* cases 2, 5, 7 */
 		err = __vma_adjust(vmi, prev, prev->vm_start,
-					end, prev->vm_pgoff, NULL, prev);
+					end, prev->vm_pgoff, prev);
 		res = prev;
 	} else if (merge_next) {
 		if (prev && addr < prev->vm_end)	/* case 4 */
 			err = __vma_adjust(vmi, prev, prev->vm_start,
-					addr, prev->vm_pgoff, NULL, next);
+					addr, prev->vm_pgoff, next);
 		else					/* cases 3, 8 */
 			err = __vma_adjust(vmi, mid, addr, next->vm_end,
-					next->vm_pgoff - pglen, NULL, next);
+					next->vm_pgoff - pglen, next);
 		res = next;
 	}
 
@@ -2246,6 +2231,7 @@ static void unmap_region(struct mm_struct *mm, struct maple_tree *mt,
 int __split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma,
 		unsigned long addr, int new_below)
 {
+	struct vma_prepare vp;
 	struct vm_area_struct *new;
 	int err;
 
@@ -2261,16 +2247,20 @@ int __split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma,
 	if (!new)
 		return -ENOMEM;
 
-	if (new_below)
+	err = -ENOMEM;
+	if (vma_iter_prealloc(vmi, vma))
+		goto out_free_vma;
+
+	if (new_below) {
 		new->vm_end = addr;
-	else {
+	} else {
 		new->vm_start = addr;
 		new->vm_pgoff += ((addr - vma->vm_start) >> PAGE_SHIFT);
 	}
 
 	err = vma_dup_policy(vma, new);
 	if (err)
-		goto out_free_vma;
+		goto out_free_vmi;
 
 	err = anon_vma_clone(new, vma);
 	if (err)
@@ -2282,33 +2272,31 @@ int __split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma,
 	if (new->vm_ops && new->vm_ops->open)
 		new->vm_ops->open(new);
 
-	if (new_below)
-		err = vma_adjust(vmi, vma, addr, vma->vm_end,
-			vma->vm_pgoff + ((addr - new->vm_start) >> PAGE_SHIFT),
-			new);
-	else
-		err = vma_adjust(vmi, vma, vma->vm_start, addr, vma->vm_pgoff,
-				 new);
+	vma_adjust_trans_huge(vma, vma->vm_start, addr, 0);
+	init_vma_prep(&vp, vma);
+	vp.insert = new;
+	vma_prepare(&vp);
 
-	/* Success. */
-	if (!err) {
-		if (new_below)
-			vma_next(vmi);
-		return 0;
+	if (new_below) {
+		vma->vm_start = addr;
+		vma->vm_pgoff += (addr - new->vm_start) >> PAGE_SHIFT;
+	} else {
+		vma->vm_end = addr;
 	}
 
-	/* Avoid vm accounting in close() operation */
-	new->vm_start = new->vm_end;
-	new->vm_pgoff = 0;
-	/* Clean everything up if vma_adjust failed. */
-	if (new->vm_ops && new->vm_ops->close)
-		new->vm_ops->close(new);
-	if (new->vm_file)
-		fput(new->vm_file);
-	unlink_anon_vmas(new);
- out_free_mpol:
+	/* vma_complete stores the new vma */
+	vma_complete(&vp, vmi, vma->vm_mm);
+
+	/* Success. */
+	if (new_below)
+		vma_next(vmi);
+	return 0;
+
+out_free_mpol:
 	mpol_put(vma_policy(new));
- out_free_vma:
+out_free_vmi:
+	vma_iter_free(vmi);
+out_free_vma:
 	vm_area_free(new);
 	validate_mm_mt(vma->vm_mm);
 	return err;
-- 
2.35.1

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 42/44] mm/mmap: Convert do_brk_flags() to use vma_prepare() and vma_complete()
  2023-01-05 19:15 [PATCH v2 00/44] VMA tree type safety and remove __vma_adjust() Liam Howlett
                   ` (40 preceding siblings ...)
  2023-01-05 19:16 ` [PATCH v2 44/44] vma_merge: Set vma iterator to correct position Liam Howlett
@ 2023-01-05 19:16 ` Liam Howlett
  2023-01-05 19:16 ` [PATCH v2 43/44] mm/mmap: Remove __vma_adjust() Liam Howlett
                   ` (2 subsequent siblings)
  44 siblings, 0 replies; 63+ messages in thread
From: Liam Howlett @ 2023-01-05 19:16 UTC (permalink / raw)
  To: maple-tree, linux-mm, linux-kernel, Andrew Morton
  Cc: Liam Howlett, Liam Howlett

From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>

Use the abstracted vma locking for do_brk_flags()

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
---
 mm/mmap.c | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index 1e9b8eb00d45..6dd34e5ff1f7 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2979,6 +2979,7 @@ static int do_brk_flags(struct vma_iterator *vmi, struct vm_area_struct *vma,
 		unsigned long addr, unsigned long len, unsigned long flags)
 {
 	struct mm_struct *mm = current->mm;
+	struct vma_prepare vp;
 
 	validate_mm_mt(mm);
 	/*
@@ -3006,18 +3007,13 @@ static int do_brk_flags(struct vma_iterator *vmi, struct vm_area_struct *vma,
 			goto unacct_fail;
 
 		vma_adjust_trans_huge(vma, vma->vm_start, addr + len, 0);
-		if (vma->anon_vma) {
-			anon_vma_lock_write(vma->anon_vma);
-			anon_vma_interval_tree_pre_update_vma(vma);
-		}
+		init_vma_prep(&vp, vma);
+		vma_prepare(&vp);
 		vma->vm_end = addr + len;
 		vma->vm_flags |= VM_SOFTDIRTY;
 		vma_iter_store(vmi, vma);
 
-		if (vma->anon_vma) {
-			anon_vma_interval_tree_post_update_vma(vma);
-			anon_vma_unlock_write(vma->anon_vma);
-		}
+		vma_complete(&vp, vmi, mm);
 		khugepaged_enter_vma(vma, flags);
 		goto out;
 	}
-- 
2.35.1

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 43/44] mm/mmap: Remove __vma_adjust()
  2023-01-05 19:15 [PATCH v2 00/44] VMA tree type safety and remove __vma_adjust() Liam Howlett
                   ` (41 preceding siblings ...)
  2023-01-05 19:16 ` [PATCH v2 42/44] mm/mmap: Convert do_brk_flags() to use vma_prepare() and vma_complete() Liam Howlett
@ 2023-01-05 19:16 ` Liam Howlett
  2023-01-05 19:16 ` [PATCH v2 41/44] mm/mmap: Introduce dup_vma_anon() helper Liam Howlett
  2023-01-10 22:51 ` [PATCH v2 00/44] VMA tree type safety and remove __vma_adjust() Mark Brown
  44 siblings, 0 replies; 63+ messages in thread
From: Liam Howlett @ 2023-01-05 19:16 UTC (permalink / raw)
  To: maple-tree, linux-mm, linux-kernel, Andrew Morton
  Cc: Liam Howlett, Liam Howlett

From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>

Inline the work of __vma_adjust() into vma_merge().  This reduces code
size and has the added benefits of the comments for the cases being
located with the code.

Change the comments referencing vma_adjust() accordingly.

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
---
 kernel/events/uprobes.c |   2 +-
 mm/filemap.c            |   2 +-
 mm/mmap.c               | 250 ++++++++++++++++------------------------
 mm/rmap.c               |  15 +--
 4 files changed, 107 insertions(+), 162 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index d9e357b7e17c..c5d5848e2c3e 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1352,7 +1352,7 @@ static int delayed_ref_ctr_inc(struct vm_area_struct *vma)
 }
 
 /*
- * Called from mmap_region/vma_adjust with mm->mmap_lock acquired.
+ * Called from mmap_region/vma_merge with mm->mmap_lock acquired.
  *
  * Currently we ignore all errors and always return 0, the callers
  * can't handle the failure anyway.
diff --git a/mm/filemap.c b/mm/filemap.c
index c4d4ace9cc70..fe5a4973718f 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -97,7 +97,7 @@
  *    ->i_pages lock		(__sync_single_inode)
  *
  *  ->i_mmap_rwsem
- *    ->anon_vma.lock		(vma_adjust)
+ *    ->anon_vma.lock		(vma_merge)
  *
  *  ->anon_vma.lock
  *    ->page_table_lock or pte_lock	(anon_vma_prepare and various)
diff --git a/mm/mmap.c b/mm/mmap.c
index 6dd34e5ff1f7..a8dba6b6c34d 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -802,133 +802,6 @@ int vma_shrink(struct vma_iterator *vmi, struct vm_area_struct *vma,
 	return 0;
 }
 
-/*
- * We cannot adjust vm_start, vm_end, vm_pgoff fields of a vma that
- * is already present in an i_mmap tree without adjusting the tree.
- * The following helper function should be used when such adjustments
- * are necessary.  The "insert" vma (if any) is to be inserted
- * before we drop the necessary locks.
- */
-int __vma_adjust(struct vma_iterator *vmi, struct vm_area_struct *vma,
-	unsigned long start, unsigned long end, pgoff_t pgoff,
-	struct vm_area_struct *expand)
-{
-	struct mm_struct *mm = vma->vm_mm;
-	struct vm_area_struct *remove2 = NULL;
-	struct vm_area_struct *remove = NULL;
-	struct vm_area_struct *next = find_vma(mm, vma->vm_end);
-	struct vm_area_struct *orig_vma = vma;
-	struct file *file = vma->vm_file;
-	bool vma_changed = false;
-	long adjust_next = 0;
-	struct vma_prepare vma_prep;
-
-	if (next) {
-		int error = 0;
-
-		if (end >= next->vm_end) {
-			/*
-			 * vma expands, overlapping all the next, and
-			 * perhaps the one after too (mprotect case 6).
-			 * The only other cases that gets here are
-			 * case 1, case 7 and case 8.
-			 */
-			if (next == expand) {
-				/*
-				 * The only case where we don't expand "vma"
-				 * and we expand "next" instead is case 8.
-				 */
-				VM_WARN_ON(end != next->vm_end);
-				/*
-				 * we're removing "vma" and that to do so we
-				 * swapped "vma" and "next".
-				 */
-				VM_WARN_ON(file != next->vm_file);
-				swap(vma, next);
-				remove = next;
-			} else {
-				VM_WARN_ON(expand != vma);
-				/*
-				 * case 1, 6, 7, remove next.
-				 * case 6 also removes the one beyond next
-				 */
-				remove = next;
-				if (end > next->vm_end)
-					remove2 = find_vma(mm, next->vm_end);
-
-				VM_WARN_ON(remove2 != NULL &&
-					   end != remove2->vm_end);
-			}
-
-			/*
-			 * If next doesn't have anon_vma, import from vma after
-			 * next, if the vma overlaps with it.
-			 */
-			if (remove != NULL && !next->anon_vma)
-				error = dup_anon_vma(vma, remove2);
-			else
-				error = dup_anon_vma(vma, remove);
-
-		} else if (end > next->vm_start) {
-			/*
-			 * vma expands, overlapping part of the next:
-			 * mprotect case 5 shifting the boundary up.
-			 */
-			adjust_next = (end - next->vm_start);
-			VM_WARN_ON(expand != vma);
-			error = dup_anon_vma(vma, next);
-		} else if (end < vma->vm_end) {
-			/*
-			 * vma shrinks, and !insert tells it's not
-			 * split_vma inserting another: so it must be
-			 * mprotect case 4 shifting the boundary down.
-			 */
-			adjust_next = -(vma->vm_end - end);
-			VM_WARN_ON(expand != next);
-			error = dup_anon_vma(next, vma);
-		}
-		if (error)
-			return error;
-	}
-
-	if (vma_iter_prealloc(vmi, vma))
-		return -ENOMEM;
-
-	vma_adjust_trans_huge(orig_vma, start, end, adjust_next);
-
-	init_multi_vma_prep(&vma_prep, vma, adjust_next ? next : NULL, remove,
-			    remove2);
-	VM_WARN_ON(vma_prep.anon_vma && adjust_next && next->anon_vma &&
-		   vma_prep.anon_vma != next->anon_vma);
-
-	vma_prepare(&vma_prep);
-
-	if (start < vma->vm_start || end > vma->vm_end)
-		vma_changed = true;
-
-	vma->vm_start = start;
-	vma->vm_end = end;
-	vma->vm_pgoff = pgoff;
-
-	if (vma_changed)
-		vma_iter_store(vmi, vma);
-
-	if (adjust_next) {
-		next->vm_start += adjust_next;
-		next->vm_pgoff += adjust_next >> PAGE_SHIFT;
-		if (adjust_next < 0) {
-			WARN_ON_ONCE(vma_changed);
-			vma_iter_store(vmi, next);
-		}
-	}
-
-	vma_complete(&vma_prep, vmi, mm);
-	vma_iter_free(vmi);
-	validate_mm(mm);
-
-	return 0;
-}
-
 /*
  * If the vma has a ->close operation then the driver probably needs to release
  * per-vma resources, so we don't attempt to merge those.
@@ -1055,7 +928,7 @@ can_vma_merge_after(struct vm_area_struct *vma, unsigned long vm_flags,
  * It is important for case 8 that the vma NNNN overlapping the
  * region AAAA is never going to extended over XXXX. Instead XXXX must
  * be extended in region AAAA and NNNN must be removed. This way in
- * all cases where vma_merge succeeds, the moment vma_adjust drops the
+ * all cases where vma_merge succeeds, the moment vma_merge drops the
  * rmap_locks, the properties of the merged vma will be already
  * correct for the whole merged range. Some of those properties like
  * vm_page_prot/vm_flags may be accessed by rmap_walks and they must
@@ -1065,6 +938,12 @@ can_vma_merge_after(struct vm_area_struct *vma, unsigned long vm_flags,
  * or other rmap walkers (if working on addresses beyond the "end"
  * parameter) may establish ptes with the wrong permissions of NNNN
  * instead of the right permissions of XXXX.
+ *
+ * In the code below:
+ * PPPP is represented by *prev
+ * NNNN is represented by *mid (and possibly equal to *next)
+ * XXXX is represented by *next or not represented at all.
+ * AAAA is not represented - it will be merged or the function will return NULL
  */
 struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
 			struct vm_area_struct *prev, unsigned long addr,
@@ -1075,11 +954,19 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
 			struct anon_vma_name *anon_name)
 {
 	pgoff_t pglen = (end - addr) >> PAGE_SHIFT;
+	pgoff_t vma_pgoff;
 	struct vm_area_struct *mid, *next, *res = NULL;
+	struct vm_area_struct *vma, *adjust, *remove, *remove2;
 	int err = -1;
 	bool merge_prev = false;
 	bool merge_next = false;
+	bool vma_expanded = false;
+	struct vma_prepare vp;
+	unsigned long vma_end = end;
+	long adj_next = 0;
+	unsigned long vma_start = addr;
 
+	validate_mm(mm);
 	/*
 	 * We later require that vma->vm_flags == vm_flags,
 	 * so this tests vma->vm_flags & VM_SPECIAL, too.
@@ -1097,13 +984,17 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
 	VM_WARN_ON(mid && end > mid->vm_end);
 	VM_WARN_ON(addr >= end);
 
-	/* Can we merge the predecessor? */
-	if (prev && prev->vm_end == addr &&
-			mpol_equal(vma_policy(prev), policy) &&
-			can_vma_merge_after(prev, vm_flags,
-					    anon_vma, file, pgoff,
-					    vm_userfaultfd_ctx, anon_name)) {
-		merge_prev = true;
+	if (prev) {
+		res = prev;
+		vma = prev;
+		vma_start = prev->vm_start;
+		vma_pgoff = prev->vm_pgoff;
+		/* Can we merge the predecessor? */
+		if (prev->vm_end == addr && mpol_equal(vma_policy(prev), policy)
+		    && can_vma_merge_after(prev, vm_flags, anon_vma, file,
+				   pgoff, vm_userfaultfd_ctx, anon_name)) {
+			merge_prev = true;
+		}
 	}
 	/* Can we merge the successor? */
 	if (next && end == next->vm_start &&
@@ -1113,32 +1004,85 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
 					     vm_userfaultfd_ctx, anon_name)) {
 		merge_next = true;
 	}
+
+	remove = remove2 = adjust = NULL;
 	/* Can we merge both the predecessor and the successor? */
 	if (merge_prev && merge_next &&
-			is_mergeable_anon_vma(prev->anon_vma,
-				next->anon_vma, NULL)) {	 /* cases 1, 6 */
-		err = __vma_adjust(vmi, prev, prev->vm_start,
-					next->vm_end, prev->vm_pgoff, prev);
-		res = prev;
-	} else if (merge_prev) {			/* cases 2, 5, 7 */
-		err = __vma_adjust(vmi, prev, prev->vm_start,
-					end, prev->vm_pgoff, prev);
-		res = prev;
+	    is_mergeable_anon_vma(prev->anon_vma, next->anon_vma, NULL)) {
+		remove = mid;				/* case 1 */
+		vma_end = next->vm_end;
+		err = dup_anon_vma(res, remove);
+		if (mid != next) {			/* case 6 */
+			remove2 = next;
+			if (!remove->anon_vma)
+				err = dup_anon_vma(res, remove2);
+		}
+	} else if (merge_prev) {
+		err = 0;				/* case 2 */
+		if (mid && end > mid->vm_start) {
+			err = dup_anon_vma(res, mid);
+			if (end == mid->vm_end) {	/* case 7 */
+				remove = mid;
+			} else {			/* case 5 */
+				adjust = mid;
+				adj_next = (end - mid->vm_start);
+			}
+		}
 	} else if (merge_next) {
-		if (prev && addr < prev->vm_end)	/* case 4 */
-			err = __vma_adjust(vmi, prev, prev->vm_start,
-					addr, prev->vm_pgoff, next);
-		else					/* cases 3, 8 */
-			err = __vma_adjust(vmi, mid, addr, next->vm_end,
-					next->vm_pgoff - pglen, next);
 		res = next;
+		if (prev && addr < prev->vm_end) {	/* case 4 */
+			vma_end = addr;
+			adjust = mid;
+			adj_next = -(vma->vm_end - addr);
+			err = dup_anon_vma(res, adjust);
+		} else {
+			vma = next;			/* case 3 */
+			vma_start = addr;
+			vma_end = next->vm_end;
+			vma_pgoff = next->vm_pgoff;
+			err = 0;
+			if (mid != next) {		/* case 8 */
+				remove = mid;
+				err = dup_anon_vma(res, remove);
+			}
+		}
 	}
 
-	/*
-	 * Cannot merge with predecessor or successor or error in __vma_adjust?
-	 */
+	/* Cannot merge or error in anon_vma clone */
 	if (err)
 		return NULL;
+
+	if (vma_iter_prealloc(vmi, vma))
+		return NULL;
+
+	vma_adjust_trans_huge(vma, vma_start, vma_end, adj_next);
+	init_multi_vma_prep(&vp, vma, adjust, remove, remove2);
+	VM_WARN_ON(vp.anon_vma && adjust && adjust->anon_vma &&
+		   vp.anon_vma != adjust->anon_vma);
+
+	vma_prepare(&vp);
+	if (vma_start < vma->vm_start || vma_end > vma->vm_end)
+		vma_expanded = true;
+
+	vma->vm_start = vma_start;
+	vma->vm_end = vma_end;
+	vma->vm_pgoff = vma_pgoff;
+
+	if (vma_expanded)
+		vma_iter_store(vmi, vma);
+
+	if (adj_next) {
+		adjust->vm_start += adj_next;
+		adjust->vm_pgoff += adj_next >> PAGE_SHIFT;
+		if (adj_next < 0) {
+			WARN_ON(vma_expanded);
+			vma_iter_store(vmi, next);
+		}
+	}
+
+	vma_complete(&vp, vmi, mm);
+	vma_iter_free(vmi);
+	validate_mm(mm);
 	khugepaged_enter_vma(res, vm_flags);
 
 	if (res)
diff --git a/mm/rmap.c b/mm/rmap.c
index b616870a09be..4ee90f06b05b 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -262,11 +262,12 @@ static inline void unlock_anon_vma_root(struct anon_vma *root)
  * Attach the anon_vmas from src to dst.
  * Returns 0 on success, -ENOMEM on failure.
  *
- * anon_vma_clone() is called by __vma_adjust(), __split_vma(), copy_vma() and
- * anon_vma_fork(). The first three want an exact copy of src, while the last
- * one, anon_vma_fork(), may try to reuse an existing anon_vma to prevent
- * endless growth of anon_vma. Since dst->anon_vma is set to NULL before call,
- * we can identify this case by checking (!dst->anon_vma && src->anon_vma).
+ * anon_vma_clone() is called by vma_expand(), vma_merge(), __split_vma(),
+ * copy_vma() and anon_vma_fork(). The first four want an exact copy of src,
+ * while the last one, anon_vma_fork(), may try to reuse an existing anon_vma to
+ * prevent endless growth of anon_vma. Since dst->anon_vma is set to NULL before
+ * call, we can identify this case by checking (!dst->anon_vma &&
+ * src->anon_vma).
  *
  * If (!dst->anon_vma && src->anon_vma) is true, this function tries to find
  * and reuse existing anon_vma which has no vmas and only one child anon_vma.
@@ -1265,7 +1266,7 @@ void page_add_anon_rmap(struct page *page,
 	if (unlikely(PageKsm(page)))
 		unlock_page_memcg(page);
 
-	/* address might be in next vma when migration races vma_adjust */
+	/* address might be in next vma when migration races vma_merge */
 	else if (first)
 		__page_set_anon_rmap(page, vma, address,
 				     !!(flags & RMAP_EXCLUSIVE));
@@ -2548,7 +2549,7 @@ void hugepage_add_anon_rmap(struct page *page, struct vm_area_struct *vma,
 
 	BUG_ON(!PageLocked(page));
 	BUG_ON(!anon_vma);
-	/* address might be in next vma when migration races vma_adjust */
+	/* address might be in next vma when migration races vma_merge */
 	first = atomic_inc_and_test(compound_mapcount_ptr(page));
 	VM_BUG_ON_PAGE(!first && (flags & RMAP_EXCLUSIVE), page);
 	VM_BUG_ON_PAGE(!first && PageAnonExclusive(page), page);
-- 
2.35.1

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 41/44] mm/mmap: Introduce dup_vma_anon() helper
  2023-01-05 19:15 [PATCH v2 00/44] VMA tree type safety and remove __vma_adjust() Liam Howlett
                   ` (42 preceding siblings ...)
  2023-01-05 19:16 ` [PATCH v2 43/44] mm/mmap: Remove __vma_adjust() Liam Howlett
@ 2023-01-05 19:16 ` Liam Howlett
  2023-01-10 22:51 ` [PATCH v2 00/44] VMA tree type safety and remove __vma_adjust() Mark Brown
  44 siblings, 0 replies; 63+ messages in thread
From: Liam Howlett @ 2023-01-05 19:16 UTC (permalink / raw)
  To: maple-tree, linux-mm, linux-kernel, Andrew Morton
  Cc: Liam Howlett, Liam Howlett

From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>

Create a helper for duplicating the anon vma when adjusting the vma.
This simplifies the logic of __vma_adjust().

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
---
 mm/mmap.c | 74 ++++++++++++++++++++++++++++++-------------------------
 1 file changed, 40 insertions(+), 34 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index dad5c0113380..1e9b8eb00d45 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -679,6 +679,29 @@ static inline void vma_complete(struct vma_prepare *vp,
 		uprobe_mmap(vp->insert);
 }
 
+/*
+ * dup_anon_vma() - Helper function to duplicate anon_vma
+ * @dst: The destination VMA
+ * @src: The source VMA
+ *
+ * Returns: 0 on success.
+ */
+static inline int dup_anon_vma(struct vm_area_struct *dst,
+			       struct vm_area_struct *src)
+{
+	/*
+	 * Easily overlooked: when mprotect shifts the boundary, make sure the
+	 * expanding vma has anon_vma set if the shrinking vma had, to cover any
+	 * anon pages imported.
+	 */
+	if (src->anon_vma && !dst->anon_vma) {
+		dst->anon_vma = src->anon_vma;
+		return anon_vma_clone(dst, src);
+	}
+
+	return 0;
+}
+
 /*
  * vma_expand - Expand an existing VMA
  *
@@ -704,15 +727,12 @@ int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma,
 	struct vma_prepare vp;
 
 	if (next && (vma != next) && (end == next->vm_end)) {
-		remove_next = true;
-		if (next->anon_vma && !vma->anon_vma) {
-			int error;
+		int ret;
 
-			vma->anon_vma = next->anon_vma;
-			error = anon_vma_clone(vma, next);
-			if (error)
-				return error;
-		}
+		remove_next = true;
+		ret = dup_anon_vma(vma, next);
+		if (ret)
+			return ret;
 	}
 
 	init_multi_vma_prep(&vp, vma, NULL, remove_next ? next : NULL, NULL);
@@ -801,10 +821,11 @@ int __vma_adjust(struct vma_iterator *vmi, struct vm_area_struct *vma,
 	struct file *file = vma->vm_file;
 	bool vma_changed = false;
 	long adjust_next = 0;
-	struct vm_area_struct *exporter = NULL, *importer = NULL;
 	struct vma_prepare vma_prep;
 
 	if (next) {
+		int error = 0;
+
 		if (end >= next->vm_end) {
 			/*
 			 * vma expands, overlapping all the next, and
@@ -839,15 +860,14 @@ int __vma_adjust(struct vma_iterator *vmi, struct vm_area_struct *vma,
 					   end != remove2->vm_end);
 			}
 
-			exporter = next;
-			importer = vma;
-
 			/*
 			 * If next doesn't have anon_vma, import from vma after
 			 * next, if the vma overlaps with it.
 			 */
-			if (remove2 != NULL && !next->anon_vma)
-				exporter = remove2;
+			if (remove != NULL && !next->anon_vma)
+				error = dup_anon_vma(vma, remove2);
+			else
+				error = dup_anon_vma(vma, remove);
 
 		} else if (end > next->vm_start) {
 			/*
@@ -855,9 +875,8 @@ int __vma_adjust(struct vma_iterator *vmi, struct vm_area_struct *vma,
 			 * mprotect case 5 shifting the boundary up.
 			 */
 			adjust_next = (end - next->vm_start);
-			exporter = next;
-			importer = vma;
-			VM_WARN_ON(expand != importer);
+			VM_WARN_ON(expand != vma);
+			error = dup_anon_vma(vma, next);
 		} else if (end < vma->vm_end) {
 			/*
 			 * vma shrinks, and !insert tells it's not
@@ -865,24 +884,11 @@ int __vma_adjust(struct vma_iterator *vmi, struct vm_area_struct *vma,
 			 * mprotect case 4 shifting the boundary down.
 			 */
 			adjust_next = -(vma->vm_end - end);
-			exporter = vma;
-			importer = next;
-			VM_WARN_ON(expand != importer);
-		}
-
-		/*
-		 * Easily overlooked: when mprotect shifts the boundary,
-		 * make sure the expanding vma has anon_vma set if the
-		 * shrinking vma had, to cover any anon pages imported.
-		 */
-		if (exporter && exporter->anon_vma && !importer->anon_vma) {
-			int error;
-
-			importer->anon_vma = exporter->anon_vma;
-			error = anon_vma_clone(importer, exporter);
-			if (error)
-				return error;
+			VM_WARN_ON(expand != next);
+			error = dup_anon_vma(next, vma);
 		}
+		if (error)
+			return error;
 	}
 
 	if (vma_iter_prealloc(vmi, vma))
-- 
2.35.1

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 44/44] vma_merge: Set vma iterator to correct position.
  2023-01-05 19:15 [PATCH v2 00/44] VMA tree type safety and remove __vma_adjust() Liam Howlett
                   ` (39 preceding siblings ...)
  2023-01-05 19:16 ` [PATCH v2 40/44] mm/mmap: Don't use __vma_adjust() in shift_arg_pages() Liam Howlett
@ 2023-01-05 19:16 ` Liam Howlett
  2023-01-05 19:16 ` [PATCH v2 42/44] mm/mmap: Convert do_brk_flags() to use vma_prepare() and vma_complete() Liam Howlett
                   ` (3 subsequent siblings)
  44 siblings, 0 replies; 63+ messages in thread
From: Liam Howlett @ 2023-01-05 19:16 UTC (permalink / raw)
  To: maple-tree, linux-mm, linux-kernel, Andrew Morton
  Cc: Liam Howlett, Liam Howlett

From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>

When merging the previous value, set the vma iterator to the previous
slot.  Don't use the vma iterator to get the next/prev so that it is in
the correct position for a write.

Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
---
 mm/mmap.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index a8dba6b6c34d..66e2f1b88c87 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -994,6 +994,7 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
 		    && can_vma_merge_after(prev, vm_flags, anon_vma, file,
 				   pgoff, vm_userfaultfd_ctx, anon_name)) {
 			merge_prev = true;
+			vma_prev(vmi);
 		}
 	}
 	/* Can we merge the successor? */
@@ -1085,9 +1086,6 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
 	validate_mm(mm);
 	khugepaged_enter_vma(res, vm_flags);
 
-	if (res)
-		vma_iter_set(vmi, end);
-
 	return res;
 }
 
-- 
2.35.1

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: [PATCH v2 26/44] mm/damon: Stop using vma_mas_store() for maple tree store
  2023-01-05 19:16 ` [PATCH v2 26/44] mm/damon: Stop using vma_mas_store() for maple tree store Liam Howlett
@ 2023-01-05 19:32   ` SeongJae Park
  2023-01-05 19:52     ` Liam Howlett
  0 siblings, 1 reply; 63+ messages in thread
From: SeongJae Park @ 2023-01-05 19:32 UTC (permalink / raw)
  To: Liam Howlett
  Cc: maple-tree, linux-mm, linux-kernel, Andrew Morton, SeongJae Park,
	damon, kernel test robot

Hi Liam,

On Thu, 5 Jan 2023 19:16:00 +0000 Liam Howlett <liam.howlett@oracle.com> wrote:

> From: "Liam R. Howlett" <Liam.Howlett@oracle.com>
> 
> Prepare for the removal of the vma_mas_store() function by open coding
> the maple tree store in this test code.

But seems this series is not really removing 'vma_mas_store()'.  Wouldn't it
better to do the preparation and removal together in a same patch series?

> Set the range of the maple
> state and call the store function directly.
> 
> Cc: SeongJae Park <sj@kernel.org>
> Cc: damon@lists.linux.dev
> Reported-by: kernel test robot <lkp@intel.com>
> Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
> ---
>  mm/damon/vaddr-test.h | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/damon/vaddr-test.h b/mm/damon/vaddr-test.h
> index bce37c487540..41532f7355d0 100644
> --- a/mm/damon/vaddr-test.h
> +++ b/mm/damon/vaddr-test.h
> @@ -24,8 +24,10 @@ static void __link_vmas(struct maple_tree *mt, struct vm_area_struct *vmas,
>  		return;
>  
>  	mas_lock(&mas);
> -	for (i = 0; i < nr_vmas; i++)
> -		vma_mas_store(&vmas[i], &mas);
> +	for (i = 0; i < nr_vmas; i++) {
> +		mas_set_range(&mas, vmas[i].vm_start, vmas[i].vm_end - 1);
> +		mas_store_gfp(&mas, &vmas[i], GFP_KERNEL);
> +	}

On the latest mm-unstable, vma_mas_store() uses mas_store_prealloc() instead of
mas_store_gfp().  Seems the difference would make no problem to this test code
in most cases, but could I ask the reason for this change?

Also, should we check the return value of mas_store_gfp()?

>  	mas_unlock(&mas);
>  }
>  
> -- 
> 2.35.1


Thanks,
SJ

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v2 26/44] mm/damon: Stop using vma_mas_store() for maple tree store
  2023-01-05 19:32   ` SeongJae Park
@ 2023-01-05 19:52     ` Liam Howlett
  2023-01-05 20:16       ` SeongJae Park
  0 siblings, 1 reply; 63+ messages in thread
From: Liam Howlett @ 2023-01-05 19:52 UTC (permalink / raw)
  To: SeongJae Park
  Cc: maple-tree, linux-mm, linux-kernel, Andrew Morton, damon,
	kernel test robot

* SeongJae Park <sj@kernel.org> [230105 14:33]:
> Hi Liam,
> 
> On Thu, 5 Jan 2023 19:16:00 +0000 Liam Howlett <liam.howlett@oracle.com> wrote:
> 
> > From: "Liam R. Howlett" <Liam.Howlett@oracle.com>
> > 
> > Prepare for the removal of the vma_mas_store() function by open coding
> > the maple tree store in this test code.
> 
> But seems this series is not really removing 'vma_mas_store()'.  Wouldn't it
> better to do the preparation and removal together in a same patch series?

It does from the all code but the nommu side.  The definition is dropped
from the header and c file in "mmap: Convert __vma_adjust() to use vma
iterator" [1].

> 
> > Set the range of the maple
> > state and call the store function directly.
> > 
> > Cc: SeongJae Park <sj@kernel.org>
> > Cc: damon@lists.linux.dev
> > Reported-by: kernel test robot <lkp@intel.com>
> > Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
> > ---
> >  mm/damon/vaddr-test.h | 6 ++++--
> >  1 file changed, 4 insertions(+), 2 deletions(-)
> > 
> > diff --git a/mm/damon/vaddr-test.h b/mm/damon/vaddr-test.h
> > index bce37c487540..41532f7355d0 100644
> > --- a/mm/damon/vaddr-test.h
> > +++ b/mm/damon/vaddr-test.h
> > @@ -24,8 +24,10 @@ static void __link_vmas(struct maple_tree *mt, struct vm_area_struct *vmas,
> >  		return;
> >  
> >  	mas_lock(&mas);
> > -	for (i = 0; i < nr_vmas; i++)
> > -		vma_mas_store(&vmas[i], &mas);
> > +	for (i = 0; i < nr_vmas; i++) {
> > +		mas_set_range(&mas, vmas[i].vm_start, vmas[i].vm_end - 1);
> > +		mas_store_gfp(&mas, &vmas[i], GFP_KERNEL);
> > +	}
> 
> On the latest mm-unstable, vma_mas_store() uses mas_store_prealloc() instead of
> mas_store_gfp().  Seems the difference would make no problem to this test code
> in most cases, but could I ask the reason for this change?

mas_store_prealloc() expects the maple state to have the necessary
memory to store the value.  Using this function is the right way of
storing the range.  In fact, we would only need a single node since
these values will be append operations anyways.

> 
> Also, should we check the return value of mas_store_gfp()?

I can add this.  The only reason we would return an error is on ENOMEM
which seems unlikely here.  Again, it is a single node that will be
used.  The size is 256B, but it's safer to add the check.

[1] https://lore.kernel.org/linux-mm/20230105191517.3099082-28-Liam.Howlett@oracle.com/


Thanks,
Liam

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v2 26/44] mm/damon: Stop using vma_mas_store() for maple tree store
  2023-01-05 19:52     ` Liam Howlett
@ 2023-01-05 20:16       ` SeongJae Park
  0 siblings, 0 replies; 63+ messages in thread
From: SeongJae Park @ 2023-01-05 20:16 UTC (permalink / raw)
  To: Liam Howlett
  Cc: SeongJae Park, maple-tree, linux-mm, linux-kernel, Andrew Morton,
	damon, kernel test robot

Hi Liam,

On Thu, 5 Jan 2023 19:52:21 +0000 Liam Howlett <liam.howlett@oracle.com> wrote:

> * SeongJae Park <sj@kernel.org> [230105 14:33]:
> > Hi Liam,
> > 
> > On Thu, 5 Jan 2023 19:16:00 +0000 Liam Howlett <liam.howlett@oracle.com> wrote:
> > 
> > > From: "Liam R. Howlett" <Liam.Howlett@oracle.com>
> > > 
> > > Prepare for the removal of the vma_mas_store() function by open coding
> > > the maple tree store in this test code.
> > 
> > But seems this series is not really removing 'vma_mas_store()'.  Wouldn't it
> > better to do the preparation and removal together in a same patch series?
> 
> It does from the all code but the nommu side.  The definition is dropped
> from the header and c file in "mmap: Convert __vma_adjust() to use vma
> iterator" [1].

Thank you for nice explanation.

> 
> > 
> > > Set the range of the maple
> > > state and call the store function directly.
> > > 
> > > Cc: SeongJae Park <sj@kernel.org>
> > > Cc: damon@lists.linux.dev
> > > Reported-by: kernel test robot <lkp@intel.com>
> > > Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
> > > ---
> > >  mm/damon/vaddr-test.h | 6 ++++--
> > >  1 file changed, 4 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/mm/damon/vaddr-test.h b/mm/damon/vaddr-test.h
> > > index bce37c487540..41532f7355d0 100644
> > > --- a/mm/damon/vaddr-test.h
> > > +++ b/mm/damon/vaddr-test.h
> > > @@ -24,8 +24,10 @@ static void __link_vmas(struct maple_tree *mt, struct vm_area_struct *vmas,
> > >  		return;
> > >  
> > >  	mas_lock(&mas);
> > > -	for (i = 0; i < nr_vmas; i++)
> > > -		vma_mas_store(&vmas[i], &mas);
> > > +	for (i = 0; i < nr_vmas; i++) {
> > > +		mas_set_range(&mas, vmas[i].vm_start, vmas[i].vm_end - 1);
> > > +		mas_store_gfp(&mas, &vmas[i], GFP_KERNEL);
> > > +	}
> > 
> > On the latest mm-unstable, vma_mas_store() uses mas_store_prealloc() instead of
> > mas_store_gfp().  Seems the difference would make no problem to this test code
> > in most cases, but could I ask the reason for this change?
> 
> mas_store_prealloc() expects the maple state to have the necessary
> memory to store the value.  Using this function is the right way of
> storing the range.  In fact, we would only need a single node since
> these values will be append operations anyways.

Again, thank you for nice explanation.

> 
> > 
> > Also, should we check the return value of mas_store_gfp()?
> 
> I can add this.  The only reason we would return an error is on ENOMEM
> which seems unlikely here.  Again, it is a single node that will be
> used.  The size is 256B, but it's safer to add the check.

You're right.  I'd prefer having the check, but I'd not block this for the
trivial nit.

Reviewed-by: SeongJae Park <sj@kernel.org>


Thanks,
SJ

> 
> [1] https://lore.kernel.org/linux-mm/20230105191517.3099082-28-Liam.Howlett@oracle.com/
> 
> 
> Thanks,
> Liam
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v2 25/44] mm: Switch vma_merge(), split_vma(), and __split_vma to vma iterator
  2023-01-05 19:15 ` [PATCH v2 25/44] mm: Switch vma_merge(), split_vma(), and __split_vma to vma iterator Liam Howlett
@ 2023-01-06 17:23   ` SeongJae Park
  2023-01-06 19:20     ` Liam Howlett
  0 siblings, 1 reply; 63+ messages in thread
From: SeongJae Park @ 2023-01-06 17:23 UTC (permalink / raw)
  To: Liam Howlett; +Cc: maple-tree, linux-mm, linux-kernel, Andrew Morton

Hello Liam,

On Thu, 5 Jan 2023 19:15:59 +0000 Liam Howlett <liam.howlett@oracle.com> wrote:

> From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>
> 
> Drop the vmi_* functions and transition all users to use the vma
> iterator directly.
> 
> Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
> ---
>  fs/userfaultfd.c   | 14 ++++----
>  include/linux/mm.h | 16 +++-------
>  mm/madvise.c       |  6 ++--
>  mm/mempolicy.c     |  6 ++--
>  mm/mlock.c         |  6 ++--
>  mm/mmap.c          | 79 +++++++++++++---------------------------------
>  mm/mprotect.c      |  6 ++--
>  mm/mremap.c        |  2 +-
>  8 files changed, 47 insertions(+), 88 deletions(-)
[...]
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2830,22 +2830,16 @@ static inline int vma_adjust(struct vm_area_struct *vma, unsigned long start,
[...]
> -extern int vmi__split_vma(struct vma_iterator *vmi, struct mm_struct *,
> -	struct vm_area_struct *, unsigned long addr, int new_below);
> -extern int split_vma(struct mm_struct *, struct vm_area_struct *,
> -	unsigned long addr, int new_below);
> -extern int vmi_split_vma(struct vma_iterator *vmi, struct mm_struct *,
> -		struct vm_area_struct *, unsigned long addr, int new_below);
> +extern int __split_vma(struct vma_iterator *vmi, struct vm_area_struct *,
> +		       unsigned long addr, int new_below);
> +extern int split_vma(struct vma_iterator *vmi, struct vm_area_struct *,
> +			 unsigned long addr, int new_below);

I just found this change for split_vma() is applied to !CONFIG_MMU, which the
definition of split_vma() is not changed, so cause a build error.  I posted a
simple fix for that:
https://lore.kernel.org/linux-mm/20230106171857.149918-1-sj@kernel.org/


Thanks,
SJ

[...]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v2 25/44] mm: Switch vma_merge(), split_vma(), and __split_vma to vma iterator
  2023-01-06 17:23   ` SeongJae Park
@ 2023-01-06 19:20     ` Liam Howlett
  0 siblings, 0 replies; 63+ messages in thread
From: Liam Howlett @ 2023-01-06 19:20 UTC (permalink / raw)
  To: SeongJae Park; +Cc: maple-tree, linux-mm, linux-kernel, Andrew Morton

* SeongJae Park <sj@kernel.org> [230106 12:23]:
> Hello Liam,
> 
> On Thu, 5 Jan 2023 19:15:59 +0000 Liam Howlett <liam.howlett@oracle.com> wrote:
> 
> > From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>
> > 
> > Drop the vmi_* functions and transition all users to use the vma
> > iterator directly.
> > 
> > Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
> > ---
> >  fs/userfaultfd.c   | 14 ++++----
> >  include/linux/mm.h | 16 +++-------
> >  mm/madvise.c       |  6 ++--
> >  mm/mempolicy.c     |  6 ++--
> >  mm/mlock.c         |  6 ++--
> >  mm/mmap.c          | 79 +++++++++++++---------------------------------
> >  mm/mprotect.c      |  6 ++--
> >  mm/mremap.c        |  2 +-
> >  8 files changed, 47 insertions(+), 88 deletions(-)
> [...]
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -2830,22 +2830,16 @@ static inline int vma_adjust(struct vm_area_struct *vma, unsigned long start,
> [...]
> > -extern int vmi__split_vma(struct vma_iterator *vmi, struct mm_struct *,
> > -	struct vm_area_struct *, unsigned long addr, int new_below);
> > -extern int split_vma(struct mm_struct *, struct vm_area_struct *,
> > -	unsigned long addr, int new_below);
> > -extern int vmi_split_vma(struct vma_iterator *vmi, struct mm_struct *,
> > -		struct vm_area_struct *, unsigned long addr, int new_below);
> > +extern int __split_vma(struct vma_iterator *vmi, struct vm_area_struct *,
> > +		       unsigned long addr, int new_below);
> > +extern int split_vma(struct vma_iterator *vmi, struct vm_area_struct *,
> > +			 unsigned long addr, int new_below);
> 
> I just found this change for split_vma() is applied to !CONFIG_MMU, which the
> definition of split_vma() is not changed, so cause a build error.  I posted a
> simple fix for that:
> https://lore.kernel.org/linux-mm/20230106171857.149918-1-sj@kernel.org/
> 

Thanks.  I think I need revisit the nommu side of things with this
change as well.  I was hoping to avoid that, but it seems to be more
necessary than I had thought.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v2 22/44] mmap: Pass through vmi iterator to __split_vma()
  2023-01-05 19:15 ` [PATCH v2 22/44] mmap: Pass through vmi iterator to __split_vma() Liam Howlett
@ 2023-01-07  2:01   ` SeongJae Park
  2023-01-07  2:39     ` SeongJae Park
  0 siblings, 1 reply; 63+ messages in thread
From: SeongJae Park @ 2023-01-07  2:01 UTC (permalink / raw)
  To: Liam Howlett; +Cc: maple-tree, linux-mm, linux-kernel, Andrew Morton

Hello Liam,


I found 'make install' mm-unstable kernel fails from initramfs stage with
'not a dynamic executable' message.  I confirmed the issue is not reproducible
before your patchset[1] but after the series[2].

I tried to bisect, but on a commit[3] middle of mm-unstable tree which this
patch is applied, I get below error while booting.  Do you have an idea?

[    2.118502] BUG: kernel NULL pointer dereference, address: 0000000000000078
[    2.121516] #PF: supervisor read access in kernel mode
[    2.121576] #PF: error_code(0x0000) - not-present page
[    2.121576] PGD 0 P4D 0
[    2.121576] Oops: 0000 [#1] PREEMPT SMP PTI
[    2.121576] CPU: 2 PID: 237 Comm: modprobe Not tainted 6.2.0-rc1+ #18
[    2.121576] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-pr4
[    2.121576] RIP: 0010:mas_wr_walk (lib/maple_tree.c:1401 lib/maple_tree.c:2259 lib/maple_tree.c:3732 lib/maple_tree.c:3757)
[ 2.121576] Code: 00 48 8b 51 18 30 d2 48 89 53 08 83 f8 02 0f 87 64 01 00 00 4c 8d 42 08 0f b6 80 e68

Code starting with the faulting instruction
===========================================
   0:   00 48 8b                add    %cl,-0x75(%rax)
   3:   51                      push   %rcx
   4:   18 30                   sbb    %dh,(%rax)
   6:   d2 48 89                rorb   %cl,-0x77(%rax)
   9:   53                      push   %rbx
   a:   08 83 f8 02 0f 87       or     %al,-0x78f0fd08(%rbx)
  10:   64 01 00                add    %eax,%fs:(%rax)
  13:   00 4c 8d 42             add    %cl,0x42(%rbp,%rcx,4)
  17:   08 0f                   or     %cl,(%rdi)
  19:   b6 80                   mov    $0x80,%dh
  1b:   68                      .byte 0x68
[    2.121576] RSP: 0018:ffffa5190119fc28 EFLAGS: 00010246
[    2.121576] RAX: 000000000000000f RBX: ffffa5190119fc78 RCX: ffffa5190119fd60
[    2.121576] RDX: 0000000000000000 RSI: 000000000000000e RDI: 000000000000000e
[    2.121576] RBP: ffffa5190119fc38 R08: 0000000000000008 R09: 0000000000000001
[    2.121576] R10: ffff95f5c3435300 R11: ffff95f5c3434c48 R12: ffffa5190119fd60
[    2.121576] R13: ffff95f5c9a26880 R14: ffff95f5c3433690 R15: 0000000000100073
[    2.121576] FS:  0000000000000000(0000) GS:ffff9613fd480000(0000) knlGS:0000000000000000
[    2.121576] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    2.121576] CR2: 0000000000000078 CR3: 0000000103430000 CR4: 00000000000006e0
[    2.121576] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    2.121576] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[    2.121576] Call Trace:
[    2.121576]  <TASK>
[    2.121576] mas_wr_store_entry (lib/maple_tree.c:4382)
[    2.121576] mas_store_prealloc (lib/maple_tree.c:249 lib/maple_tree.c:5706)
[    2.121576] mmap_region (mm/mmap.c:2808)
[    2.121576] do_mmap (mm/mmap.c:1506)
[    2.121576] ? security_mmap_file (security/security.c:1670)
[    2.121576] vm_mmap_pgoff (mm/util.c:542)
[    2.121576] ksys_mmap_pgoff (mm/mmap.c:1552)
[    2.121576] __x64_sys_mmap (arch/x86/kernel/sys_x86_64.c:86)
[    2.121576] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
[    2.121576] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
[    2.121576] RIP: 0033:0x7ff228f7a186
[ 2.121576] Code: 1f 44 00 00 f3 0f 1e fa 41 f7 c1 ff 0f 00 00 75 2b 55 48 89 fd 53 89 cb 48 85 ff 74f

Code starting with the faulting instruction
===========================================
   0:   1f                      (bad)
   1:   44 00 00                add    %r8b,(%rax)
   4:   f3 0f 1e fa             endbr64
   8:   41 f7 c1 ff 0f 00 00    test   $0xfff,%r9d
   f:   75 2b                   jne    0x3c
  11:   55                      push   %rbp
  12:   48 89 fd                mov    %rdi,%rbp
  15:   53                      push   %rbx
  16:   89 cb                   mov    %ecx,%ebx
  18:   48 85 ff                test   %rdi,%rdi
  1b:   4f                      rex.WRXB
[    2.121576] RSP: 002b:00007ffcbc695148 EFLAGS: 00000246 ORIG_RAX: 0000000000000009
[    2.121576] RAX: ffffffffffffffda RBX: 0000000000000022 RCX: 00007ff228f7a186
[    2.121576] RDX: 0000000000000003 RSI: 0000000000002000 RDI: 0000000000000000
[    2.121576] RBP: 0000000000000000 R08: 00000000ffffffff R09: 0000000000000000
[    2.121576] R10: 0000000000000022 R11: 0000000000000246 R12: 00007ff228f8a190
[    2.121576] R13: 000000000000000c R14: 00007ff228f89060 R15: 0000000000000000
[    2.121576]  </TASK>
[    2.174098] ata2: found unknown device (class 0)
[    2.121576] Modules linked in:
[    2.121576] Dumping ftrace buffer:
[    2.121576]    (ftrace buffer empty)
[    2.121576] CR2: 0000000000000078
[    2.179450] ata2.00: ATAPI: QEMU DVD-ROM, 2.5+, max UDMA/100
[    2.179774] ---[ end trace 0000000000000000 ]---
[    2.183410] RIP: 0010:mas_wr_walk (lib/maple_tree.c:1401 lib/maple_tree.c:2259 lib/maple_tree.c:3732 lib/maple_tree.c:3757)
[ 2.184545] Code: 00 48 8b 51 18 30 d2 48 89 53 08 83 f8 02 0f 87 64 01 00 00 4c 8d 42 08 0f b6 80 e68

Code starting with the faulting instruction
===========================================
   0:   00 48 8b                add    %cl,-0x75(%rax)
   3:   51                      push   %rcx
   4:   18 30                   sbb    %dh,(%rax)
   6:   d2 48 89                rorb   %cl,-0x77(%rax)
   9:   53                      push   %rbx
   a:   08 83 f8 02 0f 87       or     %al,-0x78f0fd08(%rbx)
  10:   64 01 00                add    %eax,%fs:(%rax)
  13:   00 4c 8d 42             add    %cl,0x42(%rbp,%rcx,4)
  17:   08 0f                   or     %cl,(%rdi)
  19:   b6 80                   mov    $0x80,%dh
  1b:   68                      .byte 0x68
[    2.185835] scsi 1:0:0:0: CD-ROM            QEMU     QEMU DVD-ROM     2.5+ PQ: 0 ANSI: 5
[    2.188543] RSP: 0018:ffffa5190119fc28 EFLAGS: 00010246
[    2.188546] RAX: 000000000000000f RBX: ffffa5190119fc78 RCX: ffffa5190119fd60
[    2.188547] RDX: 0000000000000000 RSI: 000000000000000e RDI: 000000000000000e
[    2.188548] RBP: ffffa5190119fc38 R08: 0000000000000008 R09: 0000000000000001
[    2.188550] R10: ffff95f5c3435300 R11: ffff95f5c3434c48 R12: ffffa5190119fd60
[    2.188551] R13: ffff95f5c9a26880 R14: ffff95f5c3433690 R15: 0000000000100073
[    2.188552] FS:  0000000000000000(0000) GS:ffff9613fd480000(0000) knlGS:0000000000000000
[    2.188554] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    2.188556] CR2: 0000000000000078 CR3: 0000000103430000 CR4: 00000000000006e0
[    2.188559] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    2.206738] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

[1] https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/commit/?h=mm-unstable&id=a13d6f0ec9b80674195d74ddfb6dfd94d352d2bb
[2] https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/commit/?h=mm-unstable&id=1329c351b42e20fcd195829357f0eda607f3de09
[3] https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/commit/?h=mm-unstable&id=f569105c34815dee1751a00bc9ca5154cc96dd6a


Thanks,
SJ


On Thu, 5 Jan 2023 19:15:58 +0000 Liam Howlett <liam.howlett@oracle.com> wrote:

> From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>
> 
> Use the vma iterator so that the iterator can be invalidated or updated
> to avoid each caller doing so.
> 
> Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
> ---
>  mm/mmap.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 4dd7e48a312f..80f12fcf158c 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -2446,7 +2446,7 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
>  		if (end < vma->vm_end && mm->map_count >= sysctl_max_map_count)
>  			goto map_count_exceeded;
>  
> -		error = __split_vma(mm, vma, start, 0);
> +		error = vmi__split_vma(vmi, mm, vma, start, 0);
>  		if (error)
>  			goto start_split_failed;
>  
> @@ -2467,7 +2467,7 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
>  		if (next->vm_end > end) {
>  			struct vm_area_struct *split;
>  
> -			error = __split_vma(mm, next, end, 1);
> +			error = vmi__split_vma(vmi, mm, next, end, 1);
>  			if (error)
>  				goto end_split_failed;
>  
> -- 
> 2.35.1
> 
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v2 22/44] mmap: Pass through vmi iterator to __split_vma()
  2023-01-07  2:01   ` SeongJae Park
@ 2023-01-07  2:39     ` SeongJae Park
  2023-01-09 16:45       ` Liam Howlett
  0 siblings, 1 reply; 63+ messages in thread
From: SeongJae Park @ 2023-01-07  2:39 UTC (permalink / raw)
  To: SeongJae Park
  Cc: Liam Howlett, maple-tree, linux-mm, linux-kernel, Andrew Morton

Hello Liam,

On Sat, 7 Jan 2023 02:01:26 +0000 SeongJae Park <sj@kernel.org> wrote:

> Hello Liam,
> 
> 
> I found 'make install' mm-unstable kernel fails from initramfs stage with
> 'not a dynamic executable' message.  I confirmed the issue is not reproducible
> before your patchset[1] but after the series[2].
> 
> I tried to bisect, but on a commit[3] middle of mm-unstable tree which this
> patch is applied, I get below error while booting.  Do you have an idea?

I further bisected for the boot failure.  The first bad commit was a8e0f2e12936
("mmap: change do_mas_munmap and do_mas_aligned_munmap() to use vma
iterator")[1].  The stacktrace on the commit is as below.


[    2.125001] BUG: kernel NULL pointer dereference, address: 0000000000000078
[    2.128035] #PF: supervisor read access in kernel mode
[    2.128035] #PF: error_code(0x0000) - not-present page
[    2.128035] PGD 0 P4D 0
[    2.128035] Oops: 0000 [#1] PREEMPT SMP PTI
[    2.128035] CPU: 27 PID: 238 Comm: modprobe Not tainted 6.2.0-rc1+ #24
[    2.128035] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-pr4
[    2.128035] RIP: 0010:mas_wr_walk (lib/maple_tree.c:1401 lib/maple_tree.c:2259 lib/maple_tree.c:3732 lib/maple_tree.c:3757)
[ 2.128035] Code: 00 48 8b 51 18 30 d2 48 89 53 08 83 f8 02 0f 87 64 01 00 00 4c 8d 42 08 0f b6 80 e68

Code starting with the faulting instruction
===========================================
   0:   00 48 8b                add    %cl,-0x75(%rax)
   3:   51                      push   %rcx
   4:   18 30                   sbb    %dh,(%rax)
   6:   d2 48 89                rorb   %cl,-0x77(%rax)
   9:   53                      push   %rbx
   a:   08 83 f8 02 0f 87       or     %al,-0x78f0fd08(%rbx)
  10:   64 01 00                add    %eax,%fs:(%rax)
  13:   00 4c 8d 42             add    %cl,0x42(%rbp,%rcx,4)
  17:   08 0f                   or     %cl,(%rdi)
  19:   b6 80                   mov    $0x80,%dh
  1b:   68                      .byte 0x68
[    2.128035] RSP: 0018:ffffba49c11b3c28 EFLAGS: 00010246
[    2.128035] RAX: 000000000000000f RBX: ffffba49c11b3c78 RCX: ffffba49c11b3d60
[    2.128035] RDX: 0000000000000000 RSI: 000000000000000e RDI: 000000000000000e
[    2.128035] RBP: ffffba49c11b3c38 R08: 0000000000000008 R09: 0000000000000001
[    2.128035] R10: ffff8fe4ca713500 R11: ffff8fe4ca713f48 R12: ffffba49c11b3d60
[    2.128035] R13: ffff8fe4ca6f2140 R14: ffff8fe4ca711988 R15: 0000000000100073
[    2.128035] FS:  0000000000000000(0000) GS:ffff9002fdac0000(0000) knlGS:0000000000000000
[    2.128035] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    2.128035] CR2: 0000000000000078 CR3: 000000010a6d6000 CR4: 00000000000006e0
[    2.128035] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    2.128035] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[    2.128035] Call Trace:
[    2.128035]  <TASK>
[    2.128035] mas_wr_store_entry (lib/maple_tree.c:4382)
[    2.128035] mas_store_prealloc (lib/maple_tree.c:249 lib/maple_tree.c:5706)
[    2.128035] mmap_region (mm/mmap.c:2765)
[    2.128035] do_mmap (mm/mmap.c:1488)
[    2.128035] ? security_mmap_file (security/security.c:1670)
[    2.128035] vm_mmap_pgoff (mm/util.c:542)
[    2.128035] ksys_mmap_pgoff (mm/mmap.c:1534)
[    2.128035] __x64_sys_mmap (arch/x86/kernel/sys_x86_64.c:86)
[    2.128035] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
[    2.128035] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
[    2.128035] RIP: 0033:0x7fea50d24186
[ 2.128035] Code: 1f 44 00 00 f3 0f 1e fa 41 f7 c1 ff 0f 00 00 75 2b 55 48 89 fd 53 89 cb 48 85 ff 74f

Code starting with the faulting instruction
===========================================
   0:   1f                      (bad)
   1:   44 00 00                add    %r8b,(%rax)
   4:   f3 0f 1e fa             endbr64
   8:   41 f7 c1 ff 0f 00 00    test   $0xfff,%r9d
   f:   75 2b                   jne    0x3c
  11:   55                      push   %rbp
  12:   48 89 fd                mov    %rdi,%rbp
  15:   53                      push   %rbx
  16:   89 cb                   mov    %ecx,%ebx
  18:   48 85 ff                test   %rdi,%rdi
  1b:   4f                      rex.WRXB
[    2.128035] RSP: 002b:00007ffee1f7b1f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000009
[    2.128035] RAX: ffffffffffffffda RBX: 0000000000000022 RCX: 00007fea50d24186
[    2.176096] ata2: found unknown device (class 0)
[    2.128035] RDX: 0000000000000003 RSI: 0000000000002000 RDI: 0000000000000000
[    2.128035] RBP: 0000000000000000 R08: 00000000ffffffff R09: 0000000000000000
[    2.181946] ata2.00: ATAPI: QEMU DVD-ROM, 2.5+, max UDMA/100
[    2.128035] R10: 0000000000000022 R11: 0000000000000246 R12: 00007fea50d34190
[    2.128035] R13: 000000000000000c R14: 00007fea50d33060 R15: 0000000000000000
[    2.188623] scsi 1:0:0:0: CD-ROM            QEMU     QEMU DVD-ROM     2.5+ PQ: 0 ANSI: 5
[    2.128035]  </TASK>
[    2.128035] Modules linked in:
[    2.128035] Dumping ftrace buffer:
[    2.128035]    (ftrace buffer empty)
[    2.128035] CR2: 0000000000000078
[    2.196913] ---[ end trace 0000000000000000 ]---
[    2.197932] RIP: 0010:mas_wr_walk (lib/maple_tree.c:1401 lib/maple_tree.c:2259 lib/maple_tree.c:3732 lib/maple_tree.c:3757)
[ 2.198869] Code: 00 48 8b 51 18 30 d2 48 89 53 08 83 f8 02 0f 87 64 01 00 00 4c 8d 42 08 0f b6 80 e68

Code starting with the faulting instruction
===========================================
   0:   00 48 8b                add    %cl,-0x75(%rax)
   3:   51                      push   %rcx
   4:   18 30                   sbb    %dh,(%rax)
   6:   d2 48 89                rorb   %cl,-0x77(%rax)
   9:   53                      push   %rbx
   a:   08 83 f8 02 0f 87       or     %al,-0x78f0fd08(%rbx)
  10:   64 01 00                add    %eax,%fs:(%rax)
  13:   00 4c 8d 42             add    %cl,0x42(%rbp,%rcx,4)
  17:   08 0f                   or     %cl,(%rdi)
  19:   b6 80                   mov    $0x80,%dh
  1b:   68                      .byte 0x68
[    2.202922] RSP: 0018:ffffba49c11b3c28 EFLAGS: 00010246
[    2.204060] RAX: 000000000000000f RBX: ffffba49c11b3c78 RCX: ffffba49c11b3d60
[    2.205608] RDX: 0000000000000000 RSI: 000000000000000e RDI: 000000000000000e
[    2.207143] RBP: ffffba49c11b3c38 R08: 0000000000000008 R09: 0000000000000001
[    2.208703] R10: ffff8fe4ca713500 R11: ffff8fe4ca713f48 R12: ffffba49c11b3d60
[    2.210239] R13: ffff8fe4ca6f2140 R14: ffff8fe4ca711988 R15: 0000000000100073
[    2.211781] FS:  0000000000000000(0000) GS:ffff9002fdac0000(0000) knlGS:0000000000000000
[    2.213520] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    2.214756] CR2: 0000000000000078 CR3: 000000010a6d6000 CR4: 00000000000006e0
[    2.216316] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000

[1] https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/commit/?h=mm-unstable&id=a8e0f2e12936b08e4abde7c867503177def79d12


Thanks,
SJ

> 
> [    2.118502] BUG: kernel NULL pointer dereference, address: 0000000000000078
> [    2.121516] #PF: supervisor read access in kernel mode
> [    2.121576] #PF: error_code(0x0000) - not-present page
> [    2.121576] PGD 0 P4D 0
> [    2.121576] Oops: 0000 [#1] PREEMPT SMP PTI
> [    2.121576] CPU: 2 PID: 237 Comm: modprobe Not tainted 6.2.0-rc1+ #18
> [    2.121576] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-pr4
> [    2.121576] RIP: 0010:mas_wr_walk (lib/maple_tree.c:1401 lib/maple_tree.c:2259 lib/maple_tree.c:3732 lib/maple_tree.c:3757)
> [ 2.121576] Code: 00 48 8b 51 18 30 d2 48 89 53 08 83 f8 02 0f 87 64 01 00 00 4c 8d 42 08 0f b6 80 e68
> 
> Code starting with the faulting instruction
> ===========================================
>    0:   00 48 8b                add    %cl,-0x75(%rax)
>    3:   51                      push   %rcx
>    4:   18 30                   sbb    %dh,(%rax)
>    6:   d2 48 89                rorb   %cl,-0x77(%rax)
>    9:   53                      push   %rbx
>    a:   08 83 f8 02 0f 87       or     %al,-0x78f0fd08(%rbx)
>   10:   64 01 00                add    %eax,%fs:(%rax)
>   13:   00 4c 8d 42             add    %cl,0x42(%rbp,%rcx,4)
>   17:   08 0f                   or     %cl,(%rdi)
>   19:   b6 80                   mov    $0x80,%dh
>   1b:   68                      .byte 0x68
> [    2.121576] RSP: 0018:ffffa5190119fc28 EFLAGS: 00010246
> [    2.121576] RAX: 000000000000000f RBX: ffffa5190119fc78 RCX: ffffa5190119fd60
> [    2.121576] RDX: 0000000000000000 RSI: 000000000000000e RDI: 000000000000000e
> [    2.121576] RBP: ffffa5190119fc38 R08: 0000000000000008 R09: 0000000000000001
> [    2.121576] R10: ffff95f5c3435300 R11: ffff95f5c3434c48 R12: ffffa5190119fd60
> [    2.121576] R13: ffff95f5c9a26880 R14: ffff95f5c3433690 R15: 0000000000100073
> [    2.121576] FS:  0000000000000000(0000) GS:ffff9613fd480000(0000) knlGS:0000000000000000
> [    2.121576] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    2.121576] CR2: 0000000000000078 CR3: 0000000103430000 CR4: 00000000000006e0
> [    2.121576] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [    2.121576] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [    2.121576] Call Trace:
> [    2.121576]  <TASK>
> [    2.121576] mas_wr_store_entry (lib/maple_tree.c:4382)
> [    2.121576] mas_store_prealloc (lib/maple_tree.c:249 lib/maple_tree.c:5706)
> [    2.121576] mmap_region (mm/mmap.c:2808)
> [    2.121576] do_mmap (mm/mmap.c:1506)
> [    2.121576] ? security_mmap_file (security/security.c:1670)
> [    2.121576] vm_mmap_pgoff (mm/util.c:542)
> [    2.121576] ksys_mmap_pgoff (mm/mmap.c:1552)
> [    2.121576] __x64_sys_mmap (arch/x86/kernel/sys_x86_64.c:86)
> [    2.121576] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
> [    2.121576] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
> [    2.121576] RIP: 0033:0x7ff228f7a186
> [ 2.121576] Code: 1f 44 00 00 f3 0f 1e fa 41 f7 c1 ff 0f 00 00 75 2b 55 48 89 fd 53 89 cb 48 85 ff 74f
> 
> Code starting with the faulting instruction
> ===========================================
>    0:   1f                      (bad)
>    1:   44 00 00                add    %r8b,(%rax)
>    4:   f3 0f 1e fa             endbr64
>    8:   41 f7 c1 ff 0f 00 00    test   $0xfff,%r9d
>    f:   75 2b                   jne    0x3c
>   11:   55                      push   %rbp
>   12:   48 89 fd                mov    %rdi,%rbp
>   15:   53                      push   %rbx
>   16:   89 cb                   mov    %ecx,%ebx
>   18:   48 85 ff                test   %rdi,%rdi
>   1b:   4f                      rex.WRXB
> [    2.121576] RSP: 002b:00007ffcbc695148 EFLAGS: 00000246 ORIG_RAX: 0000000000000009
> [    2.121576] RAX: ffffffffffffffda RBX: 0000000000000022 RCX: 00007ff228f7a186
> [    2.121576] RDX: 0000000000000003 RSI: 0000000000002000 RDI: 0000000000000000
> [    2.121576] RBP: 0000000000000000 R08: 00000000ffffffff R09: 0000000000000000
> [    2.121576] R10: 0000000000000022 R11: 0000000000000246 R12: 00007ff228f8a190
> [    2.121576] R13: 000000000000000c R14: 00007ff228f89060 R15: 0000000000000000
> [    2.121576]  </TASK>
> [    2.174098] ata2: found unknown device (class 0)
> [    2.121576] Modules linked in:
> [    2.121576] Dumping ftrace buffer:
> [    2.121576]    (ftrace buffer empty)
> [    2.121576] CR2: 0000000000000078
> [    2.179450] ata2.00: ATAPI: QEMU DVD-ROM, 2.5+, max UDMA/100
> [    2.179774] ---[ end trace 0000000000000000 ]---
> [    2.183410] RIP: 0010:mas_wr_walk (lib/maple_tree.c:1401 lib/maple_tree.c:2259 lib/maple_tree.c:3732 lib/maple_tree.c:3757)
> [ 2.184545] Code: 00 48 8b 51 18 30 d2 48 89 53 08 83 f8 02 0f 87 64 01 00 00 4c 8d 42 08 0f b6 80 e68
> 
> Code starting with the faulting instruction
> ===========================================
>    0:   00 48 8b                add    %cl,-0x75(%rax)
>    3:   51                      push   %rcx
>    4:   18 30                   sbb    %dh,(%rax)
>    6:   d2 48 89                rorb   %cl,-0x77(%rax)
>    9:   53                      push   %rbx
>    a:   08 83 f8 02 0f 87       or     %al,-0x78f0fd08(%rbx)
>   10:   64 01 00                add    %eax,%fs:(%rax)
>   13:   00 4c 8d 42             add    %cl,0x42(%rbp,%rcx,4)
>   17:   08 0f                   or     %cl,(%rdi)
>   19:   b6 80                   mov    $0x80,%dh
>   1b:   68                      .byte 0x68
> [    2.185835] scsi 1:0:0:0: CD-ROM            QEMU     QEMU DVD-ROM     2.5+ PQ: 0 ANSI: 5
> [    2.188543] RSP: 0018:ffffa5190119fc28 EFLAGS: 00010246
> [    2.188546] RAX: 000000000000000f RBX: ffffa5190119fc78 RCX: ffffa5190119fd60
> [    2.188547] RDX: 0000000000000000 RSI: 000000000000000e RDI: 000000000000000e
> [    2.188548] RBP: ffffa5190119fc38 R08: 0000000000000008 R09: 0000000000000001
> [    2.188550] R10: ffff95f5c3435300 R11: ffff95f5c3434c48 R12: ffffa5190119fd60
> [    2.188551] R13: ffff95f5c9a26880 R14: ffff95f5c3433690 R15: 0000000000100073
> [    2.188552] FS:  0000000000000000(0000) GS:ffff9613fd480000(0000) knlGS:0000000000000000
> [    2.188554] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    2.188556] CR2: 0000000000000078 CR3: 0000000103430000 CR4: 00000000000006e0
> [    2.188559] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [    2.206738] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> 
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/commit/?h=mm-unstable&id=a13d6f0ec9b80674195d74ddfb6dfd94d352d2bb
> [2] https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/commit/?h=mm-unstable&id=1329c351b42e20fcd195829357f0eda607f3de09
> [3] https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/commit/?h=mm-unstable&id=f569105c34815dee1751a00bc9ca5154cc96dd6a
> 
> 
> Thanks,
> SJ
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v2 06/44] mm/mmap: convert brk to use vma iterator
  2023-01-05 19:15 ` [PATCH v2 06/44] mm/mmap: convert brk to use vma iterator Liam Howlett
@ 2023-01-09 15:10   ` Vernon Yang
  2023-01-09 16:38     ` Liam Howlett
  0 siblings, 1 reply; 63+ messages in thread
From: Vernon Yang @ 2023-01-09 15:10 UTC (permalink / raw)
  To: Liam Howlett; +Cc: maple-tree, linux-mm, linux-kernel, Andrew Morton

Hello Liam,

On Thu, Jan 05, 2023 at 07:15:54PM +0000, Liam Howlett wrote:
> From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>
>
> Use the vma iterator API for the brk() system call.  This will provide
> type safety at compile time.
>
> Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
> ---
>  mm/mmap.c | 47 +++++++++++++++++++++++------------------------
>  1 file changed, 23 insertions(+), 24 deletions(-)
>
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 9318f2ac8a6e..4a6f42ab3560 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -239,10 +239,10 @@ static int check_brk_limits(unsigned long addr, unsigned long len)
>
>  	return mlock_future_check(current->mm, current->mm->def_flags, len);
>  }
> -static int do_brk_munmap(struct ma_state *mas, struct vm_area_struct *vma,
> +static int do_brk_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
>  			 unsigned long newbrk, unsigned long oldbrk,
>  			 struct list_head *uf);
> -static int do_brk_flags(struct ma_state *mas, struct vm_area_struct *brkvma,
> +static int do_brk_flags(struct vma_iterator *vmi, struct vm_area_struct *brkvma,
>  		unsigned long addr, unsigned long request, unsigned long flags);
>  SYSCALL_DEFINE1(brk, unsigned long, brk)
>  {
> @@ -253,7 +253,7 @@ SYSCALL_DEFINE1(brk, unsigned long, brk)
>  	bool populate;
>  	bool downgraded = false;
>  	LIST_HEAD(uf);
> -	MA_STATE(mas, &mm->mm_mt, 0, 0);
> +	struct vma_iterator vmi;
>
>  	if (mmap_write_lock_killable(mm))
>  		return -EINTR;
> @@ -301,8 +301,8 @@ SYSCALL_DEFINE1(brk, unsigned long, brk)
>  		int ret;
>
>  		/* Search one past newbrk */
> -		mas_set(&mas, newbrk);
> -		brkvma = mas_find(&mas, oldbrk);
> +		vma_iter_init(&vmi, mm, newbrk);
> +		brkvma = vma_find(&vmi, oldbrk);
>  		if (!brkvma || brkvma->vm_start >= oldbrk)
>  			goto out; /* mapping intersects with an existing non-brk vma. */
>  		/*
> @@ -311,7 +311,7 @@ SYSCALL_DEFINE1(brk, unsigned long, brk)
>  		 * before calling do_brk_munmap().
>  		 */
>  		mm->brk = brk;
> -		ret = do_brk_munmap(&mas, brkvma, newbrk, oldbrk, &uf);
> +		ret = do_brk_munmap(&vmi, brkvma, newbrk, oldbrk, &uf);
>  		if (ret == 1)  {
>  			downgraded = true;
>  			goto success;
> @@ -329,14 +329,14 @@ SYSCALL_DEFINE1(brk, unsigned long, brk)
>  	 * Only check if the next VMA is within the stack_guard_gap of the
>  	 * expansion area
>  	 */
> -	mas_set(&mas, oldbrk);
> -	next = mas_find(&mas, newbrk - 1 + PAGE_SIZE + stack_guard_gap);
> +	vma_iter_init(&vmi, mm, oldbrk);
> +	next = vma_find(&vmi, newbrk + PAGE_SIZE + stack_guard_gap);
>  	if (next && newbrk + PAGE_SIZE > vm_start_gap(next))
>  		goto out;
>
> -	brkvma = mas_prev(&mas, mm->start_brk);
> +	brkvma = vma_prev_limit(&vmi, mm->start_brk);
>  	/* Ok, looks good - let it rip. */
> -	if (do_brk_flags(&mas, brkvma, oldbrk, newbrk - oldbrk, 0) < 0)
> +	if (do_brk_flags(&vmi, brkvma, oldbrk, newbrk - oldbrk, 0) < 0)
>  		goto out;
>
>  	mm->brk = brk;
> @@ -2963,7 +2963,7 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
>
>  /*
>   * brk_munmap() - Unmap a parital vma.
> - * @mas: The maple tree state.
> + * @vmi: The vma iterator
>   * @vma: The vma to be modified
>   * @newbrk: the start of the address to unmap
>   * @oldbrk: The end of the address to unmap
> @@ -2973,7 +2973,7 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
>   * unmaps a partial VMA mapping.  Does not handle alignment, downgrades lock if
>   * possible.
>   */
> -static int do_brk_munmap(struct ma_state *mas, struct vm_area_struct *vma,
> +static int do_brk_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
>  			 unsigned long newbrk, unsigned long oldbrk,
>  			 struct list_head *uf)
>  {
> @@ -2981,14 +2981,14 @@ static int do_brk_munmap(struct ma_state *mas, struct vm_area_struct *vma,
>  	int ret;
>
>  	arch_unmap(mm, newbrk, oldbrk);
> -	ret = do_mas_align_munmap(mas, vma, mm, newbrk, oldbrk, uf, true);
> +	ret = do_mas_align_munmap(&vmi->mas, vma, mm, newbrk, oldbrk, uf, true);
>  	validate_mm_mt(mm);
>  	return ret;
>  }
>
>  /*
>   * do_brk_flags() - Increase the brk vma if the flags match.
> - * @mas: The maple tree state.
> + * @vmi: The vma iterator
>   * @addr: The start address
>   * @len: The length of the increase
>   * @vma: The vma,
> @@ -2998,7 +2998,7 @@ static int do_brk_munmap(struct ma_state *mas, struct vm_area_struct *vma,
>   * do not match then create a new anonymous VMA.  Eventually we may be able to
>   * do some brk-specific accounting here.
>   */
> -static int do_brk_flags(struct ma_state *mas, struct vm_area_struct *vma,
> +static int do_brk_flags(struct vma_iterator *vmi, struct vm_area_struct *vma,
>  		unsigned long addr, unsigned long len, unsigned long flags)
>  {
>  	struct mm_struct *mm = current->mm;
> @@ -3025,8 +3025,7 @@ static int do_brk_flags(struct ma_state *mas, struct vm_area_struct *vma,
>  	if (vma && vma->vm_end == addr && !vma_policy(vma) &&
>  	    can_vma_merge_after(vma, flags, NULL, NULL,
>  				addr >> PAGE_SHIFT, NULL_VM_UFFD_CTX, NULL)) {
> -		mas_set_range(mas, vma->vm_start, addr + len - 1);

Why was mas_set_range() removed here, but below [1] it’s left?

> -		if (mas_preallocate(mas, vma, GFP_KERNEL))
> +		if (vma_iter_prealloc(vmi, vma))
>  			goto unacct_fail;
>
>  		vma_adjust_trans_huge(vma, vma->vm_start, addr + len, 0);
> @@ -3036,7 +3035,7 @@ static int do_brk_flags(struct ma_state *mas, struct vm_area_struct *vma,
>  		}
>  		vma->vm_end = addr + len;
>  		vma->vm_flags |= VM_SOFTDIRTY;
> -		mas_store_prealloc(mas, vma);
> +		vma_iter_store(vmi, vma);
>
>  		if (vma->anon_vma) {
>  			anon_vma_interval_tree_post_update_vma(vma);
> @@ -3057,8 +3056,8 @@ static int do_brk_flags(struct ma_state *mas, struct vm_area_struct *vma,
>  	vma->vm_pgoff = addr >> PAGE_SHIFT;
>  	vma->vm_flags = flags;
>  	vma->vm_page_prot = vm_get_page_prot(flags);
> -	mas_set_range(mas, vma->vm_start, addr + len - 1);
> -	if (mas_store_gfp(mas, vma, GFP_KERNEL))
> +	mas_set_range(&vmi->mas, vma->vm_start, vma->vm_end - 1);

[1]. the mas_set_range() here has been reserved.

> +	if (vma_iter_store_gfp(vmi, vma, GFP_KERNEL))
>  		goto mas_store_fail;
>
>  	mm->map_count++;
> @@ -3087,7 +3086,7 @@ int vm_brk_flags(unsigned long addr, unsigned long request, unsigned long flags)
>  	int ret;
>  	bool populate;
>  	LIST_HEAD(uf);
> -	MA_STATE(mas, &mm->mm_mt, addr, addr);
> +	VMA_ITERATOR(vmi, mm, addr);
>
>  	len = PAGE_ALIGN(request);
>  	if (len < request)
> @@ -3106,12 +3105,12 @@ int vm_brk_flags(unsigned long addr, unsigned long request, unsigned long flags)
>  	if (ret)
>  		goto limits_failed;
>
> -	ret = do_mas_munmap(&mas, mm, addr, len, &uf, 0);
> +	ret = do_mas_munmap(&vmi.mas, mm, addr, len, &uf, 0);
>  	if (ret)
>  		goto munmap_failed;
>
> -	vma = mas_prev(&mas, 0);
> -	ret = do_brk_flags(&mas, vma, addr, len, flags);
> +	vma = vma_prev(&vmi);
> +	ret = do_brk_flags(&vmi, vma, addr, len, flags);
>  	populate = ((mm->def_flags & VM_LOCKED) != 0);
>  	mmap_write_unlock(mm);
>  	userfaultfd_unmap_complete(mm, &uf);
> --
> 2.35.1
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v2 06/44] mm/mmap: convert brk to use vma iterator
  2023-01-09 15:10   ` Vernon Yang
@ 2023-01-09 16:38     ` Liam Howlett
  0 siblings, 0 replies; 63+ messages in thread
From: Liam Howlett @ 2023-01-09 16:38 UTC (permalink / raw)
  To: Vernon Yang; +Cc: maple-tree, linux-mm, linux-kernel, Andrew Morton

* Vernon Yang <vernon2gm@gmail.com> [230109 10:19]:
> Hello Liam,
> 
> On Thu, Jan 05, 2023 at 07:15:54PM +0000, Liam Howlett wrote:
> > From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>
> >
> > Use the vma iterator API for the brk() system call.  This will provide
> > type safety at compile time.
> >
> > Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
> > ---
> >  mm/mmap.c | 47 +++++++++++++++++++++++------------------------
> >  1 file changed, 23 insertions(+), 24 deletions(-)
> >
> > diff --git a/mm/mmap.c b/mm/mmap.c
> > index 9318f2ac8a6e..4a6f42ab3560 100644
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c

...

> > @@ -2998,7 +2998,7 @@ static int do_brk_munmap(struct ma_state *mas, struct vm_area_struct *vma,
> >   * do not match then create a new anonymous VMA.  Eventually we may be able to
> >   * do some brk-specific accounting here.
> >   */
> > -static int do_brk_flags(struct ma_state *mas, struct vm_area_struct *vma,
> > +static int do_brk_flags(struct vma_iterator *vmi, struct vm_area_struct *vma,
> >  		unsigned long addr, unsigned long len, unsigned long flags)
> >  {
> >  	struct mm_struct *mm = current->mm;
> > @@ -3025,8 +3025,7 @@ static int do_brk_flags(struct ma_state *mas, struct vm_area_struct *vma,
> >  	if (vma && vma->vm_end == addr && !vma_policy(vma) &&
> >  	    can_vma_merge_after(vma, flags, NULL, NULL,
> >  				addr >> PAGE_SHIFT, NULL_VM_UFFD_CTX, NULL)) {
> > -		mas_set_range(mas, vma->vm_start, addr + len - 1);
> 
> Why was mas_set_range() removed here, but below [1] it’s left?

Set range does two things: 1. it sets the range, and 2. it resets the
maple state so that the store will occur in the correct location.  This
is so that you do not use an invalid maple state for a store.

We don't need to move the maple state in this case since we are already
pointing at the vma we are merging with.

Furthermore, the API for the vma iterator below "vma_iter_prealloc()"
also ensures that the state is okay.

> 
> > -		if (mas_preallocate(mas, vma, GFP_KERNEL))
> > +		if (vma_iter_prealloc(vmi, vma))
> >  			goto unacct_fail;
> >
> >  		vma_adjust_trans_huge(vma, vma->vm_start, addr + len, 0);
> > @@ -3036,7 +3035,7 @@ static int do_brk_flags(struct ma_state *mas, struct vm_area_struct *vma,
> >  		}
> >  		vma->vm_end = addr + len;
> >  		vma->vm_flags |= VM_SOFTDIRTY;
> > -		mas_store_prealloc(mas, vma);
> > +		vma_iter_store(vmi, vma);
> >
> >  		if (vma->anon_vma) {
> >  			anon_vma_interval_tree_post_update_vma(vma);
> > @@ -3057,8 +3056,8 @@ static int do_brk_flags(struct ma_state *mas, struct vm_area_struct *vma,
> >  	vma->vm_pgoff = addr >> PAGE_SHIFT;
> >  	vma->vm_flags = flags;
> >  	vma->vm_page_prot = vm_get_page_prot(flags);
> > -	mas_set_range(mas, vma->vm_start, addr + len - 1);
> > -	if (mas_store_gfp(mas, vma, GFP_KERNEL))
> > +	mas_set_range(&vmi->mas, vma->vm_start, vma->vm_end - 1);
> 
> [1]. the mas_set_range() here has been reserved.

The "vma_iter_store_gfp()" API call does not check the state and the
state could be invalid.

> 
> > +	if (vma_iter_store_gfp(vmi, vma, GFP_KERNEL))
> >  		goto mas_store_fail;
> >
> >  	mm->map_count++;
...

This situation happened because I added the sanity check to the state
later in the development cycle.

I will fix this inconsistency and remove the mas_set_range().  Thanks
for catching this.


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v2 22/44] mmap: Pass through vmi iterator to __split_vma()
  2023-01-07  2:39     ` SeongJae Park
@ 2023-01-09 16:45       ` Liam Howlett
  2023-01-09 19:28         ` SeongJae Park
  0 siblings, 1 reply; 63+ messages in thread
From: Liam Howlett @ 2023-01-09 16:45 UTC (permalink / raw)
  To: SeongJae Park; +Cc: maple-tree, linux-mm, linux-kernel, Andrew Morton

* SeongJae Park <sj@kernel.org> [230106 21:40]:
> Hello Liam,
> 
> On Sat, 7 Jan 2023 02:01:26 +0000 SeongJae Park <sj@kernel.org> wrote:
> 
> > Hello Liam,
> > 
> > 
> > I found 'make install' mm-unstable kernel fails from initramfs stage with
> > 'not a dynamic executable' message.  I confirmed the issue is not reproducible
> > before your patchset[1] but after the series[2].
> > 
> > I tried to bisect, but on a commit[3] middle of mm-unstable tree which this
> > patch is applied, I get below error while booting.  Do you have an idea?
> 
> I further bisected for the boot failure.  The first bad commit was a8e0f2e12936
> ("mmap: change do_mas_munmap and do_mas_aligned_munmap() to use vma
> iterator")[1].  The stacktrace on the commit is as below.
> 
...

Thanks for your work on this.

I have found the issue and will send out a fix shortly.  I am not
handling the invalidated state correctly in the write path.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v2 22/44] mmap: Pass through vmi iterator to __split_vma()
  2023-01-09 16:45       ` Liam Howlett
@ 2023-01-09 19:28         ` SeongJae Park
  2023-01-09 20:30           ` Liam Howlett
  0 siblings, 1 reply; 63+ messages in thread
From: SeongJae Park @ 2023-01-09 19:28 UTC (permalink / raw)
  To: Liam Howlett
  Cc: SeongJae Park, maple-tree, linux-mm, linux-kernel, Andrew Morton

On Mon, 9 Jan 2023 16:45:46 +0000 Liam Howlett <liam.howlett@oracle.com> wrote:

> * SeongJae Park <sj@kernel.org> [230106 21:40]:
> > Hello Liam,
> > 
> > On Sat, 7 Jan 2023 02:01:26 +0000 SeongJae Park <sj@kernel.org> wrote:
> > 
> > > Hello Liam,
> > > 
> > > 
> > > I found 'make install' mm-unstable kernel fails from initramfs stage with
> > > 'not a dynamic executable' message.  I confirmed the issue is not reproducible
> > > before your patchset[1] but after the series[2].
> > > 
> > > I tried to bisect, but on a commit[3] middle of mm-unstable tree which this
> > > patch is applied, I get below error while booting.  Do you have an idea?
> > 
> > I further bisected for the boot failure.  The first bad commit was a8e0f2e12936
> > ("mmap: change do_mas_munmap and do_mas_aligned_munmap() to use vma
> > iterator")[1].  The stacktrace on the commit is as below.
> > 
> ...
> 
> Thanks for your work on this.
> 
> I have found the issue and will send out a fix shortly.  I am not
> handling the invalidated state correctly in the write path.

Thank you, I tested the patch and confirmed it is fixing the boot failure.  The
'make install' issue on my system is not fixed yet, though.  While doing bisect
of the issue again with your boot failure fix, I found below build failure on a
commit applying a patch of this series, namely "userfaultfd: use vma iterator".

    mm/madvise.c: In function ‘madvise_update_vma’:
    mm/madvise.c:165:11: error: implicit declaration of function ‘__split_vma’; did you mean ‘split_vma’? [-Werror=implicit-function-declaration]
      165 |   error = __split_vma(mm, vma, start, 1);
          |           ^~~~~~~~~~~
          |           split_vma
    cc1: some warnings being treated as errors

Maybe "mm: add temporary vma iterator versions of vma_merge(), split_vma(), and
__split_vma()" caused the build failure?


Thanks,
SJ

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v2 22/44] mmap: Pass through vmi iterator to __split_vma()
  2023-01-09 19:28         ` SeongJae Park
@ 2023-01-09 20:30           ` Liam Howlett
  2023-01-09 23:07             ` SeongJae Park
  0 siblings, 1 reply; 63+ messages in thread
From: Liam Howlett @ 2023-01-09 20:30 UTC (permalink / raw)
  To: SeongJae Park; +Cc: maple-tree, linux-mm, linux-kernel, Andrew Morton

* SeongJae Park <sj@kernel.org> [230109 14:28]:
> On Mon, 9 Jan 2023 16:45:46 +0000 Liam Howlett <liam.howlett@oracle.com> wrote:
> 
> > * SeongJae Park <sj@kernel.org> [230106 21:40]:
> > > Hello Liam,
> > > 
> > > On Sat, 7 Jan 2023 02:01:26 +0000 SeongJae Park <sj@kernel.org> wrote:
> > > 
> > > > Hello Liam,
> > > > 
> > > > 
> > > > I found 'make install' mm-unstable kernel fails from initramfs stage with
> > > > 'not a dynamic executable' message.  I confirmed the issue is not reproducible
> > > > before your patchset[1] but after the series[2].
> > > > 
> > > > I tried to bisect, but on a commit[3] middle of mm-unstable tree which this
> > > > patch is applied, I get below error while booting.  Do you have an idea?
> > > 
> > > I further bisected for the boot failure.  The first bad commit was a8e0f2e12936
> > > ("mmap: change do_mas_munmap and do_mas_aligned_munmap() to use vma
> > > iterator")[1].  The stacktrace on the commit is as below.
> > > 
> > ...
> > 
> > Thanks for your work on this.
> > 
> > I have found the issue and will send out a fix shortly.  I am not
> > handling the invalidated state correctly in the write path.
> 
> Thank you, I tested the patch and confirmed it is fixing the boot failure.  The
> 'make install' issue on my system is not fixed yet, though.  While doing bisect
> of the issue again with your boot failure fix, I found below build failure on a
> commit applying a patch of this series, namely "userfaultfd: use vma iterator".
> 
>     mm/madvise.c: In function ‘madvise_update_vma’:
>     mm/madvise.c:165:11: error: implicit declaration of function ‘__split_vma’; did you mean ‘split_vma’? [-Werror=implicit-function-declaration]
>       165 |   error = __split_vma(mm, vma, start, 1);
>           |           ^~~~~~~~~~~
>           |           split_vma
>     cc1: some warnings being treated as errors
> 
> Maybe "mm: add temporary vma iterator versions of vma_merge(), split_vma(), and
> __split_vma()" caused the build failure?

Yes, it seems I removed the external declaration before the function.
Thanks.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v2 22/44] mmap: Pass through vmi iterator to __split_vma()
  2023-01-09 20:30           ` Liam Howlett
@ 2023-01-09 23:07             ` SeongJae Park
  0 siblings, 0 replies; 63+ messages in thread
From: SeongJae Park @ 2023-01-09 23:07 UTC (permalink / raw)
  To: Liam Howlett
  Cc: SeongJae Park, maple-tree, linux-mm, linux-kernel, Andrew Morton

Hello Liam,

On Mon, 9 Jan 2023 20:30:50 +0000 Liam Howlett <liam.howlett@oracle.com> wrote:

> * SeongJae Park <sj@kernel.org> [230109 14:28]:
> > On Mon, 9 Jan 2023 16:45:46 +0000 Liam Howlett <liam.howlett@oracle.com> wrote:
> > 
> > > * SeongJae Park <sj@kernel.org> [230106 21:40]:
> > > > Hello Liam,
> > > > 
> > > > On Sat, 7 Jan 2023 02:01:26 +0000 SeongJae Park <sj@kernel.org> wrote:
> > > > 
> > > > > Hello Liam,
> > > > > 
> > > > > 
> > > > > I found 'make install' mm-unstable kernel fails from initramfs stage with
> > > > > 'not a dynamic executable' message.  I confirmed the issue is not reproducible
> > > > > before your patchset[1] but after the series[2].
> > > > > 
> > > > > I tried to bisect, but on a commit[3] middle of mm-unstable tree which this
> > > > > patch is applied, I get below error while booting.  Do you have an idea?
> > > > 
> > > > I further bisected for the boot failure.  The first bad commit was a8e0f2e12936
> > > > ("mmap: change do_mas_munmap and do_mas_aligned_munmap() to use vma
> > > > iterator")[1].  The stacktrace on the commit is as below.
> > > > 
> > > ...
> > > 
> > > Thanks for your work on this.
> > > 
> > > I have found the issue and will send out a fix shortly.  I am not
> > > handling the invalidated state correctly in the write path.
> > 
> > Thank you, I tested the patch and confirmed it is fixing the boot failure.  The
> > 'make install' issue on my system is not fixed yet, though.  While doing bisect
> > of the issue again with your boot failure fix, I found below build failure on a
> > commit applying a patch of this series, namely "userfaultfd: use vma iterator".
> > 
> >     mm/madvise.c: In function ‘madvise_update_vma’:
> >     mm/madvise.c:165:11: error: implicit declaration of function ‘__split_vma’; did you mean ‘split_vma’? [-Werror=implicit-function-declaration]
> >       165 |   error = __split_vma(mm, vma, start, 1);
> >           |           ^~~~~~~~~~~
> >           |           split_vma
> >     cc1: some warnings being treated as errors
> > 
> > Maybe "mm: add temporary vma iterator versions of vma_merge(), split_vma(), and
> > __split_vma()" caused the build failure?
> 
> Yes, it seems I removed the external declaration before the function.
> Thanks.

I continued bisect with your fix for this[1], and found my 'make install' issue
comes from 'mm: change mprotect_fixup to vma iterator'.

[1] https://lore.kernel.org/linux-mm/20230109205300.955019-1-Liam.Howlett@oracle.com/


Thanks,
SJ

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v2 10/44] mmap: Change do_mas_munmap and do_mas_aligned_munmap() to use vma iterator
  2023-01-05 19:15 ` [PATCH v2 10/44] mmap: Change do_mas_munmap and do_mas_aligned_munmap() " Liam Howlett
@ 2023-01-10 14:53   ` Sven Schnelle
  2023-01-10 17:26     ` Liam Howlett
  0 siblings, 1 reply; 63+ messages in thread
From: Sven Schnelle @ 2023-01-10 14:53 UTC (permalink / raw)
  To: Liam Howlett
  Cc: maple-tree, linux-mm, linux-kernel, Andrew Morton, linux-s390

Liam Howlett <liam.howlett@oracle.com> writes:

> From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>
>
> Start passing the vma iterator through the mm code.  This will allow for
> reuse of the state and cleaner invalidation if necessary.
>
> Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
> ---
>  include/linux/mm.h |  2 +-
>  mm/mmap.c          | 77 +++++++++++++++++++++-------------------------
>  mm/mremap.c        |  6 ++--
>  3 files changed, 39 insertions(+), 46 deletions(-)
>

Starting with this patch i see the following oops on s390:

[    4.512863] Run /sbin/init as init process
[    4.519447] Unable to handle kernel pointer dereference in virtual kernel address space
[    4.519450] Failing address: fbebfffb00000000 TEID: fbebfffb00000803
[    4.519452] Fault in home space mode while using kernel ASCE.
[    4.519455] AS:0000000001a60007 R3:0000000000000024
[    4.519482] Oops: 0038 ilc:2 [#1] SMP
[    4.519486] Modules linked in:
[    4.519488] CPU: 7 PID: 1 Comm: init Not tainted 6.2.0-rc1-00179-ga7f83eb601ef #1582
[    4.519491] Hardware name: IBM 3906 M04 704 (z/VM 7.1.0)
[    4.519493] Krnl PSW : 0704c00180000000 0000000000929464 (__memcpy+0x24/0x50)
[    4.519503]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
[    4.519506] Krnl GPRS: 0000000000000000 0000037fffb1f990 0000037fffb1f990 fbebfffb00000008
[    4.519509]            0000000000000007 0000000000929480 0000000000000008 0000000000000000
[    4.519517]            0000000000000009 0000037fffb1fb40 0000037fffb1f880 0000037fffb1fc58
[    4.519519]            0000000080288000 0000000000000001 0000000000cf65da 0000037fffb1f5d8
[    4.519527] Krnl Code: 0000000000929456: b9040012            lgr     %r1,%r2
[    4.519527]            000000000092945a: a7740008            brc     7,000000000092946a
[    4.519527]           #000000000092945e: c05000000011        larl    %r5,0000000000929480
[    4.519527]           >0000000000929464: 44405000            ex      %r4,0(%r5)
[    4.519527]            0000000000929468: 07fe                bcr     15,%r14
[    4.519527]            000000000092946a: d2ff10003000        mvc     0(256,%r1),0(%r3)
[    4.519527]            0000000000929470: 41101100            la      %r1,256(%r1)
[    4.519527]            0000000000929474: 41303100            la      %r3,256(%r3)
[    4.519547] Call Trace:
[    4.519548]  [<0000000000929464>] __memcpy+0x24/0x50
[    4.519557]  [<0000000000cfd474>] mas_wr_bnode+0x5c/0x14e8
[    4.519562]  [<0000000000cffaf6>] mas_store_prealloc+0x4e/0xf8
[    4.519569]  [<000000000039d262>] mmap_region+0x482/0x8b0
[    4.519572]  [<000000000039da6e>] do_mmap+0x3de/0x4c0
[    4.519575]  [<000000000036aeae>] vm_mmap_pgoff+0xd6/0x188
[    4.519580]  [<000000000039a18a>] ksys_mmap_pgoff+0x62/0x230
[    4.519584]  [<000000000039a522>] __s390x_sys_old_mmap+0x7a/0x98
[    4.519588]  [<0000000000d22650>] __do_syscall+0x1d0/0x1f8
[    4.519592]  [<0000000000d32712>] system_call+0x82/0xb0
[    4.519596] Last Breaking-Event-Address:
[    4.519596]  [<0000000000cf65d4>] mas_store_b_node+0x3cc/0x6b0
[    4.519603] Kernel panic - not syncing: Fatal exception: panic_on_oops

This happens on every boot, always killing the init process. The oops
doesn't happen with next-20230110. With next-20230110 i see shmat
testcase failures in ltp (shmat returning with -EINVAL because
find_vma_intersection() tells shmat that there's already a mapping
present).

Trying to bisect that i stumbled above the oops above. Any ideas before
i start trying to understand the patch?

Thanks,
Sven

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v2 10/44] mmap: Change do_mas_munmap and do_mas_aligned_munmap() to use vma iterator
  2023-01-10 14:53   ` Sven Schnelle
@ 2023-01-10 17:26     ` Liam Howlett
  2023-01-11  6:55       ` Sven Schnelle
  0 siblings, 1 reply; 63+ messages in thread
From: Liam Howlett @ 2023-01-10 17:26 UTC (permalink / raw)
  To: Sven Schnelle
  Cc: maple-tree, linux-mm, linux-kernel, Andrew Morton, linux-s390

* Sven Schnelle <svens@linux.ibm.com> [230110 09:54]:
> Liam Howlett <liam.howlett@oracle.com> writes:
> 
> > From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>
> >
> > Start passing the vma iterator through the mm code.  This will allow for
> > reuse of the state and cleaner invalidation if necessary.
> >
> > Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
> > ---
> >  include/linux/mm.h |  2 +-
> >  mm/mmap.c          | 77 +++++++++++++++++++++-------------------------
> >  mm/mremap.c        |  6 ++--
> >  3 files changed, 39 insertions(+), 46 deletions(-)
> >
> 
> Starting with this patch i see the following oops on s390:
> 
> [    4.512863] Run /sbin/init as init process
> [    4.519447] Unable to handle kernel pointer dereference in virtual kernel address space
> [    4.519450] Failing address: fbebfffb00000000 TEID: fbebfffb00000803
> [    4.519452] Fault in home space mode while using kernel ASCE.
> [    4.519455] AS:0000000001a60007 R3:0000000000000024
> [    4.519482] Oops: 0038 ilc:2 [#1] SMP
> [    4.519486] Modules linked in:
> [    4.519488] CPU: 7 PID: 1 Comm: init Not tainted 6.2.0-rc1-00179-ga7f83eb601ef #1582
> [    4.519491] Hardware name: IBM 3906 M04 704 (z/VM 7.1.0)
> [    4.519493] Krnl PSW : 0704c00180000000 0000000000929464 (__memcpy+0x24/0x50)
> [    4.519503]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
> [    4.519506] Krnl GPRS: 0000000000000000 0000037fffb1f990 0000037fffb1f990 fbebfffb00000008
> [    4.519509]            0000000000000007 0000000000929480 0000000000000008 0000000000000000
> [    4.519517]            0000000000000009 0000037fffb1fb40 0000037fffb1f880 0000037fffb1fc58
> [    4.519519]            0000000080288000 0000000000000001 0000000000cf65da 0000037fffb1f5d8
> [    4.519527] Krnl Code: 0000000000929456: b9040012            lgr     %r1,%r2
> [    4.519527]            000000000092945a: a7740008            brc     7,000000000092946a
> [    4.519527]           #000000000092945e: c05000000011        larl    %r5,0000000000929480
> [    4.519527]           >0000000000929464: 44405000            ex      %r4,0(%r5)
> [    4.519527]            0000000000929468: 07fe                bcr     15,%r14
> [    4.519527]            000000000092946a: d2ff10003000        mvc     0(256,%r1),0(%r3)
> [    4.519527]            0000000000929470: 41101100            la      %r1,256(%r1)
> [    4.519527]            0000000000929474: 41303100            la      %r3,256(%r3)
> [    4.519547] Call Trace:
> [    4.519548]  [<0000000000929464>] __memcpy+0x24/0x50
> [    4.519557]  [<0000000000cfd474>] mas_wr_bnode+0x5c/0x14e8
> [    4.519562]  [<0000000000cffaf6>] mas_store_prealloc+0x4e/0xf8
> [    4.519569]  [<000000000039d262>] mmap_region+0x482/0x8b0
> [    4.519572]  [<000000000039da6e>] do_mmap+0x3de/0x4c0
> [    4.519575]  [<000000000036aeae>] vm_mmap_pgoff+0xd6/0x188
> [    4.519580]  [<000000000039a18a>] ksys_mmap_pgoff+0x62/0x230
> [    4.519584]  [<000000000039a522>] __s390x_sys_old_mmap+0x7a/0x98
> [    4.519588]  [<0000000000d22650>] __do_syscall+0x1d0/0x1f8
> [    4.519592]  [<0000000000d32712>] system_call+0x82/0xb0
> [    4.519596] Last Breaking-Event-Address:
> [    4.519596]  [<0000000000cf65d4>] mas_store_b_node+0x3cc/0x6b0
> [    4.519603] Kernel panic - not syncing: Fatal exception: panic_on_oops
> 
> This happens on every boot, always killing the init process. The oops
> doesn't happen with next-20230110. With next-20230110 i see shmat
> testcase failures in ltp (shmat returning with -EINVAL because
> find_vma_intersection() tells shmat that there's already a mapping
> present).
> 
> Trying to bisect that i stumbled above the oops above. Any ideas before
> i start trying to understand the patch?

Yes, try the patch for fixing the invalidated state I sent out yesterday
[1].  This should come before ("mm: expand vma iterator interface").

1. https://lore.kernel.org/linux-mm/20230109165455.647400-1-Liam.Howlett@oracle.com/

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v2 00/44] VMA tree type safety and remove __vma_adjust()
  2023-01-05 19:15 [PATCH v2 00/44] VMA tree type safety and remove __vma_adjust() Liam Howlett
                   ` (43 preceding siblings ...)
  2023-01-05 19:16 ` [PATCH v2 41/44] mm/mmap: Introduce dup_vma_anon() helper Liam Howlett
@ 2023-01-10 22:51 ` Mark Brown
  2023-01-11  2:22   ` Liam Howlett
  44 siblings, 1 reply; 63+ messages in thread
From: Mark Brown @ 2023-01-10 22:51 UTC (permalink / raw)
  To: Liam Howlett; +Cc: maple-tree, linux-mm, linux-kernel, Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 17062 bytes --]

On Thu, Jan 05, 2023 at 07:15:44PM +0000, Liam Howlett wrote:

> This patch set does two things: 1. Clean up, including removal of
> __vma_adjust() and 2. Extends the VMA iterator API to provide type
> safety to the VMA operations using the maple tree, as requested by Linus
> [1].

This series *appears* to be causing some fun issues in -next for the
past couple of days or so.  The initial failures were seen by KernelCI
on several platforms (I've mostly been trying various arm64 things, at
least 32 bit ARM is also affected).  The intial symptom seen is that a
go binary called skipgen that gets invoked as part of the testing
silently faults, tweaking things so that we get as far as running the
arm64 selftests results in much more useful output with various things
failing with actual error messages such as:

  ./fake_sigreturn_bad_magic: error while loading shared libraries: cannot make segment writable for relocation: Cannot allocate memory
  ./sve-test: error while loading shared libraries: cannot make segment writable for relocation: Cannot allocate memory

I'm fairly sure we're not actually running out of memory, there's no OOM
killer activity, the amount of memory the system has appears to make no
difference and just replacing the kernel with a mainline build runs as
expected.

You can see the full run that produced the above errors at:

   https://lava.sirena.org.uk/scheduler/job/88257

which also embeds links to all the binaries used, exact commands run and
so on.  The failing binaries all appear to be execed from within a
testsuite, though it's not *all* binaries execed from within tests (eg,
vec-syscfg execs things and seems happy).

This has taken out a bunch of testsuites in KernelCI (and probably other
CI systems using test-definitions, though I didn't check).

I tried to bisect this but otherwise haven't made any effort to look at
the failure.  The bisect sadly got lost in this series since a lot of
the series either fails to build with:

/home/broonie/git/bisect/mm/madvise.c: In function 'madvise_update_vma':
/home/broonie/git/bisect/mm/madvise.c:165:25: error: implicit declaration of function '__split_vma'; did you mean 'split_vma'? [-Werror=implicit-function-declaration]
  165 |                 error = __split_vma(mm, vma, start, 1);
      |                         ^~~~~~~~~~~
      |                         split_vma

or fails to boot with something along the lines of:

<6>[    6.054380] Freeing initrd memory: 86880K
<1>[    6.087945] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000078
<1>[    6.088231] Mem abort info:
<1>[    6.088340]   ESR = 0x0000000096000004
<1>[    6.088504]   EC = 0x25: DABT (current EL), IL = 32 bits
<1>[    6.088671]   SET = 0, FnV = 0
<1>[    6.088802]   EA = 0, S1PTW = 0
<1>[    6.088929]   FSC = 0x04: level 0 translation fault
<1>[    6.089099] Data abort info:
<1>[    6.089210]   ISV = 0, ISS = 0x00000004
<1>[    6.089347]   CM = 0, WnR = 0
<1>[    6.089486] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000043e33000
<1>[    6.089692] [0000000000000078] pgd=0000000000000000, p4d=0000000000000000
<0>[    6.090566] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
<4>[    6.090866] Modules linked in:
<4>[    6.091167] CPU: 0 PID: 42 Comm: modprobe Not tainted 6.2.0-rc1-00190-g505c59767243 #13
<4>[    6.091478] Hardware name: linux,dummy-virt (DT)
<4>[    6.091784] pstate: 63400005 (nZCv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
<4>[    6.092048] pc : mas_wr_walk+0x60/0x2d0
<4>[    6.092622] lr : mas_wr_store_entry.isra.0+0x80/0x4a0
<4>[    6.092798] sp : ffff80000821bb10
<4>[    6.092926] x29: ffff80000821bb10 x28: ffff000003fa4480 x27: 0000000200100073
<4>[    6.093206] x26: ffff000003fa41b0 x25: ffff000003fa43f0 x24: 0000000000000002
<4>[    6.093445] x23: 0000000ffffae021 x22: 0000000000000000 x21: ffff000002a74440
<4>[    6.093685] x20: ffff000003fa4480 x19: ffff80000821bc48 x18: 0000000000000000
<4>[    6.093933] x17: 0000000000000000 x16: ffff000002b8da00 x15: ffff80000821bc48
<4>[    6.094169] x14: 0000ffffae022fff x13: ffffffffffffffff x12: ffff000002b8da0c
<4>[    6.094427] x11: ffff80000821bb68 x10: ffffd75265462458 x9 : ffff80000821bc48
<4>[    6.094685] x8 : ffff80000821bbb8 x7 : ffff80000821bc48 x6 : ffffffffffffffff
<4>[    6.094922] x5 : 000000000000000e x4 : 000000000000000e x3 : 0000000000000000
<4>[    6.095167] x2 : 0000000000000008 x1 : 000000000000000f x0 : ffff80000821bb68
<4>[    6.095499] Call trace:
<4>[    6.095685]  mas_wr_walk+0x60/0x2d0
<4>[    6.095936]  mas_store_prealloc+0x50/0xa0
<4>[    6.096097]  mmap_region+0x520/0x784
<4>[    6.096232]  do_mmap+0x3b0/0x52c
<4>[    6.096347]  vm_mmap_pgoff+0xe4/0x10c
<4>[    6.096480]  ksys_mmap_pgoff+0x4c/0x204
<4>[    6.096621]  __arm64_sys_mmap+0x30/0x44
<4>[    6.096754]  invoke_syscall+0x48/0x114
<4>[    6.096900]  el0_svc_common.constprop.0+0x44/0xec
<4>[    6.097052]  do_el0_svc+0x38/0xb0
<4>[    6.097183]  el0_svc+0x2c/0x84
<4>[    6.097287]  el0t_64_sync_handler+0xf4/0x120
<4>[    6.097457]  el0t_64_sync+0x190/0x194
<0>[    6.097835] Code: 39402021 51000425 92401ca4 12001ca5 (f8647844) 
<4>[    6.098294] ---[ end trace 0000000000000000 ]---

(not always exactly the same backtrace, but the mas_wr_walk() was always
there.)

The specific set of commits in next-20230110 where bisect got lost was:

505c59767243 madvise: use vmi iterator for __split_vma() and vma_merge()
1cfdd2a44d6b mmap: pass through vmi iterator to __split_vma()
7d718fd9873c sched: convert to vma iterator
2f94851ec717 mmap: use vmi version of vma_merge()
7e2dd18353a3 task_mmu: convert to vma iterator
756841b468f5 mm/mremap: use vmi version of vma_merge()
aaba4ba837fa mempolicy: convert to vma iterator
8193673ee5d8 coredump: convert to vma iterator
d4f7ebf41a44 mm: switch vma_merge(), split_vma(), and __split_vma to vma iterator
4b02758dc3c5 mlock: convert mlock to vma iterator
fd367dac089e include/linux/mm: declare different type of split_vma() for !CONFIG_MMU
3a72a0174748 mm/damon: stop using vma_mas_store() for maple tree store
dd51a3ca1096 mm: change mprotect_fixup to vma iterator
b9e4eabb8f40 mmap: convert __vma_adjust() to use vma iterator
c6fc05242a09 userfaultfd: use vma iterator
b9000fd4c5a6 mmap-convert-__vma_adjust-to-use-vma-iterator-fix
bdfb333b0b2a ipc/shm: use the vma iterator for munmap calls
3128296746a1 mm: pass through vma iterator to __vma_adjust()
80c8eed1721e mm: add temporary vma iterator versions of vma_merge(), split_vma(), and __split_vma()
311129a7971c mmap: convert vma_expand() to use vma iterator
69e9b6c8a525 madvise: use split_vma() instead of __split_vma()
751f0a6713a9 mm: remove unnecessary write to vma iterator in __vma_adjust()
a7f83eb601ef mmap: change do_mas_munmap and do_mas_aligned_munmap() to use vma iterator
39fd6622223e mm: pass vma iterator through to __vma_adjust()

(that last one actually failed, the rest were skipped.)  Full bisect
log:

git bisect start
# bad: [435bf71af3a0aa8067f3b87ff9febf68b564dbb6] Add linux-next specific files for 20230110
git bisect bad 435bf71af3a0aa8067f3b87ff9febf68b564dbb6
# good: [1fe4fd6f5cad346e598593af36caeadc4f5d4fa9] Merge tag 'xfs-6.2-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
git bisect good 1fe4fd6f5cad346e598593af36caeadc4f5d4fa9
# good: [57aac56e8af1628ef96055820f88ca547233b310] Merge branch 'drm-next' of git://git.freedesktop.org/git/drm/drm.git
git bisect good 57aac56e8af1628ef96055820f88ca547233b310
# good: [c9167d1c0ec75118a2859099255f68dc4d0779fd] Merge branch 'rcu/next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git
git bisect good c9167d1c0ec75118a2859099255f68dc4d0779fd
# good: [74f6598c9d8197774cfa9038c0cf0925cc5f178f] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git
git bisect good 74f6598c9d8197774cfa9038c0cf0925cc5f178f
# bad: [f434860645df3dc10aae20654f17eb30955196c6] drivers/misc/open-dice: don't touch VM_MAYSHARE
git bisect bad f434860645df3dc10aae20654f17eb30955196c6
# good: [f73d9ff6ef5a79d212319950dab7d6b1fdea9ee9] mm/page_reporting: replace rcu_access_pointer() with rcu_dereference_protected()
git bisect good f73d9ff6ef5a79d212319950dab7d6b1fdea9ee9
# skip: [311129a7971cb4b80038fca4b4ac0c6214dbc46f] mmap: convert vma_expand() to use vma iterator
git bisect skip 311129a7971cb4b80038fca4b4ac0c6214dbc46f
# bad: [85a9b62c63adb67becc48887b6e211a3760e1758] zram: correctly handle all next_arg() cases
git bisect bad 85a9b62c63adb67becc48887b6e211a3760e1758
# good: [f355b8d96876e06a6879e8936297474fdf8b5e82] mm/mmap: remove preallocation from do_mas_align_munmap()
git bisect good f355b8d96876e06a6879e8936297474fdf8b5e82
# skip: [061dc47414898c882c8ffb55c60434f41e844cb7] mm: add vma iterator to vma_adjust() arguments
git bisect skip 061dc47414898c882c8ffb55c60434f41e844cb7
# skip: [751f0a6713a94e739a924d8729fd58628e119ef6] mm: remove unnecessary write to vma iterator in __vma_adjust()
git bisect skip 751f0a6713a94e739a924d8729fd58628e119ef6
# skip: [505c597672439d99cb42b11b5ea56fbf00746e0a] madvise: use vmi iterator for __split_vma() and vma_merge()
git bisect skip 505c597672439d99cb42b11b5ea56fbf00746e0a
# skip: [b01b3b8a73656aa475df807c17e4a34254d3a4c1] mm: change munmap splitting order and move_vma()
git bisect skip b01b3b8a73656aa475df807c17e4a34254d3a4c1
# bad: [3eade064bd22a24bcde84bdf371fb746087f6c9b] mm: fix two spelling mistakes in highmem.h
git bisect bad 3eade064bd22a24bcde84bdf371fb746087f6c9b
# skip: [b9000fd4c5a64464e62e61da21f2101543b2e042] mmap-convert-__vma_adjust-to-use-vma-iterator-fix
git bisect skip b9000fd4c5a64464e62e61da21f2101543b2e042
# skip: [a7f83eb601efc719889279bf9981b4b3f23f0084] mmap: change do_mas_munmap and do_mas_aligned_munmap() to use vma iterator
git bisect skip a7f83eb601efc719889279bf9981b4b3f23f0084
# skip: [1b55bb7e3b16724e91020c168eb50c40a1f5df88] mmap: clean up mmap_region() unrolling
git bisect skip 1b55bb7e3b16724e91020c168eb50c40a1f5df88
# bad: [4b9c180dfc284fbbecad8feaa4b5f86a12d04e49] mm/mmap: remove __vma_adjust()
git bisect bad 4b9c180dfc284fbbecad8feaa4b5f86a12d04e49
# skip: [3a72a017474833fca226699e3cc7a95cdf55d421] mm/damon: stop using vma_mas_store() for maple tree store
git bisect skip 3a72a017474833fca226699e3cc7a95cdf55d421
# skip: [bdfb333b0b2a025de350a01748be1406801f1f24] ipc/shm: use the vma iterator for munmap calls
git bisect skip bdfb333b0b2a025de350a01748be1406801f1f24
# skip: [2f94851ec717a9b318ac57c011af349a5ef20f5e] mmap: use vmi version of vma_merge()
git bisect skip 2f94851ec717a9b318ac57c011af349a5ef20f5e
# skip: [07364e5b9a1db3a939395c387e0222964b962561] mm: don't use __vma_adjust() in __split_vma()
git bisect skip 07364e5b9a1db3a939395c387e0222964b962561
# skip: [756841b468f59fd31c3dcd1ff574a2c582124a7e] mm/mremap: use vmi version of vma_merge()
git bisect skip 756841b468f59fd31c3dcd1ff574a2c582124a7e
# skip: [c6fc05242a095b7652e501ae73313730359a4bbb] userfaultfd: use vma iterator
git bisect skip c6fc05242a095b7652e501ae73313730359a4bbb
# skip: [3128296746a14cb620247ffd3f8ff38dd4c58102] mm: pass through vma iterator to __vma_adjust()
git bisect skip 3128296746a14cb620247ffd3f8ff38dd4c58102
# skip: [dd51a3ca1096d568a796b5b21851d9d07e5955eb] mm: change mprotect_fixup to vma iterator
git bisect skip dd51a3ca1096d568a796b5b21851d9d07e5955eb
# skip: [d4f7ebf41a4428a3ea6f202e297b7584f1109a78] mm: switch vma_merge(), split_vma(), and __split_vma to vma iterator
git bisect skip d4f7ebf41a4428a3ea6f202e297b7584f1109a78
# skip: [d2297db1d48afba5b74eb002c1cbf7beb8a5c241] mm/mmap: use vma_prepare() and vma_complete() in vma_expand()
git bisect skip d2297db1d48afba5b74eb002c1cbf7beb8a5c241
# skip: [fd367dac089e27a60bc0700dc272428cb9da8446] include/linux/mm: declare different type of split_vma() for !CONFIG_MMU
git bisect skip fd367dac089e27a60bc0700dc272428cb9da8446
# skip: [4b02758dc3c5f80582e4c822d28ef271828b8d68] mlock: convert mlock to vma iterator
git bisect skip 4b02758dc3c5f80582e4c822d28ef271828b8d68
# skip: [69e9b6c8a5256fdc6a5854375e6d231527f33247] madvise: use split_vma() instead of __split_vma()
git bisect skip 69e9b6c8a5256fdc6a5854375e6d231527f33247
# skip: [0471d6b0df5e8afe03cb7ff3cd507dd8d45dd0ac] mm/mmap: refactor locking out of __vma_adjust()
git bisect skip 0471d6b0df5e8afe03cb7ff3cd507dd8d45dd0ac
# skip: [b9e4eabb8f40e7dae4b0d5f33826b6d27c33a6e7] mmap: convert __vma_adjust() to use vma iterator
git bisect skip b9e4eabb8f40e7dae4b0d5f33826b6d27c33a6e7
# skip: [edd9f4829c57c856109764d6c1140428b9f275b5] mm/mmap: move anon_vma setting in __vma_adjust()
git bisect skip edd9f4829c57c856109764d6c1140428b9f275b5
# skip: [1cfdd2a44d6b142dc6c16108e1efc8404c21f3b6] mmap: pass through vmi iterator to __split_vma()
git bisect skip 1cfdd2a44d6b142dc6c16108e1efc8404c21f3b6
# skip: [fc63eb0e3016002ee0683829f0673463ee0d855e] mm/mmap: introduce init_vma_prep() and init_multi_vma_prep()
git bisect skip fc63eb0e3016002ee0683829f0673463ee0d855e
# bad: [39fd6622223e2f26f585c2c19cf69443ba5b3549] mm: pass vma iterator through to __vma_adjust()
git bisect bad 39fd6622223e2f26f585c2c19cf69443ba5b3549
# skip: [7d718fd9873c157fc791816829ece1a96e7ac4d3] sched: convert to vma iterator
git bisect skip 7d718fd9873c157fc791816829ece1a96e7ac4d3
# skip: [aaba4ba837fa08bb6e822d0726a6718f861661d7] mempolicy: convert to vma iterator
git bisect skip aaba4ba837fa08bb6e822d0726a6718f861661d7
# skip: [7e2dd18353a3f09d2ad16cd4977dd9d716104863] task_mmu: convert to vma iterator
git bisect skip 7e2dd18353a3f09d2ad16cd4977dd9d716104863
# skip: [8193673ee5d8a88563cfd5f5befe299c41d49e54] coredump: convert to vma iterator
git bisect skip 8193673ee5d8a88563cfd5f5befe299c41d49e54
# skip: [80c8eed1721ee630b2494f14f239d7b3389dac7e] mm: add temporary vma iterator versions of vma_merge(), split_vma(), and __split_vma()
git bisect skip 80c8eed1721ee630b2494f14f239d7b3389dac7e
# only skipped commits left to test
# possible first bad commit: [39fd6622223e2f26f585c2c19cf69443ba5b3549] mm: pass vma iterator through to __vma_adjust()
# possible first bad commit: [751f0a6713a94e739a924d8729fd58628e119ef6] mm: remove unnecessary write to vma iterator in __vma_adjust()
# possible first bad commit: [69e9b6c8a5256fdc6a5854375e6d231527f33247] madvise: use split_vma() instead of __split_vma()
# possible first bad commit: [3128296746a14cb620247ffd3f8ff38dd4c58102] mm: pass through vma iterator to __vma_adjust()
# possible first bad commit: [b9000fd4c5a64464e62e61da21f2101543b2e042] mmap-convert-__vma_adjust-to-use-vma-iterator-fix
# possible first bad commit: [b9e4eabb8f40e7dae4b0d5f33826b6d27c33a6e7] mmap: convert __vma_adjust() to use vma iterator
# possible first bad commit: [3a72a017474833fca226699e3cc7a95cdf55d421] mm/damon: stop using vma_mas_store() for maple tree store
# possible first bad commit: [fd367dac089e27a60bc0700dc272428cb9da8446] include/linux/mm: declare different type of split_vma() for !CONFIG_MMU
# possible first bad commit: [d4f7ebf41a4428a3ea6f202e297b7584f1109a78] mm: switch vma_merge(), split_vma(), and __split_vma to vma iterator
# possible first bad commit: [756841b468f59fd31c3dcd1ff574a2c582124a7e] mm/mremap: use vmi version of vma_merge()
# possible first bad commit: [2f94851ec717a9b318ac57c011af349a5ef20f5e] mmap: use vmi version of vma_merge()
# possible first bad commit: [1cfdd2a44d6b142dc6c16108e1efc8404c21f3b6] mmap: pass through vmi iterator to __split_vma()
# possible first bad commit: [505c597672439d99cb42b11b5ea56fbf00746e0a] madvise: use vmi iterator for __split_vma() and vma_merge()
# possible first bad commit: [7d718fd9873c157fc791816829ece1a96e7ac4d3] sched: convert to vma iterator
# possible first bad commit: [7e2dd18353a3f09d2ad16cd4977dd9d716104863] task_mmu: convert to vma iterator
# possible first bad commit: [aaba4ba837fa08bb6e822d0726a6718f861661d7] mempolicy: convert to vma iterator
# possible first bad commit: [8193673ee5d8a88563cfd5f5befe299c41d49e54] coredump: convert to vma iterator
# possible first bad commit: [4b02758dc3c5f80582e4c822d28ef271828b8d68] mlock: convert mlock to vma iterator
# possible first bad commit: [dd51a3ca1096d568a796b5b21851d9d07e5955eb] mm: change mprotect_fixup to vma iterator
# possible first bad commit: [c6fc05242a095b7652e501ae73313730359a4bbb] userfaultfd: use vma iterator
# possible first bad commit: [bdfb333b0b2a025de350a01748be1406801f1f24] ipc/shm: use the vma iterator for munmap calls
# possible first bad commit: [80c8eed1721ee630b2494f14f239d7b3389dac7e] mm: add temporary vma iterator versions of vma_merge(), split_vma(), and __split_vma()
# possible first bad commit: [311129a7971cb4b80038fca4b4ac0c6214dbc46f] mmap: convert vma_expand() to use vma iterator
# possible first bad commit: [a7f83eb601efc719889279bf9981b4b3f23f0084] mmap: change do_mas_munmap and do_mas_aligned_munmap() to use vma iterator

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v2 00/44] VMA tree type safety and remove __vma_adjust()
  2023-01-10 22:51 ` [PATCH v2 00/44] VMA tree type safety and remove __vma_adjust() Mark Brown
@ 2023-01-11  2:22   ` Liam Howlett
  0 siblings, 0 replies; 63+ messages in thread
From: Liam Howlett @ 2023-01-11  2:22 UTC (permalink / raw)
  To: Mark Brown; +Cc: maple-tree, linux-mm, linux-kernel, Andrew Morton

* Mark Brown <broonie@kernel.org> [230110 17:52]:
> On Thu, Jan 05, 2023 at 07:15:44PM +0000, Liam Howlett wrote:
> 
> > This patch set does two things: 1. Clean up, including removal of
> > __vma_adjust() and 2. Extends the VMA iterator API to provide type
> > safety to the VMA operations using the maple tree, as requested by Linus
> > [1].
> 
> This series *appears* to be causing some fun issues in -next for the
> past couple of days or so.  The initial failures were seen by KernelCI
> on several platforms (I've mostly been trying various arm64 things, at
> least 32 bit ARM is also affected).  The intial symptom seen is that a
> go binary called skipgen that gets invoked as part of the testing
> silently faults, tweaking things so that we get as far as running the
> arm64 selftests results in much more useful output with various things
> failing with actual error messages such as:
> 
>   ./fake_sigreturn_bad_magic: error while loading shared libraries: cannot make segment writable for relocation: Cannot allocate memory
>   ./sve-test: error while loading shared libraries: cannot make segment writable for relocation: Cannot allocate memory
> 
> I'm fairly sure we're not actually running out of memory, there's no OOM
> killer activity, the amount of memory the system has appears to make no
> difference and just replacing the kernel with a mainline build runs as
> expected.


Thanks for the detailed analysis.  This series has been dropped from
mm-unstable and, I guess, out of linux-next by tomorrow.

I will retest my series against a larger number of platforms before
sending out the next revision.

> 
> You can see the full run that produced the above errors at:
> 
>    https://lava.sirena.org.uk/scheduler/job/88257
> 
> which also embeds links to all the binaries used, exact commands run and
> so on.  The failing binaries all appear to be execed from within a
> testsuite, though it's not *all* binaries execed from within tests (eg,
> vec-syscfg execs things and seems happy).
> 
> This has taken out a bunch of testsuites in KernelCI (and probably other
> CI systems using test-definitions, though I didn't check).
> 
> I tried to bisect this but otherwise haven't made any effort to look at
> the failure.  The bisect sadly got lost in this series since a lot of
> the series either fails to build with:
> 
> /home/broonie/git/bisect/mm/madvise.c: In function 'madvise_update_vma':
> /home/broonie/git/bisect/mm/madvise.c:165:25: error: implicit declaration of function '__split_vma'; did you mean 'split_vma'? [-Werror=implicit-function-declaration]
>   165 |                 error = __split_vma(mm, vma, start, 1);
>       |                         ^~~~~~~~~~~
>       |                         split_vma

Thanks.  This was reported to me before and I had a fix in mm-unstable.
I'll squash this into the series for v3.

> 
> or fails to boot with something along the lines of:
> 
> <6>[    6.054380] Freeing initrd memory: 86880K
> <1>[    6.087945] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000078
> <1>[    6.088231] Mem abort info:
> <1>[    6.088340]   ESR = 0x0000000096000004
> <1>[    6.088504]   EC = 0x25: DABT (current EL), IL = 32 bits
> <1>[    6.088671]   SET = 0, FnV = 0
> <1>[    6.088802]   EA = 0, S1PTW = 0
> <1>[    6.088929]   FSC = 0x04: level 0 translation fault
> <1>[    6.089099] Data abort info:
> <1>[    6.089210]   ISV = 0, ISS = 0x00000004
> <1>[    6.089347]   CM = 0, WnR = 0
> <1>[    6.089486] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000043e33000
> <1>[    6.089692] [0000000000000078] pgd=0000000000000000, p4d=0000000000000000
> <0>[    6.090566] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
> <4>[    6.090866] Modules linked in:
> <4>[    6.091167] CPU: 0 PID: 42 Comm: modprobe Not tainted 6.2.0-rc1-00190-g505c59767243 #13
> <4>[    6.091478] Hardware name: linux,dummy-virt (DT)
> <4>[    6.091784] pstate: 63400005 (nZCv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
> <4>[    6.092048] pc : mas_wr_walk+0x60/0x2d0
> <4>[    6.092622] lr : mas_wr_store_entry.isra.0+0x80/0x4a0
> <4>[    6.092798] sp : ffff80000821bb10
> <4>[    6.092926] x29: ffff80000821bb10 x28: ffff000003fa4480 x27: 0000000200100073
> <4>[    6.093206] x26: ffff000003fa41b0 x25: ffff000003fa43f0 x24: 0000000000000002
> <4>[    6.093445] x23: 0000000ffffae021 x22: 0000000000000000 x21: ffff000002a74440
> <4>[    6.093685] x20: ffff000003fa4480 x19: ffff80000821bc48 x18: 0000000000000000
> <4>[    6.093933] x17: 0000000000000000 x16: ffff000002b8da00 x15: ffff80000821bc48
> <4>[    6.094169] x14: 0000ffffae022fff x13: ffffffffffffffff x12: ffff000002b8da0c
> <4>[    6.094427] x11: ffff80000821bb68 x10: ffffd75265462458 x9 : ffff80000821bc48
> <4>[    6.094685] x8 : ffff80000821bbb8 x7 : ffff80000821bc48 x6 : ffffffffffffffff
> <4>[    6.094922] x5 : 000000000000000e x4 : 000000000000000e x3 : 0000000000000000
> <4>[    6.095167] x2 : 0000000000000008 x1 : 000000000000000f x0 : ffff80000821bb68
> <4>[    6.095499] Call trace:
> <4>[    6.095685]  mas_wr_walk+0x60/0x2d0
> <4>[    6.095936]  mas_store_prealloc+0x50/0xa0
> <4>[    6.096097]  mmap_region+0x520/0x784
> <4>[    6.096232]  do_mmap+0x3b0/0x52c
> <4>[    6.096347]  vm_mmap_pgoff+0xe4/0x10c
> <4>[    6.096480]  ksys_mmap_pgoff+0x4c/0x204
> <4>[    6.096621]  __arm64_sys_mmap+0x30/0x44
> <4>[    6.096754]  invoke_syscall+0x48/0x114
> <4>[    6.096900]  el0_svc_common.constprop.0+0x44/0xec
> <4>[    6.097052]  do_el0_svc+0x38/0xb0
> <4>[    6.097183]  el0_svc+0x2c/0x84
> <4>[    6.097287]  el0t_64_sync_handler+0xf4/0x120
> <4>[    6.097457]  el0t_64_sync+0x190/0x194
> <0>[    6.097835] Code: 39402021 51000425 92401ca4 12001ca5 (f8647844) 
> <4>[    6.098294] ---[ end trace 0000000000000000 ]---
> 
> (not always exactly the same backtrace, but the mas_wr_walk() was always
> there.)

Thanks.  This was also reported and a fix had landed in mm-unstable as
well.

> 
> The specific set of commits in next-20230110 where bisect got lost was:
> 
> 505c59767243 madvise: use vmi iterator for __split_vma() and vma_merge()
> 1cfdd2a44d6b mmap: pass through vmi iterator to __split_vma()
> 7d718fd9873c sched: convert to vma iterator
> 2f94851ec717 mmap: use vmi version of vma_merge()
> 7e2dd18353a3 task_mmu: convert to vma iterator
> 756841b468f5 mm/mremap: use vmi version of vma_merge()
> aaba4ba837fa mempolicy: convert to vma iterator
> 8193673ee5d8 coredump: convert to vma iterator
> d4f7ebf41a44 mm: switch vma_merge(), split_vma(), and __split_vma to vma iterator
> 4b02758dc3c5 mlock: convert mlock to vma iterator
> fd367dac089e include/linux/mm: declare different type of split_vma() for !CONFIG_MMU
> 3a72a0174748 mm/damon: stop using vma_mas_store() for maple tree store
> dd51a3ca1096 mm: change mprotect_fixup to vma iterator
> b9e4eabb8f40 mmap: convert __vma_adjust() to use vma iterator
> c6fc05242a09 userfaultfd: use vma iterator
> b9000fd4c5a6 mmap-convert-__vma_adjust-to-use-vma-iterator-fix
> bdfb333b0b2a ipc/shm: use the vma iterator for munmap calls
> 3128296746a1 mm: pass through vma iterator to __vma_adjust()
> 80c8eed1721e mm: add temporary vma iterator versions of vma_merge(), split_vma(), and __split_vma()
> 311129a7971c mmap: convert vma_expand() to use vma iterator
> 69e9b6c8a525 madvise: use split_vma() instead of __split_vma()
> 751f0a6713a9 mm: remove unnecessary write to vma iterator in __vma_adjust()
> a7f83eb601ef mmap: change do_mas_munmap and do_mas_aligned_munmap() to use vma iterator
> 39fd6622223e mm: pass vma iterator through to __vma_adjust()
> 

...

I appreciate you running through the bisect and bringing this to my
attention.

I will do a better job of emailing linux-next the fixes, which I
obviously overlooked.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v2 10/44] mmap: Change do_mas_munmap and do_mas_aligned_munmap() to use vma iterator
  2023-01-10 17:26     ` Liam Howlett
@ 2023-01-11  6:55       ` Sven Schnelle
  0 siblings, 0 replies; 63+ messages in thread
From: Sven Schnelle @ 2023-01-11  6:55 UTC (permalink / raw)
  To: Liam Howlett
  Cc: maple-tree, linux-mm, linux-kernel, Andrew Morton, linux-s390

Liam Howlett <liam.howlett@oracle.com> writes:

> * Sven Schnelle <svens@linux.ibm.com> [230110 09:54]:
>> Liam Howlett <liam.howlett@oracle.com> writes:
>> 
>> > From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>
>> >
>> > Start passing the vma iterator through the mm code.  This will allow for
>> > reuse of the state and cleaner invalidation if necessary.
>> >
>> > Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
>> > ---
>> >  include/linux/mm.h |  2 +-
>> >  mm/mmap.c          | 77 +++++++++++++++++++++-------------------------
>> >  mm/mremap.c        |  6 ++--
>> >  3 files changed, 39 insertions(+), 46 deletions(-)
>> >
>> 
>> Starting with this patch i see the following oops on s390:
>> [..]
>> This happens on every boot, always killing the init process. The oops
>> doesn't happen with next-20230110. With next-20230110 i see shmat
>> testcase failures in ltp (shmat returning with -EINVAL because
>> find_vma_intersection() tells shmat that there's already a mapping
>> present).
>> 
>> Trying to bisect that i stumbled above the oops above. Any ideas before
>> i start trying to understand the patch?
>
> Yes, try the patch for fixing the invalidated state I sent out yesterday
> [1].  This should come before ("mm: expand vma iterator interface").
>
> 1. https://lore.kernel.org/linux-mm/20230109165455.647400-1-Liam.Howlett@oracle.com/

Thanks, missed that. I can report that the crash i've seen seems to be
fixed. Also the shmat01 testcase in ltp is working now.

^ permalink raw reply	[flat|nested] 63+ messages in thread

end of thread, other threads:[~2023-01-11  6:56 UTC | newest]

Thread overview: 63+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-05 19:15 [PATCH v2 00/44] VMA tree type safety and remove __vma_adjust() Liam Howlett
2023-01-05 19:15 ` [PATCH v2 01/44] maple_tree: Add mas_init() function Liam Howlett
2023-01-05 19:15 ` [PATCH v2 02/44] maple_tree: Fix potential rcu issue Liam Howlett
2023-01-05 19:15 ` [PATCH v2 03/44] maple_tree: Reduce user error potential Liam Howlett
2023-01-05 19:15 ` [PATCH v2 05/44] mm: Expand vma iterator interface Liam Howlett
2023-01-05 19:15 ` [PATCH v2 04/44] test_maple_tree: Test modifications while iterating Liam Howlett
2023-01-05 19:15 ` [PATCH v2 06/44] mm/mmap: convert brk to use vma iterator Liam Howlett
2023-01-09 15:10   ` Vernon Yang
2023-01-09 16:38     ` Liam Howlett
2023-01-05 19:15 ` [PATCH v2 08/44] mmap: Convert vma_link() " Liam Howlett
2023-01-05 19:15 ` [PATCH v2 07/44] kernel/fork: Convert forking to using the vmi iterator Liam Howlett
2023-01-05 19:15 ` [PATCH v2 11/44] mmap: Convert vma_expand() to use vma iterator Liam Howlett
2023-01-05 19:15 ` [PATCH v2 10/44] mmap: Change do_mas_munmap and do_mas_aligned_munmap() " Liam Howlett
2023-01-10 14:53   ` Sven Schnelle
2023-01-10 17:26     ` Liam Howlett
2023-01-11  6:55       ` Sven Schnelle
2023-01-05 19:15 ` [PATCH v2 09/44] mm/mmap: Remove preallocation from do_mas_align_munmap() Liam Howlett
2023-01-05 19:15 ` [PATCH v2 15/44] mm: Change mprotect_fixup to vma iterator Liam Howlett
2023-01-05 19:15 ` [PATCH v2 12/44] mm: Add temporary vma iterator versions of vma_merge(), split_vma(), and __split_vma() Liam Howlett
2023-01-05 19:15 ` [PATCH v2 14/44] userfaultfd: Use vma iterator Liam Howlett
2023-01-05 19:15 ` [PATCH v2 13/44] ipc/shm: Use the vma iterator for munmap calls Liam Howlett
2023-01-05 19:15 ` [PATCH v2 16/44] mlock: Convert mlock to vma iterator Liam Howlett
2023-01-05 19:15 ` [PATCH v2 17/44] coredump: Convert " Liam Howlett
2023-01-05 19:15 ` [PATCH v2 18/44] mempolicy: " Liam Howlett
2023-01-05 19:15 ` [PATCH v2 22/44] mmap: Pass through vmi iterator to __split_vma() Liam Howlett
2023-01-07  2:01   ` SeongJae Park
2023-01-07  2:39     ` SeongJae Park
2023-01-09 16:45       ` Liam Howlett
2023-01-09 19:28         ` SeongJae Park
2023-01-09 20:30           ` Liam Howlett
2023-01-09 23:07             ` SeongJae Park
2023-01-05 19:15 ` [PATCH v2 19/44] task_mmu: Convert to vma iterator Liam Howlett
2023-01-05 19:15 ` [PATCH v2 21/44] madvise: Use vmi iterator for __split_vma() and vma_merge() Liam Howlett
2023-01-05 19:15 ` [PATCH v2 20/44] sched: Convert to vma iterator Liam Howlett
2023-01-05 19:15 ` [PATCH v2 23/44] mmap: Use vmi version of vma_merge() Liam Howlett
2023-01-05 19:15 ` [PATCH v2 25/44] mm: Switch vma_merge(), split_vma(), and __split_vma to vma iterator Liam Howlett
2023-01-06 17:23   ` SeongJae Park
2023-01-06 19:20     ` Liam Howlett
2023-01-05 19:15 ` [PATCH v2 24/44] mm/mremap: Use vmi version of vma_merge() Liam Howlett
2023-01-05 19:16 ` [PATCH v2 26/44] mm/damon: Stop using vma_mas_store() for maple tree store Liam Howlett
2023-01-05 19:32   ` SeongJae Park
2023-01-05 19:52     ` Liam Howlett
2023-01-05 20:16       ` SeongJae Park
2023-01-05 19:16 ` [PATCH v2 27/44] mmap: Convert __vma_adjust() to use vma iterator Liam Howlett
2023-01-05 19:16 ` [PATCH v2 28/44] mm: Pass through vma iterator to __vma_adjust() Liam Howlett
2023-01-05 19:16 ` [PATCH v2 29/44] madvise: Use split_vma() instead of __split_vma() Liam Howlett
2023-01-05 19:16 ` [PATCH v2 30/44] mm: Remove unnecessary write to vma iterator in __vma_adjust() Liam Howlett
2023-01-05 19:16 ` [PATCH v2 31/44] mm: Pass vma iterator through to __vma_adjust() Liam Howlett
2023-01-05 19:16 ` [PATCH v2 32/44] mm: Add vma iterator to vma_adjust() arguments Liam Howlett
2023-01-05 19:16 ` [PATCH v2 34/44] mm: Change munmap splitting order and move_vma() Liam Howlett
2023-01-05 19:16 ` [PATCH v2 33/44] mmap: Clean up mmap_region() unrolling Liam Howlett
2023-01-05 19:16 ` [PATCH v2 35/44] mm/mmap: move anon_vma setting in __vma_adjust() Liam Howlett
2023-01-05 19:16 ` [PATCH v2 36/44] mm/mmap: Refactor locking out of __vma_adjust() Liam Howlett
2023-01-05 19:16 ` [PATCH v2 37/44] mm/mmap: Use vma_prepare() and vma_complete() in vma_expand() Liam Howlett
2023-01-05 19:16 ` [PATCH v2 39/44] mm: Don't use __vma_adjust() in __split_vma() Liam Howlett
2023-01-05 19:16 ` [PATCH v2 38/44] mm/mmap: Introduce init_vma_prep() and init_multi_vma_prep() Liam Howlett
2023-01-05 19:16 ` [PATCH v2 40/44] mm/mmap: Don't use __vma_adjust() in shift_arg_pages() Liam Howlett
2023-01-05 19:16 ` [PATCH v2 44/44] vma_merge: Set vma iterator to correct position Liam Howlett
2023-01-05 19:16 ` [PATCH v2 42/44] mm/mmap: Convert do_brk_flags() to use vma_prepare() and vma_complete() Liam Howlett
2023-01-05 19:16 ` [PATCH v2 43/44] mm/mmap: Remove __vma_adjust() Liam Howlett
2023-01-05 19:16 ` [PATCH v2 41/44] mm/mmap: Introduce dup_vma_anon() helper Liam Howlett
2023-01-10 22:51 ` [PATCH v2 00/44] VMA tree type safety and remove __vma_adjust() Mark Brown
2023-01-11  2:22   ` Liam Howlett

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.