linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v2 0/4] Removing limitations of merging anonymous VMAs
@ 2022-03-11 17:45 Jakub Matěna
  2022-03-11 17:45 ` [RFC PATCH v2 1/4] [PATCH 1/4] mm: refactor of vma_merge() Jakub Matěna
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: Jakub Matěna @ 2022-03-11 17:45 UTC (permalink / raw)
  To: linux-mm
  Cc: patches, linux-kernel, vbabka, mhocko, mgorman, willy,
	liam.howlett, hughd, kirill, riel, rostedt, peterz,
	Jakub Matěna

Motivation
In the current kernel it is impossible to merge two anonymous VMAs
if one of them was moved. That is because VMA's page offset is
set according to the virtual address where it was created and in
order to merge two VMAs page offsets need to follow up.
Another problem when merging two VMA's is their anon_vma. In
current kernel these anon_vmas have to be the one and the same.
Otherwise merge is again not allowed.
There are several places from which vma_merge() is called and therefore
several use cases that might profit from this upgrade. These include
mmap (that fills a hole between two VMAs), mremap (that moves VMA next
to another one or again perfectly fills a hole), mprotect (that modifies
protection and allows merging with a neighbor) and brk (that expands VMA
so that it is adjacent to a neighbor).
Missed merge opportunities increase the number of VMAs of a process
and in some cases can cause problems when a max count is reached.

Solution
Following series of these patches solves the first problem with
page offsets by updating them when the VMA is moved to a
different virtual address (patch 2). As for the second
problem merging of VMAs with different anon_vma is allowed
(patch 3). Patch 1 refactors function vma_merge and
makes it easier to understand and also allows relatively
seamless tracing of successful merges introduced by the patch 4.

Limitations
For both problems solution works only for VMAs that do not share
physical pages with other processes (usually child or parent
processes). This is checked by looking at anon_vma of the respective
VMA. The reason why it is not possible or at least not easy to
accomplish is that each physical page has a pointer to anon_vma and
page offset. And when this physical page is shared we cannot simply
change these parameters without affecting all of the VMAs mapping
this physical page. Good thing is that this case amounts only for
about 1-3% of all merges (measured on jemalloc (0%), redis (2.7%) and
kcbench (1.2%) tests) that fail to merge in the current kernel.
Measuring also shows slight increase in running time, jemalloc (0.3%),
redis (1%), kcbench (1%). More extensive data can be viewed at
https://home.alabanda.cz/share/results.png

This series of patches and documentation of the related code will
be part of my master's thesis.
This patch series is based on tag v5.17-rc4. This is a second version
including minor changes that arose from the first RFC like formatting.
Speed and failed merge percentage data are also included.

Jakub Matěna (4):
  mm: refactor of vma_merge()
  mm: adjust page offset in mremap
  mm: enable merging of VMAs with different anon_vmas
  mm: add tracing for VMA merges

 include/linux/rmap.h        |  17 ++-
 include/trace/events/mmap.h |  83 +++++++++++++++
 mm/internal.h               |  12 +++
 mm/mmap.c                   | 206 ++++++++++++++++++++++++------------
 mm/rmap.c                   |  77 ++++++++++++++
 5 files changed, 325 insertions(+), 70 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [RFC PATCH v2 1/4] [PATCH 1/4] mm: refactor of vma_merge()
  2022-03-11 17:45 [RFC PATCH v2 0/4] Removing limitations of merging anonymous VMAs Jakub Matěna
@ 2022-03-11 17:45 ` Jakub Matěna
  2022-03-17 18:53   ` Vlastimil Babka
  2022-03-11 17:46 ` [RFC PATCH v2 2/4] [PATCH 2/4] mm: adjust page offset in mremap Jakub Matěna
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 6+ messages in thread
From: Jakub Matěna @ 2022-03-11 17:45 UTC (permalink / raw)
  To: linux-mm
  Cc: patches, linux-kernel, vbabka, mhocko, mgorman, willy,
	liam.howlett, hughd, kirill, riel, rostedt, peterz,
	Jakub Matěna

Refactor vma_merge() to make it shorter, more understandable and
suitable for tracing of successful merges that are made possible by
following patches in the series. Main change is the elimination of code
duplicity in the case of merge next check. This is done by first doing
checks and caching the results before executing the merge itself. Exit
paths are also unified.

Signed-off-by: Jakub Matěna <matenajakub@gmail.com>
---
 mm/mmap.c | 81 +++++++++++++++++++++++++------------------------------
 1 file changed, 36 insertions(+), 45 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index 1e8fdb0b51ed..8d817b11c656 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1171,7 +1171,9 @@ struct vm_area_struct *vma_merge(struct mm_struct *mm,
 {
 	pgoff_t pglen = (end - addr) >> PAGE_SHIFT;
 	struct vm_area_struct *area, *next;
-	int err;
+	int err = -1;
+	bool merge_prev = false;
+	bool merge_next = false;
 
 	/*
 	 * We later require that vma->vm_flags == vm_flags,
@@ -1190,66 +1192,55 @@ struct vm_area_struct *vma_merge(struct mm_struct *mm,
 	VM_WARN_ON(area && end > area->vm_end);
 	VM_WARN_ON(addr >= end);
 
-	/*
-	 * Can it merge with the predecessor?
-	 */
+	/* Can we merge the predecessor? */
 	if (prev && prev->vm_end == addr &&
 			mpol_equal(vma_policy(prev), policy) &&
 			can_vma_merge_after(prev, vm_flags,
 					    anon_vma, file, pgoff,
 					    vm_userfaultfd_ctx, anon_name)) {
-		/*
-		 * OK, it can.  Can we now merge in the successor as well?
-		 */
-		if (next && end == next->vm_start &&
-				mpol_equal(policy, vma_policy(next)) &&
-				can_vma_merge_before(next, vm_flags,
-						     anon_vma, file,
-						     pgoff+pglen,
-						     vm_userfaultfd_ctx, anon_name) &&
-				is_mergeable_anon_vma(prev->anon_vma,
-						      next->anon_vma, NULL)) {
-							/* cases 1, 6 */
-			err = __vma_adjust(prev, prev->vm_start,
-					 next->vm_end, prev->vm_pgoff, NULL,
-					 prev);
-		} else					/* cases 2, 5, 7 */
-			err = __vma_adjust(prev, prev->vm_start,
-					 end, prev->vm_pgoff, NULL, prev);
-		if (err)
-			return NULL;
-		khugepaged_enter_vma_merge(prev, vm_flags);
-		return prev;
+		merge_prev = true;
+		area = prev;
 	}
-
-	/*
-	 * Can this new request be merged in front of next?
-	 */
+	/* Can we merge the successor? */
 	if (next && end == next->vm_start &&
 			mpol_equal(policy, vma_policy(next)) &&
 			can_vma_merge_before(next, vm_flags,
 					     anon_vma, file, pgoff+pglen,
 					     vm_userfaultfd_ctx, anon_name)) {
+		merge_next = true;
+	}
+	/* Can we merge both the predecessor and the successor? */
+	if (merge_prev && merge_next &&
+			is_mergeable_anon_vma(prev->anon_vma,
+				next->anon_vma, NULL)) {	 /* cases 1, 6 */
+		err = __vma_adjust(prev, prev->vm_start,
+					next->vm_end, prev->vm_pgoff, NULL,
+					prev);
+	} else if (merge_prev) {			/* cases 2, 5, 7 */
+		err = __vma_adjust(prev, prev->vm_start,
+					end, prev->vm_pgoff, NULL, prev);
+	} else if (merge_next) {
 		if (prev && addr < prev->vm_end)	/* case 4 */
 			err = __vma_adjust(prev, prev->vm_start,
-					 addr, prev->vm_pgoff, NULL, next);
-		else {					/* cases 3, 8 */
+					addr, prev->vm_pgoff, NULL, next);
+		else					/* cases 3, 8 */
 			err = __vma_adjust(area, addr, next->vm_end,
-					 next->vm_pgoff - pglen, NULL, next);
-			/*
-			 * In case 3 area is already equal to next and
-			 * this is a noop, but in case 8 "area" has
-			 * been removed and next was expanded over it.
-			 */
-			area = next;
-		}
-		if (err)
-			return NULL;
-		khugepaged_enter_vma_merge(area, vm_flags);
-		return area;
+					next->vm_pgoff - pglen, NULL, next);
+		/*
+		 * In case 3 and 4 area is already equal to next and
+		 * this is a noop, but in case 8 "area" has
+		 * been removed and next was expanded over it.
+		 */
+		area = next;
 	}
 
-	return NULL;
+	/*
+	 * Cannot merge with predecessor or successor or error in __vma_adjust?
+	 */
+	if (err)
+		return NULL;
+	khugepaged_enter_vma_merge(area, vm_flags);
+	return area;
 }
 
 /*
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [RFC PATCH v2 2/4] [PATCH 2/4] mm: adjust page offset in mremap
  2022-03-11 17:45 [RFC PATCH v2 0/4] Removing limitations of merging anonymous VMAs Jakub Matěna
  2022-03-11 17:45 ` [RFC PATCH v2 1/4] [PATCH 1/4] mm: refactor of vma_merge() Jakub Matěna
@ 2022-03-11 17:46 ` Jakub Matěna
  2022-03-11 17:46 ` [RFC PATCH v2 3/4] [PATCH 3/4] mm: enable merging of VMAs with different anon_vmas Jakub Matěna
  2022-03-11 17:46 ` [RFC PATCH v2 4/4] [PATCH 4/4] mm: add tracing for VMA merges Jakub Matěna
  3 siblings, 0 replies; 6+ messages in thread
From: Jakub Matěna @ 2022-03-11 17:46 UTC (permalink / raw)
  To: linux-mm
  Cc: patches, linux-kernel, vbabka, mhocko, mgorman, willy,
	liam.howlett, hughd, kirill, riel, rostedt, peterz,
	Jakub Matěna

Adjust page offset of a VMA when it's moved to a new location by mremap.
This is made possible for all VMAs that do not share their anonymous
pages with other processes. Previously this was possible only for not
yet faulted VMAs.
When the page offset does not correspond to the virtual address
of the anonymous VMA any merge attempt with another VMA will fail.

Signed-off-by: Jakub Matěna <matenajakub@gmail.com>
---
 mm/mmap.c | 69 ++++++++++++++++++++++++++++++++++++++++++++++++++-----
 mm/rmap.c | 37 +++++++++++++++++++++++++++++
 2 files changed, 100 insertions(+), 6 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index 8d817b11c656..4f9c6ca7ff4e 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -3218,6 +3218,59 @@ int insert_vm_struct(struct mm_struct *mm, struct vm_area_struct *vma)
 	return 0;
 }
 
+/**
+ * update_faulted_pgoff() - Update faulted pages of a vma
+ * @vma: VMA being moved
+ * @addr: new virtual address
+ * @pgoff: pointer to pgoff which is updated
+ * If the vma and its pages are not shared with another process, update
+ * the new pgoff and also update index parameter (copy of the pgoff) in
+ * all faulted pages.
+ */
+bool update_faulted_pgoff(struct vm_area_struct *vma, unsigned long addr, pgoff_t *pgoff)
+{
+	unsigned long pg_iter = 0;
+	unsigned long pg_iters = (vma->vm_end - vma->vm_start) >> PAGE_SHIFT;
+
+	/* Check vma is not shared with other processes */
+	if (vma->anon_vma->root != vma->anon_vma || !rbt_no_children(vma->anon_vma))
+		return false;
+
+	/* Check all pages are not shared */
+	for (; pg_iter < pg_iters; ++pg_iter) {
+		bool pages_not_shared = true;
+		unsigned long shift = pg_iter << PAGE_SHIFT;
+		struct page *phys_page = follow_page(vma, vma->vm_start + shift, FOLL_GET);
+
+		if (phys_page == NULL)
+			continue;
+
+		/* Check page is not shared with other processes */
+		if (page_mapcount(phys_page) > 1)
+			pages_not_shared = false;
+		put_page(phys_page);
+		if (!pages_not_shared)
+			return false;
+	}
+
+	/* Update index in all pages to this new pgoff */
+	pg_iter = 0;
+	*pgoff = addr >> PAGE_SHIFT;
+
+	for (; pg_iter < pg_iters; ++pg_iter) {
+		unsigned long shift = pg_iter << PAGE_SHIFT;
+		struct page *phys_page = follow_page(vma, vma->vm_start + shift, FOLL_GET);
+
+		if (phys_page == NULL)
+			continue;
+		lock_page(phys_page);
+		phys_page->index = *pgoff + pg_iter;
+		unlock_page(phys_page);
+		put_page(phys_page);
+	}
+	return true;
+}
+
 /*
  * Copy the vma structure to a new location in the same mm,
  * prior to moving page table entries, to effect an mremap move.
@@ -3231,15 +3284,19 @@ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap,
 	struct mm_struct *mm = vma->vm_mm;
 	struct vm_area_struct *new_vma, *prev;
 	struct rb_node **rb_link, *rb_parent;
-	bool faulted_in_anon_vma = true;
+	bool anon_pgoff_updated = false;
 
 	/*
-	 * If anonymous vma has not yet been faulted, update new pgoff
+	 * Try to update new pgoff for anonymous vma
 	 * to match new location, to increase its chance of merging.
 	 */
-	if (unlikely(vma_is_anonymous(vma) && !vma->anon_vma)) {
-		pgoff = addr >> PAGE_SHIFT;
-		faulted_in_anon_vma = false;
+	if (unlikely(vma_is_anonymous(vma))) {
+		if (!vma->anon_vma) {
+			pgoff = addr >> PAGE_SHIFT;
+			anon_pgoff_updated = true;
+		} else {
+			anon_pgoff_updated = update_faulted_pgoff(vma, addr, &pgoff);
+		}
 	}
 
 	if (find_vma_links(mm, addr, addr + len, &prev, &rb_link, &rb_parent))
@@ -3265,7 +3322,7 @@ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap,
 			 * safe. It is only safe to keep the vm_pgoff
 			 * linear if there are no pages mapped yet.
 			 */
-			VM_BUG_ON_VMA(faulted_in_anon_vma, new_vma);
+			VM_BUG_ON_VMA(!anon_pgoff_updated, new_vma);
 			*vmap = vma = new_vma;
 		}
 		*need_rmap_locks = (new_vma->vm_pgoff <= vma->vm_pgoff);
diff --git a/mm/rmap.c b/mm/rmap.c
index 6a1e8c7f6213..96273d6a9796 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -387,6 +387,43 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
 	return -ENOMEM;
 }
 
+/*
+ * Used by rbt_no_children to check node subtree.
+ * Check if none of the VMAs connected to the node subtree via
+ * anon_vma_chain are in child relationship to the given anon_vma.
+ */
+bool rbst_no_children(struct anon_vma *av, struct rb_node *node)
+{
+	struct anon_vma_chain *model;
+	struct anon_vma_chain *avc;
+
+	if (node == NULL) /* leaf node */
+		return true;
+	avc = container_of(node, typeof(*(model)), rb);
+	if (avc->vma->anon_vma != av)
+		/*
+		 * Inequality implies avc belongs
+		 * to a VMA of a child process
+		 */
+		return false;
+	return (rbst_no_children(av, node->rb_left) &&
+	rbst_no_children(av, node->rb_right));
+}
+
+/*
+ * Check if none of the VMAs connected to the given
+ * anon_vma via anon_vma_chain are in child relationship
+ */
+bool rbt_no_children(struct anon_vma *av)
+{
+	struct rb_node *root_node;
+
+	if (av == NULL || av->degree <= 1) /* Higher degree might not necessarily imply children */
+		return true;
+	root_node = av->rb_root.rb_root.rb_node;
+	return rbst_no_children(av, root_node);
+}
+
 void unlink_anon_vmas(struct vm_area_struct *vma)
 {
 	struct anon_vma_chain *avc, *next;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [RFC PATCH v2 3/4] [PATCH 3/4] mm: enable merging of VMAs with different anon_vmas
  2022-03-11 17:45 [RFC PATCH v2 0/4] Removing limitations of merging anonymous VMAs Jakub Matěna
  2022-03-11 17:45 ` [RFC PATCH v2 1/4] [PATCH 1/4] mm: refactor of vma_merge() Jakub Matěna
  2022-03-11 17:46 ` [RFC PATCH v2 2/4] [PATCH 2/4] mm: adjust page offset in mremap Jakub Matěna
@ 2022-03-11 17:46 ` Jakub Matěna
  2022-03-11 17:46 ` [RFC PATCH v2 4/4] [PATCH 4/4] mm: add tracing for VMA merges Jakub Matěna
  3 siblings, 0 replies; 6+ messages in thread
From: Jakub Matěna @ 2022-03-11 17:46 UTC (permalink / raw)
  To: linux-mm
  Cc: patches, linux-kernel, vbabka, mhocko, mgorman, willy,
	liam.howlett, hughd, kirill, riel, rostedt, peterz,
	Jakub Matěna

Enable merging of a VMA even when it is linked to different
anon_vma than the one it is being merged to, but only if the VMA
in question does not share any page with a parent or child process.
Every anonymous page stores a pointer to its anon_vma in the parameter
mapping, which is now updated as part of the merge process.

Signed-off-by: Jakub Matěna <matenajakub@gmail.com>
---
 include/linux/rmap.h | 17 ++++++++++++++++-
 mm/mmap.c            | 15 ++++++++++++++-
 mm/rmap.c            | 40 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 70 insertions(+), 2 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index e704b1a4c06c..c8508a4ebc46 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -137,10 +137,13 @@ static inline void anon_vma_unlock_read(struct anon_vma *anon_vma)
  */
 void anon_vma_init(void);	/* create anon_vma_cachep */
 int  __anon_vma_prepare(struct vm_area_struct *);
+void reconnect_pages(struct vm_area_struct *vma, struct vm_area_struct *next);
 void unlink_anon_vmas(struct vm_area_struct *);
 int anon_vma_clone(struct vm_area_struct *, struct vm_area_struct *);
 int anon_vma_fork(struct vm_area_struct *, struct vm_area_struct *);
 
+bool rbt_no_children(struct anon_vma *av);
+
 static inline int anon_vma_prepare(struct vm_area_struct *vma)
 {
 	if (likely(vma->anon_vma))
@@ -149,10 +152,22 @@ static inline int anon_vma_prepare(struct vm_area_struct *vma)
 	return __anon_vma_prepare(vma);
 }
 
+/**
+ * anon_vma_merge() - Merge anon_vmas of the given VMAs
+ * @vma: VMA being merged to
+ * @next: VMA being merged
+ */
 static inline void anon_vma_merge(struct vm_area_struct *vma,
 				  struct vm_area_struct *next)
 {
-	VM_BUG_ON_VMA(vma->anon_vma != next->anon_vma, vma);
+	struct anon_vma *anon_vma1 = vma->anon_vma;
+	struct anon_vma *anon_vma2 = next->anon_vma;
+
+	VM_BUG_ON_VMA(anon_vma1 && anon_vma2 && anon_vma1 != anon_vma2 &&
+			((anon_vma2 != anon_vma2->root)
+			|| !rbt_no_children(anon_vma2)), vma);
+
+	reconnect_pages(vma, next);
 	unlink_anon_vmas(next);
 }
 
diff --git a/mm/mmap.c b/mm/mmap.c
index 4f9c6ca7ff4e..ccb24862e670 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1065,7 +1065,20 @@ static inline int is_mergeable_anon_vma(struct anon_vma *anon_vma1,
 	if ((!anon_vma1 || !anon_vma2) && (!vma ||
 		list_is_singular(&vma->anon_vma_chain)))
 		return 1;
-	return anon_vma1 == anon_vma2;
+	if (anon_vma1 == anon_vma2)
+		return 1;
+	/*
+	 * Different anon_vma but not shared by several processes
+	 */
+	else if ((anon_vma1 && anon_vma2) &&
+			(anon_vma1 == anon_vma1->root)
+			&& (rbt_no_children(anon_vma1)))
+		return 1;
+	/*
+	 * Different anon_vma and shared -> unmergeable
+	 */
+	else
+		return 0;
 }
 
 /*
diff --git a/mm/rmap.c b/mm/rmap.c
index 96273d6a9796..b296e1e1aec3 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -387,6 +387,46 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
 	return -ENOMEM;
 }
 
+/**
+ * reconnect_pages() - Reconnect physical pages from old to vma
+ * @vma: VMA to newly contain all physical pages of old
+ * @old: old VMA being merged to vma
+ */
+void reconnect_pages(struct vm_area_struct *vma, struct vm_area_struct *old)
+{
+	struct anon_vma *anon_vma1 = vma->anon_vma;
+	struct anon_vma *anon_vma2 = old->anon_vma;
+	unsigned long pg_iter;
+	int pg_iters;
+
+	if (anon_vma1 == anon_vma2 || anon_vma1 == NULL || anon_vma2 == NULL)
+		return; /* Nothing to do */
+
+	/* Modify page->mapping for all pages in old */
+	pg_iter = 0;
+	pg_iters = (old->vm_end - old->vm_start) >> PAGE_SHIFT;
+
+	for (; pg_iter < pg_iters; ++pg_iter) {
+		/* Get the physical page */
+		unsigned long shift = pg_iter << PAGE_SHIFT;
+		struct page *phys_page = follow_page(old, old->vm_start + shift, FOLL_GET);
+		struct anon_vma *page_anon_vma;
+
+		/* Do some checks and lock the page */
+		if (phys_page == NULL)
+			continue; /* Virtual memory page is not mapped */
+		lock_page(phys_page);
+		page_anon_vma = page_get_anon_vma(phys_page);
+		if (page_anon_vma != NULL) { /* NULL in case of ZERO_PAGE */
+			VM_BUG_ON_VMA(page_anon_vma != old->anon_vma, old);
+			/* Update physical page's mapping */
+			page_move_anon_rmap(phys_page, vma);
+		}
+		unlock_page(phys_page);
+		put_page(phys_page);
+	}
+}
+
 /*
  * Used by rbt_no_children to check node subtree.
  * Check if none of the VMAs connected to the node subtree via
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [RFC PATCH v2 4/4] [PATCH 4/4] mm: add tracing for VMA merges
  2022-03-11 17:45 [RFC PATCH v2 0/4] Removing limitations of merging anonymous VMAs Jakub Matěna
                   ` (2 preceding siblings ...)
  2022-03-11 17:46 ` [RFC PATCH v2 3/4] [PATCH 3/4] mm: enable merging of VMAs with different anon_vmas Jakub Matěna
@ 2022-03-11 17:46 ` Jakub Matěna
  3 siblings, 0 replies; 6+ messages in thread
From: Jakub Matěna @ 2022-03-11 17:46 UTC (permalink / raw)
  To: linux-mm
  Cc: patches, linux-kernel, vbabka, mhocko, mgorman, willy,
	liam.howlett, hughd, kirill, riel, rostedt, peterz,
	Jakub Matěna

Adds trace support for vma_merge to measure successful and unsuccessful
merges of two VMAs with distinct anon_vmas and also trace support for
merges made possible by update of page offset made possible by a previous
patch in this series.

Signed-off-by: Jakub Matěna <matenajakub@gmail.com>
---
 include/trace/events/mmap.h | 83 +++++++++++++++++++++++++++++++++++++
 mm/internal.h               | 12 ++++++
 mm/mmap.c                   | 69 ++++++++++++++++--------------
 3 files changed, 133 insertions(+), 31 deletions(-)

diff --git a/include/trace/events/mmap.h b/include/trace/events/mmap.h
index 4661f7ba07c0..bad7abe4899c 100644
--- a/include/trace/events/mmap.h
+++ b/include/trace/events/mmap.h
@@ -6,6 +6,27 @@
 #define _TRACE_MMAP_H
 
 #include <linux/tracepoint.h>
+#include <../mm/internal.h>
+
+#define AV_MERGE_TYPES		\
+	EM(MERGE_FAILED)	\
+	EM(AV_MERGE_FAILED)	\
+	EM(AV_MERGE_NULL)	\
+	EM(AV_MERGE_SAME)	\
+	EMe(AV_MERGE_DIFFERENT)
+
+#undef EM
+#undef EMe
+#define EM(a)	TRACE_DEFINE_ENUM(a);
+#define EMe(a)	TRACE_DEFINE_ENUM(a);
+
+AV_MERGE_TYPES
+
+#undef EM
+#undef EMe
+
+#define EM(a)   { a, #a },
+#define EMe(a)  { a, #a }
 
 TRACE_EVENT(vm_unmapped_area,
 
@@ -42,6 +63,68 @@ TRACE_EVENT(vm_unmapped_area,
 		__entry->low_limit, __entry->high_limit, __entry->align_mask,
 		__entry->align_offset)
 );
+
+TRACE_EVENT(vm_av_merge,
+
+	TP_PROTO(int merged, enum vma_merge_res merge_prev,
+			enum vma_merge_res merge_next, enum vma_merge_res merge_both),
+
+	TP_ARGS(merged, merge_prev, merge_next, merge_both),
+
+	TP_STRUCT__entry(
+		__field(int,			merged)
+		__field(enum vma_merge_res,	predecessor_different_av)
+		__field(enum vma_merge_res,	successor_different_av)
+		__field(enum vma_merge_res,	predecessor_with_successor_different_av)
+		__field(int,			same_count)
+		__field(int,			diff_count)
+		__field(int,			failed_count)
+	),
+
+	TP_fast_assign(
+		__entry->merged = merged == 0;
+		__entry->predecessor_different_av = merge_prev;
+		__entry->successor_different_av = merge_next;
+		__entry->predecessor_with_successor_different_av = merge_both;
+		__entry->same_count = (merge_prev == AV_MERGE_SAME) +
+			(merge_next == AV_MERGE_SAME) +
+			(merge_both == AV_MERGE_SAME);
+		__entry->diff_count = (merge_prev == AV_MERGE_DIFFERENT) +
+			(merge_next == AV_MERGE_DIFFERENT) +
+			(merge_both == AV_MERGE_DIFFERENT);
+		__entry->failed_count = (merge_prev == AV_MERGE_FAILED) +
+			(merge_next == AV_MERGE_FAILED) +
+			(merge_both == AV_MERGE_FAILED);
+	),
+
+	TP_printk("merged=%d predecessor=%s successor=%s predecessor_with_successor=%s same_count=%d diff_count=%d failed_count=%d",
+		__entry->merged,
+		__print_symbolic(__entry->predecessor_different_av, AV_MERGE_TYPES),
+		__print_symbolic(__entry->successor_different_av, AV_MERGE_TYPES),
+		__print_symbolic(__entry->predecessor_with_successor_different_av, AV_MERGE_TYPES),
+		__entry->same_count, __entry->diff_count, __entry->failed_count)
+
+);
+
+TRACE_EVENT(vm_pgoff_merge,
+
+	TP_PROTO(struct vm_area_struct *vma, bool anon_pgoff_updated),
+
+	TP_ARGS(vma, anon_pgoff_updated),
+
+	TP_STRUCT__entry(
+		__field(bool,	faulted)
+		__field(bool,	updated)
+	),
+
+	TP_fast_assign(
+		__entry->faulted = vma->anon_vma;
+		__entry->updated = anon_pgoff_updated;
+	),
+
+	TP_printk("faulted=%d updated=%d\n",
+		__entry->faulted, __entry->updated)
+);
 #endif
 
 /* This part must be outside protection */
diff --git a/mm/internal.h b/mm/internal.h
index d80300392a19..860169612192 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -34,6 +34,18 @@ struct folio_batch;
 /* Do not use these with a slab allocator */
 #define GFP_SLAB_BUG_MASK (__GFP_DMA32|__GFP_HIGHMEM|~__GFP_BITS_MASK)
 
+/*
+ * Following values indicate reason for merge success or failure.
+ */
+enum vma_merge_res {
+	MERGE_FAILED,
+	AV_MERGE_FAILED,
+	AV_MERGE_NULL,
+	MERGE_OK = AV_MERGE_NULL,
+	AV_MERGE_SAME,
+	AV_MERGE_DIFFERENT,
+};
+
 void page_writeback_init(void);
 
 static inline void *folio_raw_mapping(struct folio *folio)
diff --git a/mm/mmap.c b/mm/mmap.c
index ccb24862e670..663f8ec46f2c 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1064,21 +1064,21 @@ static inline int is_mergeable_anon_vma(struct anon_vma *anon_vma1,
 	 */
 	if ((!anon_vma1 || !anon_vma2) && (!vma ||
 		list_is_singular(&vma->anon_vma_chain)))
-		return 1;
+		return AV_MERGE_NULL;
 	if (anon_vma1 == anon_vma2)
-		return 1;
+		return AV_MERGE_SAME;
 	/*
 	 * Different anon_vma but not shared by several processes
 	 */
 	else if ((anon_vma1 && anon_vma2) &&
 			(anon_vma1 == anon_vma1->root)
 			&& (rbt_no_children(anon_vma1)))
-		return 1;
+		return AV_MERGE_DIFFERENT;
 	/*
 	 * Different anon_vma and shared -> unmergeable
 	 */
 	else
-		return 0;
+		return AV_MERGE_FAILED;
 }
 
 /*
@@ -1099,12 +1099,10 @@ can_vma_merge_before(struct vm_area_struct *vma, unsigned long vm_flags,
 		     struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
 		     const char *anon_name)
 {
-	if (is_mergeable_vma(vma, file, vm_flags, vm_userfaultfd_ctx, anon_name) &&
-	    is_mergeable_anon_vma(anon_vma, vma->anon_vma, vma)) {
+	if (is_mergeable_vma(vma, file, vm_flags, vm_userfaultfd_ctx, anon_name))
 		if (vma->vm_pgoff == vm_pgoff)
-			return 1;
-	}
-	return 0;
+			return is_mergeable_anon_vma(anon_vma, vma->anon_vma, vma);
+	return MERGE_FAILED;
 }
 
 /*
@@ -1121,14 +1119,13 @@ can_vma_merge_after(struct vm_area_struct *vma, unsigned long vm_flags,
 		    struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
 		    const char *anon_name)
 {
-	if (is_mergeable_vma(vma, file, vm_flags, vm_userfaultfd_ctx, anon_name) &&
-	    is_mergeable_anon_vma(anon_vma, vma->anon_vma, vma)) {
+	if (is_mergeable_vma(vma, file, vm_flags, vm_userfaultfd_ctx, anon_name)) {
 		pgoff_t vm_pglen;
 		vm_pglen = vma_pages(vma);
 		if (vma->vm_pgoff + vm_pglen == vm_pgoff)
-			return 1;
+			return is_mergeable_anon_vma(anon_vma, vma->anon_vma, vma);
 	}
-	return 0;
+	return MERGE_FAILED;
 }
 
 /*
@@ -1185,8 +1182,14 @@ struct vm_area_struct *vma_merge(struct mm_struct *mm,
 	pgoff_t pglen = (end - addr) >> PAGE_SHIFT;
 	struct vm_area_struct *area, *next;
 	int err = -1;
-	bool merge_prev = false;
-	bool merge_next = false;
+	/*
+	 * Following three variables are used to store values
+	 * indicating wheather this VMA and its anon_vma can
+	 * be merged and also the type of failure or success.
+	 */
+	enum vma_merge_res merge_prev = MERGE_FAILED;
+	enum vma_merge_res merge_both = MERGE_FAILED;
+	enum vma_merge_res merge_next = MERGE_FAILED;
 
 	/*
 	 * We later require that vma->vm_flags == vm_flags,
@@ -1207,32 +1210,34 @@ struct vm_area_struct *vma_merge(struct mm_struct *mm,
 
 	/* Can we merge the predecessor? */
 	if (prev && prev->vm_end == addr &&
-			mpol_equal(vma_policy(prev), policy) &&
-			can_vma_merge_after(prev, vm_flags,
+			mpol_equal(vma_policy(prev), policy)) {
+		merge_prev = can_vma_merge_after(prev, vm_flags,
 					    anon_vma, file, pgoff,
-					    vm_userfaultfd_ctx, anon_name)) {
-		merge_prev = true;
-		area = prev;
+					    vm_userfaultfd_ctx, anon_name);
 	}
+
 	/* Can we merge the successor? */
 	if (next && end == next->vm_start &&
-			mpol_equal(policy, vma_policy(next)) &&
-			can_vma_merge_before(next, vm_flags,
-					     anon_vma, file, pgoff+pglen,
-					     vm_userfaultfd_ctx, anon_name)) {
-		merge_next = true;
+			mpol_equal(policy, vma_policy(next))) {
+		merge_next = can_vma_merge_before(next, vm_flags,
+					anon_vma, file, pgoff+pglen,
+					vm_userfaultfd_ctx, anon_name);
 	}
+
 	/* Can we merge both the predecessor and the successor? */
-	if (merge_prev && merge_next &&
-			is_mergeable_anon_vma(prev->anon_vma,
-				next->anon_vma, NULL)) {	 /* cases 1, 6 */
+	if (merge_prev >= MERGE_OK && merge_next >= MERGE_OK)
+		merge_both = is_mergeable_anon_vma(prev->anon_vma, next->anon_vma, NULL);
+
+	if (merge_both >= MERGE_OK) {	 /* cases 1, 6 */
 		err = __vma_adjust(prev, prev->vm_start,
 					next->vm_end, prev->vm_pgoff, NULL,
 					prev);
-	} else if (merge_prev) {			/* cases 2, 5, 7 */
+		area = prev;
+	} else if (merge_prev >= MERGE_OK) {			/* cases 2, 5, 7 */
 		err = __vma_adjust(prev, prev->vm_start,
 					end, prev->vm_pgoff, NULL, prev);
-	} else if (merge_next) {
+		area = prev;
+	} else if (merge_next >= MERGE_OK) {
 		if (prev && addr < prev->vm_end)	/* case 4 */
 			err = __vma_adjust(prev, prev->vm_start,
 					addr, prev->vm_pgoff, NULL, next);
@@ -1246,7 +1251,7 @@ struct vm_area_struct *vma_merge(struct mm_struct *mm,
 		 */
 		area = next;
 	}
-
+	trace_vm_av_merge(err, merge_prev, merge_next, merge_both);
 	/*
 	 * Cannot merge with predecessor or successor or error in __vma_adjust?
 	 */
@@ -3321,6 +3326,8 @@ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap,
 		/*
 		 * Source vma may have been merged into new_vma
 		 */
+		trace_vm_pgoff_merge(vma, anon_pgoff_updated);
+
 		if (unlikely(vma_start >= new_vma->vm_start &&
 			     vma_start < new_vma->vm_end)) {
 			/*
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH v2 1/4] [PATCH 1/4] mm: refactor of vma_merge()
  2022-03-11 17:45 ` [RFC PATCH v2 1/4] [PATCH 1/4] mm: refactor of vma_merge() Jakub Matěna
@ 2022-03-17 18:53   ` Vlastimil Babka
  0 siblings, 0 replies; 6+ messages in thread
From: Vlastimil Babka @ 2022-03-17 18:53 UTC (permalink / raw)
  To: Jakub Matěna, linux-mm
  Cc: patches, linux-kernel, mhocko, mgorman, willy, liam.howlett,
	hughd, kirill, riel, rostedt, peterz

On 3/11/22 18:45, Jakub Matěna wrote:
> Refactor vma_merge() to make it shorter, more understandable and
> suitable for tracing of successful merges that are made possible by
> following patches in the series. Main change is the elimination of code
> duplicity in the case of merge next check. This is done by first doing
> checks and caching the results before executing the merge itself. Exit
> paths are also unified.
> 
> Signed-off-by: Jakub Matěna <matenajakub@gmail.com>

Reviewed-by: Vlastimil Babka <vbabka@suse.cz>

It's a nice cleanup on its own. Removed duplication and reduced indentation
levels helps.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-03-17 18:53 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-11 17:45 [RFC PATCH v2 0/4] Removing limitations of merging anonymous VMAs Jakub Matěna
2022-03-11 17:45 ` [RFC PATCH v2 1/4] [PATCH 1/4] mm: refactor of vma_merge() Jakub Matěna
2022-03-17 18:53   ` Vlastimil Babka
2022-03-11 17:46 ` [RFC PATCH v2 2/4] [PATCH 2/4] mm: adjust page offset in mremap Jakub Matěna
2022-03-11 17:46 ` [RFC PATCH v2 3/4] [PATCH 3/4] mm: enable merging of VMAs with different anon_vmas Jakub Matěna
2022-03-11 17:46 ` [RFC PATCH v2 4/4] [PATCH 4/4] mm: add tracing for VMA merges Jakub Matěna

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).