All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/6] Handle more faults under the VMA lock
@ 2023-10-06 19:53 Matthew Wilcox (Oracle)
  2023-10-06 19:53 ` [PATCH v2 1/6] mm: Make lock_folio_maybe_drop_mmap() VMA lock aware Matthew Wilcox (Oracle)
                   ` (5 more replies)
  0 siblings, 6 replies; 16+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-10-06 19:53 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Matthew Wilcox (Oracle), linux-mm, Suren Baghdasaryan

At this point, we're handling the majority of file-backed page faults
under the VMA lock, using the ->map_pages entry point.  This patch set
attempts to expand that for the following siutations:

 - We have to do a read.  This could be because we've hit the point in
   the readahead window where we need to kick off the next readahead,
   or because the page is simply not present in cache.
 - We're handling a write fault.  Most applications don't do I/O by writes
   to shared mmaps for very good reasons, but some do, and it'd be nice
   to not make that slow unnecessarily.
 - We're doing a COW of a private mapping (both PTE already present
   and PTE not-present).  These are two different codepaths and I handle
   both of them in this patch set.

There is no support in this patch set for drivers to mark themselves
as being VMA lock friendly; they could implement the ->map_pages
vm_operation, but if they do, they would be the first.  This is probably
something we want to change at some point in the future, and I've marked
where to make that change in the code.

There is very little performance change in the benchmarks we've run;
mostly because the vast majority of page faults are handled through the
other paths.  I still think this patch series is useful for workloads
that may take these paths more often, and just for cleaning up the
fault path in general (it's now clearer why we have to retry in these
cases).

v2;
 - Rename vmf_maybe_unlock_vma to vmf_can_call_fault

Matthew Wilcox (Oracle) (6):
  mm: Make lock_folio_maybe_drop_mmap() VMA lock aware
  mm: Call wp_page_copy() under the VMA lock
  mm: Handle shared faults under the VMA lock
  mm: Handle COW faults under the VMA lock
  mm: Handle read faults under the VMA lock
  mm: Handle write faults to RO pages under the VMA lock

 mm/filemap.c | 13 ++++----
 mm/memory.c  | 93 ++++++++++++++++++++++++++++++++--------------------
 2 files changed, 65 insertions(+), 41 deletions(-)

-- 
2.40.1



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2 1/6] mm: Make lock_folio_maybe_drop_mmap() VMA lock aware
  2023-10-06 19:53 [PATCH v2 0/6] Handle more faults under the VMA lock Matthew Wilcox (Oracle)
@ 2023-10-06 19:53 ` Matthew Wilcox (Oracle)
  2023-10-08 21:47   ` Suren Baghdasaryan
  2023-10-06 19:53 ` [PATCH v2 2/6] mm: Call wp_page_copy() under the VMA lock Matthew Wilcox (Oracle)
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 16+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-10-06 19:53 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Matthew Wilcox (Oracle), linux-mm, Suren Baghdasaryan

Drop the VMA lock instead of the mmap_lock if that's the one which
is held.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/filemap.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 9481ffaf24e6..a598872d62cc 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3104,7 +3104,7 @@ static int lock_folio_maybe_drop_mmap(struct vm_fault *vmf, struct folio *folio,
 
 	/*
 	 * NOTE! This will make us return with VM_FAULT_RETRY, but with
-	 * the mmap_lock still held. That's how FAULT_FLAG_RETRY_NOWAIT
+	 * the fault lock still held. That's how FAULT_FLAG_RETRY_NOWAIT
 	 * is supposed to work. We have way too many special cases..
 	 */
 	if (vmf->flags & FAULT_FLAG_RETRY_NOWAIT)
@@ -3114,13 +3114,14 @@ static int lock_folio_maybe_drop_mmap(struct vm_fault *vmf, struct folio *folio,
 	if (vmf->flags & FAULT_FLAG_KILLABLE) {
 		if (__folio_lock_killable(folio)) {
 			/*
-			 * We didn't have the right flags to drop the mmap_lock,
-			 * but all fault_handlers only check for fatal signals
-			 * if we return VM_FAULT_RETRY, so we need to drop the
-			 * mmap_lock here and return 0 if we don't have a fpin.
+			 * We didn't have the right flags to drop the
+			 * fault lock, but all fault_handlers only check
+			 * for fatal signals if we return VM_FAULT_RETRY,
+			 * so we need to drop the fault lock here and
+			 * return 0 if we don't have a fpin.
 			 */
 			if (*fpin == NULL)
-				mmap_read_unlock(vmf->vma->vm_mm);
+				release_fault_lock(vmf);
 			return 0;
 		}
 	} else
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 2/6] mm: Call wp_page_copy() under the VMA lock
  2023-10-06 19:53 [PATCH v2 0/6] Handle more faults under the VMA lock Matthew Wilcox (Oracle)
  2023-10-06 19:53 ` [PATCH v2 1/6] mm: Make lock_folio_maybe_drop_mmap() VMA lock aware Matthew Wilcox (Oracle)
@ 2023-10-06 19:53 ` Matthew Wilcox (Oracle)
  2023-10-08 22:00   ` Suren Baghdasaryan
  2023-10-06 19:53 ` [PATCH v2 3/6] mm: Handle shared faults " Matthew Wilcox (Oracle)
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 16+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-10-06 19:53 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Matthew Wilcox (Oracle), linux-mm, Suren Baghdasaryan

It is usually safe to call wp_page_copy() under the VMA lock.  The only
unsafe situation is when no anon_vma has been allocated for this VMA,
and we have to look at adjacent VMAs to determine if their anon_vma can
be shared.  Since this happens only for the first COW of a page in this
VMA, the majority of calls to wp_page_copy() do not need to fall back
to the mmap_sem.

Add vmf_anon_prepare() as an alternative to anon_vma_prepare() which
will return RETRY if we currently hold the VMA lock and need to allocate
an anon_vma.  This lets us drop the check in do_wp_page().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/memory.c | 39 ++++++++++++++++++++++++++-------------
 1 file changed, 26 insertions(+), 13 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 97f860d6cd2a..cff78c496728 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3042,6 +3042,21 @@ static inline void wp_page_reuse(struct vm_fault *vmf)
 	count_vm_event(PGREUSE);
 }
 
+static vm_fault_t vmf_anon_prepare(struct vm_fault *vmf)
+{
+	struct vm_area_struct *vma = vmf->vma;
+
+	if (likely(vma->anon_vma))
+		return 0;
+	if (vmf->flags & FAULT_FLAG_VMA_LOCK) {
+		vma_end_read(vma);
+		return VM_FAULT_RETRY;
+	}
+	if (__anon_vma_prepare(vma))
+		return VM_FAULT_OOM;
+	return 0;
+}
+
 /*
  * Handle the case of a page which we actually need to copy to a new page,
  * either due to COW or unsharing.
@@ -3069,27 +3084,29 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
 	pte_t entry;
 	int page_copied = 0;
 	struct mmu_notifier_range range;
-	int ret;
+	vm_fault_t ret;
 
 	delayacct_wpcopy_start();
 
 	if (vmf->page)
 		old_folio = page_folio(vmf->page);
-	if (unlikely(anon_vma_prepare(vma)))
-		goto oom;
+	ret = vmf_anon_prepare(vmf);
+	if (unlikely(ret))
+		goto out;
 
 	if (is_zero_pfn(pte_pfn(vmf->orig_pte))) {
 		new_folio = vma_alloc_zeroed_movable_folio(vma, vmf->address);
 		if (!new_folio)
 			goto oom;
 	} else {
+		int err;
 		new_folio = vma_alloc_folio(GFP_HIGHUSER_MOVABLE, 0, vma,
 				vmf->address, false);
 		if (!new_folio)
 			goto oom;
 
-		ret = __wp_page_copy_user(&new_folio->page, vmf->page, vmf);
-		if (ret) {
+		err = __wp_page_copy_user(&new_folio->page, vmf->page, vmf);
+		if (err) {
 			/*
 			 * COW failed, if the fault was solved by other,
 			 * it's fine. If not, userspace would re-fault on
@@ -3102,7 +3119,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
 				folio_put(old_folio);
 
 			delayacct_wpcopy_end();
-			return ret == -EHWPOISON ? VM_FAULT_HWPOISON : 0;
+			return err == -EHWPOISON ? VM_FAULT_HWPOISON : 0;
 		}
 		kmsan_copy_page_meta(&new_folio->page, vmf->page);
 	}
@@ -3212,11 +3229,13 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
 oom_free_new:
 	folio_put(new_folio);
 oom:
+	ret = VM_FAULT_OOM;
+out:
 	if (old_folio)
 		folio_put(old_folio);
 
 	delayacct_wpcopy_end();
-	return VM_FAULT_OOM;
+	return ret;
 }
 
 /**
@@ -3458,12 +3477,6 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf)
 		return 0;
 	}
 copy:
-	if ((vmf->flags & FAULT_FLAG_VMA_LOCK) && !vma->anon_vma) {
-		pte_unmap_unlock(vmf->pte, vmf->ptl);
-		vma_end_read(vmf->vma);
-		return VM_FAULT_RETRY;
-	}
-
 	/*
 	 * Ok, we need to copy. Oh, well..
 	 */
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 3/6] mm: Handle shared faults under the VMA lock
  2023-10-06 19:53 [PATCH v2 0/6] Handle more faults under the VMA lock Matthew Wilcox (Oracle)
  2023-10-06 19:53 ` [PATCH v2 1/6] mm: Make lock_folio_maybe_drop_mmap() VMA lock aware Matthew Wilcox (Oracle)
  2023-10-06 19:53 ` [PATCH v2 2/6] mm: Call wp_page_copy() under the VMA lock Matthew Wilcox (Oracle)
@ 2023-10-06 19:53 ` Matthew Wilcox (Oracle)
  2023-10-08 22:01   ` Suren Baghdasaryan
  2023-10-20 13:23   ` kernel test robot
  2023-10-06 19:53 ` [PATCH v2 4/6] mm: Handle COW " Matthew Wilcox (Oracle)
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 16+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-10-06 19:53 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Matthew Wilcox (Oracle), linux-mm, Suren Baghdasaryan

There are many implementations of ->fault and some of them depend on
mmap_lock being held.  All vm_ops that implement ->map_pages() end up
calling filemap_fault(), which I have audited to be sure it does not rely
on mmap_lock.  So (for now) key off ->map_pages existing as a flag to
indicate that it's safe to call ->fault while only holding the vma lock.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/memory.c | 22 ++++++++++++++++++----
 1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index cff78c496728..a9b0c135209a 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3042,6 +3042,21 @@ static inline void wp_page_reuse(struct vm_fault *vmf)
 	count_vm_event(PGREUSE);
 }
 
+/*
+ * We could add a bitflag somewhere, but for now, we know that all
+ * vm_ops that have a ->map_pages have been audited and don't need
+ * the mmap_lock to be held.
+ */
+static inline vm_fault_t vmf_can_call_fault(const struct vm_fault *vmf)
+{
+	struct vm_area_struct *vma = vmf->vma;
+
+	if (vma->vm_ops->map_pages || !(vmf->flags & FAULT_FLAG_VMA_LOCK))
+		return 0;
+	vma_end_read(vma);
+	return VM_FAULT_RETRY;
+}
+
 static vm_fault_t vmf_anon_prepare(struct vm_fault *vmf)
 {
 	struct vm_area_struct *vma = vmf->vma;
@@ -4669,10 +4684,9 @@ static vm_fault_t do_shared_fault(struct vm_fault *vmf)
 	vm_fault_t ret, tmp;
 	struct folio *folio;
 
-	if (vmf->flags & FAULT_FLAG_VMA_LOCK) {
-		vma_end_read(vma);
-		return VM_FAULT_RETRY;
-	}
+	ret = vmf_can_call_fault(vmf);
+	if (ret)
+		return ret;
 
 	ret = __do_fault(vmf);
 	if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY)))
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 4/6] mm: Handle COW faults under the VMA lock
  2023-10-06 19:53 [PATCH v2 0/6] Handle more faults under the VMA lock Matthew Wilcox (Oracle)
                   ` (2 preceding siblings ...)
  2023-10-06 19:53 ` [PATCH v2 3/6] mm: Handle shared faults " Matthew Wilcox (Oracle)
@ 2023-10-06 19:53 ` Matthew Wilcox (Oracle)
  2023-10-08 22:05   ` Suren Baghdasaryan
  2023-10-20 13:18   ` kernel test robot
  2023-10-06 19:53 ` [PATCH v2 5/6] mm: Handle read " Matthew Wilcox (Oracle)
  2023-10-06 19:53 ` [PATCH v2 6/6] mm: Handle write faults to RO pages " Matthew Wilcox (Oracle)
  5 siblings, 2 replies; 16+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-10-06 19:53 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Matthew Wilcox (Oracle), linux-mm, Suren Baghdasaryan

If the page is not currently present in the page tables, we need to call
the page fault handler to find out which page we're supposed to COW,
so we need to both check that there is already an anon_vma and that the
fault handler doesn't need the mmap_lock.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/memory.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index a9b0c135209a..938f481df0ab 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4639,13 +4639,11 @@ static vm_fault_t do_cow_fault(struct vm_fault *vmf)
 	struct vm_area_struct *vma = vmf->vma;
 	vm_fault_t ret;
 
-	if (vmf->flags & FAULT_FLAG_VMA_LOCK) {
-		vma_end_read(vma);
-		return VM_FAULT_RETRY;
-	}
-
-	if (unlikely(anon_vma_prepare(vma)))
-		return VM_FAULT_OOM;
+	ret = vmf_can_call_fault(vmf);
+	if (!ret)
+		ret = vmf_anon_prepare(vmf);
+	if (ret)
+		return ret;
 
 	vmf->cow_page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, vmf->address);
 	if (!vmf->cow_page)
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 5/6] mm: Handle read faults under the VMA lock
  2023-10-06 19:53 [PATCH v2 0/6] Handle more faults under the VMA lock Matthew Wilcox (Oracle)
                   ` (3 preceding siblings ...)
  2023-10-06 19:53 ` [PATCH v2 4/6] mm: Handle COW " Matthew Wilcox (Oracle)
@ 2023-10-06 19:53 ` Matthew Wilcox (Oracle)
  2023-10-08 22:06   ` Suren Baghdasaryan
  2023-10-20  9:55   ` kernel test robot
  2023-10-06 19:53 ` [PATCH v2 6/6] mm: Handle write faults to RO pages " Matthew Wilcox (Oracle)
  5 siblings, 2 replies; 16+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-10-06 19:53 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Matthew Wilcox (Oracle), linux-mm, Suren Baghdasaryan

Most file-backed faults are already handled through ->map_pages(),
but if we need to do I/O we'll come this way.  Since filemap_fault()
is now safe to be called under the VMA lock, we can handle these faults
under the VMA lock now.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/memory.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 938f481df0ab..e615afd28db2 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4617,10 +4617,9 @@ static vm_fault_t do_read_fault(struct vm_fault *vmf)
 			return ret;
 	}
 
-	if (vmf->flags & FAULT_FLAG_VMA_LOCK) {
-		vma_end_read(vmf->vma);
-		return VM_FAULT_RETRY;
-	}
+	ret = vmf_can_call_fault(vmf);
+	if (ret)
+		return ret;
 
 	ret = __do_fault(vmf);
 	if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY)))
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 6/6] mm: Handle write faults to RO pages under the VMA lock
  2023-10-06 19:53 [PATCH v2 0/6] Handle more faults under the VMA lock Matthew Wilcox (Oracle)
                   ` (4 preceding siblings ...)
  2023-10-06 19:53 ` [PATCH v2 5/6] mm: Handle read " Matthew Wilcox (Oracle)
@ 2023-10-06 19:53 ` Matthew Wilcox (Oracle)
  2023-10-08 22:07   ` Suren Baghdasaryan
  5 siblings, 1 reply; 16+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-10-06 19:53 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Matthew Wilcox (Oracle), linux-mm, Suren Baghdasaryan

I think this is a pretty rare occurrence, but for consistency handle
faults with the VMA lock held the same way that we handle other
faults with the VMA lock held.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/memory.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index e615afd28db2..3d1bc622e344 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3301,10 +3301,9 @@ static vm_fault_t wp_pfn_shared(struct vm_fault *vmf)
 		vm_fault_t ret;
 
 		pte_unmap_unlock(vmf->pte, vmf->ptl);
-		if (vmf->flags & FAULT_FLAG_VMA_LOCK) {
-			vma_end_read(vmf->vma);
-			return VM_FAULT_RETRY;
-		}
+		ret = vmf_can_call_fault(vmf);
+		if (ret)
+			return ret;
 
 		vmf->flags |= FAULT_FLAG_MKWRITE;
 		ret = vma->vm_ops->pfn_mkwrite(vmf);
@@ -3328,10 +3327,10 @@ static vm_fault_t wp_page_shared(struct vm_fault *vmf, struct folio *folio)
 		vm_fault_t tmp;
 
 		pte_unmap_unlock(vmf->pte, vmf->ptl);
-		if (vmf->flags & FAULT_FLAG_VMA_LOCK) {
+		tmp = vmf_can_call_fault(vmf);
+		if (tmp) {
 			folio_put(folio);
-			vma_end_read(vmf->vma);
-			return VM_FAULT_RETRY;
+			return tmp;
 		}
 
 		tmp = do_page_mkwrite(vmf, folio);
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/6] mm: Make lock_folio_maybe_drop_mmap() VMA lock aware
  2023-10-06 19:53 ` [PATCH v2 1/6] mm: Make lock_folio_maybe_drop_mmap() VMA lock aware Matthew Wilcox (Oracle)
@ 2023-10-08 21:47   ` Suren Baghdasaryan
  0 siblings, 0 replies; 16+ messages in thread
From: Suren Baghdasaryan @ 2023-10-08 21:47 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: Andrew Morton, linux-mm

On Fri, Oct 6, 2023 at 12:53 PM Matthew Wilcox (Oracle)
<willy@infradead.org> wrote:
>
> Drop the VMA lock instead of the mmap_lock if that's the one which
> is held.
>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Reviewed-by: Suren Baghdasaryan <surenb@google.com>

> ---
>  mm/filemap.c | 13 +++++++------
>  1 file changed, 7 insertions(+), 6 deletions(-)
>
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 9481ffaf24e6..a598872d62cc 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -3104,7 +3104,7 @@ static int lock_folio_maybe_drop_mmap(struct vm_fault *vmf, struct folio *folio,
>
>         /*
>          * NOTE! This will make us return with VM_FAULT_RETRY, but with
> -        * the mmap_lock still held. That's how FAULT_FLAG_RETRY_NOWAIT
> +        * the fault lock still held. That's how FAULT_FLAG_RETRY_NOWAIT
>          * is supposed to work. We have way too many special cases..
>          */
>         if (vmf->flags & FAULT_FLAG_RETRY_NOWAIT)
> @@ -3114,13 +3114,14 @@ static int lock_folio_maybe_drop_mmap(struct vm_fault *vmf, struct folio *folio,
>         if (vmf->flags & FAULT_FLAG_KILLABLE) {
>                 if (__folio_lock_killable(folio)) {
>                         /*
> -                        * We didn't have the right flags to drop the mmap_lock,
> -                        * but all fault_handlers only check for fatal signals
> -                        * if we return VM_FAULT_RETRY, so we need to drop the
> -                        * mmap_lock here and return 0 if we don't have a fpin.
> +                        * We didn't have the right flags to drop the
> +                        * fault lock, but all fault_handlers only check
> +                        * for fatal signals if we return VM_FAULT_RETRY,
> +                        * so we need to drop the fault lock here and
> +                        * return 0 if we don't have a fpin.
>                          */
>                         if (*fpin == NULL)
> -                               mmap_read_unlock(vmf->vma->vm_mm);
> +                               release_fault_lock(vmf);
>                         return 0;
>                 }
>         } else
> --
> 2.40.1
>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/6] mm: Call wp_page_copy() under the VMA lock
  2023-10-06 19:53 ` [PATCH v2 2/6] mm: Call wp_page_copy() under the VMA lock Matthew Wilcox (Oracle)
@ 2023-10-08 22:00   ` Suren Baghdasaryan
  0 siblings, 0 replies; 16+ messages in thread
From: Suren Baghdasaryan @ 2023-10-08 22:00 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: Andrew Morton, linux-mm

On Fri, Oct 6, 2023 at 12:53 PM Matthew Wilcox (Oracle)
<willy@infradead.org> wrote:
>
> It is usually safe to call wp_page_copy() under the VMA lock.  The only
> unsafe situation is when no anon_vma has been allocated for this VMA,
> and we have to look at adjacent VMAs to determine if their anon_vma can
> be shared.  Since this happens only for the first COW of a page in this
> VMA, the majority of calls to wp_page_copy() do not need to fall back
> to the mmap_sem.
>
> Add vmf_anon_prepare() as an alternative to anon_vma_prepare() which
> will return RETRY if we currently hold the VMA lock and need to allocate
> an anon_vma.  This lets us drop the check in do_wp_page().
>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Reviewed-by: Suren Baghdasaryan <surenb@google.com>

> ---
>  mm/memory.c | 39 ++++++++++++++++++++++++++-------------
>  1 file changed, 26 insertions(+), 13 deletions(-)
>
> diff --git a/mm/memory.c b/mm/memory.c
> index 97f860d6cd2a..cff78c496728 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3042,6 +3042,21 @@ static inline void wp_page_reuse(struct vm_fault *vmf)
>         count_vm_event(PGREUSE);
>  }
>
> +static vm_fault_t vmf_anon_prepare(struct vm_fault *vmf)
> +{
> +       struct vm_area_struct *vma = vmf->vma;
> +
> +       if (likely(vma->anon_vma))
> +               return 0;
> +       if (vmf->flags & FAULT_FLAG_VMA_LOCK) {
> +               vma_end_read(vma);
> +               return VM_FAULT_RETRY;
> +       }
> +       if (__anon_vma_prepare(vma))
> +               return VM_FAULT_OOM;
> +       return 0;
> +}
> +
>  /*
>   * Handle the case of a page which we actually need to copy to a new page,
>   * either due to COW or unsharing.
> @@ -3069,27 +3084,29 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
>         pte_t entry;
>         int page_copied = 0;
>         struct mmu_notifier_range range;
> -       int ret;
> +       vm_fault_t ret;
>
>         delayacct_wpcopy_start();
>
>         if (vmf->page)
>                 old_folio = page_folio(vmf->page);
> -       if (unlikely(anon_vma_prepare(vma)))
> -               goto oom;
> +       ret = vmf_anon_prepare(vmf);
> +       if (unlikely(ret))
> +               goto out;
>
>         if (is_zero_pfn(pte_pfn(vmf->orig_pte))) {
>                 new_folio = vma_alloc_zeroed_movable_folio(vma, vmf->address);
>                 if (!new_folio)
>                         goto oom;
>         } else {
> +               int err;
>                 new_folio = vma_alloc_folio(GFP_HIGHUSER_MOVABLE, 0, vma,
>                                 vmf->address, false);
>                 if (!new_folio)
>                         goto oom;
>
> -               ret = __wp_page_copy_user(&new_folio->page, vmf->page, vmf);
> -               if (ret) {
> +               err = __wp_page_copy_user(&new_folio->page, vmf->page, vmf);
> +               if (err) {
>                         /*
>                          * COW failed, if the fault was solved by other,
>                          * it's fine. If not, userspace would re-fault on
> @@ -3102,7 +3119,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
>                                 folio_put(old_folio);
>
>                         delayacct_wpcopy_end();
> -                       return ret == -EHWPOISON ? VM_FAULT_HWPOISON : 0;
> +                       return err == -EHWPOISON ? VM_FAULT_HWPOISON : 0;
>                 }
>                 kmsan_copy_page_meta(&new_folio->page, vmf->page);
>         }
> @@ -3212,11 +3229,13 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
>  oom_free_new:
>         folio_put(new_folio);
>  oom:
> +       ret = VM_FAULT_OOM;
> +out:
>         if (old_folio)
>                 folio_put(old_folio);
>
>         delayacct_wpcopy_end();
> -       return VM_FAULT_OOM;
> +       return ret;
>  }
>
>  /**
> @@ -3458,12 +3477,6 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf)
>                 return 0;
>         }
>  copy:
> -       if ((vmf->flags & FAULT_FLAG_VMA_LOCK) && !vma->anon_vma) {
> -               pte_unmap_unlock(vmf->pte, vmf->ptl);
> -               vma_end_read(vmf->vma);
> -               return VM_FAULT_RETRY;
> -       }
> -
>         /*
>          * Ok, we need to copy. Oh, well..
>          */
> --
> 2.40.1
>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 3/6] mm: Handle shared faults under the VMA lock
  2023-10-06 19:53 ` [PATCH v2 3/6] mm: Handle shared faults " Matthew Wilcox (Oracle)
@ 2023-10-08 22:01   ` Suren Baghdasaryan
  2023-10-20 13:23   ` kernel test robot
  1 sibling, 0 replies; 16+ messages in thread
From: Suren Baghdasaryan @ 2023-10-08 22:01 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: Andrew Morton, linux-mm

On Fri, Oct 6, 2023 at 12:53 PM Matthew Wilcox (Oracle)
<willy@infradead.org> wrote:
>
> There are many implementations of ->fault and some of them depend on
> mmap_lock being held.  All vm_ops that implement ->map_pages() end up
> calling filemap_fault(), which I have audited to be sure it does not rely
> on mmap_lock.  So (for now) key off ->map_pages existing as a flag to
> indicate that it's safe to call ->fault while only holding the vma lock.
>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Reviewed-by: Suren Baghdasaryan <surenb@google.com>

> ---
>  mm/memory.c | 22 ++++++++++++++++++----
>  1 file changed, 18 insertions(+), 4 deletions(-)
>
> diff --git a/mm/memory.c b/mm/memory.c
> index cff78c496728..a9b0c135209a 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3042,6 +3042,21 @@ static inline void wp_page_reuse(struct vm_fault *vmf)
>         count_vm_event(PGREUSE);
>  }
>
> +/*
> + * We could add a bitflag somewhere, but for now, we know that all
> + * vm_ops that have a ->map_pages have been audited and don't need
> + * the mmap_lock to be held.
> + */
> +static inline vm_fault_t vmf_can_call_fault(const struct vm_fault *vmf)
> +{
> +       struct vm_area_struct *vma = vmf->vma;
> +
> +       if (vma->vm_ops->map_pages || !(vmf->flags & FAULT_FLAG_VMA_LOCK))
> +               return 0;
> +       vma_end_read(vma);
> +       return VM_FAULT_RETRY;
> +}
> +
>  static vm_fault_t vmf_anon_prepare(struct vm_fault *vmf)
>  {
>         struct vm_area_struct *vma = vmf->vma;
> @@ -4669,10 +4684,9 @@ static vm_fault_t do_shared_fault(struct vm_fault *vmf)
>         vm_fault_t ret, tmp;
>         struct folio *folio;
>
> -       if (vmf->flags & FAULT_FLAG_VMA_LOCK) {
> -               vma_end_read(vma);
> -               return VM_FAULT_RETRY;
> -       }
> +       ret = vmf_can_call_fault(vmf);
> +       if (ret)
> +               return ret;
>
>         ret = __do_fault(vmf);
>         if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY)))
> --
> 2.40.1
>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 4/6] mm: Handle COW faults under the VMA lock
  2023-10-06 19:53 ` [PATCH v2 4/6] mm: Handle COW " Matthew Wilcox (Oracle)
@ 2023-10-08 22:05   ` Suren Baghdasaryan
  2023-10-20 13:18   ` kernel test robot
  1 sibling, 0 replies; 16+ messages in thread
From: Suren Baghdasaryan @ 2023-10-08 22:05 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: Andrew Morton, linux-mm

On Fri, Oct 6, 2023 at 12:53 PM Matthew Wilcox (Oracle)
<willy@infradead.org> wrote:
>
> If the page is not currently present in the page tables, we need to call
> the page fault handler to find out which page we're supposed to COW,
> so we need to both check that there is already an anon_vma and that the
> fault handler doesn't need the mmap_lock.
>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Reviewed-by: Suren Baghdasaryan <surenb@google.com>

> ---
>  mm/memory.c | 12 +++++-------
>  1 file changed, 5 insertions(+), 7 deletions(-)
>
> diff --git a/mm/memory.c b/mm/memory.c
> index a9b0c135209a..938f481df0ab 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -4639,13 +4639,11 @@ static vm_fault_t do_cow_fault(struct vm_fault *vmf)
>         struct vm_area_struct *vma = vmf->vma;
>         vm_fault_t ret;
>
> -       if (vmf->flags & FAULT_FLAG_VMA_LOCK) {
> -               vma_end_read(vma);
> -               return VM_FAULT_RETRY;
> -       }
> -
> -       if (unlikely(anon_vma_prepare(vma)))
> -               return VM_FAULT_OOM;
> +       ret = vmf_can_call_fault(vmf);
> +       if (!ret)
> +               ret = vmf_anon_prepare(vmf);
> +       if (ret)
> +               return ret;
>
>         vmf->cow_page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, vmf->address);
>         if (!vmf->cow_page)
> --
> 2.40.1
>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 5/6] mm: Handle read faults under the VMA lock
  2023-10-06 19:53 ` [PATCH v2 5/6] mm: Handle read " Matthew Wilcox (Oracle)
@ 2023-10-08 22:06   ` Suren Baghdasaryan
  2023-10-20  9:55   ` kernel test robot
  1 sibling, 0 replies; 16+ messages in thread
From: Suren Baghdasaryan @ 2023-10-08 22:06 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: Andrew Morton, linux-mm

On Fri, Oct 6, 2023 at 12:53 PM Matthew Wilcox (Oracle)
<willy@infradead.org> wrote:
>
> Most file-backed faults are already handled through ->map_pages(),
> but if we need to do I/O we'll come this way.  Since filemap_fault()
> is now safe to be called under the VMA lock, we can handle these faults
> under the VMA lock now.
>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Reviewed-by: Suren Baghdasaryan <surenb@google.com>

> ---
>  mm/memory.c | 7 +++----
>  1 file changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/mm/memory.c b/mm/memory.c
> index 938f481df0ab..e615afd28db2 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -4617,10 +4617,9 @@ static vm_fault_t do_read_fault(struct vm_fault *vmf)
>                         return ret;
>         }
>
> -       if (vmf->flags & FAULT_FLAG_VMA_LOCK) {
> -               vma_end_read(vmf->vma);
> -               return VM_FAULT_RETRY;
> -       }
> +       ret = vmf_can_call_fault(vmf);
> +       if (ret)
> +               return ret;
>
>         ret = __do_fault(vmf);
>         if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY)))
> --
> 2.40.1
>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 6/6] mm: Handle write faults to RO pages under the VMA lock
  2023-10-06 19:53 ` [PATCH v2 6/6] mm: Handle write faults to RO pages " Matthew Wilcox (Oracle)
@ 2023-10-08 22:07   ` Suren Baghdasaryan
  0 siblings, 0 replies; 16+ messages in thread
From: Suren Baghdasaryan @ 2023-10-08 22:07 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: Andrew Morton, linux-mm

On Fri, Oct 6, 2023 at 12:53 PM Matthew Wilcox (Oracle)
<willy@infradead.org> wrote:
>
> I think this is a pretty rare occurrence, but for consistency handle
> faults with the VMA lock held the same way that we handle other
> faults with the VMA lock held.
>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Reviewed-by: Suren Baghdasaryan <surenb@google.com>

> ---
>  mm/memory.c | 13 ++++++-------
>  1 file changed, 6 insertions(+), 7 deletions(-)
>
> diff --git a/mm/memory.c b/mm/memory.c
> index e615afd28db2..3d1bc622e344 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3301,10 +3301,9 @@ static vm_fault_t wp_pfn_shared(struct vm_fault *vmf)
>                 vm_fault_t ret;
>
>                 pte_unmap_unlock(vmf->pte, vmf->ptl);
> -               if (vmf->flags & FAULT_FLAG_VMA_LOCK) {
> -                       vma_end_read(vmf->vma);
> -                       return VM_FAULT_RETRY;
> -               }
> +               ret = vmf_can_call_fault(vmf);
> +               if (ret)
> +                       return ret;
>
>                 vmf->flags |= FAULT_FLAG_MKWRITE;
>                 ret = vma->vm_ops->pfn_mkwrite(vmf);
> @@ -3328,10 +3327,10 @@ static vm_fault_t wp_page_shared(struct vm_fault *vmf, struct folio *folio)
>                 vm_fault_t tmp;
>
>                 pte_unmap_unlock(vmf->pte, vmf->ptl);
> -               if (vmf->flags & FAULT_FLAG_VMA_LOCK) {
> +               tmp = vmf_can_call_fault(vmf);
> +               if (tmp) {
>                         folio_put(folio);
> -                       vma_end_read(vmf->vma);
> -                       return VM_FAULT_RETRY;
> +                       return tmp;
>                 }
>
>                 tmp = do_page_mkwrite(vmf, folio);
> --
> 2.40.1
>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 5/6] mm: Handle read faults under the VMA lock
  2023-10-06 19:53 ` [PATCH v2 5/6] mm: Handle read " Matthew Wilcox (Oracle)
  2023-10-08 22:06   ` Suren Baghdasaryan
@ 2023-10-20  9:55   ` kernel test robot
  1 sibling, 0 replies; 16+ messages in thread
From: kernel test robot @ 2023-10-20  9:55 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: oe-lkp, lkp, linux-mm, ying.huang, feng.tang, fengwei.yin,
	Andrew Morton, Matthew Wilcox (Oracle),
	Suren Baghdasaryan, oliver.sang



Hello,

kernel test robot noticed a 46.0% improvement of vm-scalability.throughput on:


commit: 39fbbca087dd149cdb82f08e7b92d62395c21ecf ("[PATCH v2 5/6] mm: Handle read faults under the VMA lock")
url: https://github.com/intel-lab-lkp/linux/commits/Matthew-Wilcox-Oracle/mm-Make-lock_folio_maybe_drop_mmap-VMA-lock-aware/20231007-035513
base: v6.6-rc4
patch link: https://lore.kernel.org/all/20231006195318.4087158-6-willy@infradead.org/
patch subject: [PATCH v2 5/6] mm: Handle read faults under the VMA lock

testcase: vm-scalability
test machine: 96 threads 2 sockets Intel(R) Xeon(R) Platinum 8260L CPU @ 2.40GHz (Cascade Lake) with 128G memory
parameters:

	runtime: 300s
	size: 2T
	test: shm-pread-seq-mt
	cpufreq_governor: performance

test-description: The motivation behind this suite is to exercise functions and regions of the mm/ of the Linux kernel which are of interest to us.
test-url: https://git.kernel.org/cgit/linux/kernel/git/wfg/vm-scalability.git/





Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20231020/202310201715.3f52109d-oliver.sang@intel.com

=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/runtime/size/tbox_group/test/testcase:
  gcc-12/performance/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/300s/2T/lkp-csl-2sp3/shm-pread-seq-mt/vm-scalability

commit: 
  90e99527c7 ("mm: Handle COW faults under the VMA lock")
  39fbbca087 ("mm: Handle read faults under the VMA lock")

90e99527c746cd9e 39fbbca087dd149cdb82f08e7b9 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
     34.69 ± 23%     +72.5%      59.82 ±  2%  vm-scalability.free_time
    173385           +45.6%     252524        vm-scalability.median
  16599151           +46.0%   24242352        vm-scalability.throughput
    390.45            +6.9%     417.32        vm-scalability.time.elapsed_time
    390.45            +6.9%     417.32        vm-scalability.time.elapsed_time.max
     45781 ±  2%     +16.3%      53251 ±  2%  vm-scalability.time.involuntary_context_switches
 4.213e+09           +50.1%  6.325e+09        vm-scalability.time.maximum_resident_set_size
 5.316e+08           +47.3%   7.83e+08        vm-scalability.time.minor_page_faults
      6400            -8.0%       5890        vm-scalability.time.percent_of_cpu_this_job_got
     21673           -10.2%      19455        vm-scalability.time.system_time
      3319           +54.4%       5126        vm-scalability.time.user_time
 2.321e+08 ±  2%     +27.2%  2.953e+08 ±  5%  vm-scalability.time.voluntary_context_switches
 5.004e+09           +42.2%  7.116e+09        vm-scalability.workload
     13110           +24.0%      16254        uptime.idle
  1.16e+10           +24.5%  1.444e+10        cpuidle..time
 2.648e+08 ±  3%     +16.3%  3.079e+08 ±  5%  cpuidle..usage
     22.86            +6.3       29.17        mpstat.cpu.all.idle%
      8.29 ±  5%      -1.2        7.13 ±  7%  mpstat.cpu.all.iowait%
     58.63            -9.2       49.38        mpstat.cpu.all.sys%
      9.05            +4.0       13.09        mpstat.cpu.all.usr%
   8721571 ±  5%     +44.8%   12630342 ±  2%  numa-numastat.node0.local_node
   8773210 ±  5%     +44.8%   12706884 ±  2%  numa-numastat.node0.numa_hit
   7793725 ±  5%     +51.3%   11793573        numa-numastat.node1.local_node
   7842342 ±  5%     +50.7%   11816543        numa-numastat.node1.numa_hit
     23.17           +26.8%      29.37        vmstat.cpu.id
  31295414           +50.9%   47211341        vmstat.memory.cache
  95303378           -18.8%   77355720        vmstat.memory.free
   1176885 ±  2%     +19.2%    1402891 ±  3%  vmstat.system.cs
    194658            +5.4%     205149 ±  2%  vmstat.system.in
   9920198 ± 10%     -48.9%    5071533 ± 15%  turbostat.C1
      0.51 ± 12%      -0.3        0.21 ± 12%  turbostat.C1%
   1831098 ± 15%     -72.0%     512888 ± 19%  turbostat.C1E
      0.14 ± 13%      -0.1        0.06 ± 11%  turbostat.C1E%
   8736699           +36.3%   11905646        turbostat.C6
     22.74            +6.3       29.02        turbostat.C6%
     17.82           +25.5%      22.37        turbostat.CPU%c1
      5.36           +28.2%       6.87        turbostat.CPU%c6
      0.07           +42.9%       0.10        turbostat.IPC
  77317703           +12.3%   86804635 ±  3%  turbostat.IRQ
 2.443e+08 ±  3%     +18.9%  2.904e+08 ±  6%  turbostat.POLL
      4.80           +30.2%       6.24        turbostat.Pkg%pc2
    266.73            -1.3%     263.33        turbostat.PkgWatt
      0.00           -25.0%       0.00        perf-sched.sch_delay.avg.ms.io_schedule.folio_wait_bit_common.shmem_get_folio_gfp.shmem_fault
      0.06 ± 11%     -21.8%       0.04 ±  9%  perf-sched.sch_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
     26.45 ±  9%     -16.0%      22.21 ±  6%  perf-sched.sch_delay.max.ms.io_schedule.folio_wait_bit_common.shmem_get_folio_gfp.shmem_fault
      0.00           -25.0%       0.00        perf-sched.total_sch_delay.average.ms
    106.37 ±167%     -79.1%      22.21 ±  6%  perf-sched.total_sch_delay.max.ms
      0.46 ±  2%     -16.0%       0.39 ±  5%  perf-sched.total_wait_and_delay.average.ms
   2202457 ±  2%     +26.1%    2776824 ±  3%  perf-sched.total_wait_and_delay.count.ms
      0.45 ±  2%     -15.9%       0.38 ±  5%  perf-sched.total_wait_time.average.ms
      0.02 ±  2%     -19.8%       0.01 ±  2%  perf-sched.wait_and_delay.avg.ms.io_schedule.folio_wait_bit_common.shmem_get_folio_gfp.shmem_fault
    494.65 ±  4%     +10.6%     546.88 ±  3%  perf-sched.wait_and_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
   2196122 ±  2%     +26.1%    2770017 ±  3%  perf-sched.wait_and_delay.count.io_schedule.folio_wait_bit_common.shmem_get_folio_gfp.shmem_fault
      0.01 ±  3%     -19.5%       0.01        perf-sched.wait_time.avg.ms.io_schedule.folio_wait_bit_common.shmem_get_folio_gfp.shmem_fault
    494.63 ±  4%     +10.6%     546.87 ±  3%  perf-sched.wait_time.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      0.22 ± 42%     -68.8%       0.07 ±125%  perf-sched.wait_time.max.ms.__cond_resched.stop_one_cpu.migrate_task_to.task_numa_migrate.isra
  11445425           +82.1%   20837223        meminfo.Active
  11444642           +82.1%   20836443        meminfo.Active(anon)
  31218122           +51.0%   47138293        meminfo.Cached
  30006048           +53.7%   46116816        meminfo.Committed_AS
  17425032           +37.4%   23950392        meminfo.Inactive
  17423257           +37.5%   23948613        meminfo.Inactive(anon)
    164910           +21.8%     200913        meminfo.KReclaimable
  26336530           +57.6%   41514589        meminfo.Mapped
  94668993           -19.0%   76693589        meminfo.MemAvailable
  95202238           -18.9%   77208832        meminfo.MemFree
  36610737           +49.1%   54604143        meminfo.Memused
   4072810           +50.1%    6114589        meminfo.PageTables
    164910           +21.8%     200913        meminfo.SReclaimable
  28535318           +55.8%   44455489        meminfo.Shmem
    367289           +10.1%     404373        meminfo.Slab
  37978157           +50.2%   57055526        meminfo.max_used_kB
   2860756           +82.1%    5208445        proc-vmstat.nr_active_anon
   2361286           -19.0%    1912151        proc-vmstat.nr_dirty_background_threshold
   4728345           -19.0%    3828978        proc-vmstat.nr_dirty_threshold
   7804148           +51.0%   11783823        proc-vmstat.nr_file_pages
  23801109           -18.9%   19303173        proc-vmstat.nr_free_pages
   4355690           +37.5%    5986921        proc-vmstat.nr_inactive_anon
   6583645           +57.6%   10377790        proc-vmstat.nr_mapped
   1018109           +50.1%    1528565        proc-vmstat.nr_page_table_pages
   7133183           +55.8%   11112858        proc-vmstat.nr_shmem
     41226           +21.8%      50226        proc-vmstat.nr_slab_reclaimable
   2860756           +82.1%    5208445        proc-vmstat.nr_zone_active_anon
   4355690           +37.5%    5986921        proc-vmstat.nr_zone_inactive_anon
    112051            +3.8%     116273        proc-vmstat.numa_hint_faults
  16618553           +47.6%   24525492        proc-vmstat.numa_hit
  16518296           +47.9%   24425975        proc-vmstat.numa_local
  11052273           +49.9%   16566743        proc-vmstat.pgactivate
  16757533           +47.2%   24672644        proc-vmstat.pgalloc_normal
 5.329e+08           +47.2%  7.844e+08        proc-vmstat.pgfault
  16101786           +48.3%   23877738        proc-vmstat.pgfree
   3302784            +6.0%    3500288        proc-vmstat.unevictable_pgs_scanned
   6101287 ±  7%     +81.3%   11062634 ±  3%  numa-meminfo.node0.Active
   6101026 ±  7%     +81.3%   11062389 ±  3%  numa-meminfo.node0.Active(anon)
  17217355 ±  5%     +46.3%   25196100 ±  3%  numa-meminfo.node0.FilePages
   9363213 ±  7%     +31.9%   12347562 ±  2%  numa-meminfo.node0.Inactive
   9362621 ±  7%     +31.9%   12347130 ±  2%  numa-meminfo.node0.Inactive(anon)
  14211196 ±  7%     +51.2%   21487599        numa-meminfo.node0.Mapped
  45879058 ±  2%     -19.6%   36888633 ±  2%  numa-meminfo.node0.MemFree
  19925073 ±  5%     +45.1%   28915498 ±  3%  numa-meminfo.node0.MemUsed
   2032891           +50.5%    3060344        numa-meminfo.node0.PageTables
  15318197 ±  6%     +52.0%   23276446 ±  2%  numa-meminfo.node0.Shmem
   5342463 ±  7%     +82.9%    9769639 ±  4%  numa-meminfo.node1.Active
   5341941 ±  7%     +82.9%    9769104 ±  4%  numa-meminfo.node1.Active(anon)
  13998966 ±  8%     +56.6%   21919509 ±  3%  numa-meminfo.node1.FilePages
   8060699 ±  7%     +43.7%   11584190 ±  2%  numa-meminfo.node1.Inactive
   8059515 ±  7%     +43.7%   11582844 ±  2%  numa-meminfo.node1.Inactive(anon)
  12125745 ±  7%     +65.0%   20005342        numa-meminfo.node1.Mapped
  49326340 ±  2%     -18.2%   40347902 ±  2%  numa-meminfo.node1.MemFree
  16682503 ±  7%     +53.8%   25660941 ±  3%  numa-meminfo.node1.MemUsed
   2039529           +49.6%    3051247        numa-meminfo.node1.PageTables
  13214266 ±  7%     +60.1%   21155303 ±  2%  numa-meminfo.node1.Shmem
    156378 ± 13%     +21.1%     189316 ±  9%  numa-meminfo.node1.Slab
   1525784 ±  7%     +81.4%    2767183 ±  3%  numa-vmstat.node0.nr_active_anon
   4304756 ±  5%     +46.4%    6302189 ±  3%  numa-vmstat.node0.nr_file_pages
  11469263 ±  2%     -19.6%    9218468 ±  2%  numa-vmstat.node0.nr_free_pages
   2340569 ±  7%     +32.0%    3088383 ±  2%  numa-vmstat.node0.nr_inactive_anon
   3553304 ±  7%     +51.3%    5375214        numa-vmstat.node0.nr_mapped
    508315           +50.6%     765564        numa-vmstat.node0.nr_page_table_pages
   3829966 ±  6%     +52.0%    5822276 ±  2%  numa-vmstat.node0.nr_shmem
   1525783 ±  7%     +81.4%    2767184 ±  3%  numa-vmstat.node0.nr_zone_active_anon
   2340569 ±  7%     +32.0%    3088382 ±  2%  numa-vmstat.node0.nr_zone_inactive_anon
   8773341 ±  5%     +44.8%   12707017 ±  2%  numa-vmstat.node0.numa_hit
   8721702 ±  5%     +44.8%   12630474 ±  2%  numa-vmstat.node0.numa_local
   1335910 ±  7%     +82.9%    2443778 ±  4%  numa-vmstat.node1.nr_active_anon
   3500040 ±  8%     +56.7%    5482887 ±  3%  numa-vmstat.node1.nr_file_pages
  12331163 ±  2%     -18.2%   10083422 ±  2%  numa-vmstat.node1.nr_free_pages
   2014795 ±  7%     +43.8%    2897243 ±  2%  numa-vmstat.node1.nr_inactive_anon
   3031806 ±  7%     +65.1%    5004449        numa-vmstat.node1.nr_mapped
    510000           +49.7%     763297        numa-vmstat.node1.nr_page_table_pages
   3303865 ±  7%     +60.2%    5291835 ±  2%  numa-vmstat.node1.nr_shmem
   1335910 ±  7%     +82.9%    2443778 ±  4%  numa-vmstat.node1.nr_zone_active_anon
   2014795 ±  7%     +43.8%    2897242 ±  2%  numa-vmstat.node1.nr_zone_inactive_anon
   7842425 ±  5%     +50.7%   11816530        numa-vmstat.node1.numa_hit
   7793808 ±  5%     +51.3%   11793555        numa-vmstat.node1.numa_local
   9505083           +21.3%   11532590 ±  3%  sched_debug.cfs_rq:/.avg_vruntime.avg
   9551715           +21.4%   11595502 ±  3%  sched_debug.cfs_rq:/.avg_vruntime.max
   9426050           +21.4%   11443528 ±  3%  sched_debug.cfs_rq:/.avg_vruntime.min
     19249 ±  4%     +28.3%      24698 ± 10%  sched_debug.cfs_rq:/.avg_vruntime.stddev
      0.79           -30.7%       0.55 ±  8%  sched_debug.cfs_rq:/.h_nr_running.avg
     12458 ± 12%     +70.8%      21277 ± 22%  sched_debug.cfs_rq:/.load.avg
     13767 ± 95%    +311.7%      56677 ± 29%  sched_debug.cfs_rq:/.load.stddev
   9505083           +21.3%   11532590 ±  3%  sched_debug.cfs_rq:/.min_vruntime.avg
   9551715           +21.4%   11595502 ±  3%  sched_debug.cfs_rq:/.min_vruntime.max
   9426050           +21.4%   11443528 ±  3%  sched_debug.cfs_rq:/.min_vruntime.min
     19249 ±  4%     +28.3%      24698 ± 10%  sched_debug.cfs_rq:/.min_vruntime.stddev
      0.78           -30.7%       0.54 ±  8%  sched_debug.cfs_rq:/.nr_running.avg
    170.67           -21.4%     134.10 ±  6%  sched_debug.cfs_rq:/.removed.load_avg.max
    708.55           -32.2%     480.43 ±  7%  sched_debug.cfs_rq:/.runnable_avg.avg
      1510 ±  3%     -12.5%       1320 ±  4%  sched_debug.cfs_rq:/.runnable_avg.max
    219.68 ±  7%     -12.7%     191.74 ±  5%  sched_debug.cfs_rq:/.runnable_avg.stddev
    707.51           -32.3%     479.05 ±  7%  sched_debug.cfs_rq:/.util_avg.avg
      1506 ±  3%     -12.6%       1317 ±  4%  sched_debug.cfs_rq:/.util_avg.max
    219.64 ±  7%     -13.0%     191.15 ±  5%  sched_debug.cfs_rq:/.util_avg.stddev
    564.18 ±  2%     -32.4%     381.24 ±  8%  sched_debug.cfs_rq:/.util_est_enqueued.avg
      1168 ±  7%     -14.8%     995.94 ±  7%  sched_debug.cfs_rq:/.util_est_enqueued.max
    235.45 ±  5%     -21.4%     185.13 ±  7%  sched_debug.cfs_rq:/.util_est_enqueued.stddev
    149234 ±  5%    +192.0%     435707 ± 10%  sched_debug.cpu.avg_idle.avg
    404765 ± 17%     +47.3%     596259 ± 15%  sched_debug.cpu.avg_idle.max
      5455 ±  4%   +3302.8%     185624 ± 34%  sched_debug.cpu.avg_idle.min
    201990           +24.9%     252309 ±  5%  sched_debug.cpu.clock.avg
    201997           +24.9%     252315 ±  5%  sched_debug.cpu.clock.max
    201983           +24.9%     252303 ±  5%  sched_debug.cpu.clock.min
      3.80 ±  2%     -10.1%       3.42 ±  3%  sched_debug.cpu.clock.stddev
    200296           +24.8%     249952 ±  5%  sched_debug.cpu.clock_task.avg
    200541           +24.8%     250280 ±  5%  sched_debug.cpu.clock_task.max
    194086           +25.5%     243582 ±  5%  sched_debug.cpu.clock_task.min
      4069           -32.7%       2739 ±  8%  sched_debug.cpu.curr->pid.avg
      8703           +15.2%      10027 ±  3%  sched_debug.cpu.curr->pid.max
      0.00 ±  6%     -27.2%       0.00 ±  5%  sched_debug.cpu.next_balance.stddev
      0.78           -32.7%       0.52 ±  8%  sched_debug.cpu.nr_running.avg
      0.33 ±  6%     -13.9%       0.29 ±  5%  sched_debug.cpu.nr_running.stddev
   2372181 ±  2%     +57.6%    3737590 ±  8%  sched_debug.cpu.nr_switches.avg
   2448893 ±  2%     +58.5%    3880813 ±  8%  sched_debug.cpu.nr_switches.max
   2290032 ±  2%     +55.9%    3570559 ±  8%  sched_debug.cpu.nr_switches.min
     36185 ± 10%     +74.8%      63244 ±  8%  sched_debug.cpu.nr_switches.stddev
      0.10 ± 19%    +138.0%       0.23 ± 19%  sched_debug.cpu.nr_uninterruptible.avg
    201984           +24.9%     252304 ±  5%  sched_debug.cpu_clk
    201415           +25.0%     251735 ±  5%  sched_debug.ktime
    202543           +24.8%     252867 ±  5%  sched_debug.sched_clk
      3.84 ±  2%     -14.1%       3.30 ±  2%  perf-stat.i.MPKI
 1.679e+10           +30.1%  2.186e+10        perf-stat.i.branch-instructions
      0.54 ±  2%      -0.1        0.45        perf-stat.i.branch-miss-rate%
  75872684            -2.6%   73927540        perf-stat.i.branch-misses
     31.85            -1.1       30.75        perf-stat.i.cache-miss-rate%
   1184992 ±  2%     +19.1%    1411069 ±  3%  perf-stat.i.context-switches
      3.49           -29.3%       2.47        perf-stat.i.cpi
 2.265e+11            -8.1%  2.081e+11        perf-stat.i.cpu-cycles
    950.46 ±  3%     -11.6%     840.03 ±  2%  perf-stat.i.cycles-between-cache-misses
   9514714 ± 12%     +27.3%   12109471 ± 10%  perf-stat.i.dTLB-load-misses
 1.556e+10           +29.9%  2.022e+10        perf-stat.i.dTLB-loads
   1575276 ±  5%     +35.8%    2138868 ±  5%  perf-stat.i.dTLB-store-misses
 3.396e+09           +21.6%  4.129e+09        perf-stat.i.dTLB-stores
     79.97            +2.8       82.74        perf-stat.i.iTLB-load-miss-rate%
   4265612            +8.4%    4624960 ±  2%  perf-stat.i.iTLB-load-misses
    712599 ±  8%     -38.4%     438645 ±  7%  perf-stat.i.iTLB-loads
  5.59e+10           +27.7%  7.137e+10        perf-stat.i.instructions
     12120           +11.6%      13525 ±  2%  perf-stat.i.instructions-per-iTLB-miss
      0.35           +32.7%       0.46        perf-stat.i.ipc
      0.04 ± 38%    +119.0%       0.08 ± 33%  perf-stat.i.major-faults
      2.36            -8.1%       2.17        perf-stat.i.metric.GHz
    863.69            +7.5%     928.37        perf-stat.i.metric.K/sec
    378.76           +28.8%     487.87        perf-stat.i.metric.M/sec
   1359089           +37.9%    1874285        perf-stat.i.minor-faults
     84.30            -2.8       81.50        perf-stat.i.node-load-miss-rate%
     89.54            -2.5       87.09        perf-stat.i.node-store-miss-rate%
   1359089           +37.9%    1874285        perf-stat.i.page-faults
      3.65 ±  3%     -22.5%       2.82 ±  4%  perf-stat.overall.MPKI
      0.45            -0.1        0.34        perf-stat.overall.branch-miss-rate%
     32.64            -1.7       30.98        perf-stat.overall.cache-miss-rate%
      4.05           -28.0%       2.92        perf-stat.overall.cpi
      1113 ±  3%      -7.1%       1034 ±  3%  perf-stat.overall.cycles-between-cache-misses
      0.05 ±  5%      +0.0        0.05 ±  5%  perf-stat.overall.dTLB-store-miss-rate%
     85.73            +5.6       91.37        perf-stat.overall.iTLB-load-miss-rate%
     13110 ±  2%     +17.8%      15440 ±  2%  perf-stat.overall.instructions-per-iTLB-miss
      0.25           +39.0%       0.34        perf-stat.overall.ipc
      4378            -4.2%       4195        perf-stat.overall.path-length
 1.679e+10           +30.2%  2.186e+10        perf-stat.ps.branch-instructions
  75862675            -2.6%   73920168        perf-stat.ps.branch-misses
   1184994 ±  2%     +19.1%    1411192 ±  3%  perf-stat.ps.context-switches
 2.265e+11            -8.1%  2.082e+11        perf-stat.ps.cpu-cycles
   9518014 ± 12%     +27.3%   12118863 ± 10%  perf-stat.ps.dTLB-load-misses
 1.556e+10           +29.9%  2.022e+10        perf-stat.ps.dTLB-loads
   1575414 ±  5%     +35.8%    2139373 ±  5%  perf-stat.ps.dTLB-store-misses
 3.396e+09           +21.6%  4.129e+09        perf-stat.ps.dTLB-stores
   4265139            +8.4%    4625090 ±  2%  perf-stat.ps.iTLB-load-misses
    711002 ±  8%     -38.5%     437258 ±  7%  perf-stat.ps.iTLB-loads
  5.59e+10           +27.7%  7.137e+10        perf-stat.ps.instructions
      0.04 ± 37%    +118.9%       0.08 ± 33%  perf-stat.ps.major-faults
   1359186           +37.9%    1874615        perf-stat.ps.minor-faults
   1359186           +37.9%    1874615        perf-stat.ps.page-faults
 2.191e+13           +36.3%  2.986e+13        perf-stat.total.instructions
     74.66            -6.7       67.93        perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.do_access
     74.61            -6.7       67.89        perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.do_access
     53.18            -6.3       46.88        perf-profile.calltrace.cycles-pp.filemap_map_pages.do_read_fault.do_fault.__handle_mm_fault.handle_mm_fault
     35.54            -6.1       29.43        perf-profile.calltrace.cycles-pp.next_uptodate_folio.filemap_map_pages.do_read_fault.do_fault.__handle_mm_fault
     76.49            -5.4       71.07        perf-profile.calltrace.cycles-pp.asm_exc_page_fault.do_access
     79.82            -3.9       75.89        perf-profile.calltrace.cycles-pp.do_access
     70.02            -3.8       66.23        perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
     70.39            -3.7       66.70        perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.do_access
     68.31            -2.8       65.51        perf-profile.calltrace.cycles-pp.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
     68.29            -2.8       65.50        perf-profile.calltrace.cycles-pp.do_read_fault.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
      0.65 ±  7%      -0.3        0.37 ± 71%  perf-profile.calltrace.cycles-pp.finish_task_switch.__schedule.schedule.io_schedule.folio_wait_bit_common
      1.94 ±  6%      -0.2        1.71 ±  6%  perf-profile.calltrace.cycles-pp.__schedule.schedule.io_schedule.folio_wait_bit_common.shmem_get_folio_gfp
      1.96 ±  6%      -0.2        1.74 ±  6%  perf-profile.calltrace.cycles-pp.io_schedule.folio_wait_bit_common.shmem_get_folio_gfp.shmem_fault.__do_fault
      1.95 ±  6%      -0.2        1.74 ±  6%  perf-profile.calltrace.cycles-pp.schedule.io_schedule.folio_wait_bit_common.shmem_get_folio_gfp.shmem_fault
      0.86            +0.1        1.00 ±  2%  perf-profile.calltrace.cycles-pp.folio_add_file_rmap_range.set_pte_range.filemap_map_pages.do_read_fault.do_fault
      0.56            +0.2        0.72 ±  4%  perf-profile.calltrace.cycles-pp.sched_ttwu_pending.__flush_smp_call_function_queue.flush_smp_call_function_queue.do_idle.cpu_startup_entry
      1.16 ±  3%      +0.2        1.33 ±  2%  perf-profile.calltrace.cycles-pp.set_pte_range.filemap_map_pages.do_read_fault.do_fault.__handle_mm_fault
      0.71 ±  2%      +0.2        0.92 ±  3%  perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.flush_smp_call_function_queue.do_idle.cpu_startup_entry.start_secondary
      0.78            +0.2        1.02 ±  4%  perf-profile.calltrace.cycles-pp.flush_smp_call_function_queue.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
      0.44 ± 44%      +0.3        0.73 ±  3%  perf-profile.calltrace.cycles-pp.set_pte_range.finish_fault.do_read_fault.do_fault.__handle_mm_fault
      0.89 ±  9%      +0.3        1.24 ±  8%  perf-profile.calltrace.cycles-pp.finish_fault.do_read_fault.do_fault.__handle_mm_fault.handle_mm_fault
      1.23            +0.4        1.59        perf-profile.calltrace.cycles-pp.irqentry_exit_to_user_mode.asm_exc_page_fault.do_access
      0.18 ±141%      +0.4        0.57 ±  5%  perf-profile.calltrace.cycles-pp.try_to_wake_up.wake_page_function.__wake_up_common.folio_wake_bit.filemap_map_pages
      1.50            +0.6        2.05        perf-profile.calltrace.cycles-pp.filemap_get_entry.shmem_get_folio_gfp.shmem_fault.__do_fault.do_read_fault
      0.00            +0.6        0.56 ±  4%  perf-profile.calltrace.cycles-pp.wake_page_function.__wake_up_common.folio_wake_bit.do_read_fault.do_fault
      0.09 ±223%      +0.6        0.69 ±  4%  perf-profile.calltrace.cycles-pp.__wake_up_common.folio_wake_bit.do_read_fault.do_fault.__handle_mm_fault
      0.00            +0.6        0.60        perf-profile.calltrace.cycles-pp.folio_add_file_rmap_range.set_pte_range.finish_fault.do_read_fault.do_fault
      2.98 ±  3%      +0.7        3.66 ±  2%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irq.folio_wait_bit_common.shmem_get_folio_gfp.shmem_fault
      3.39 ±  3%      +0.8        4.21        perf-profile.calltrace.cycles-pp._raw_spin_lock_irq.folio_wait_bit_common.shmem_get_folio_gfp.shmem_fault.__do_fault
      7.48            +0.9        8.41        perf-profile.calltrace.cycles-pp.folio_wait_bit_common.shmem_get_folio_gfp.shmem_fault.__do_fault.do_read_fault
      2.25 ±  6%      +1.0        3.30 ±  3%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_wake_bit.do_read_fault.do_fault
      2.44 ±  5%      +1.1        3.56 ±  2%  perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_wake_bit.do_read_fault.do_fault.__handle_mm_fault
      3.11 ±  4%      +1.4        4.52        perf-profile.calltrace.cycles-pp.folio_wake_bit.do_read_fault.do_fault.__handle_mm_fault.handle_mm_fault
     10.14            +1.9       12.06        perf-profile.calltrace.cycles-pp.shmem_get_folio_gfp.shmem_fault.__do_fault.do_read_fault.do_fault
     10.26            +2.0       12.25        perf-profile.calltrace.cycles-pp.shmem_fault.__do_fault.do_read_fault.do_fault.__handle_mm_fault
     10.29            +2.0       12.29        perf-profile.calltrace.cycles-pp.__do_fault.do_read_fault.do_fault.__handle_mm_fault.handle_mm_fault
      9.69            +5.5       15.21 ±  2%  perf-profile.calltrace.cycles-pp.do_rw_once
     74.66            -6.7       67.94        perf-profile.children.cycles-pp.exc_page_fault
     74.62            -6.7       67.90        perf-profile.children.cycles-pp.do_user_addr_fault
     53.19            -6.3       46.89        perf-profile.children.cycles-pp.filemap_map_pages
     35.56            -6.1       29.44        perf-profile.children.cycles-pp.next_uptodate_folio
     76.51            -6.0       70.48        perf-profile.children.cycles-pp.asm_exc_page_fault
     70.02            -3.8       66.24        perf-profile.children.cycles-pp.__handle_mm_fault
     70.40            -3.7       66.71        perf-profile.children.cycles-pp.handle_mm_fault
     81.33            -3.5       77.78        perf-profile.children.cycles-pp.do_access
     68.32            -2.8       65.52        perf-profile.children.cycles-pp.do_fault
     68.30            -2.8       65.50        perf-profile.children.cycles-pp.do_read_fault
      2.07 ±  7%      -2.0        0.12 ±  6%  perf-profile.children.cycles-pp.down_read_trylock
      1.28 ±  4%      -1.1        0.16 ±  4%  perf-profile.children.cycles-pp.up_read
      0.65 ± 12%      -0.4        0.28 ± 15%  perf-profile.children.cycles-pp.intel_idle_irq
      1.96 ±  6%      -0.2        1.74 ±  6%  perf-profile.children.cycles-pp.schedule
      1.96 ±  6%      -0.2        1.74 ±  6%  perf-profile.children.cycles-pp.io_schedule
      0.36 ±  7%      -0.2        0.15 ±  3%  perf-profile.children.cycles-pp.mtree_range_walk
      0.30 ±  8%      -0.2        0.13 ± 14%  perf-profile.children.cycles-pp.mm_cid_get
      0.12 ± 12%      -0.1        0.03 ±100%  perf-profile.children.cycles-pp.update_sg_lb_stats
      0.16 ±  9%      -0.1        0.07 ± 15%  perf-profile.children.cycles-pp.load_balance
      0.14 ± 10%      -0.1        0.05 ± 46%  perf-profile.children.cycles-pp.update_sd_lb_stats
      0.20 ± 10%      -0.1        0.11 ±  8%  perf-profile.children.cycles-pp.newidle_balance
      0.14 ± 10%      -0.1        0.06 ± 17%  perf-profile.children.cycles-pp.find_busiest_group
      0.33 ±  6%      -0.0        0.28 ±  5%  perf-profile.children.cycles-pp.pick_next_task_fair
      0.05            +0.0        0.06        perf-profile.children.cycles-pp.nohz_run_idle_balance
      0.06            +0.0        0.08 ±  6%  perf-profile.children.cycles-pp.__update_load_avg_se
      0.04 ± 44%      +0.0        0.06        perf-profile.children.cycles-pp.reweight_entity
      0.09 ±  7%      +0.0        0.11 ±  4%  perf-profile.children.cycles-pp.xas_descend
      0.08 ±  5%      +0.0        0.10 ±  4%  perf-profile.children.cycles-pp.update_curr
      0.09 ±  7%      +0.0        0.11 ±  3%  perf-profile.children.cycles-pp.prepare_task_switch
      0.10 ±  4%      +0.0        0.12 ±  3%  perf-profile.children.cycles-pp.call_function_single_prep_ipi
      0.08 ±  4%      +0.0        0.10 ±  5%  perf-profile.children.cycles-pp.__update_load_avg_cfs_rq
      0.04 ± 44%      +0.0        0.06 ±  7%  perf-profile.children.cycles-pp.sched_clock
      0.13 ±  7%      +0.0        0.16 ±  4%  perf-profile.children.cycles-pp.__sysvec_call_function_single
      0.08 ±  6%      +0.0        0.10 ±  3%  perf-profile.children.cycles-pp.set_next_entity
      0.16 ±  4%      +0.0        0.19 ±  3%  perf-profile.children.cycles-pp.__switch_to
      0.09 ±  4%      +0.0        0.12 ±  4%  perf-profile.children.cycles-pp.llist_reverse_order
      0.04 ± 44%      +0.0        0.07 ±  5%  perf-profile.children.cycles-pp.place_entity
      0.14 ±  3%      +0.0        0.16 ±  3%  perf-profile.children.cycles-pp.llist_add_batch
      0.09 ±  5%      +0.0        0.12 ±  6%  perf-profile.children.cycles-pp.available_idle_cpu
      0.15 ±  4%      +0.0        0.18 ±  4%  perf-profile.children.cycles-pp.sysvec_call_function_single
      0.08 ±  5%      +0.0        0.12 ±  6%  perf-profile.children.cycles-pp.wake_affine
      0.08            +0.0        0.11        perf-profile.children.cycles-pp.__list_del_entry_valid_or_report
      0.11 ±  4%      +0.0        0.14 ±  3%  perf-profile.children.cycles-pp.update_rq_clock_task
      0.11 ±  4%      +0.0        0.14 ±  4%  perf-profile.children.cycles-pp.__switch_to_asm
      0.04 ± 44%      +0.0        0.07 ±  6%  perf-profile.children.cycles-pp.folio_add_lru
      0.06 ±  7%      +0.0        0.10 ±  6%  perf-profile.children.cycles-pp.shmem_add_to_page_cache
      0.18 ±  5%      +0.0        0.22 ±  4%  perf-profile.children.cycles-pp.asm_sysvec_call_function_single
      0.02 ±141%      +0.0        0.06 ±  6%  perf-profile.children.cycles-pp.tick_nohz_idle_exit
      0.12 ±  3%      +0.0        0.17 ±  5%  perf-profile.children.cycles-pp.select_task_rq_fair
      0.13 ±  3%      +0.0        0.18 ±  6%  perf-profile.children.cycles-pp.select_task_rq
      0.23 ±  3%      +0.1        0.29 ±  3%  perf-profile.children.cycles-pp.__smp_call_single_queue
      0.20 ±  3%      +0.1        0.26 ±  3%  perf-profile.children.cycles-pp.update_load_avg
      0.01 ±223%      +0.1        0.07 ± 18%  perf-profile.children.cycles-pp.shmem_alloc_and_acct_folio
      0.26 ±  2%      +0.1        0.34 ±  3%  perf-profile.children.cycles-pp.dequeue_entity
      0.29 ±  3%      +0.1        0.37 ±  4%  perf-profile.children.cycles-pp.dequeue_task_fair
      0.17 ±  3%      +0.1        0.26 ±  2%  perf-profile.children.cycles-pp.sync_regs
      0.34 ±  2%      +0.1        0.42 ±  4%  perf-profile.children.cycles-pp.ttwu_queue_wakelist
      0.28 ±  3%      +0.1        0.37 ±  4%  perf-profile.children.cycles-pp.enqueue_entity
      0.28 ±  3%      +0.1        0.38 ±  6%  perf-profile.children.cycles-pp.__perf_sw_event
      0.32 ±  2%      +0.1        0.42 ±  5%  perf-profile.children.cycles-pp.___perf_sw_event
      0.34 ±  3%      +0.1        0.44 ±  4%  perf-profile.children.cycles-pp.enqueue_task_fair
      0.36 ±  2%      +0.1        0.46 ±  3%  perf-profile.children.cycles-pp.activate_task
      0.24 ±  2%      +0.1        0.35        perf-profile.children.cycles-pp.native_irq_return_iret
      0.30 ±  6%      +0.1        0.42 ± 10%  perf-profile.children.cycles-pp.xas_load
      0.31            +0.1        0.43 ±  3%  perf-profile.children.cycles-pp.folio_unlock
      0.44 ±  2%      +0.1        0.56 ±  4%  perf-profile.children.cycles-pp.ttwu_do_activate
      0.40 ±  6%      +0.2        0.56 ±  5%  perf-profile.children.cycles-pp._compound_head
      1.52            +0.2        1.68 ±  4%  perf-profile.children.cycles-pp.wake_page_function
      0.68 ±  3%      +0.2        0.86 ±  4%  perf-profile.children.cycles-pp.try_to_wake_up
      0.66 ±  2%      +0.2        0.84 ±  3%  perf-profile.children.cycles-pp.sched_ttwu_pending
      0.85 ±  2%      +0.2        1.09 ±  3%  perf-profile.children.cycles-pp.__flush_smp_call_function_queue
      0.79 ±  2%      +0.2        1.03 ±  4%  perf-profile.children.cycles-pp.flush_smp_call_function_queue
      1.83            +0.3        2.08 ±  4%  perf-profile.children.cycles-pp.__wake_up_common
      1.29            +0.3        1.60        perf-profile.children.cycles-pp.folio_add_file_rmap_range
      0.89 ±  9%      +0.4        1.24 ±  8%  perf-profile.children.cycles-pp.finish_fault
      1.24            +0.4        1.60        perf-profile.children.cycles-pp.irqentry_exit_to_user_mode
      1.68 ±  3%      +0.4        2.06 ±  2%  perf-profile.children.cycles-pp.set_pte_range
      1.50            +0.6        2.06        perf-profile.children.cycles-pp.filemap_get_entry
      3.42 ±  3%      +0.8        4.24        perf-profile.children.cycles-pp._raw_spin_lock_irq
      7.48            +0.9        8.41        perf-profile.children.cycles-pp.folio_wait_bit_common
      9.67 ±  4%      +1.4       11.07 ±  2%  perf-profile.children.cycles-pp._raw_spin_lock_irqsave
     12.08 ±  3%      +1.8       13.84        perf-profile.children.cycles-pp.folio_wake_bit
     10.15            +1.9       12.07        perf-profile.children.cycles-pp.shmem_get_folio_gfp
     11.80 ±  4%      +1.9       13.74 ±  2%  perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
     10.26            +2.0       12.25        perf-profile.children.cycles-pp.shmem_fault
     10.29            +2.0       12.29        perf-profile.children.cycles-pp.__do_fault
      8.59            +5.3       13.94 ±  2%  perf-profile.children.cycles-pp.do_rw_once
     35.10            -6.1       28.98 ±  2%  perf-profile.self.cycles-pp.next_uptodate_folio
      2.06 ±  7%      -1.9        0.11 ±  4%  perf-profile.self.cycles-pp.down_read_trylock
      1.28 ±  4%      -1.1        0.16 ±  3%  perf-profile.self.cycles-pp.up_read
      1.66 ±  6%      -1.0        0.68 ±  3%  perf-profile.self.cycles-pp.__handle_mm_fault
      7.20            -0.7        6.55        perf-profile.self.cycles-pp.filemap_map_pages
      0.64 ± 12%      -0.4        0.28 ± 15%  perf-profile.self.cycles-pp.intel_idle_irq
      0.36 ±  7%      -0.2        0.15        perf-profile.self.cycles-pp.mtree_range_walk
      0.30 ±  8%      -0.2        0.13 ± 14%  perf-profile.self.cycles-pp.mm_cid_get
      0.71 ±  8%      -0.1        0.59 ±  7%  perf-profile.self.cycles-pp.__schedule
      0.05 ±  8%      +0.0        0.06 ±  7%  perf-profile.self.cycles-pp.ttwu_do_activate
      0.08 ±  5%      +0.0        0.10 ±  4%  perf-profile.self.cycles-pp.do_idle
      0.06 ±  6%      +0.0        0.08 ±  6%  perf-profile.self.cycles-pp.enqueue_task_fair
      0.05 ±  8%      +0.0        0.07 ±  8%  perf-profile.self.cycles-pp.__update_load_avg_se
      0.09 ±  5%      +0.0        0.10 ±  4%  perf-profile.self.cycles-pp.xas_descend
      0.04 ± 44%      +0.0        0.06        perf-profile.self.cycles-pp.reweight_entity
      0.05 ±  7%      +0.0        0.07 ±  9%  perf-profile.self.cycles-pp.set_pte_range
      0.08 ±  6%      +0.0        0.10 ±  5%  perf-profile.self.cycles-pp.update_load_avg
      0.10 ±  4%      +0.0        0.12 ±  3%  perf-profile.self.cycles-pp.call_function_single_prep_ipi
      0.07 ±  5%      +0.0        0.09 ±  5%  perf-profile.self.cycles-pp.__update_load_avg_cfs_rq
      0.08 ±  6%      +0.0        0.10 ±  6%  perf-profile.self.cycles-pp.flush_smp_call_function_queue
      0.10 ±  4%      +0.0        0.13 ±  2%  perf-profile.self.cycles-pp.__flush_smp_call_function_queue
      0.16 ±  4%      +0.0        0.19 ±  3%  perf-profile.self.cycles-pp.__switch_to
      0.14 ±  3%      +0.0        0.16 ±  3%  perf-profile.self.cycles-pp.llist_add_batch
      0.09 ±  5%      +0.0        0.12 ±  6%  perf-profile.self.cycles-pp.available_idle_cpu
      0.08 ±  5%      +0.0        0.12 ±  6%  perf-profile.self.cycles-pp.enqueue_entity
      0.08 ±  5%      +0.0        0.12 ±  4%  perf-profile.self.cycles-pp.llist_reverse_order
      0.10 ±  4%      +0.0        0.13 ±  3%  perf-profile.self.cycles-pp.update_rq_clock_task
      0.08            +0.0        0.11        perf-profile.self.cycles-pp.__list_del_entry_valid_or_report
      0.11 ±  4%      +0.0        0.14 ±  4%  perf-profile.self.cycles-pp.__switch_to_asm
      0.09 ±  5%      +0.0        0.12 ±  8%  perf-profile.self.cycles-pp.ttwu_queue_wakelist
      0.12 ±  4%      +0.0        0.16 ±  6%  perf-profile.self.cycles-pp.xas_load
      0.00            +0.1        0.05        perf-profile.self.cycles-pp.sched_ttwu_pending
      0.00            +0.1        0.06        perf-profile.self.cycles-pp.asm_exc_page_fault
      0.11 ±  4%      +0.1        0.18 ±  4%  perf-profile.self.cycles-pp.shmem_fault
      0.17 ±  3%      +0.1        0.26 ±  2%  perf-profile.self.cycles-pp.sync_regs
      0.31 ±  2%      +0.1        0.40 ±  5%  perf-profile.self.cycles-pp.___perf_sw_event
      0.31 ±  2%      +0.1        0.40 ±  3%  perf-profile.self.cycles-pp.__wake_up_common
      0.24 ±  2%      +0.1        0.35        perf-profile.self.cycles-pp.native_irq_return_iret
      0.31            +0.1        0.43 ±  3%  perf-profile.self.cycles-pp.folio_unlock
      0.44 ±  3%      +0.1        0.57 ±  2%  perf-profile.self.cycles-pp._raw_spin_lock_irq
      0.68 ±  3%      +0.1        0.83 ±  2%  perf-profile.self.cycles-pp.folio_wake_bit
      0.85            +0.2        1.00 ±  3%  perf-profile.self.cycles-pp._raw_spin_lock_irqsave
      0.40 ±  5%      +0.2        0.56 ±  5%  perf-profile.self.cycles-pp._compound_head
      1.29            +0.3        1.59        perf-profile.self.cycles-pp.folio_add_file_rmap_range
      0.99            +0.3        1.30 ±  2%  perf-profile.self.cycles-pp.shmem_get_folio_gfp
      2.08            +0.3        2.39 ±  2%  perf-profile.self.cycles-pp.folio_wait_bit_common
      1.18            +0.4        1.55        perf-profile.self.cycles-pp.irqentry_exit_to_user_mode
      1.43            +0.5        1.90        perf-profile.self.cycles-pp.filemap_get_entry
      3.93            +1.9        5.85        perf-profile.self.cycles-pp.do_access
     11.80 ±  4%      +1.9       13.74 ±  2%  perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
      6.55            +4.5       11.08 ±  2%  perf-profile.self.cycles-pp.do_rw_once




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 4/6] mm: Handle COW faults under the VMA lock
  2023-10-06 19:53 ` [PATCH v2 4/6] mm: Handle COW " Matthew Wilcox (Oracle)
  2023-10-08 22:05   ` Suren Baghdasaryan
@ 2023-10-20 13:18   ` kernel test robot
  1 sibling, 0 replies; 16+ messages in thread
From: kernel test robot @ 2023-10-20 13:18 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: oe-lkp, lkp, linux-mm, ying.huang, feng.tang, fengwei.yin,
	Andrew Morton, Matthew Wilcox (Oracle),
	Suren Baghdasaryan, oliver.sang



Hello,

kernel test robot noticed a 38.7% improvement of will-it-scale.per_thread_ops on:


commit: 90e99527c746cd9ef7ebf0333c9611e45c6e5e1d ("[PATCH v2 4/6] mm: Handle COW faults under the VMA lock")
url: https://github.com/intel-lab-lkp/linux/commits/Matthew-Wilcox-Oracle/mm-Make-lock_folio_maybe_drop_mmap-VMA-lock-aware/20231007-035513
base: v6.6-rc4
patch link: https://lore.kernel.org/all/20231006195318.4087158-5-willy@infradead.org/
patch subject: [PATCH v2 4/6] mm: Handle COW faults under the VMA lock

testcase: will-it-scale
test machine: 104 threads 2 sockets (Skylake) with 192G memory
parameters:

	nr_task: 16
	mode: thread
	test: page_fault2
	cpufreq_governor: performance






Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20231020/202310201702.62f04f91-oliver.sang@intel.com

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
  gcc-12/performance/x86_64-rhel-8.3/thread/16/debian-11.1-x86_64-20220510.cgz/lkp-skl-fpga01/page_fault2/will-it-scale

commit: 
  c8b329d48e ("mm: Handle shared faults under the VMA lock")
  90e99527c7 ("mm: Handle COW faults under the VMA lock")

c8b329d48e0dac74 90e99527c746cd9ef7ebf0333c9 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
      1.11 ±  2%      +0.4        1.50        mpstat.cpu.all.usr%
    690.67 ± 20%     -35.3%     447.00 ±  6%  perf-c2c.HITM.local
     71432 ±  3%     -10.5%      63958        meminfo.Active
     70468 ±  3%     -10.4%      63142        meminfo.Active(anon)
 5.722e+08 ±  2%     +38.8%  7.942e+08        numa-numastat.node0.local_node
 5.723e+08 ±  2%     +38.8%  7.944e+08        numa-numastat.node0.numa_hit
      4746           -54.0%       2183        vmstat.system.cs
    106237            +1.7%     108086        vmstat.system.in
     69143 ±  4%     -10.2%      62107 ±  2%  numa-meminfo.node1.Active
     68750 ±  3%     -10.1%      61835        numa-meminfo.node1.Active(anon)
     70251 ±  4%      -9.8%      63348        numa-meminfo.node1.Shmem
   1889742 ±  2%     +38.7%    2621754        will-it-scale.16.threads
    118108 ±  2%     +38.7%     163859        will-it-scale.per_thread_ops
   1889742 ±  2%     +38.7%    2621754        will-it-scale.workload
 5.723e+08 ±  2%     +38.8%  7.944e+08        numa-vmstat.node0.numa_hit
 5.722e+08 ±  2%     +38.8%  7.942e+08        numa-vmstat.node0.numa_local
     17189 ±  3%     -10.1%      15458        numa-vmstat.node1.nr_active_anon
     17563 ±  4%      -9.8%      15837        numa-vmstat.node1.nr_shmem
     17189 ±  3%     -10.1%      15458        numa-vmstat.node1.nr_zone_active_anon
     66914 ± 10%     -54.3%      30547 ±  4%  turbostat.C1
      0.07 ± 18%      -0.1        0.02 ± 33%  turbostat.C1%
    513918 ±  3%     -74.2%     132621 ±  2%  turbostat.C1E
      0.54 ±  4%      -0.4        0.16 ±  4%  turbostat.C1E%
      0.11           +18.2%       0.13        turbostat.IPC
    218.42            +2.0%     222.83        turbostat.PkgWatt
     30.47           +13.3%      34.53        turbostat.RAMWatt
    720.36           +24.0%     893.56 ±  4%  sched_debug.cfs_rq:/.runnable_avg.max
    225.47 ±  7%     +16.4%     262.37        sched_debug.cfs_rq:/.runnable_avg.stddev
    713.28           +25.3%     893.53 ±  4%  sched_debug.cfs_rq:/.util_avg.max
    224.87 ±  7%     +16.6%     262.19        sched_debug.cfs_rq:/.util_avg.stddev
     72.59 ± 49%     +63.1%     118.38 ± 11%  sched_debug.cfs_rq:/.util_est_enqueued.avg
    605.14 ±  4%     +40.7%     851.22        sched_debug.cfs_rq:/.util_est_enqueued.max
    151.28 ± 22%     +64.0%     248.15 ±  5%  sched_debug.cfs_rq:/.util_est_enqueued.stddev
      8811           -42.4%       5078        sched_debug.cpu.nr_switches.avg
     17617 ±  3%     -10.4%      15785        proc-vmstat.nr_active_anon
    332941            +4.6%     348206        proc-vmstat.nr_anon_pages
    855626            +1.7%     870502        proc-vmstat.nr_inactive_anon
     17617 ±  3%     -10.4%      15785        proc-vmstat.nr_zone_active_anon
    855626            +1.7%     870502        proc-vmstat.nr_zone_inactive_anon
 5.729e+08 ±  2%     +38.8%   7.95e+08        proc-vmstat.numa_hit
 5.727e+08 ±  2%     +38.8%  7.948e+08        proc-vmstat.numa_local
     16509 ±  4%     -13.0%      14365        proc-vmstat.pgactivate
 5.724e+08 ±  2%     +38.7%   7.94e+08        proc-vmstat.pgalloc_normal
 5.704e+08 ±  2%     +38.8%  7.914e+08        proc-vmstat.pgfault
 5.723e+08 ±  2%     +38.7%   7.94e+08        proc-vmstat.pgfree
      0.00 ± 37%    +164.7%       0.01 ±  6%  perf-sched.sch_delay.avg.ms.__cond_resched.tlb_batch_pages_flush.zap_pte_range.zap_pmd_range.isra
      0.02 ± 12%     +26.4%       0.02 ± 10%  perf-sched.sch_delay.avg.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.do_syscall_64
      0.00 ±223%   +9466.7%       0.05 ±181%  perf-sched.sch_delay.avg.ms.io_schedule.folio_wait_bit_common.filemap_fault.__do_fault
      0.02 ± 94%     -61.2%       0.01 ± 11%  perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.do_select.core_sys_select.kern_select
      0.00 ±  8%   +1068.0%       0.05 ±189%  perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.vm_mmap_pgoff
      0.01 ± 14%     +52.6%       0.01 ± 34%  perf-sched.sch_delay.max.ms.__cond_resched.tlb_batch_pages_flush.zap_pte_range.zap_pmd_range.isra
      0.01 ±  9%  +10802.8%       0.65 ±212%  perf-sched.sch_delay.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt
      0.00 ±223%  +10533.3%       0.05 ±162%  perf-sched.sch_delay.max.ms.io_schedule.folio_wait_bit_common.filemap_fault.__do_fault
     62.95 ±  2%    +113.8%     134.58 ±  2%  perf-sched.total_wait_and_delay.average.ms
     13913           -52.2%       6654        perf-sched.total_wait_and_delay.count.ms
     62.87 ±  2%    +113.8%     134.44 ±  2%  perf-sched.total_wait_time.average.ms
      2.95 ±  3%   +1477.8%      46.48 ±  2%  perf-sched.wait_and_delay.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read_killable.lock_mm_and_find_vma
      1.18 ±  7%   +2017.8%      24.99 ±  2%  perf-sched.wait_and_delay.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.__vm_munmap
      2.76 ±  3%    -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.vm_mmap_pgoff
      6894 ±  2%     -94.4%     384.67        perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read_killable.lock_mm_and_find_vma
      1070 ± 11%     -60.9%     418.33        perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.__vm_munmap
    112.33 ± 13%    -100.0%       0.00        perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.vm_mmap_pgoff
     15.07 ± 30%    +469.9%      85.90 ±  4%  perf-sched.wait_and_delay.max.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read_killable.lock_mm_and_find_vma
     11.68 ± 17%    +558.0%      76.85 ± 11%  perf-sched.wait_and_delay.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.__vm_munmap
     14.21 ± 27%    -100.0%       0.00        perf-sched.wait_and_delay.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.vm_mmap_pgoff
     17.20 ± 29%     -69.9%       5.17 ±  7%  perf-sched.wait_and_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
      3893 ±  8%     -19.2%       3144 ± 19%  perf-sched.wait_and_delay.max.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
      2.99 ± 28%    +906.8%      30.07 ± 12%  perf-sched.wait_time.avg.ms.__cond_resched.__alloc_pages.__folio_alloc.vma_alloc_folio.do_cow_fault
      3.59 ± 49%    +796.7%      32.22 ± 11%  perf-sched.wait_time.avg.ms.__cond_resched.shmem_get_folio_gfp.shmem_fault.__do_fault.do_cow_fault
      1.81 ± 75%   +2169.9%      41.07 ± 29%  perf-sched.wait_time.avg.ms.__cond_resched.tlb_batch_pages_flush.zap_pte_range.zap_pmd_range.isra
      3.46 ±101%   +1224.0%      45.81 ± 30%  perf-sched.wait_time.avg.ms.__cond_resched.zap_pmd_range.isra.0.unmap_page_range
      3.15 ± 29%    +943.4%      32.88 ±  7%  perf-sched.wait_time.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_exc_page_fault
      2.88 ± 50%    +922.9%      29.44 ± 11%  perf-sched.wait_time.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt
      2.94 ±  3%   +1481.0%      46.47 ±  2%  perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read_killable.lock_mm_and_find_vma
      1.18 ±  7%   +2023.3%      24.96 ±  3%  perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.__vm_munmap
      2.76 ±  3%   +1449.8%      42.73 ±  9%  perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.vm_mmap_pgoff
     10.38 ±  3%    +533.8%      65.76 ±  7%  perf-sched.wait_time.max.ms.__cond_resched.__alloc_pages.__folio_alloc.vma_alloc_folio.do_cow_fault
      9.13 ± 26%    +596.6%      63.59 ± 11%  perf-sched.wait_time.max.ms.__cond_resched.shmem_get_folio_gfp.shmem_fault.__do_fault.do_cow_fault
      6.77 ± 70%    +843.3%      63.87 ± 30%  perf-sched.wait_time.max.ms.__cond_resched.tlb_batch_pages_flush.zap_pte_range.zap_pmd_range.isra
      5.71 ± 64%   +1111.5%      69.19 ± 15%  perf-sched.wait_time.max.ms.__cond_resched.zap_pmd_range.isra.0.unmap_page_range
     10.23 ±  4%    +560.7%      67.56 ±  6%  perf-sched.wait_time.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_exc_page_fault
      8.83 ± 30%    +582.4%      60.23 ±  7%  perf-sched.wait_time.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt
     15.06 ± 30%    +470.1%      85.89 ±  4%  perf-sched.wait_time.max.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read_killable.lock_mm_and_find_vma
     11.67 ± 17%    +558.2%      76.84 ± 11%  perf-sched.wait_time.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.__vm_munmap
     14.21 ± 27%    +429.5%      75.22 ±  9%  perf-sched.wait_time.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.vm_mmap_pgoff
     17.16 ± 28%     -69.9%       5.16 ±  7%  perf-sched.wait_time.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
      3893 ±  8%     -19.2%       3144 ± 19%  perf-sched.wait_time.max.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
     14.12           +16.6%      16.46        perf-stat.i.MPKI
 2.231e+09 ±  2%     +16.6%  2.601e+09        perf-stat.i.branch-instructions
  19953628            +8.8%   21705347        perf-stat.i.branch-misses
     51.96 ±  2%     +13.0       65.01        perf-stat.i.cache-miss-rate%
 1.566e+08 ±  2%     +36.8%  2.142e+08        perf-stat.i.cache-misses
 3.015e+08 ±  3%      +9.2%  3.294e+08        perf-stat.i.cache-references
      4702           -55.0%       2116        perf-stat.i.context-switches
      2.58           -14.2%       2.22        perf-stat.i.cpi
    114.64            -2.2%     112.13        perf-stat.i.cpu-migrations
    183.46           -26.2%     135.46        perf-stat.i.cycles-between-cache-misses
   4280505 ±  3%     +22.7%    5251081 ±  6%  perf-stat.i.dTLB-load-misses
 2.774e+09 ±  2%     +19.1%  3.303e+09        perf-stat.i.dTLB-loads
      0.98 ±  2%      +0.2        1.14        perf-stat.i.dTLB-store-miss-rate%
  15927669 ±  4%     +38.8%   22110291        perf-stat.i.dTLB-store-misses
 1.604e+09 ±  2%     +19.9%  1.923e+09        perf-stat.i.dTLB-stores
     79.86            +3.1       82.95        perf-stat.i.iTLB-load-miss-rate%
   2701759 ±  2%     +19.0%    3214102        perf-stat.i.iTLB-load-misses
    679352            -2.8%     660048        perf-stat.i.iTLB-loads
 1.115e+10 ±  2%     +17.1%  1.305e+10        perf-stat.i.instructions
      0.39           +16.8%       0.45        perf-stat.i.ipc
      0.29 ± 26%     -31.6%       0.20 ± 17%  perf-stat.i.major-faults
    762.98 ±  2%     +39.2%       1062        perf-stat.i.metric.K/sec
     66.44 ±  2%     +18.0%      78.42        perf-stat.i.metric.M/sec
   1890049 ±  2%     +38.5%    2616916        perf-stat.i.minor-faults
  47044113 ±  2%     +41.1%   66393293        perf-stat.i.node-loads
  11825548 ±  2%     +34.0%   15841684        perf-stat.i.node-stores
   1890049 ±  2%     +38.5%    2616917        perf-stat.i.page-faults
     14.05           +16.9%      16.42        perf-stat.overall.MPKI
      0.89            -0.1        0.83        perf-stat.overall.branch-miss-rate%
     51.96 ±  2%     +13.1       65.04        perf-stat.overall.cache-miss-rate%
      2.57           -14.4%       2.20        perf-stat.overall.cpi
    183.08           -26.7%     134.14        perf-stat.overall.cycles-between-cache-misses
      0.98 ±  2%      +0.2        1.14        perf-stat.overall.dTLB-store-miss-rate%
     79.90            +3.1       82.97        perf-stat.overall.iTLB-load-miss-rate%
      0.39           +16.7%       0.45        perf-stat.overall.ipc
      0.22 ±  2%      -0.1        0.15 ±  3%  perf-stat.overall.node-load-miss-rate%
      0.19 ±  8%      -0.1        0.13 ± 16%  perf-stat.overall.node-store-miss-rate%
   1779185           -15.5%    1503815        perf-stat.overall.path-length
 2.224e+09 ±  2%     +16.6%  2.593e+09        perf-stat.ps.branch-instructions
  19885795            +8.8%   21625880        perf-stat.ps.branch-misses
  1.56e+08 ±  2%     +36.8%  2.135e+08        perf-stat.ps.cache-misses
 3.005e+08 ±  3%      +9.2%  3.283e+08        perf-stat.ps.cache-references
      4686           -55.0%       2109        perf-stat.ps.context-switches
    114.35            -2.3%     111.73        perf-stat.ps.cpu-migrations
   4265367 ±  3%     +22.7%    5233761 ±  6%  perf-stat.ps.dTLB-load-misses
 2.765e+09 ±  2%     +19.1%  3.292e+09        perf-stat.ps.dTLB-loads
  15874379 ±  4%     +38.8%   22037238        perf-stat.ps.dTLB-store-misses
 1.598e+09 ±  2%     +19.9%  1.917e+09        perf-stat.ps.dTLB-stores
   2692499 ±  2%     +19.0%    3203465        perf-stat.ps.iTLB-load-misses
    677243            -2.9%     657791        perf-stat.ps.iTLB-loads
 1.111e+10 ±  2%     +17.1%    1.3e+10        perf-stat.ps.instructions
      0.29 ± 26%     -31.6%       0.20 ± 17%  perf-stat.ps.major-faults
   1883712 ±  2%     +38.5%    2608263        perf-stat.ps.minor-faults
  46887454 ±  2%     +41.1%   66175688        perf-stat.ps.node-loads
  11785781 ±  2%     +34.0%   15789100        perf-stat.ps.node-stores
   1883712 ±  2%     +38.5%    2608264        perf-stat.ps.page-faults
 3.362e+12 ±  2%     +17.3%  3.943e+12        perf-stat.total.instructions
     47.03 ±  2%      -8.6       38.45        perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
     47.22 ±  2%      -8.6       38.67        perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.testcase
      8.30 ±  6%      -8.3        0.00        perf-profile.calltrace.cycles-pp.lock_mm_and_find_vma.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
      7.19 ±  4%      -7.2        0.00        perf-profile.calltrace.cycles-pp.down_read_trylock.lock_mm_and_find_vma.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
     57.96 ±  3%      -4.7       53.23        perf-profile.calltrace.cycles-pp.asm_exc_page_fault.testcase
     61.72 ±  3%      -3.3       58.42        perf-profile.calltrace.cycles-pp.testcase
      2.19 ± 13%      -0.6        1.59 ±  6%  perf-profile.calltrace.cycles-pp.lock_vma_under_rcu.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
      0.91 ±  8%      +0.2        1.09 ±  7%  perf-profile.calltrace.cycles-pp.__mem_cgroup_charge.do_cow_fault.do_fault.__handle_mm_fault.handle_mm_fault
      0.56 ±  2%      +0.2        0.78 ±  5%  perf-profile.calltrace.cycles-pp.__perf_sw_event.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
      1.11 ±  4%      +0.2        1.34 ±  4%  perf-profile.calltrace.cycles-pp.filemap_get_entry.shmem_get_folio_gfp.shmem_fault.__do_fault.do_cow_fault
      0.86 ±  6%      +0.3        1.13 ±  4%  perf-profile.calltrace.cycles-pp.lru_add_fn.folio_batch_move_lru.folio_add_lru_vma.set_pte_range.finish_fault
      1.42 ±  3%      +0.3        1.77 ±  2%  perf-profile.calltrace.cycles-pp.shmem_get_folio_gfp.shmem_fault.__do_fault.do_cow_fault.do_fault
      0.87 ±  6%      +0.4        1.27 ±  3%  perf-profile.calltrace.cycles-pp.__free_one_page.free_pcppages_bulk.free_unref_page_list.release_pages.tlb_batch_pages_flush
      1.66 ±  3%      +0.4        2.10        perf-profile.calltrace.cycles-pp.shmem_fault.__do_fault.do_cow_fault.do_fault.__handle_mm_fault
      0.54 ± 45%      +0.4        0.98 ±  4%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_add_lru_vma
      0.96 ±  6%      +0.4        1.40 ±  2%  perf-profile.calltrace.cycles-pp.free_pcppages_bulk.free_unref_page_list.release_pages.tlb_batch_pages_flush.zap_pte_range
      1.23 ±  4%      +0.4        1.68        perf-profile.calltrace.cycles-pp.sync_regs.asm_exc_page_fault.testcase
      0.26 ±100%      +0.5        0.72 ±  8%  perf-profile.calltrace.cycles-pp.folio_add_new_anon_rmap.set_pte_range.finish_fault.do_cow_fault.do_fault
      1.74 ±  3%      +0.5        2.22        perf-profile.calltrace.cycles-pp.__do_fault.do_cow_fault.do_fault.__handle_mm_fault.handle_mm_fault
      0.59 ± 45%      +0.5        1.06 ±  3%  perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_add_lru_vma.set_pte_range
      0.89 ±  5%      +0.5        1.36        perf-profile.calltrace.cycles-pp._compound_head.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas
      0.60 ± 45%      +0.5        1.08 ±  3%  perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_add_lru_vma.set_pte_range.finish_fault
      0.00            +0.5        0.52 ±  2%  perf-profile.calltrace.cycles-pp.tlb_batch_pages_flush.tlb_finish_mmu.unmap_region.do_vmi_align_munmap.do_vmi_munmap
      0.00            +0.5        0.52 ±  2%  perf-profile.calltrace.cycles-pp.tlb_finish_mmu.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
      0.08 ±223%      +0.6        0.67        perf-profile.calltrace.cycles-pp.page_remove_rmap.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas
      1.56 ±  4%      +0.6        2.18 ±  2%  perf-profile.calltrace.cycles-pp.free_unref_page_list.release_pages.tlb_batch_pages_flush.zap_pte_range.zap_pmd_range
      0.00            +0.6        0.63 ±  5%  perf-profile.calltrace.cycles-pp.___perf_sw_event.__perf_sw_event.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
      0.00            +0.7        0.66 ±  2%  perf-profile.calltrace.cycles-pp.__perf_sw_event.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
      2.04 ±  8%      +0.9        2.91        perf-profile.calltrace.cycles-pp.folio_batch_move_lru.folio_add_lru_vma.set_pte_range.finish_fault.do_cow_fault
      2.16 ±  7%      +0.9        3.10 ±  2%  perf-profile.calltrace.cycles-pp.folio_add_lru_vma.set_pte_range.finish_fault.do_cow_fault.do_fault
      2.80 ±  4%      +1.0        3.76        perf-profile.calltrace.cycles-pp.irqentry_exit_to_user_mode.asm_exc_page_fault.testcase
      2.93 ±  5%      +1.1        4.06        perf-profile.calltrace.cycles-pp._raw_spin_lock.__pte_offset_map_lock.finish_fault.do_cow_fault.do_fault
      3.11 ±  7%      +1.1        4.24 ±  2%  perf-profile.calltrace.cycles-pp.set_pte_range.finish_fault.do_cow_fault.do_fault.__handle_mm_fault
      3.15 ±  4%      +1.2        4.31        perf-profile.calltrace.cycles-pp.error_entry.testcase
      3.05 ±  5%      +1.2        4.23        perf-profile.calltrace.cycles-pp.__pte_offset_map_lock.finish_fault.do_cow_fault.do_fault.__handle_mm_fault
      3.21 ±  3%      +1.2        4.41        perf-profile.calltrace.cycles-pp.__irqentry_text_end.testcase
      2.62 ±  6%      +1.4        3.98        perf-profile.calltrace.cycles-pp.release_pages.tlb_batch_pages_flush.zap_pte_range.zap_pmd_range.unmap_page_range
      2.78 ±  6%      +1.4        4.20        perf-profile.calltrace.cycles-pp.tlb_batch_pages_flush.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas
      0.70 ± 48%      +1.7        2.38 ±  3%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.rmqueue_bulk.rmqueue.get_page_from_freelist
      0.71 ± 48%      +1.7        2.39 ±  2%  perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.rmqueue_bulk.rmqueue.get_page_from_freelist.__alloc_pages
      1.98 ± 10%      +1.7        3.66        perf-profile.calltrace.cycles-pp.rmqueue_bulk.rmqueue.get_page_from_freelist.__alloc_pages.__folio_alloc
      2.43 ±  9%      +1.8        4.25        perf-profile.calltrace.cycles-pp.rmqueue.get_page_from_freelist.__alloc_pages.__folio_alloc.vma_alloc_folio
      2.64 ±  8%      +1.9        4.55        perf-profile.calltrace.cycles-pp.get_page_from_freelist.__alloc_pages.__folio_alloc.vma_alloc_folio.do_cow_fault
      3.07 ±  8%      +2.1        5.13        perf-profile.calltrace.cycles-pp.__alloc_pages.__folio_alloc.vma_alloc_folio.do_cow_fault.do_fault
      3.15 ±  8%      +2.1        5.25        perf-profile.calltrace.cycles-pp.__folio_alloc.vma_alloc_folio.do_cow_fault.do_fault.__handle_mm_fault
      4.46 ±  5%      +2.3        6.72        perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas.unmap_region
      4.47 ±  5%      +2.3        6.74        perf-profile.calltrace.cycles-pp.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
      4.47 ±  5%      +2.3        6.74        perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap
      4.47 ±  5%      +2.3        6.74        perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.unmap_vmas.unmap_region.do_vmi_align_munmap
      6.38 ±  6%      +2.3        8.65        perf-profile.calltrace.cycles-pp.finish_fault.do_cow_fault.do_fault.__handle_mm_fault.handle_mm_fault
      3.64 ±  7%      +2.3        5.97        perf-profile.calltrace.cycles-pp.vma_alloc_folio.do_cow_fault.do_fault.__handle_mm_fault.handle_mm_fault
      4.81 ±  6%      +2.5        7.28        perf-profile.calltrace.cycles-pp.__munmap
      4.81 ±  6%      +2.5        7.28        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap
      4.81 ±  6%      +2.5        7.28        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
      4.81 ±  6%      +2.5        7.28        perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
      4.81 ±  6%      +2.5        7.28        perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
      4.79 ±  6%      +2.5        7.27        perf-profile.calltrace.cycles-pp.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap
      4.80 ±  6%      +2.5        7.28        perf-profile.calltrace.cycles-pp.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
      4.80 ±  6%      +2.5        7.28        perf-profile.calltrace.cycles-pp.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64
     31.04 ±  3%      +3.1       34.10        perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
     32.18 ±  3%      +3.2       35.42        perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
     10.32 ±  4%      +3.6       13.90        perf-profile.calltrace.cycles-pp.copy_page.do_cow_fault.do_fault.__handle_mm_fault.handle_mm_fault
     23.83 ±  5%      +9.0       32.85        perf-profile.calltrace.cycles-pp.do_cow_fault.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
     23.96 ±  5%      +9.0       33.00        perf-profile.calltrace.cycles-pp.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
     47.11 ±  2%      -8.6       38.50        perf-profile.children.cycles-pp.do_user_addr_fault
     47.25 ±  2%      -8.5       38.70        perf-profile.children.cycles-pp.exc_page_fault
      8.31 ±  6%      -8.3        0.00        perf-profile.children.cycles-pp.lock_mm_and_find_vma
      7.32 ±  4%      -7.1        0.18 ±  9%  perf-profile.children.cycles-pp.down_read_trylock
     54.76 ±  3%      -5.9       48.89        perf-profile.children.cycles-pp.asm_exc_page_fault
      3.55 ±  3%      -3.4        0.18 ±  8%  perf-profile.children.cycles-pp.up_read
     63.31 ±  3%      -2.7       60.56        perf-profile.children.cycles-pp.testcase
      2.19 ± 13%      -0.6        1.59 ±  6%  perf-profile.children.cycles-pp.lock_vma_under_rcu
      0.55 ± 10%      -0.2        0.37 ±  6%  perf-profile.children.cycles-pp.mtree_range_walk
      0.30 ± 11%      -0.1        0.18 ± 10%  perf-profile.children.cycles-pp.handle_pte_fault
      0.20 ± 13%      -0.1        0.12 ±  9%  perf-profile.children.cycles-pp.pte_offset_map_nolock
      0.14 ± 10%      -0.1        0.07 ± 10%  perf-profile.children.cycles-pp.access_error
      0.08 ± 14%      -0.0        0.04 ± 45%  perf-profile.children.cycles-pp.intel_idle
      0.07 ± 11%      +0.0        0.10 ± 12%  perf-profile.children.cycles-pp.xas_start
      0.05 ± 46%      +0.0        0.08 ±  7%  perf-profile.children.cycles-pp.policy_node
      0.11 ±  7%      +0.0        0.14 ± 12%  perf-profile.children.cycles-pp.folio_unlock
      0.15 ±  6%      +0.0        0.20 ± 10%  perf-profile.children.cycles-pp._raw_spin_trylock
      0.11 ± 10%      +0.0        0.15 ±  7%  perf-profile.children.cycles-pp.get_pfnblock_flags_mask
      0.12 ± 12%      +0.0        0.17 ±  6%  perf-profile.children.cycles-pp.mem_cgroup_update_lru_size
      0.13 ± 10%      +0.0        0.18 ±  5%  perf-profile.children.cycles-pp.uncharge_folio
      0.15 ±  8%      +0.0        0.20 ±  7%  perf-profile.children.cycles-pp.__mem_cgroup_uncharge_list
      0.11 ± 10%      +0.0        0.16 ±  6%  perf-profile.children.cycles-pp.shmem_get_policy
      0.15 ±  7%      +0.0        0.20 ±  4%  perf-profile.children.cycles-pp.try_charge_memcg
      0.13 ±  9%      +0.0        0.18 ±  9%  perf-profile.children.cycles-pp.cgroup_rstat_updated
      0.01 ±223%      +0.1        0.06 ± 23%  perf-profile.children.cycles-pp.perf_swevent_event
      0.20 ± 10%      +0.1        0.26 ±  4%  perf-profile.children.cycles-pp.__mod_zone_page_state
      0.17 ±  9%      +0.1        0.23 ±  8%  perf-profile.children.cycles-pp.__count_memcg_events
      0.14 ± 11%      +0.1        0.20 ±  2%  perf-profile.children.cycles-pp.free_swap_cache
      0.20 ±  6%      +0.1        0.25 ±  3%  perf-profile.children.cycles-pp.free_unref_page_prepare
      0.04 ± 45%      +0.1        0.10 ± 19%  perf-profile.children.cycles-pp.kthread_blkcg
      0.14 ±  8%      +0.1        0.20 ±  3%  perf-profile.children.cycles-pp.free_pages_and_swap_cache
      0.24 ±  8%      +0.1        0.30 ±  6%  perf-profile.children.cycles-pp.__list_add_valid_or_report
      0.23 ±  9%      +0.1        0.30 ±  4%  perf-profile.children.cycles-pp.free_unref_page_commit
      0.46 ±  4%      +0.1        0.55 ±  2%  perf-profile.children.cycles-pp.xas_load
      0.00            +0.1        0.11 ±  9%  perf-profile.children.cycles-pp.exit_to_user_mode_prepare
      0.34 ±  3%      +0.1        0.47 ±  6%  perf-profile.children.cycles-pp.charge_memcg
      0.32 ±  8%      +0.1        0.47 ±  6%  perf-profile.children.cycles-pp.__mod_memcg_lruvec_state
      0.00            +0.2        0.15 ± 16%  perf-profile.children.cycles-pp.put_page
      0.25 ±  7%      +0.2        0.41 ±  5%  perf-profile.children.cycles-pp.__mod_node_page_state
      0.20 ± 15%      +0.2        0.36 ± 12%  perf-profile.children.cycles-pp.blk_cgroup_congested
      1.42 ±  4%      +0.2        1.58        perf-profile.children.cycles-pp.__list_del_entry_valid_or_report
      0.23 ± 16%      +0.2        0.42 ± 10%  perf-profile.children.cycles-pp.__folio_throttle_swaprate
      0.36 ±  4%      +0.2        0.56 ±  5%  perf-profile.children.cycles-pp.__mod_lruvec_state
      0.91 ±  8%      +0.2        1.11 ±  7%  perf-profile.children.cycles-pp.__mem_cgroup_charge
      0.32 ±  9%      +0.2        0.53 ±  2%  perf-profile.children.cycles-pp.tlb_finish_mmu
      0.45 ±  6%      +0.2        0.68        perf-profile.children.cycles-pp.page_remove_rmap
      1.11 ±  4%      +0.2        1.34 ±  4%  perf-profile.children.cycles-pp.filemap_get_entry
      0.47 ± 12%      +0.2        0.72 ±  8%  perf-profile.children.cycles-pp.folio_add_new_anon_rmap
      0.47 ± 11%      +0.3        0.74 ±  6%  perf-profile.children.cycles-pp.__mod_lruvec_page_state
      0.88 ±  6%      +0.3        1.17 ±  4%  perf-profile.children.cycles-pp.lru_add_fn
      0.85 ±  2%      +0.3        1.16 ±  3%  perf-profile.children.cycles-pp.___perf_sw_event
      1.43 ±  4%      +0.3        1.78 ±  2%  perf-profile.children.cycles-pp.shmem_get_folio_gfp
      1.06 ±  2%      +0.4        1.47 ±  2%  perf-profile.children.cycles-pp.__perf_sw_event
      1.66 ±  3%      +0.4        2.10        perf-profile.children.cycles-pp.shmem_fault
      0.97 ±  6%      +0.5        1.44 ±  3%  perf-profile.children.cycles-pp.__free_one_page
      1.27 ±  4%      +0.5        1.74        perf-profile.children.cycles-pp.sync_regs
      1.75 ±  4%      +0.5        2.22        perf-profile.children.cycles-pp.__do_fault
      1.06 ±  6%      +0.5        1.58 ±  2%  perf-profile.children.cycles-pp.free_pcppages_bulk
      0.92 ±  5%      +0.5        1.45        perf-profile.children.cycles-pp._compound_head
      1.75 ±  5%      +0.6        2.36        perf-profile.children.cycles-pp.native_irq_return_iret
      1.74 ±  4%      +0.7        2.47 ±  2%  perf-profile.children.cycles-pp.free_unref_page_list
      0.83 ± 18%      +0.8        1.65 ±  2%  perf-profile.children.cycles-pp.folio_lruvec_lock_irqsave
      2.04 ±  8%      +0.9        2.92        perf-profile.children.cycles-pp.folio_batch_move_lru
      2.17 ±  7%      +0.9        3.11 ±  2%  perf-profile.children.cycles-pp.folio_add_lru_vma
      2.85 ±  4%      +1.0        3.82        perf-profile.children.cycles-pp.irqentry_exit_to_user_mode
      3.01 ±  5%      +1.1        4.14        perf-profile.children.cycles-pp._raw_spin_lock
      3.12 ±  7%      +1.1        4.26 ±  3%  perf-profile.children.cycles-pp.set_pte_range
      3.20 ±  3%      +1.2        4.37        perf-profile.children.cycles-pp.error_entry
      3.06 ±  5%      +1.2        4.24        perf-profile.children.cycles-pp.__pte_offset_map_lock
      3.22 ±  3%      +1.2        4.41        perf-profile.children.cycles-pp.__irqentry_text_end
      3.09 ±  6%      +1.6        4.70        perf-profile.children.cycles-pp.release_pages
      3.09 ±  6%      +1.6        4.72        perf-profile.children.cycles-pp.tlb_batch_pages_flush
      1.98 ± 10%      +1.7        3.67        perf-profile.children.cycles-pp.rmqueue_bulk
      2.44 ±  9%      +1.8        4.27        perf-profile.children.cycles-pp.rmqueue
      2.66 ±  8%      +1.9        4.57        perf-profile.children.cycles-pp.get_page_from_freelist
      3.14 ±  7%      +2.1        5.23        perf-profile.children.cycles-pp.__alloc_pages
      3.17 ±  8%      +2.1        5.28        perf-profile.children.cycles-pp.__folio_alloc
      4.48 ±  5%      +2.3        6.75        perf-profile.children.cycles-pp.unmap_vmas
      4.48 ±  5%      +2.3        6.75        perf-profile.children.cycles-pp.unmap_page_range
      4.48 ±  5%      +2.3        6.75        perf-profile.children.cycles-pp.zap_pmd_range
      4.48 ±  5%      +2.3        6.75        perf-profile.children.cycles-pp.zap_pte_range
      6.39 ±  6%      +2.3        8.68        perf-profile.children.cycles-pp.finish_fault
      3.68 ±  7%      +2.4        6.03        perf-profile.children.cycles-pp.vma_alloc_folio
      1.56 ± 21%      +2.4        3.92        perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
      1.66 ± 19%      +2.4        4.08        perf-profile.children.cycles-pp._raw_spin_lock_irqsave
      4.97 ±  5%      +2.4        7.42        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
      4.96 ±  5%      +2.4        7.42        perf-profile.children.cycles-pp.do_syscall_64
      4.81 ±  6%      +2.5        7.28        perf-profile.children.cycles-pp.__munmap
      4.81 ±  6%      +2.5        7.28        perf-profile.children.cycles-pp.__x64_sys_munmap
      4.81 ±  6%      +2.5        7.28        perf-profile.children.cycles-pp.__vm_munmap
      4.80 ±  6%      +2.5        7.28        perf-profile.children.cycles-pp.do_vmi_munmap
      4.80 ±  6%      +2.5        7.28        perf-profile.children.cycles-pp.do_vmi_align_munmap
      4.80 ±  6%      +2.5        7.27        perf-profile.children.cycles-pp.unmap_region
     31.08 ±  3%      +3.1       34.13        perf-profile.children.cycles-pp.__handle_mm_fault
     32.25 ±  3%      +3.2       35.50        perf-profile.children.cycles-pp.handle_mm_fault
     10.33 ±  4%      +3.6       13.92        perf-profile.children.cycles-pp.copy_page
     23.97 ±  5%      +9.0       33.01        perf-profile.children.cycles-pp.do_fault
     23.88 ±  5%      +9.1       32.95        perf-profile.children.cycles-pp.do_cow_fault
      7.29 ±  4%      -7.1        0.18 ± 10%  perf-profile.self.cycles-pp.down_read_trylock
      6.77 ±  4%      -5.8        0.93 ±  8%  perf-profile.self.cycles-pp.__handle_mm_fault
      3.51 ±  3%      -3.3        0.18 ± 10%  perf-profile.self.cycles-pp.up_read
      0.54 ± 10%      -0.2        0.36 ±  6%  perf-profile.self.cycles-pp.mtree_range_walk
      0.10 ± 18%      -0.1        0.04 ± 72%  perf-profile.self.cycles-pp.handle_pte_fault
      0.12 ±  7%      -0.1        0.07 ± 10%  perf-profile.self.cycles-pp.access_error
      0.10 ± 18%      -0.1        0.05 ± 47%  perf-profile.self.cycles-pp.pte_offset_map_nolock
      0.08 ± 11%      -0.0        0.04 ± 44%  perf-profile.self.cycles-pp.do_fault
      0.08 ± 14%      -0.0        0.04 ± 45%  perf-profile.self.cycles-pp.intel_idle
      0.09 ±  6%      +0.0        0.11 ±  5%  perf-profile.self.cycles-pp.free_unref_page_prepare
      0.06 ±  7%      +0.0        0.09 ±  4%  perf-profile.self.cycles-pp.free_pcppages_bulk
      0.09 ±  6%      +0.0        0.12 ±  6%  perf-profile.self.cycles-pp.rmqueue_bulk
      0.10 ±  6%      +0.0        0.13 ± 10%  perf-profile.self.cycles-pp.charge_memcg
      0.11 ±  8%      +0.0        0.14 ±  4%  perf-profile.self.cycles-pp.__mod_lruvec_state
      0.08 ± 12%      +0.0        0.11 ±  9%  perf-profile.self.cycles-pp.__pte_offset_map_lock
      0.10 ± 10%      +0.0        0.14 ± 11%  perf-profile.self.cycles-pp.folio_unlock
      0.12 ± 11%      +0.0        0.16 ±  8%  perf-profile.self.cycles-pp.uncharge_folio
      0.12 ± 15%      +0.0        0.16 ±  4%  perf-profile.self.cycles-pp._raw_spin_lock_irqsave
      0.10 ±  9%      +0.0        0.14 ±  5%  perf-profile.self.cycles-pp.get_pfnblock_flags_mask
      0.10 ±  7%      +0.0        0.15 ±  3%  perf-profile.self.cycles-pp.try_charge_memcg
      0.04 ± 71%      +0.0        0.08 ± 16%  perf-profile.self.cycles-pp.__do_fault
      0.15 ±  6%      +0.0        0.20 ± 10%  perf-profile.self.cycles-pp._raw_spin_trylock
      0.11 ±  9%      +0.0        0.16 ±  6%  perf-profile.self.cycles-pp.mem_cgroup_update_lru_size
      0.13 ±  9%      +0.0        0.17 ±  8%  perf-profile.self.cycles-pp.set_pte_range
      0.10 ±  9%      +0.0        0.15 ±  5%  perf-profile.self.cycles-pp.shmem_get_policy
      0.18 ±  9%      +0.0        0.24 ±  4%  perf-profile.self.cycles-pp.__mod_zone_page_state
      0.14 ± 10%      +0.0        0.18 ±  2%  perf-profile.self.cycles-pp.free_swap_cache
      0.10 ± 11%      +0.1        0.16 ± 10%  perf-profile.self.cycles-pp.exc_page_fault
      0.19 ± 11%      +0.1        0.24 ±  5%  perf-profile.self.cycles-pp.free_unref_page_commit
      0.12 ± 12%      +0.1        0.17 ± 10%  perf-profile.self.cycles-pp.cgroup_rstat_updated
      0.14 ±  8%      +0.1        0.20 ±  5%  perf-profile.self.cycles-pp.asm_exc_page_fault
      0.01 ±223%      +0.1        0.06 ± 23%  perf-profile.self.cycles-pp.perf_swevent_event
      0.17 ±  7%      +0.1        0.22 ±  7%  perf-profile.self.cycles-pp.xas_load
      0.16 ±  9%      +0.1        0.22 ±  5%  perf-profile.self.cycles-pp.folio_add_new_anon_rmap
      0.20 ±  8%      +0.1        0.26 ±  4%  perf-profile.self.cycles-pp.free_unref_page_list
      0.06 ± 14%      +0.1        0.13 ± 21%  perf-profile.self.cycles-pp.__mem_cgroup_charge
      0.22 ±  9%      +0.1        0.28 ±  7%  perf-profile.self.cycles-pp.__list_add_valid_or_report
      0.13 ±  6%      +0.1        0.19 ±  7%  perf-profile.self.cycles-pp.folio_add_lru_vma
      0.22 ±  7%      +0.1        0.29 ±  5%  perf-profile.self.cycles-pp.rmqueue
      0.24 ±  5%      +0.1        0.31 ±  6%  perf-profile.self.cycles-pp.shmem_fault
      0.21 ±  6%      +0.1        0.29 ±  4%  perf-profile.self.cycles-pp.get_page_from_freelist
      0.22 ±  7%      +0.1        0.30 ±  4%  perf-profile.self.cycles-pp.__perf_sw_event
      0.00            +0.1        0.10 ±  9%  perf-profile.self.cycles-pp.exit_to_user_mode_prepare
      0.29 ±  4%      +0.1        0.39 ±  6%  perf-profile.self.cycles-pp.shmem_get_folio_gfp
      0.32 ±  7%      +0.1        0.44 ±  4%  perf-profile.self.cycles-pp.zap_pte_range
      0.24 ±  9%      +0.1        0.36 ±  6%  perf-profile.self.cycles-pp.__mod_memcg_lruvec_state
      0.16 ± 15%      +0.1        0.29 ± 11%  perf-profile.self.cycles-pp.blk_cgroup_congested
      0.35 ±  7%      +0.1        0.48 ±  4%  perf-profile.self.cycles-pp.folio_batch_move_lru
      0.39 ±  5%      +0.1        0.53 ±  5%  perf-profile.self.cycles-pp.__alloc_pages
      0.31 ±  8%      +0.1        0.45 ±  2%  perf-profile.self.cycles-pp.vma_alloc_folio
      0.29 ±  9%      +0.1        0.44 ±  4%  perf-profile.self.cycles-pp.page_remove_rmap
      0.65 ±  7%      +0.1        0.80 ±  7%  perf-profile.self.cycles-pp.filemap_get_entry
      0.44 ±  7%      +0.2        0.59 ±  2%  perf-profile.self.cycles-pp.lru_add_fn
      0.00            +0.2        0.15 ± 16%  perf-profile.self.cycles-pp.put_page
      0.24 ±  7%      +0.2        0.39 ±  6%  perf-profile.self.cycles-pp.__mod_node_page_state
      1.41 ±  4%      +0.2        1.57        perf-profile.self.cycles-pp.__list_del_entry_valid_or_report
      0.57 ±  8%      +0.2        0.81 ±  5%  perf-profile.self.cycles-pp.release_pages
      0.75 ±  2%      +0.3        1.03 ±  2%  perf-profile.self.cycles-pp.___perf_sw_event
      0.91 ±  6%      +0.5        1.37 ±  3%  perf-profile.self.cycles-pp.__free_one_page
      1.27 ±  4%      +0.5        1.74        perf-profile.self.cycles-pp.sync_regs
      0.90 ±  5%      +0.5        1.42        perf-profile.self.cycles-pp._compound_head
      1.74 ±  5%      +0.6        2.36        perf-profile.self.cycles-pp.native_irq_return_iret
      2.82 ±  5%      +0.9        3.72        perf-profile.self.cycles-pp.irqentry_exit_to_user_mode
      2.99 ±  5%      +1.1        4.11        perf-profile.self.cycles-pp._raw_spin_lock
      3.18 ±  4%      +1.2        4.34        perf-profile.self.cycles-pp.error_entry
      3.22 ±  3%      +1.2        4.41        perf-profile.self.cycles-pp.__irqentry_text_end
      3.70 ±  4%      +1.3        5.00        perf-profile.self.cycles-pp.testcase
      1.56 ± 21%      +2.4        3.92        perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
     10.29 ±  4%      +3.6       13.86        perf-profile.self.cycles-pp.copy_page




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 3/6] mm: Handle shared faults under the VMA lock
  2023-10-06 19:53 ` [PATCH v2 3/6] mm: Handle shared faults " Matthew Wilcox (Oracle)
  2023-10-08 22:01   ` Suren Baghdasaryan
@ 2023-10-20 13:23   ` kernel test robot
  1 sibling, 0 replies; 16+ messages in thread
From: kernel test robot @ 2023-10-20 13:23 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: oe-lkp, lkp, linux-mm, ying.huang, feng.tang, fengwei.yin,
	Andrew Morton, Matthew Wilcox (Oracle),
	Suren Baghdasaryan, oliver.sang



Hello,

kernel test robot noticed a 67.5% improvement of stress-ng.fault.minor_page_faults_per_sec on:


commit: c8b329d48e0dac7438168a1857c3f67d4e23fed0 ("[PATCH v2 3/6] mm: Handle shared faults under the VMA lock")
url: https://github.com/intel-lab-lkp/linux/commits/Matthew-Wilcox-Oracle/mm-Make-lock_folio_maybe_drop_mmap-VMA-lock-aware/20231007-035513
base: v6.6-rc4
patch link: https://lore.kernel.org/all/20231006195318.4087158-4-willy@infradead.org/
patch subject: [PATCH v2 3/6] mm: Handle shared faults under the VMA lock

testcase: stress-ng
test machine: 36 threads 1 sockets Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz (Cascade Lake) with 128G memory
parameters:

	nr_threads: 1
	disk: 1HDD
	testtime: 60s
	fs: ext4
	class: os
	test: fault
	cpufreq_governor: performance


In addition to that, the commit also has significant impact on the following tests:

+------------------+----------------------------------------------------------------------------------------------------+
| testcase: change | will-it-scale: will-it-scale.per_thread_ops 274.8% improvement                                     |
| test machine     | 224 threads 4 sockets Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with 192G memory |
| test parameters  | cpufreq_governor=performance                                                                       |
|                  | mode=thread                                                                                        |
|                  | nr_task=50%                                                                                        |
|                  | test=page_fault3                                                                                   |
+------------------+----------------------------------------------------------------------------------------------------+




Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20231020/202310201857.d7db939a-oliver.sang@intel.com

=========================================================================================
class/compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  os/gcc-12/performance/1HDD/ext4/x86_64-rhel-8.3/1/debian-11.1-x86_64-20220510.cgz/lkp-csl-d02/fault/stress-ng/60s

commit: 
  34611600bf ("mm: Call wp_page_copy() under the VMA lock")
  c8b329d48e ("mm: Handle shared faults under the VMA lock")

34611600bfd1bf9f c8b329d48e0dac7438168a1857c 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
    157941 ±  6%     +20.3%     190026 ± 11%  meminfo.DirectMap4k
      0.05            +0.0        0.05        perf-stat.i.dTLB-store-miss-rate%
     51205          -100.0%       0.03 ± 81%  perf-stat.i.major-faults
     79003           +65.6%     130837        perf-stat.i.minor-faults
     50394          -100.0%       0.03 ± 81%  perf-stat.ps.major-faults
     77754           +65.6%     128767        perf-stat.ps.minor-faults
     53411          -100.0%       0.00 ±223%  stress-ng.fault.major_page_faults_per_sec
     80118           +67.5%     134204        stress-ng.fault.minor_page_faults_per_sec
      1417            -4.7%       1350        stress-ng.fault.nanosecs_per_page_fault
   3204300          -100.0%       0.33 ±141%  stress-ng.time.major_page_faults
   4815857           +67.3%    8059294        stress-ng.time.minor_page_faults
      0.01 ± 68%    +224.2%       0.03 ± 51%  perf-sched.sch_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
      0.55 ± 95%    +368.6%       2.56 ± 35%  perf-sched.sch_delay.max.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
      0.05 ± 70%    +168.2%       0.12 ± 32%  perf-sched.wait_time.avg.ms.__cond_resched.__ext4_handle_dirty_metadata.ext4_mb_clear_bb.ext4_remove_blocks.ext4_ext_rm_leaf
      0.05 ± 73%    +114.3%       0.10 ± 13%  perf-sched.wait_time.avg.ms.__cond_resched.zap_pmd_range.isra.0.unmap_page_range
      0.09 ± 78%     +79.2%       0.17 ±  8%  perf-sched.wait_time.max.ms.__cond_resched.__ext4_handle_dirty_metadata.ext4_delete_entry.__ext4_unlink.ext4_unlink
      0.05 ± 70%    +229.6%       0.15 ± 21%  perf-sched.wait_time.max.ms.__cond_resched.__ext4_handle_dirty_metadata.ext4_mb_clear_bb.ext4_remove_blocks.ext4_ext_rm_leaf
      0.03 ±151%    +260.5%       0.12 ± 35%  perf-sched.wait_time.max.ms.__cond_resched.down_write_killable.vm_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.03 ±100%    +183.8%       0.10 ± 35%  perf-sched.wait_time.max.ms.__cond_resched.ext4_journal_check_start.__ext4_journal_start_sb.ext4_alloc_file_blocks.isra
      0.08 ± 79%    +134.1%       0.18 ± 36%  perf-sched.wait_time.max.ms.__cond_resched.zap_pmd_range.isra.0.unmap_page_range
     11.65            -0.8       10.82 ±  2%  perf-profile.calltrace.cycles-pp.stress_fault
      9.42            -0.8        8.61 ±  2%  perf-profile.calltrace.cycles-pp.asm_exc_page_fault.stress_fault
      8.84            -0.8        8.07 ±  3%  perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.stress_fault
      8.74            -0.7        8.00 ±  3%  perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.stress_fault
      7.56 ±  2%      -0.5        7.04 ±  3%  perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.stress_fault
      6.99 ±  2%      -0.5        6.51 ±  3%  perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
     11.10            -0.9       10.24 ±  2%  perf-profile.children.cycles-pp.asm_exc_page_fault
     12.38            -0.8       11.54        perf-profile.children.cycles-pp.stress_fault
      8.92            -0.8        8.14 ±  3%  perf-profile.children.cycles-pp.exc_page_fault
      8.84            -0.8        8.07 ±  3%  perf-profile.children.cycles-pp.do_user_addr_fault
      7.63 ±  2%      -0.5        7.09 ±  2%  perf-profile.children.cycles-pp.handle_mm_fault
      7.06 ±  2%      -0.5        6.56 ±  3%  perf-profile.children.cycles-pp.__handle_mm_fault
      0.36 ±  8%      -0.2        0.19 ±  8%  perf-profile.children.cycles-pp.lock_mm_and_find_vma
      1.46 ±  4%      -0.1        1.33 ±  5%  perf-profile.children.cycles-pp.page_cache_ra_unbounded
      0.40 ±  5%      -0.1        0.34 ±  8%  perf-profile.children.cycles-pp.mas_next_slot
      0.22 ± 13%      -0.1        0.17 ± 14%  perf-profile.children.cycles-pp.__mod_memcg_lruvec_state
      0.03 ±100%      +0.0        0.07 ±  5%  perf-profile.children.cycles-pp.housekeeping_test_cpu
      0.44 ±  4%      -0.1        0.32 ± 13%  perf-profile.self.cycles-pp.__handle_mm_fault
      0.67 ±  9%      -0.1        0.54 ± 10%  perf-profile.self.cycles-pp.mtree_range_walk
      0.58 ±  7%      -0.1        0.49 ±  6%  perf-profile.self.cycles-pp.percpu_counter_add_batch
      0.16 ±  7%      -0.1        0.10 ± 20%  perf-profile.self.cycles-pp.madvise_cold_or_pageout_pte_range
      0.39 ±  5%      -0.1        0.33 ±  8%  perf-profile.self.cycles-pp.mas_next_slot
      0.26 ±  6%      +0.0        0.29 ± 10%  perf-profile.self.cycles-pp.filemap_fault


***************************************************************************************************
lkp-cpl-4sp2: 224 threads 4 sockets Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with 192G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
  gcc-12/performance/x86_64-rhel-8.3/thread/50%/debian-11.1-x86_64-20220510.cgz/lkp-cpl-4sp2/page_fault3/will-it-scale

commit: 
  34611600bf ("mm: Call wp_page_copy() under the VMA lock")
  c8b329d48e ("mm: Handle shared faults under the VMA lock")

34611600bfd1bf9f c8b329d48e0dac7438168a1857c 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
     46289           +39.6%      64618 ±  2%  uptime.idle
 3.839e+10           +47.5%  5.663e+10 ±  2%  cpuidle..time
  44548500           +28.6%   57277226 ±  2%  cpuidle..usage
    244.33 ±  7%     +92.0%     469.17 ± 14%  perf-c2c.DRAM.local
    563.00 ±  3%     -60.7%     221.00 ± 15%  perf-c2c.HITM.remote
    554762           -20.3%     441916        meminfo.Inactive
    554566           -20.3%     441725        meminfo.Inactive(anon)
   7360875           +46.0%   10746773 ±  2%  meminfo.Mapped
     20123           +28.9%      25930        meminfo.PageTables
     56.22           +46.8%      82.52 ±  2%  vmstat.cpu.id
     63.54 ±  8%     -39.5%      38.45 ± 14%  vmstat.procs.r
     23694           -84.7%       3627        vmstat.system.cs
    455148 ±  2%    +123.1%    1015448 ±  7%  vmstat.system.in
   2882478 ±  2%    +274.8%   10804264 ±  5%  will-it-scale.112.threads
     55.72           +47.6%      82.27 ±  2%  will-it-scale.112.threads_idle
     25736 ±  2%    +274.8%      96466 ±  5%  will-it-scale.per_thread_ops
   2882478 ±  2%    +274.8%   10804264 ±  5%  will-it-scale.workload
     55.97           +26.4       82.36 ±  2%  mpstat.cpu.all.idle%
      0.82            -0.1        0.70 ±  4%  mpstat.cpu.all.irq%
      0.11 ±  4%      -0.1        0.05 ±  5%  mpstat.cpu.all.soft%
     42.51           -26.9       15.64 ± 11%  mpstat.cpu.all.sys%
      0.59 ± 17%      +0.7        1.25 ± 41%  mpstat.cpu.all.usr%
   1841712           +44.8%    2666713 ±  3%  numa-meminfo.node0.Mapped
      5224 ±  3%     +31.6%       6877 ±  4%  numa-meminfo.node0.PageTables
   1845678           +46.3%    2699787 ±  2%  numa-meminfo.node1.Mapped
      5064 ±  5%     +23.0%       6231 ±  2%  numa-meminfo.node1.PageTables
   1826141           +47.6%    2694729 ±  2%  numa-meminfo.node2.Mapped
      4794 ±  2%     +30.8%       6269 ±  3%  numa-meminfo.node2.PageTables
   1868742 ±  2%     +44.9%    2708096 ±  3%  numa-meminfo.node3.Mapped
      5026 ±  5%     +28.8%       6474 ±  4%  numa-meminfo.node3.PageTables
   1591430 ±  4%     +70.8%    2718150 ±  3%  numa-numastat.node0.local_node
   1673574 ±  3%     +68.6%    2821949 ±  3%  numa-numastat.node0.numa_hit
   1577936 ±  6%     +74.8%    2757801 ±  2%  numa-numastat.node1.local_node
   1645522 ±  5%     +73.0%    2847142 ±  3%  numa-numastat.node1.numa_hit
   1537208 ±  3%     +77.3%    2725353 ±  2%  numa-numastat.node2.local_node
   1639749 ±  3%     +71.4%    2811161 ±  2%  numa-numastat.node2.numa_hit
   1637504 ±  5%     +72.8%    2829154 ±  5%  numa-numastat.node3.local_node
   1732850 ±  4%     +67.2%    2898001 ±  3%  numa-numastat.node3.numa_hit
      1684           -59.8%     677.17 ± 13%  turbostat.Avg_MHz
     44.43           -26.6       17.86 ± 13%  turbostat.Busy%
  44289096           +28.7%   57018721 ±  2%  turbostat.C1
     56.21           +26.6       82.76 ±  2%  turbostat.C1%
     55.57           +47.8%      82.14 ±  2%  turbostat.CPU%c1
      0.01          +533.3%       0.06 ± 17%  turbostat.IPC
 2.014e+08 ±  3%    +174.5%  5.527e+08 ±  7%  turbostat.IRQ
     43515 ±  3%     +37.9%      59997 ±  3%  turbostat.POLL
    685.24           -22.4%     532.03 ±  3%  turbostat.PkgWatt
     17.33            +5.6%      18.30        turbostat.RAMWatt
    458598           +45.2%     666035 ±  3%  numa-vmstat.node0.nr_mapped
      1305 ±  3%     +32.0%       1723 ±  4%  numa-vmstat.node0.nr_page_table_pages
   1673564 ±  3%     +68.6%    2822055 ±  3%  numa-vmstat.node0.numa_hit
   1591420 ±  4%     +70.8%    2718256 ±  3%  numa-vmstat.node0.numa_local
    461362           +46.3%     674878 ±  2%  numa-vmstat.node1.nr_mapped
      1266 ±  4%     +23.2%       1559 ±  2%  numa-vmstat.node1.nr_page_table_pages
   1645442 ±  5%     +73.0%    2847103 ±  3%  numa-vmstat.node1.numa_hit
   1577856 ±  6%     +74.8%    2757762 ±  2%  numa-vmstat.node1.numa_local
    456314           +47.5%     672973 ±  2%  numa-vmstat.node2.nr_mapped
      1198 ±  2%     +31.0%       1569 ±  3%  numa-vmstat.node2.nr_page_table_pages
   1639701 ±  3%     +71.4%    2811174 ±  2%  numa-vmstat.node2.numa_hit
   1537161 ±  3%     +77.3%    2725366 ±  2%  numa-vmstat.node2.numa_local
    464153           +46.1%     677988 ±  3%  numa-vmstat.node3.nr_mapped
      1255 ±  5%     +29.1%       1621 ±  4%  numa-vmstat.node3.nr_page_table_pages
   1732732 ±  4%     +67.3%    2898025 ±  3%  numa-vmstat.node3.numa_hit
   1637386 ±  5%     +72.8%    2829178 ±  5%  numa-vmstat.node3.numa_local
    104802            -2.5%     102214        proc-vmstat.nr_anon_pages
   4433098            -1.0%    4389891        proc-vmstat.nr_file_pages
    138599           -20.3%     110426        proc-vmstat.nr_inactive_anon
   1842991           +45.8%    2687030 ±  2%  proc-vmstat.nr_mapped
      5030           +28.9%       6483        proc-vmstat.nr_page_table_pages
   3710638            -1.2%    3667429        proc-vmstat.nr_shmem
    138599           -20.3%     110426        proc-vmstat.nr_zone_inactive_anon
     43540 ±  7%     -82.6%       7576 ± 47%  proc-vmstat.numa_hint_faults
     26753 ± 10%     -77.6%       5982 ± 56%  proc-vmstat.numa_hint_faults_local
   6693986           +70.0%   11381806 ±  3%  proc-vmstat.numa_hit
   6346365           +73.9%   11034009 ±  3%  proc-vmstat.numa_local
     21587 ± 31%     -92.2%       1683 ± 58%  proc-vmstat.numa_pages_migrated
    197966           -81.7%      36131 ± 25%  proc-vmstat.numa_pte_updates
   3749632            -1.1%    3708618        proc-vmstat.pgactivate
   6848722           +68.4%   11532638 ±  3%  proc-vmstat.pgalloc_normal
 8.677e+08 ±  2%    +276.2%  3.265e+09 ±  5%  proc-vmstat.pgfault
   6646708           +72.1%   11436096 ±  3%  proc-vmstat.pgfree
     21587 ± 31%     -92.2%       1683 ± 58%  proc-vmstat.pgmigrate_success
     54536 ±  8%     -24.2%      41332 ±  3%  proc-vmstat.pgreuse
   6305732           -84.8%     961479 ± 36%  sched_debug.cfs_rq:/.avg_vruntime.avg
  10700237           -83.0%    1820191 ± 34%  sched_debug.cfs_rq:/.avg_vruntime.max
   1797215 ± 18%     -93.8%     112003 ± 80%  sched_debug.cfs_rq:/.avg_vruntime.min
   1512854 ±  2%     -75.4%     372673 ± 31%  sched_debug.cfs_rq:/.avg_vruntime.stddev
      0.13 ± 20%     +62.6%       0.21 ± 26%  sched_debug.cfs_rq:/.h_nr_running.avg
      0.33 ±  8%     +17.2%       0.39 ±  9%  sched_debug.cfs_rq:/.h_nr_running.stddev
      4781 ± 82%    -100.0%       0.12 ±223%  sched_debug.cfs_rq:/.left_vruntime.avg
    804679 ± 78%    -100.0%      27.67 ±223%  sched_debug.cfs_rq:/.left_vruntime.max
     61817 ± 80%    -100.0%       1.84 ±223%  sched_debug.cfs_rq:/.left_vruntime.stddev
      2654 ± 21%    +156.8%       6815 ± 18%  sched_debug.cfs_rq:/.load.avg
   6305732           -84.8%     961479 ± 36%  sched_debug.cfs_rq:/.min_vruntime.avg
  10700237           -83.0%    1820191 ± 34%  sched_debug.cfs_rq:/.min_vruntime.max
   1797215 ± 18%     -93.8%     112003 ± 80%  sched_debug.cfs_rq:/.min_vruntime.min
   1512854 ±  2%     -75.4%     372673 ± 31%  sched_debug.cfs_rq:/.min_vruntime.stddev
      0.13 ± 20%     +63.4%       0.21 ± 26%  sched_debug.cfs_rq:/.nr_running.avg
      0.33 ±  7%     +18.1%       0.39 ±  9%  sched_debug.cfs_rq:/.nr_running.stddev
      4781 ± 82%    -100.0%       0.12 ±223%  sched_debug.cfs_rq:/.right_vruntime.avg
    804679 ± 78%    -100.0%      27.67 ±223%  sched_debug.cfs_rq:/.right_vruntime.max
     61817 ± 80%    -100.0%       1.84 ±223%  sched_debug.cfs_rq:/.right_vruntime.stddev
    495.58 ±  3%     -56.6%     214.98 ± 24%  sched_debug.cfs_rq:/.runnable_avg.avg
      1096 ±  7%     -13.4%     949.07 ±  3%  sched_debug.cfs_rq:/.runnable_avg.max
    359.89           -23.0%     277.09 ± 11%  sched_debug.cfs_rq:/.runnable_avg.stddev
    493.94 ±  3%     -56.5%     214.69 ± 24%  sched_debug.cfs_rq:/.util_avg.avg
    359.20           -22.9%     276.81 ± 11%  sched_debug.cfs_rq:/.util_avg.stddev
     97.00 ± 24%     +76.3%     171.06 ± 31%  sched_debug.cfs_rq:/.util_est_enqueued.avg
   1512762 ±  4%     -35.1%     981444        sched_debug.cpu.avg_idle.avg
   5146368 ± 10%     -71.3%    1476288 ± 33%  sched_debug.cpu.avg_idle.max
    578157 ±  8%     -68.8%     180178 ± 10%  sched_debug.cpu.avg_idle.min
    670957 ±  5%     -83.7%     109591 ± 26%  sched_debug.cpu.avg_idle.stddev
     73.60 ± 11%     -81.3%      13.79 ±  9%  sched_debug.cpu.clock.stddev
    650.52 ± 18%     +58.1%       1028 ± 14%  sched_debug.cpu.curr->pid.avg
      1959 ±  7%     +19.6%       2342 ±  6%  sched_debug.cpu.curr->pid.stddev
    924262 ±  3%     -45.6%     502853        sched_debug.cpu.max_idle_balance_cost.avg
   2799134 ± 10%     -70.8%     817753 ± 35%  sched_debug.cpu.max_idle_balance_cost.max
    377335 ±  9%     -93.4%      24979 ± 94%  sched_debug.cpu.max_idle_balance_cost.stddev
      0.00 ±  8%     -59.7%       0.00 ± 55%  sched_debug.cpu.next_balance.stddev
      0.10 ± 17%     +57.8%       0.15 ± 14%  sched_debug.cpu.nr_running.avg
      1.28 ±  6%     -19.6%       1.03 ±  6%  sched_debug.cpu.nr_running.max
      0.29 ±  7%     +19.1%       0.35 ±  6%  sched_debug.cpu.nr_running.stddev
     17163           -79.4%       3534 ±  5%  sched_debug.cpu.nr_switches.avg
      7523 ± 10%     -87.2%     961.21 ± 12%  sched_debug.cpu.nr_switches.min
      0.33 ±  5%     -18.7%       0.27 ±  6%  sched_debug.cpu.nr_uninterruptible.avg
      0.00          -100.0%       0.00        sched_debug.rt_rq:.rt_nr_migratory.avg
      0.17          -100.0%       0.00        sched_debug.rt_rq:.rt_nr_migratory.max
      0.01          -100.0%       0.00        sched_debug.rt_rq:.rt_nr_migratory.stddev
      0.00          -100.0%       0.00        sched_debug.rt_rq:.rt_nr_running.avg
      0.17          -100.0%       0.00        sched_debug.rt_rq:.rt_nr_running.max
      0.01          -100.0%       0.00        sched_debug.rt_rq:.rt_nr_running.stddev
      0.18 ± 34%    -100.0%       0.00        sched_debug.rt_rq:.rt_time.avg
     40.73 ± 34%    -100.0%       0.00        sched_debug.rt_rq:.rt_time.max
      2.72 ± 34%    -100.0%       0.00        sched_debug.rt_rq:.rt_time.stddev
  2.63e+09          +165.9%  6.995e+09 ±  5%  perf-stat.i.branch-instructions
      0.45            -0.2        0.22 ±  3%  perf-stat.i.branch-miss-rate%
  12246564           +25.8%   15409530 ±  3%  perf-stat.i.branch-misses
     40.05 ±  6%      +5.9       45.95        perf-stat.i.cache-miss-rate%
     23716           -85.2%       3516        perf-stat.i.context-switches
     30.25 ±  2%     -84.8%       4.58 ± 18%  perf-stat.i.cpi
 3.785e+11           -60.2%  1.507e+11 ± 13%  perf-stat.i.cpu-cycles
    270.31            -6.3%     253.41        perf-stat.i.cpu-migrations
      9670 ± 38%     -79.6%       1972 ± 22%  perf-stat.i.cycles-between-cache-misses
      0.03 ±  4%      -0.0        0.01 ± 10%  perf-stat.i.dTLB-load-miss-rate%
    958512 ±  3%     +20.5%    1154691 ±  5%  perf-stat.i.dTLB-load-misses
  3.15e+09          +172.1%  8.571e+09 ±  5%  perf-stat.i.dTLB-loads
      4.91            +1.6        6.54        perf-stat.i.dTLB-store-miss-rate%
  87894742 ±  2%    +276.4%  3.308e+08 ±  5%  perf-stat.i.dTLB-store-misses
 1.709e+09          +176.5%  4.725e+09 ±  5%  perf-stat.i.dTLB-stores
     78.59           +13.6       92.23        perf-stat.i.iTLB-load-miss-rate%
   8890053          +168.7%   23884564 ±  6%  perf-stat.i.iTLB-load-misses
   2405390           -17.2%    1990833        perf-stat.i.iTLB-loads
 1.257e+10          +164.2%  3.323e+10 ±  5%  perf-stat.i.instructions
      0.03 ±  4%    +571.5%       0.23 ± 17%  perf-stat.i.ipc
      1.69           -60.1%       0.67 ± 13%  perf-stat.i.metric.GHz
     33.38          +175.7%      92.04 ±  5%  perf-stat.i.metric.M/sec
   2877597 ±  2%    +274.4%   10773809 ±  5%  perf-stat.i.minor-faults
     86.68            -3.1       83.57 ±  2%  perf-stat.i.node-load-miss-rate%
   5637384 ± 17%    +105.0%   11559372 ± 20%  perf-stat.i.node-load-misses
    857520 ± 10%    +158.1%    2213133 ±  9%  perf-stat.i.node-loads
     46.44           -16.5       29.97        perf-stat.i.node-store-miss-rate%
   2608818           +80.4%    4705017 ±  3%  perf-stat.i.node-store-misses
   3024158 ±  2%    +264.6%   11026854 ±  5%  perf-stat.i.node-stores
   2877597 ±  2%    +274.4%   10773809 ±  5%  perf-stat.i.page-faults
      0.47            -0.2        0.22 ±  3%  perf-stat.overall.branch-miss-rate%
     39.95 ±  6%      +6.0       45.93        perf-stat.overall.cache-miss-rate%
     30.11 ±  2%     -84.8%       4.58 ± 18%  perf-stat.overall.cpi
      9647 ± 38%     -79.6%       1971 ± 21%  perf-stat.overall.cycles-between-cache-misses
      0.03 ±  4%      -0.0        0.01 ± 10%  perf-stat.overall.dTLB-load-miss-rate%
      4.89            +1.7        6.54        perf-stat.overall.dTLB-store-miss-rate%
     78.70           +13.6       92.28        perf-stat.overall.iTLB-load-miss-rate%
      0.03 ±  2%    +579.6%       0.23 ± 17%  perf-stat.overall.ipc
     86.61            -3.0       83.59 ±  2%  perf-stat.overall.node-load-miss-rate%
     46.33           -16.4       29.93        perf-stat.overall.node-store-miss-rate%
   1315354           -29.2%     931848        perf-stat.overall.path-length
 2.621e+09          +166.0%   6.97e+09 ±  5%  perf-stat.ps.branch-instructions
  12198295           +25.8%   15340303 ±  3%  perf-stat.ps.branch-misses
     23623           -85.2%       3499        perf-stat.ps.context-switches
  3.77e+11           -60.2%  1.502e+11 ± 13%  perf-stat.ps.cpu-cycles
    266.50            -5.2%     252.58        perf-stat.ps.cpu-migrations
    961568 ±  4%     +19.7%    1150857 ±  5%  perf-stat.ps.dTLB-load-misses
 3.138e+09          +172.1%  8.541e+09 ±  5%  perf-stat.ps.dTLB-loads
  87534988 ±  2%    +276.7%  3.297e+08 ±  5%  perf-stat.ps.dTLB-store-misses
 1.702e+09          +176.6%  4.708e+09 ±  5%  perf-stat.ps.dTLB-stores
   8850726          +169.0%   23812332 ±  6%  perf-stat.ps.iTLB-load-misses
   2394294           -17.1%    1983791        perf-stat.ps.iTLB-loads
 1.253e+10          +164.3%  3.311e+10 ±  5%  perf-stat.ps.instructions
   2865432 ±  2%    +274.7%   10737502 ±  5%  perf-stat.ps.minor-faults
   5615782 ± 17%    +105.2%   11521907 ± 20%  perf-stat.ps.node-load-misses
    856502 ± 10%    +157.6%    2206185 ±  9%  perf-stat.ps.node-loads
   2598165           +80.5%    4689182 ±  3%  perf-stat.ps.node-store-misses
   3009612 ±  2%    +265.1%   10987528 ±  5%  perf-stat.ps.node-stores
   2865432 ±  2%    +274.7%   10737502 ±  5%  perf-stat.ps.page-faults
 3.791e+12          +165.5%  1.006e+13 ±  5%  perf-stat.total.instructions
      0.05 ± 17%     -77.2%       0.01 ± 73%  perf-sched.sch_delay.avg.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      0.07 ± 34%     -90.1%       0.01 ± 99%  perf-sched.sch_delay.avg.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
      0.00 ± 33%    +377.8%       0.01 ±  9%  perf-sched.sch_delay.avg.ms.__cond_resched.tlb_batch_pages_flush.zap_pte_range.zap_pmd_range.isra
      0.08 ± 25%     -90.3%       0.01 ± 12%  perf-sched.sch_delay.avg.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      0.12 ± 64%     -91.3%       0.01 ±  8%  perf-sched.sch_delay.avg.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
      0.06 ± 37%     -89.4%       0.01 ± 16%  perf-sched.sch_delay.avg.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
      0.03 ± 19%     -80.0%       0.01 ± 34%  perf-sched.sch_delay.avg.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64
      0.21 ±100%     -97.3%       0.01 ±  6%  perf-sched.sch_delay.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_exc_page_fault
      0.02 ± 57%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm
      0.04 ± 34%     -88.0%       0.00 ± 15%  perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll
      0.08 ± 21%     -90.3%       0.01 ± 25%  perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
      0.06 ± 80%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read_killable.lock_mm_and_find_vma
      0.00 ± 19%    +173.3%       0.01 ±  5%  perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.vm_mmap_pgoff
      0.02 ± 10%     -87.3%       0.00        perf-sched.sch_delay.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
      0.05 ± 16%     -85.9%       0.01 ±  7%  perf-sched.sch_delay.avg.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
      0.03 ± 10%     -74.0%       0.01 ± 27%  perf-sched.sch_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      0.09 ± 40%     -91.2%       0.01 ± 17%  perf-sched.sch_delay.avg.ms.syslog_print.do_syslog.kmsg_read.vfs_read
      0.01 ± 16%     -74.6%       0.00        perf-sched.sch_delay.avg.ms.wait_for_partner.fifo_open.do_dentry_open.do_open
      0.12 ± 66%     -93.3%       0.01 ± 19%  perf-sched.sch_delay.max.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
      0.79 ± 40%     -96.9%       0.02 ±186%  perf-sched.sch_delay.max.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
      0.00 ± 10%    +123.1%       0.01 ± 22%  perf-sched.sch_delay.max.ms.__cond_resched.tlb_batch_pages_flush.zap_pte_range.zap_pmd_range.isra
      0.13 ± 14%     -92.2%       0.01 ± 27%  perf-sched.sch_delay.max.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      0.15 ± 54%     -91.6%       0.01 ± 17%  perf-sched.sch_delay.max.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
      0.14 ± 43%     -93.0%       0.01 ± 27%  perf-sched.sch_delay.max.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
     24.13 ±116%     -99.9%       0.01 ± 15%  perf-sched.sch_delay.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_exc_page_fault
      0.06 ± 63%    -100.0%       0.00        perf-sched.sch_delay.max.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm
      0.22 ± 39%     -95.5%       0.01 ± 15%  perf-sched.sch_delay.max.ms.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll
      0.22 ± 60%     -93.8%       0.01 ± 20%  perf-sched.sch_delay.max.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
     65.91 ± 71%    -100.0%       0.00        perf-sched.sch_delay.max.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read_killable.lock_mm_and_find_vma
     13.37 ±143%    +703.9%     107.48 ± 64%  perf-sched.sch_delay.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.__vm_munmap
      0.01 ± 31%    +121.2%       0.02 ± 31%  perf-sched.sch_delay.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.vm_mmap_pgoff
      0.28 ± 14%     -96.4%       0.01 ± 29%  perf-sched.sch_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
      0.19 ± 16%     -93.0%       0.01 ± 23%  perf-sched.sch_delay.max.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
      0.16 ± 50%     -93.2%       0.01 ± 27%  perf-sched.sch_delay.max.ms.syslog_print.do_syslog.kmsg_read.vfs_read
      0.10 ±  9%     -94.3%       0.01 ± 37%  perf-sched.sch_delay.max.ms.wait_for_partner.fifo_open.do_dentry_open.do_open
     31.83 ±  4%    +503.4%     192.03 ±  4%  perf-sched.total_wait_and_delay.average.ms
     61361           -82.4%      10800 ±  5%  perf-sched.total_wait_and_delay.count.ms
     31.76 ±  4%    +502.8%     191.45 ±  5%  perf-sched.total_wait_time.average.ms
      1.67 ± 13%   +9097.0%     153.21 ±  4%  perf-sched.wait_and_delay.avg.ms.__cond_resched.shmem_get_folio_gfp.shmem_fault.__do_fault.do_fault
     32.29 ±  9%     -25.8%      23.95 ± 22%  perf-sched.wait_and_delay.avg.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
      1.57 ±  2%    -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read_killable.lock_mm_and_find_vma
      1.11 ±  8%  +12026.0%     134.28 ±  8%  perf-sched.wait_and_delay.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.__vm_munmap
      3.84 ±  5%     +20.0%       4.61 ±  2%  perf-sched.wait_and_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
    610.00 ±  7%     -30.8%     421.83 ± 15%  perf-sched.wait_and_delay.count.__cond_resched.shmem_get_folio_gfp.shmem_fault.__do_fault.do_fault
     51568 ±  2%    -100.0%       0.00        perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read_killable.lock_mm_and_find_vma
      1146 ±  4%     +51.3%       1734 ±  6%  perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.__vm_munmap
      1226 ±  4%     -11.6%       1084 ±  2%  perf-sched.wait_and_delay.count.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
    995.33 ±  3%     -19.5%     801.33 ±  4%  perf-sched.wait_and_delay.count.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
     33.51 ± 78%    +547.7%     217.02 ±  2%  perf-sched.wait_and_delay.max.ms.__cond_resched.shmem_get_folio_gfp.shmem_fault.__do_fault.do_fault
     65.98 ± 70%    -100.0%       0.00        perf-sched.wait_and_delay.max.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read_killable.lock_mm_and_find_vma
     15.43 ±115%   +1309.1%     217.47        perf-sched.wait_and_delay.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.__vm_munmap
      1.47 ±  8%  +10284.0%     153.06 ±  4%  perf-sched.wait_time.avg.ms.__cond_resched.shmem_get_folio_gfp.shmem_fault.__do_fault.do_fault
      0.01 ± 34%  +1.4e+06%     179.68 ± 11%  perf-sched.wait_time.avg.ms.__cond_resched.tlb_batch_pages_flush.zap_pte_range.zap_pmd_range.isra
      5.95 ± 21%     -73.5%       1.58 ±  8%  perf-sched.wait_time.avg.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
      1.44 ±  6%  +10218.7%     148.31 ±  9%  perf-sched.wait_time.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_exc_page_fault
      2.93 ± 23%    -100.0%       0.00        perf-sched.wait_time.avg.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm
      1.51 ±  4%    -100.0%       0.00        perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read_killable.lock_mm_and_find_vma
      1.08 ±  5%  +12334.2%     134.16 ±  8%  perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.__vm_munmap
      0.38 ± 27%  +47951.7%     182.68 ±  2%  perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.vm_mmap_pgoff
      2.83 ± 16%     -82.5%       0.49 ±  2%  perf-sched.wait_time.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
      3.82 ±  6%     +20.4%       4.60 ±  2%  perf-sched.wait_time.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
      6.49 ± 18%     -75.3%       1.60 ±  9%  perf-sched.wait_time.avg.ms.syslog_print.do_syslog.kmsg_read.vfs_read
      0.03 ±133%     -99.4%       0.00 ±223%  perf-sched.wait_time.avg.ms.wait_for_partner.fifo_open.do_dentry_open.do_open
      2.99 ± 14%   +7148.4%     217.02 ±  2%  perf-sched.wait_time.max.ms.__cond_resched.shmem_get_folio_gfp.shmem_fault.__do_fault.do_fault
      0.06 ± 51%  +3.5e+05%     209.30 ±  2%  perf-sched.wait_time.max.ms.__cond_resched.tlb_batch_pages_flush.zap_pte_range.zap_pmd_range.isra
     11.89 ± 21%     -73.5%       3.16 ±  8%  perf-sched.wait_time.max.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
      2.62 ±  4%   +7966.9%     211.41 ±  2%  perf-sched.wait_time.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_exc_page_fault
      4.63 ±  7%    -100.0%       0.00        perf-sched.wait_time.max.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm
      5.60 ± 74%    -100.0%       0.00        perf-sched.wait_time.max.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read_killable.lock_mm_and_find_vma
      4.03 ±  3%   +5294.2%     217.47        perf-sched.wait_time.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.__vm_munmap
      2.46 ± 25%   +8701.7%     216.14        perf-sched.wait_time.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.vm_mmap_pgoff
     16.44 ± 21%     -93.5%       1.07 ±  3%  perf-sched.wait_time.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
     14.31 ± 59%     -65.0%       5.01        perf-sched.wait_time.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
     12.98 ± 18%     -75.3%       3.20 ±  9%  perf-sched.wait_time.max.ms.syslog_print.do_syslog.kmsg_read.vfs_read
      0.63 ±151%     -98.7%       0.01 ± 46%  perf-sched.wait_time.max.ms.wait_for_partner.fifo_open.do_dentry_open.do_open
     24.35 ±  5%     -24.3        0.00        perf-profile.calltrace.cycles-pp.lock_mm_and_find_vma.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
     22.08 ±  2%     -22.1        0.00        perf-profile.calltrace.cycles-pp.down_read_trylock.lock_mm_and_find_vma.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
     13.70 ±  2%     -13.7        0.00        perf-profile.calltrace.cycles-pp.up_read.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
     74.33           -12.7       61.66        perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
     74.37           -12.2       62.18        perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.testcase
      3.61 ±  8%      -2.7        0.89 ± 16%  perf-profile.calltrace.cycles-pp.acpi_safe_halt.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call
      0.00            +0.7        0.71 ± 21%  perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.acpi_safe_halt.acpi_idle_enter.cpuidle_enter_state
      0.00            +0.7        0.71 ±  5%  perf-profile.calltrace.cycles-pp.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.zap_pte_range.zap_pmd_range
      0.00            +0.7        0.72 ±  5%  perf-profile.calltrace.cycles-pp.on_each_cpu_cond_mask.flush_tlb_mm_range.zap_pte_range.zap_pmd_range.unmap_page_range
      0.00            +0.7        0.73 ±  5%  perf-profile.calltrace.cycles-pp.flush_tlb_mm_range.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas
      0.00            +0.8        0.81 ± 15%  perf-profile.calltrace.cycles-pp.filemap_get_entry.shmem_get_folio_gfp.shmem_fault.__do_fault.do_fault
      0.71 ±  3%      +1.2        1.90 ± 14%  perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas.unmap_region
      0.00            +1.2        1.20 ± 17%  perf-profile.calltrace.cycles-pp.shmem_get_folio_gfp.shmem_fault.__do_fault.do_fault.__handle_mm_fault
      0.71 ±  3%      +1.2        1.94 ± 14%  perf-profile.calltrace.cycles-pp.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
      0.71 ±  3%      +1.2        1.94 ± 14%  perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap
      0.71 ±  3%      +1.2        1.94 ± 14%  perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.unmap_vmas.unmap_region.do_vmi_align_munmap
      0.00            +1.3        1.28 ± 27%  perf-profile.calltrace.cycles-pp.set_pte_range.finish_fault.do_fault.__handle_mm_fault.handle_mm_fault
      0.73 ±  3%      +1.3        2.02 ± 14%  perf-profile.calltrace.cycles-pp.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.73 ±  3%      +1.3        2.02 ± 14%  perf-profile.calltrace.cycles-pp.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64
      0.72 ±  3%      +1.3        2.00 ± 14%  perf-profile.calltrace.cycles-pp.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap
      0.00            +1.4        1.36 ± 17%  perf-profile.calltrace.cycles-pp.shmem_fault.__do_fault.do_fault.__handle_mm_fault.handle_mm_fault
      0.00            +1.4        1.40 ± 18%  perf-profile.calltrace.cycles-pp.sync_regs.asm_exc_page_fault.testcase
      0.00            +1.4        1.42 ± 17%  perf-profile.calltrace.cycles-pp.__do_fault.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
      0.00            +1.7        1.68 ± 20%  perf-profile.calltrace.cycles-pp.finish_fault.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
      0.00            +2.0        2.05 ± 14%  perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00            +2.0        2.05 ± 14%  perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
     41.13            +2.1       43.18        perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.acpi_safe_halt.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter
      0.00            +2.1        2.08 ± 14%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
      0.00            +2.1        2.08 ± 14%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
      1.52 ± 13%      +2.1        3.66 ± 14%  perf-profile.calltrace.cycles-pp.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
      3.32 ± 19%     +19.0       22.27 ± 24%  perf-profile.calltrace.cycles-pp.lock_vma_under_rcu.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
     24.35 ±  5%     -24.4        0.00        perf-profile.children.cycles-pp.lock_mm_and_find_vma
     22.17 ±  2%     -22.0        0.16 ± 17%  perf-profile.children.cycles-pp.down_read_trylock
     14.60 ±  2%     -14.5        0.14 ± 21%  perf-profile.children.cycles-pp.up_read
     74.37           -12.4       62.02        perf-profile.children.cycles-pp.do_user_addr_fault
     74.39           -12.2       62.20        perf-profile.children.cycles-pp.exc_page_fault
     75.34            -5.5       69.86        perf-profile.children.cycles-pp.asm_exc_page_fault
     23.24            -0.9       22.38        perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
      0.44 ±  4%      -0.2        0.28 ± 14%  perf-profile.children.cycles-pp.scheduler_tick
      0.20 ± 39%      -0.1        0.08 ± 86%  perf-profile.children.cycles-pp.x86_64_start_kernel
      0.20 ± 39%      -0.1        0.08 ± 86%  perf-profile.children.cycles-pp.x86_64_start_reservations
      0.20 ± 39%      -0.1        0.08 ± 86%  perf-profile.children.cycles-pp.start_kernel
      0.20 ± 39%      -0.1        0.08 ± 86%  perf-profile.children.cycles-pp.arch_call_rest_init
      0.20 ± 39%      -0.1        0.08 ± 86%  perf-profile.children.cycles-pp.rest_init
      0.28 ±  4%      -0.1        0.17 ± 19%  perf-profile.children.cycles-pp._compound_head
      0.47 ±  4%      -0.1        0.37 ± 14%  perf-profile.children.cycles-pp.update_process_times
      0.47 ±  4%      -0.1        0.37 ± 14%  perf-profile.children.cycles-pp.tick_sched_handle
      0.12 ± 13%      -0.1        0.03 ±102%  perf-profile.children.cycles-pp.load_balance
      0.00            +0.1        0.07 ± 17%  perf-profile.children.cycles-pp._raw_spin_trylock
      0.00            +0.1        0.07 ± 10%  perf-profile.children.cycles-pp.irqtime_account_irq
      0.03 ± 70%      +0.1        0.11 ± 19%  perf-profile.children.cycles-pp.rebalance_domains
      0.00            +0.1        0.08 ± 19%  perf-profile.children.cycles-pp.__irqentry_text_end
      0.00            +0.1        0.08 ± 17%  perf-profile.children.cycles-pp.__count_memcg_events
      0.00            +0.1        0.08 ± 18%  perf-profile.children.cycles-pp.cgroup_rstat_updated
      0.00            +0.1        0.10 ± 23%  perf-profile.children.cycles-pp.folio_mark_dirty
      0.00            +0.1        0.10 ± 18%  perf-profile.children.cycles-pp.__pte_offset_map
      0.08 ±  6%      +0.1        0.18 ± 20%  perf-profile.children.cycles-pp.__do_softirq
      0.00            +0.1        0.11 ± 14%  perf-profile.children.cycles-pp.pte_offset_map_nolock
      0.53 ±  4%      +0.1        0.65 ± 16%  perf-profile.children.cycles-pp.__hrtimer_run_queues
      0.00            +0.1        0.12 ± 28%  perf-profile.children.cycles-pp.__mod_node_page_state
      0.05 ±  8%      +0.1        0.18 ± 19%  perf-profile.children.cycles-pp._raw_spin_lock
      0.00            +0.1        0.14 ± 22%  perf-profile.children.cycles-pp.folio_unlock
      0.00            +0.1        0.14 ± 20%  perf-profile.children.cycles-pp.release_pages
      0.02 ±141%      +0.1        0.16 ± 36%  perf-profile.children.cycles-pp.ktime_get
      0.00            +0.1        0.14 ±  6%  perf-profile.children.cycles-pp.native_flush_tlb_local
      0.08 ±  5%      +0.1        0.23 ± 16%  perf-profile.children.cycles-pp.__irq_exit_rcu
      0.01 ±223%      +0.2        0.16 ± 24%  perf-profile.children.cycles-pp.perf_mux_hrtimer_handler
      0.00            +0.2        0.15 ± 17%  perf-profile.children.cycles-pp.handle_pte_fault
      0.00            +0.2        0.16 ±115%  perf-profile.children.cycles-pp.menu_select
      0.00            +0.2        0.16 ± 20%  perf-profile.children.cycles-pp.__mod_memcg_lruvec_state
      0.57 ±  4%      +0.2        0.74 ± 15%  perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
      0.57 ±  4%      +0.2        0.74 ± 15%  perf-profile.children.cycles-pp.hrtimer_interrupt
      0.01 ±223%      +0.2        0.18 ± 29%  perf-profile.children.cycles-pp.inode_needs_update_time
      0.00            +0.2        0.18 ± 18%  perf-profile.children.cycles-pp.tlb_batch_pages_flush
      0.00            +0.2        0.19 ± 23%  perf-profile.children.cycles-pp.__mod_lruvec_state
      0.01 ±223%      +0.2        0.21 ± 27%  perf-profile.children.cycles-pp.file_update_time
      0.00            +0.2        0.22 ±  4%  perf-profile.children.cycles-pp.llist_reverse_order
      0.02 ±141%      +0.2        0.26 ±  5%  perf-profile.children.cycles-pp.flush_tlb_func
      0.00            +0.2        0.25 ± 18%  perf-profile.children.cycles-pp.error_entry
      0.00            +0.3        0.26 ± 18%  perf-profile.children.cycles-pp.__pte_offset_map_lock
      0.15 ±  9%      +0.3        0.44 ± 21%  perf-profile.children.cycles-pp.mtree_range_walk
      0.07 ±  9%      +0.3        0.38 ±  5%  perf-profile.children.cycles-pp.default_send_IPI_mask_sequence_phys
      0.05 ±  7%      +0.3        0.37 ± 15%  perf-profile.children.cycles-pp.xas_descend
      0.66 ±  3%      +0.3        1.00 ± 14%  perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
      0.05 ±  7%      +0.4        0.43 ± 18%  perf-profile.children.cycles-pp.folio_add_file_rmap_range
      0.04 ± 44%      +0.4        0.43 ± 19%  perf-profile.children.cycles-pp.page_remove_rmap
      0.06 ±  6%      +0.4        0.45 ± 20%  perf-profile.children.cycles-pp.__mod_lruvec_page_state
      0.04 ± 44%      +0.4        0.44 ± 20%  perf-profile.children.cycles-pp.tlb_flush_rmaps
      0.06 ± 16%      +0.4        0.48 ± 24%  perf-profile.children.cycles-pp.fault_dirty_shared_page
      0.30 ±  3%      +0.4        0.72 ±  5%  perf-profile.children.cycles-pp.on_each_cpu_cond_mask
      0.30 ±  3%      +0.4        0.72 ±  5%  perf-profile.children.cycles-pp.smp_call_function_many_cond
      0.31 ±  2%      +0.4        0.74 ±  5%  perf-profile.children.cycles-pp.flush_tlb_mm_range
      0.07 ±  6%      +0.5        0.54 ± 14%  perf-profile.children.cycles-pp.xas_load
      0.13 ±  8%      +0.6        0.75 ±  4%  perf-profile.children.cycles-pp.__flush_smp_call_function_queue
      0.11 ±  8%      +0.6        0.75 ±  4%  perf-profile.children.cycles-pp.__sysvec_call_function
      0.12 ±  8%      +0.7        0.82 ±  4%  perf-profile.children.cycles-pp.sysvec_call_function
      0.12 ±  4%      +0.7        0.82 ± 15%  perf-profile.children.cycles-pp.filemap_get_entry
      0.08 ±  5%      +0.8        0.86 ± 19%  perf-profile.children.cycles-pp.___perf_sw_event
      0.11 ±  6%      +1.0        1.09 ± 20%  perf-profile.children.cycles-pp.__perf_sw_event
      0.18 ±  6%      +1.0        1.21 ± 16%  perf-profile.children.cycles-pp.shmem_get_folio_gfp
      0.25 ± 46%      +1.0        1.30 ± 27%  perf-profile.children.cycles-pp.set_pte_range
      0.18 ±  5%      +1.2        1.36 ± 17%  perf-profile.children.cycles-pp.shmem_fault
      0.25 ±  5%      +1.2        1.43 ± 17%  perf-profile.children.cycles-pp.__do_fault
      0.93 ±  4%      +1.2        2.12 ± 14%  perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
      0.93 ±  4%      +1.2        2.12 ± 14%  perf-profile.children.cycles-pp.do_syscall_64
      0.72 ±  3%      +1.2        1.94 ± 14%  perf-profile.children.cycles-pp.zap_pte_range
      0.72 ±  3%      +1.2        1.94 ± 14%  perf-profile.children.cycles-pp.unmap_vmas
      0.72 ±  3%      +1.2        1.94 ± 14%  perf-profile.children.cycles-pp.unmap_page_range
      0.72 ±  3%      +1.2        1.94 ± 14%  perf-profile.children.cycles-pp.zap_pmd_range
      0.44 ±  4%      +1.2        1.67        perf-profile.children.cycles-pp.asm_sysvec_call_function
      0.18 ± 19%      +1.3        1.45 ± 17%  perf-profile.children.cycles-pp.sync_regs
      0.74 ±  3%      +1.3        2.02 ± 14%  perf-profile.children.cycles-pp.do_vmi_munmap
      0.74 ±  3%      +1.3        2.02 ± 14%  perf-profile.children.cycles-pp.do_vmi_align_munmap
      0.72 ±  3%      +1.3        2.00 ± 14%  perf-profile.children.cycles-pp.unmap_region
      0.74 ±  3%      +1.3        2.05 ± 14%  perf-profile.children.cycles-pp.__vm_munmap
      0.74 ±  3%      +1.3        2.05 ± 14%  perf-profile.children.cycles-pp.__x64_sys_munmap
      0.31 ± 37%      +1.4        1.70 ± 20%  perf-profile.children.cycles-pp.finish_fault
      1.53 ± 13%      +2.2        3.68 ± 14%  perf-profile.children.cycles-pp.do_fault
      0.82 ± 24%      +4.1        4.90 ± 33%  perf-profile.children.cycles-pp.native_irq_return_iret
      3.34 ± 19%     +19.0       22.33 ± 24%  perf-profile.children.cycles-pp.lock_vma_under_rcu
     21.96 ±  2%     -21.8        0.16 ± 17%  perf-profile.self.cycles-pp.down_read_trylock
     14.47 ±  2%     -14.3        0.13 ± 22%  perf-profile.self.cycles-pp.up_read
      3.51 ±  9%      -3.5        0.04 ±108%  perf-profile.self.cycles-pp.asm_sysvec_apic_timer_interrupt
      0.27 ±  4%      -0.1        0.14 ± 19%  perf-profile.self.cycles-pp._compound_head
      0.12 ±  6%      +0.0        0.14 ±  4%  perf-profile.self.cycles-pp.llist_add_batch
      0.00            +0.1        0.06 ± 14%  perf-profile.self.cycles-pp.__mod_lruvec_state
      0.00            +0.1        0.07 ± 17%  perf-profile.self.cycles-pp._raw_spin_trylock
      0.00            +0.1        0.07 ± 23%  perf-profile.self.cycles-pp.__irqentry_text_end
      0.00            +0.1        0.07 ± 18%  perf-profile.self.cycles-pp.do_fault
      0.00            +0.1        0.08 ± 17%  perf-profile.self.cycles-pp.finish_fault
      0.00            +0.1        0.08 ± 14%  perf-profile.self.cycles-pp.cgroup_rstat_updated
      0.00            +0.1        0.09 ± 20%  perf-profile.self.cycles-pp.inode_needs_update_time
      0.00            +0.1        0.09 ± 16%  perf-profile.self.cycles-pp.__pte_offset_map_lock
      0.00            +0.1        0.09 ± 20%  perf-profile.self.cycles-pp.__pte_offset_map
      0.00            +0.1        0.10 ± 16%  perf-profile.self.cycles-pp.__mod_lruvec_page_state
      0.00            +0.1        0.11 ± 22%  perf-profile.self.cycles-pp.__mod_memcg_lruvec_state
      0.00            +0.1        0.11 ± 31%  perf-profile.self.cycles-pp.__mod_node_page_state
      0.00            +0.1        0.11 ± 13%  perf-profile.self.cycles-pp.flush_tlb_func
      0.07 ±  5%      +0.1        0.19 ± 12%  perf-profile.self.cycles-pp.smp_call_function_many_cond
      0.02 ±141%      +0.1        0.14 ± 40%  perf-profile.self.cycles-pp.ktime_get
      0.00            +0.1        0.13 ± 17%  perf-profile.self.cycles-pp.exc_page_fault
      0.00            +0.1        0.13 ± 14%  perf-profile.self.cycles-pp.xas_load
      0.00            +0.1        0.13 ± 21%  perf-profile.self.cycles-pp.folio_unlock
      0.00            +0.1        0.14 ± 19%  perf-profile.self.cycles-pp.release_pages
      0.00            +0.1        0.14 ±  7%  perf-profile.self.cycles-pp.native_flush_tlb_local
      0.00            +0.1        0.14 ± 18%  perf-profile.self.cycles-pp.shmem_fault
      0.01 ±223%      +0.2        0.18 ± 18%  perf-profile.self.cycles-pp._raw_spin_lock
      0.00            +0.2        0.17 ± 18%  perf-profile.self.cycles-pp.set_pte_range
      0.00            +0.2        0.19 ± 17%  perf-profile.self.cycles-pp.folio_add_file_rmap_range
      0.00            +0.2        0.22 ± 19%  perf-profile.self.cycles-pp.page_remove_rmap
      0.00            +0.2        0.22 ±  5%  perf-profile.self.cycles-pp.llist_reverse_order
      0.00            +0.2        0.24 ± 18%  perf-profile.self.cycles-pp.error_entry
      0.00            +0.2        0.24 ± 25%  perf-profile.self.cycles-pp.__perf_sw_event
      0.02 ± 99%      +0.2        0.27 ± 16%  perf-profile.self.cycles-pp.filemap_get_entry
      0.00            +0.3        0.28 ±  5%  perf-profile.self.cycles-pp.__flush_smp_call_function_queue
      0.14 ±  7%      +0.3        0.43 ± 21%  perf-profile.self.cycles-pp.mtree_range_walk
      0.00            +0.3        0.29 ± 13%  perf-profile.self.cycles-pp.asm_exc_page_fault
      0.07 ±  8%      +0.3        0.38 ±  4%  perf-profile.self.cycles-pp.default_send_IPI_mask_sequence_phys
      0.04 ± 44%      +0.3        0.35 ± 15%  perf-profile.self.cycles-pp.xas_descend
      0.00            +0.3        0.31 ± 19%  perf-profile.self.cycles-pp.zap_pte_range
      0.03 ± 70%      +0.3        0.35 ± 19%  perf-profile.self.cycles-pp.shmem_get_folio_gfp
      0.01 ±223%      +0.5        0.53 ± 13%  perf-profile.self.cycles-pp.do_user_addr_fault
      0.08 ±  6%      +0.7        0.74 ± 19%  perf-profile.self.cycles-pp.___perf_sw_event
      0.18 ± 19%      +1.3        1.44 ± 17%  perf-profile.self.cycles-pp.sync_regs
     19.02            +2.4       21.40        perf-profile.self.cycles-pp.acpi_safe_halt
      0.27 ± 71%      +2.4        2.72 ± 76%  perf-profile.self.cycles-pp.handle_mm_fault
      0.82 ± 24%      +4.1        4.89 ± 33%  perf-profile.self.cycles-pp.native_irq_return_iret
      0.76 ±  3%      +5.6        6.37 ± 18%  perf-profile.self.cycles-pp.testcase





Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2023-10-20 13:23 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-06 19:53 [PATCH v2 0/6] Handle more faults under the VMA lock Matthew Wilcox (Oracle)
2023-10-06 19:53 ` [PATCH v2 1/6] mm: Make lock_folio_maybe_drop_mmap() VMA lock aware Matthew Wilcox (Oracle)
2023-10-08 21:47   ` Suren Baghdasaryan
2023-10-06 19:53 ` [PATCH v2 2/6] mm: Call wp_page_copy() under the VMA lock Matthew Wilcox (Oracle)
2023-10-08 22:00   ` Suren Baghdasaryan
2023-10-06 19:53 ` [PATCH v2 3/6] mm: Handle shared faults " Matthew Wilcox (Oracle)
2023-10-08 22:01   ` Suren Baghdasaryan
2023-10-20 13:23   ` kernel test robot
2023-10-06 19:53 ` [PATCH v2 4/6] mm: Handle COW " Matthew Wilcox (Oracle)
2023-10-08 22:05   ` Suren Baghdasaryan
2023-10-20 13:18   ` kernel test robot
2023-10-06 19:53 ` [PATCH v2 5/6] mm: Handle read " Matthew Wilcox (Oracle)
2023-10-08 22:06   ` Suren Baghdasaryan
2023-10-20  9:55   ` kernel test robot
2023-10-06 19:53 ` [PATCH v2 6/6] mm: Handle write faults to RO pages " Matthew Wilcox (Oracle)
2023-10-08 22:07   ` Suren Baghdasaryan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.