* [PATCH v2 0/6] Handle more faults under the VMA lock
@ 2023-10-06 19:53 Matthew Wilcox (Oracle)
2023-10-06 19:53 ` [PATCH v2 1/6] mm: Make lock_folio_maybe_drop_mmap() VMA lock aware Matthew Wilcox (Oracle)
` (5 more replies)
0 siblings, 6 replies; 16+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-10-06 19:53 UTC (permalink / raw)
To: Andrew Morton; +Cc: Matthew Wilcox (Oracle), linux-mm, Suren Baghdasaryan
At this point, we're handling the majority of file-backed page faults
under the VMA lock, using the ->map_pages entry point. This patch set
attempts to expand that for the following siutations:
- We have to do a read. This could be because we've hit the point in
the readahead window where we need to kick off the next readahead,
or because the page is simply not present in cache.
- We're handling a write fault. Most applications don't do I/O by writes
to shared mmaps for very good reasons, but some do, and it'd be nice
to not make that slow unnecessarily.
- We're doing a COW of a private mapping (both PTE already present
and PTE not-present). These are two different codepaths and I handle
both of them in this patch set.
There is no support in this patch set for drivers to mark themselves
as being VMA lock friendly; they could implement the ->map_pages
vm_operation, but if they do, they would be the first. This is probably
something we want to change at some point in the future, and I've marked
where to make that change in the code.
There is very little performance change in the benchmarks we've run;
mostly because the vast majority of page faults are handled through the
other paths. I still think this patch series is useful for workloads
that may take these paths more often, and just for cleaning up the
fault path in general (it's now clearer why we have to retry in these
cases).
v2;
- Rename vmf_maybe_unlock_vma to vmf_can_call_fault
Matthew Wilcox (Oracle) (6):
mm: Make lock_folio_maybe_drop_mmap() VMA lock aware
mm: Call wp_page_copy() under the VMA lock
mm: Handle shared faults under the VMA lock
mm: Handle COW faults under the VMA lock
mm: Handle read faults under the VMA lock
mm: Handle write faults to RO pages under the VMA lock
mm/filemap.c | 13 ++++----
mm/memory.c | 93 ++++++++++++++++++++++++++++++++--------------------
2 files changed, 65 insertions(+), 41 deletions(-)
--
2.40.1
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v2 1/6] mm: Make lock_folio_maybe_drop_mmap() VMA lock aware
2023-10-06 19:53 [PATCH v2 0/6] Handle more faults under the VMA lock Matthew Wilcox (Oracle)
@ 2023-10-06 19:53 ` Matthew Wilcox (Oracle)
2023-10-08 21:47 ` Suren Baghdasaryan
2023-10-06 19:53 ` [PATCH v2 2/6] mm: Call wp_page_copy() under the VMA lock Matthew Wilcox (Oracle)
` (4 subsequent siblings)
5 siblings, 1 reply; 16+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-10-06 19:53 UTC (permalink / raw)
To: Andrew Morton; +Cc: Matthew Wilcox (Oracle), linux-mm, Suren Baghdasaryan
Drop the VMA lock instead of the mmap_lock if that's the one which
is held.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
mm/filemap.c | 13 +++++++------
1 file changed, 7 insertions(+), 6 deletions(-)
diff --git a/mm/filemap.c b/mm/filemap.c
index 9481ffaf24e6..a598872d62cc 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3104,7 +3104,7 @@ static int lock_folio_maybe_drop_mmap(struct vm_fault *vmf, struct folio *folio,
/*
* NOTE! This will make us return with VM_FAULT_RETRY, but with
- * the mmap_lock still held. That's how FAULT_FLAG_RETRY_NOWAIT
+ * the fault lock still held. That's how FAULT_FLAG_RETRY_NOWAIT
* is supposed to work. We have way too many special cases..
*/
if (vmf->flags & FAULT_FLAG_RETRY_NOWAIT)
@@ -3114,13 +3114,14 @@ static int lock_folio_maybe_drop_mmap(struct vm_fault *vmf, struct folio *folio,
if (vmf->flags & FAULT_FLAG_KILLABLE) {
if (__folio_lock_killable(folio)) {
/*
- * We didn't have the right flags to drop the mmap_lock,
- * but all fault_handlers only check for fatal signals
- * if we return VM_FAULT_RETRY, so we need to drop the
- * mmap_lock here and return 0 if we don't have a fpin.
+ * We didn't have the right flags to drop the
+ * fault lock, but all fault_handlers only check
+ * for fatal signals if we return VM_FAULT_RETRY,
+ * so we need to drop the fault lock here and
+ * return 0 if we don't have a fpin.
*/
if (*fpin == NULL)
- mmap_read_unlock(vmf->vma->vm_mm);
+ release_fault_lock(vmf);
return 0;
}
} else
--
2.40.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v2 2/6] mm: Call wp_page_copy() under the VMA lock
2023-10-06 19:53 [PATCH v2 0/6] Handle more faults under the VMA lock Matthew Wilcox (Oracle)
2023-10-06 19:53 ` [PATCH v2 1/6] mm: Make lock_folio_maybe_drop_mmap() VMA lock aware Matthew Wilcox (Oracle)
@ 2023-10-06 19:53 ` Matthew Wilcox (Oracle)
2023-10-08 22:00 ` Suren Baghdasaryan
2023-10-06 19:53 ` [PATCH v2 3/6] mm: Handle shared faults " Matthew Wilcox (Oracle)
` (3 subsequent siblings)
5 siblings, 1 reply; 16+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-10-06 19:53 UTC (permalink / raw)
To: Andrew Morton; +Cc: Matthew Wilcox (Oracle), linux-mm, Suren Baghdasaryan
It is usually safe to call wp_page_copy() under the VMA lock. The only
unsafe situation is when no anon_vma has been allocated for this VMA,
and we have to look at adjacent VMAs to determine if their anon_vma can
be shared. Since this happens only for the first COW of a page in this
VMA, the majority of calls to wp_page_copy() do not need to fall back
to the mmap_sem.
Add vmf_anon_prepare() as an alternative to anon_vma_prepare() which
will return RETRY if we currently hold the VMA lock and need to allocate
an anon_vma. This lets us drop the check in do_wp_page().
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
mm/memory.c | 39 ++++++++++++++++++++++++++-------------
1 file changed, 26 insertions(+), 13 deletions(-)
diff --git a/mm/memory.c b/mm/memory.c
index 97f860d6cd2a..cff78c496728 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3042,6 +3042,21 @@ static inline void wp_page_reuse(struct vm_fault *vmf)
count_vm_event(PGREUSE);
}
+static vm_fault_t vmf_anon_prepare(struct vm_fault *vmf)
+{
+ struct vm_area_struct *vma = vmf->vma;
+
+ if (likely(vma->anon_vma))
+ return 0;
+ if (vmf->flags & FAULT_FLAG_VMA_LOCK) {
+ vma_end_read(vma);
+ return VM_FAULT_RETRY;
+ }
+ if (__anon_vma_prepare(vma))
+ return VM_FAULT_OOM;
+ return 0;
+}
+
/*
* Handle the case of a page which we actually need to copy to a new page,
* either due to COW or unsharing.
@@ -3069,27 +3084,29 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
pte_t entry;
int page_copied = 0;
struct mmu_notifier_range range;
- int ret;
+ vm_fault_t ret;
delayacct_wpcopy_start();
if (vmf->page)
old_folio = page_folio(vmf->page);
- if (unlikely(anon_vma_prepare(vma)))
- goto oom;
+ ret = vmf_anon_prepare(vmf);
+ if (unlikely(ret))
+ goto out;
if (is_zero_pfn(pte_pfn(vmf->orig_pte))) {
new_folio = vma_alloc_zeroed_movable_folio(vma, vmf->address);
if (!new_folio)
goto oom;
} else {
+ int err;
new_folio = vma_alloc_folio(GFP_HIGHUSER_MOVABLE, 0, vma,
vmf->address, false);
if (!new_folio)
goto oom;
- ret = __wp_page_copy_user(&new_folio->page, vmf->page, vmf);
- if (ret) {
+ err = __wp_page_copy_user(&new_folio->page, vmf->page, vmf);
+ if (err) {
/*
* COW failed, if the fault was solved by other,
* it's fine. If not, userspace would re-fault on
@@ -3102,7 +3119,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
folio_put(old_folio);
delayacct_wpcopy_end();
- return ret == -EHWPOISON ? VM_FAULT_HWPOISON : 0;
+ return err == -EHWPOISON ? VM_FAULT_HWPOISON : 0;
}
kmsan_copy_page_meta(&new_folio->page, vmf->page);
}
@@ -3212,11 +3229,13 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
oom_free_new:
folio_put(new_folio);
oom:
+ ret = VM_FAULT_OOM;
+out:
if (old_folio)
folio_put(old_folio);
delayacct_wpcopy_end();
- return VM_FAULT_OOM;
+ return ret;
}
/**
@@ -3458,12 +3477,6 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf)
return 0;
}
copy:
- if ((vmf->flags & FAULT_FLAG_VMA_LOCK) && !vma->anon_vma) {
- pte_unmap_unlock(vmf->pte, vmf->ptl);
- vma_end_read(vmf->vma);
- return VM_FAULT_RETRY;
- }
-
/*
* Ok, we need to copy. Oh, well..
*/
--
2.40.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v2 3/6] mm: Handle shared faults under the VMA lock
2023-10-06 19:53 [PATCH v2 0/6] Handle more faults under the VMA lock Matthew Wilcox (Oracle)
2023-10-06 19:53 ` [PATCH v2 1/6] mm: Make lock_folio_maybe_drop_mmap() VMA lock aware Matthew Wilcox (Oracle)
2023-10-06 19:53 ` [PATCH v2 2/6] mm: Call wp_page_copy() under the VMA lock Matthew Wilcox (Oracle)
@ 2023-10-06 19:53 ` Matthew Wilcox (Oracle)
2023-10-08 22:01 ` Suren Baghdasaryan
2023-10-20 13:23 ` kernel test robot
2023-10-06 19:53 ` [PATCH v2 4/6] mm: Handle COW " Matthew Wilcox (Oracle)
` (2 subsequent siblings)
5 siblings, 2 replies; 16+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-10-06 19:53 UTC (permalink / raw)
To: Andrew Morton; +Cc: Matthew Wilcox (Oracle), linux-mm, Suren Baghdasaryan
There are many implementations of ->fault and some of them depend on
mmap_lock being held. All vm_ops that implement ->map_pages() end up
calling filemap_fault(), which I have audited to be sure it does not rely
on mmap_lock. So (for now) key off ->map_pages existing as a flag to
indicate that it's safe to call ->fault while only holding the vma lock.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
mm/memory.c | 22 ++++++++++++++++++----
1 file changed, 18 insertions(+), 4 deletions(-)
diff --git a/mm/memory.c b/mm/memory.c
index cff78c496728..a9b0c135209a 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3042,6 +3042,21 @@ static inline void wp_page_reuse(struct vm_fault *vmf)
count_vm_event(PGREUSE);
}
+/*
+ * We could add a bitflag somewhere, but for now, we know that all
+ * vm_ops that have a ->map_pages have been audited and don't need
+ * the mmap_lock to be held.
+ */
+static inline vm_fault_t vmf_can_call_fault(const struct vm_fault *vmf)
+{
+ struct vm_area_struct *vma = vmf->vma;
+
+ if (vma->vm_ops->map_pages || !(vmf->flags & FAULT_FLAG_VMA_LOCK))
+ return 0;
+ vma_end_read(vma);
+ return VM_FAULT_RETRY;
+}
+
static vm_fault_t vmf_anon_prepare(struct vm_fault *vmf)
{
struct vm_area_struct *vma = vmf->vma;
@@ -4669,10 +4684,9 @@ static vm_fault_t do_shared_fault(struct vm_fault *vmf)
vm_fault_t ret, tmp;
struct folio *folio;
- if (vmf->flags & FAULT_FLAG_VMA_LOCK) {
- vma_end_read(vma);
- return VM_FAULT_RETRY;
- }
+ ret = vmf_can_call_fault(vmf);
+ if (ret)
+ return ret;
ret = __do_fault(vmf);
if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY)))
--
2.40.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v2 4/6] mm: Handle COW faults under the VMA lock
2023-10-06 19:53 [PATCH v2 0/6] Handle more faults under the VMA lock Matthew Wilcox (Oracle)
` (2 preceding siblings ...)
2023-10-06 19:53 ` [PATCH v2 3/6] mm: Handle shared faults " Matthew Wilcox (Oracle)
@ 2023-10-06 19:53 ` Matthew Wilcox (Oracle)
2023-10-08 22:05 ` Suren Baghdasaryan
2023-10-20 13:18 ` kernel test robot
2023-10-06 19:53 ` [PATCH v2 5/6] mm: Handle read " Matthew Wilcox (Oracle)
2023-10-06 19:53 ` [PATCH v2 6/6] mm: Handle write faults to RO pages " Matthew Wilcox (Oracle)
5 siblings, 2 replies; 16+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-10-06 19:53 UTC (permalink / raw)
To: Andrew Morton; +Cc: Matthew Wilcox (Oracle), linux-mm, Suren Baghdasaryan
If the page is not currently present in the page tables, we need to call
the page fault handler to find out which page we're supposed to COW,
so we need to both check that there is already an anon_vma and that the
fault handler doesn't need the mmap_lock.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
mm/memory.c | 12 +++++-------
1 file changed, 5 insertions(+), 7 deletions(-)
diff --git a/mm/memory.c b/mm/memory.c
index a9b0c135209a..938f481df0ab 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4639,13 +4639,11 @@ static vm_fault_t do_cow_fault(struct vm_fault *vmf)
struct vm_area_struct *vma = vmf->vma;
vm_fault_t ret;
- if (vmf->flags & FAULT_FLAG_VMA_LOCK) {
- vma_end_read(vma);
- return VM_FAULT_RETRY;
- }
-
- if (unlikely(anon_vma_prepare(vma)))
- return VM_FAULT_OOM;
+ ret = vmf_can_call_fault(vmf);
+ if (!ret)
+ ret = vmf_anon_prepare(vmf);
+ if (ret)
+ return ret;
vmf->cow_page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, vmf->address);
if (!vmf->cow_page)
--
2.40.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v2 5/6] mm: Handle read faults under the VMA lock
2023-10-06 19:53 [PATCH v2 0/6] Handle more faults under the VMA lock Matthew Wilcox (Oracle)
` (3 preceding siblings ...)
2023-10-06 19:53 ` [PATCH v2 4/6] mm: Handle COW " Matthew Wilcox (Oracle)
@ 2023-10-06 19:53 ` Matthew Wilcox (Oracle)
2023-10-08 22:06 ` Suren Baghdasaryan
2023-10-20 9:55 ` kernel test robot
2023-10-06 19:53 ` [PATCH v2 6/6] mm: Handle write faults to RO pages " Matthew Wilcox (Oracle)
5 siblings, 2 replies; 16+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-10-06 19:53 UTC (permalink / raw)
To: Andrew Morton; +Cc: Matthew Wilcox (Oracle), linux-mm, Suren Baghdasaryan
Most file-backed faults are already handled through ->map_pages(),
but if we need to do I/O we'll come this way. Since filemap_fault()
is now safe to be called under the VMA lock, we can handle these faults
under the VMA lock now.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
mm/memory.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/mm/memory.c b/mm/memory.c
index 938f481df0ab..e615afd28db2 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4617,10 +4617,9 @@ static vm_fault_t do_read_fault(struct vm_fault *vmf)
return ret;
}
- if (vmf->flags & FAULT_FLAG_VMA_LOCK) {
- vma_end_read(vmf->vma);
- return VM_FAULT_RETRY;
- }
+ ret = vmf_can_call_fault(vmf);
+ if (ret)
+ return ret;
ret = __do_fault(vmf);
if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY)))
--
2.40.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v2 6/6] mm: Handle write faults to RO pages under the VMA lock
2023-10-06 19:53 [PATCH v2 0/6] Handle more faults under the VMA lock Matthew Wilcox (Oracle)
` (4 preceding siblings ...)
2023-10-06 19:53 ` [PATCH v2 5/6] mm: Handle read " Matthew Wilcox (Oracle)
@ 2023-10-06 19:53 ` Matthew Wilcox (Oracle)
2023-10-08 22:07 ` Suren Baghdasaryan
5 siblings, 1 reply; 16+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-10-06 19:53 UTC (permalink / raw)
To: Andrew Morton; +Cc: Matthew Wilcox (Oracle), linux-mm, Suren Baghdasaryan
I think this is a pretty rare occurrence, but for consistency handle
faults with the VMA lock held the same way that we handle other
faults with the VMA lock held.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
mm/memory.c | 13 ++++++-------
1 file changed, 6 insertions(+), 7 deletions(-)
diff --git a/mm/memory.c b/mm/memory.c
index e615afd28db2..3d1bc622e344 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3301,10 +3301,9 @@ static vm_fault_t wp_pfn_shared(struct vm_fault *vmf)
vm_fault_t ret;
pte_unmap_unlock(vmf->pte, vmf->ptl);
- if (vmf->flags & FAULT_FLAG_VMA_LOCK) {
- vma_end_read(vmf->vma);
- return VM_FAULT_RETRY;
- }
+ ret = vmf_can_call_fault(vmf);
+ if (ret)
+ return ret;
vmf->flags |= FAULT_FLAG_MKWRITE;
ret = vma->vm_ops->pfn_mkwrite(vmf);
@@ -3328,10 +3327,10 @@ static vm_fault_t wp_page_shared(struct vm_fault *vmf, struct folio *folio)
vm_fault_t tmp;
pte_unmap_unlock(vmf->pte, vmf->ptl);
- if (vmf->flags & FAULT_FLAG_VMA_LOCK) {
+ tmp = vmf_can_call_fault(vmf);
+ if (tmp) {
folio_put(folio);
- vma_end_read(vmf->vma);
- return VM_FAULT_RETRY;
+ return tmp;
}
tmp = do_page_mkwrite(vmf, folio);
--
2.40.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v2 1/6] mm: Make lock_folio_maybe_drop_mmap() VMA lock aware
2023-10-06 19:53 ` [PATCH v2 1/6] mm: Make lock_folio_maybe_drop_mmap() VMA lock aware Matthew Wilcox (Oracle)
@ 2023-10-08 21:47 ` Suren Baghdasaryan
0 siblings, 0 replies; 16+ messages in thread
From: Suren Baghdasaryan @ 2023-10-08 21:47 UTC (permalink / raw)
To: Matthew Wilcox (Oracle); +Cc: Andrew Morton, linux-mm
On Fri, Oct 6, 2023 at 12:53 PM Matthew Wilcox (Oracle)
<willy@infradead.org> wrote:
>
> Drop the VMA lock instead of the mmap_lock if that's the one which
> is held.
>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
> ---
> mm/filemap.c | 13 +++++++------
> 1 file changed, 7 insertions(+), 6 deletions(-)
>
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 9481ffaf24e6..a598872d62cc 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -3104,7 +3104,7 @@ static int lock_folio_maybe_drop_mmap(struct vm_fault *vmf, struct folio *folio,
>
> /*
> * NOTE! This will make us return with VM_FAULT_RETRY, but with
> - * the mmap_lock still held. That's how FAULT_FLAG_RETRY_NOWAIT
> + * the fault lock still held. That's how FAULT_FLAG_RETRY_NOWAIT
> * is supposed to work. We have way too many special cases..
> */
> if (vmf->flags & FAULT_FLAG_RETRY_NOWAIT)
> @@ -3114,13 +3114,14 @@ static int lock_folio_maybe_drop_mmap(struct vm_fault *vmf, struct folio *folio,
> if (vmf->flags & FAULT_FLAG_KILLABLE) {
> if (__folio_lock_killable(folio)) {
> /*
> - * We didn't have the right flags to drop the mmap_lock,
> - * but all fault_handlers only check for fatal signals
> - * if we return VM_FAULT_RETRY, so we need to drop the
> - * mmap_lock here and return 0 if we don't have a fpin.
> + * We didn't have the right flags to drop the
> + * fault lock, but all fault_handlers only check
> + * for fatal signals if we return VM_FAULT_RETRY,
> + * so we need to drop the fault lock here and
> + * return 0 if we don't have a fpin.
> */
> if (*fpin == NULL)
> - mmap_read_unlock(vmf->vma->vm_mm);
> + release_fault_lock(vmf);
> return 0;
> }
> } else
> --
> 2.40.1
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 2/6] mm: Call wp_page_copy() under the VMA lock
2023-10-06 19:53 ` [PATCH v2 2/6] mm: Call wp_page_copy() under the VMA lock Matthew Wilcox (Oracle)
@ 2023-10-08 22:00 ` Suren Baghdasaryan
0 siblings, 0 replies; 16+ messages in thread
From: Suren Baghdasaryan @ 2023-10-08 22:00 UTC (permalink / raw)
To: Matthew Wilcox (Oracle); +Cc: Andrew Morton, linux-mm
On Fri, Oct 6, 2023 at 12:53 PM Matthew Wilcox (Oracle)
<willy@infradead.org> wrote:
>
> It is usually safe to call wp_page_copy() under the VMA lock. The only
> unsafe situation is when no anon_vma has been allocated for this VMA,
> and we have to look at adjacent VMAs to determine if their anon_vma can
> be shared. Since this happens only for the first COW of a page in this
> VMA, the majority of calls to wp_page_copy() do not need to fall back
> to the mmap_sem.
>
> Add vmf_anon_prepare() as an alternative to anon_vma_prepare() which
> will return RETRY if we currently hold the VMA lock and need to allocate
> an anon_vma. This lets us drop the check in do_wp_page().
>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
> ---
> mm/memory.c | 39 ++++++++++++++++++++++++++-------------
> 1 file changed, 26 insertions(+), 13 deletions(-)
>
> diff --git a/mm/memory.c b/mm/memory.c
> index 97f860d6cd2a..cff78c496728 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3042,6 +3042,21 @@ static inline void wp_page_reuse(struct vm_fault *vmf)
> count_vm_event(PGREUSE);
> }
>
> +static vm_fault_t vmf_anon_prepare(struct vm_fault *vmf)
> +{
> + struct vm_area_struct *vma = vmf->vma;
> +
> + if (likely(vma->anon_vma))
> + return 0;
> + if (vmf->flags & FAULT_FLAG_VMA_LOCK) {
> + vma_end_read(vma);
> + return VM_FAULT_RETRY;
> + }
> + if (__anon_vma_prepare(vma))
> + return VM_FAULT_OOM;
> + return 0;
> +}
> +
> /*
> * Handle the case of a page which we actually need to copy to a new page,
> * either due to COW or unsharing.
> @@ -3069,27 +3084,29 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
> pte_t entry;
> int page_copied = 0;
> struct mmu_notifier_range range;
> - int ret;
> + vm_fault_t ret;
>
> delayacct_wpcopy_start();
>
> if (vmf->page)
> old_folio = page_folio(vmf->page);
> - if (unlikely(anon_vma_prepare(vma)))
> - goto oom;
> + ret = vmf_anon_prepare(vmf);
> + if (unlikely(ret))
> + goto out;
>
> if (is_zero_pfn(pte_pfn(vmf->orig_pte))) {
> new_folio = vma_alloc_zeroed_movable_folio(vma, vmf->address);
> if (!new_folio)
> goto oom;
> } else {
> + int err;
> new_folio = vma_alloc_folio(GFP_HIGHUSER_MOVABLE, 0, vma,
> vmf->address, false);
> if (!new_folio)
> goto oom;
>
> - ret = __wp_page_copy_user(&new_folio->page, vmf->page, vmf);
> - if (ret) {
> + err = __wp_page_copy_user(&new_folio->page, vmf->page, vmf);
> + if (err) {
> /*
> * COW failed, if the fault was solved by other,
> * it's fine. If not, userspace would re-fault on
> @@ -3102,7 +3119,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
> folio_put(old_folio);
>
> delayacct_wpcopy_end();
> - return ret == -EHWPOISON ? VM_FAULT_HWPOISON : 0;
> + return err == -EHWPOISON ? VM_FAULT_HWPOISON : 0;
> }
> kmsan_copy_page_meta(&new_folio->page, vmf->page);
> }
> @@ -3212,11 +3229,13 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
> oom_free_new:
> folio_put(new_folio);
> oom:
> + ret = VM_FAULT_OOM;
> +out:
> if (old_folio)
> folio_put(old_folio);
>
> delayacct_wpcopy_end();
> - return VM_FAULT_OOM;
> + return ret;
> }
>
> /**
> @@ -3458,12 +3477,6 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf)
> return 0;
> }
> copy:
> - if ((vmf->flags & FAULT_FLAG_VMA_LOCK) && !vma->anon_vma) {
> - pte_unmap_unlock(vmf->pte, vmf->ptl);
> - vma_end_read(vmf->vma);
> - return VM_FAULT_RETRY;
> - }
> -
> /*
> * Ok, we need to copy. Oh, well..
> */
> --
> 2.40.1
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 3/6] mm: Handle shared faults under the VMA lock
2023-10-06 19:53 ` [PATCH v2 3/6] mm: Handle shared faults " Matthew Wilcox (Oracle)
@ 2023-10-08 22:01 ` Suren Baghdasaryan
2023-10-20 13:23 ` kernel test robot
1 sibling, 0 replies; 16+ messages in thread
From: Suren Baghdasaryan @ 2023-10-08 22:01 UTC (permalink / raw)
To: Matthew Wilcox (Oracle); +Cc: Andrew Morton, linux-mm
On Fri, Oct 6, 2023 at 12:53 PM Matthew Wilcox (Oracle)
<willy@infradead.org> wrote:
>
> There are many implementations of ->fault and some of them depend on
> mmap_lock being held. All vm_ops that implement ->map_pages() end up
> calling filemap_fault(), which I have audited to be sure it does not rely
> on mmap_lock. So (for now) key off ->map_pages existing as a flag to
> indicate that it's safe to call ->fault while only holding the vma lock.
>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
> ---
> mm/memory.c | 22 ++++++++++++++++++----
> 1 file changed, 18 insertions(+), 4 deletions(-)
>
> diff --git a/mm/memory.c b/mm/memory.c
> index cff78c496728..a9b0c135209a 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3042,6 +3042,21 @@ static inline void wp_page_reuse(struct vm_fault *vmf)
> count_vm_event(PGREUSE);
> }
>
> +/*
> + * We could add a bitflag somewhere, but for now, we know that all
> + * vm_ops that have a ->map_pages have been audited and don't need
> + * the mmap_lock to be held.
> + */
> +static inline vm_fault_t vmf_can_call_fault(const struct vm_fault *vmf)
> +{
> + struct vm_area_struct *vma = vmf->vma;
> +
> + if (vma->vm_ops->map_pages || !(vmf->flags & FAULT_FLAG_VMA_LOCK))
> + return 0;
> + vma_end_read(vma);
> + return VM_FAULT_RETRY;
> +}
> +
> static vm_fault_t vmf_anon_prepare(struct vm_fault *vmf)
> {
> struct vm_area_struct *vma = vmf->vma;
> @@ -4669,10 +4684,9 @@ static vm_fault_t do_shared_fault(struct vm_fault *vmf)
> vm_fault_t ret, tmp;
> struct folio *folio;
>
> - if (vmf->flags & FAULT_FLAG_VMA_LOCK) {
> - vma_end_read(vma);
> - return VM_FAULT_RETRY;
> - }
> + ret = vmf_can_call_fault(vmf);
> + if (ret)
> + return ret;
>
> ret = __do_fault(vmf);
> if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY)))
> --
> 2.40.1
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 4/6] mm: Handle COW faults under the VMA lock
2023-10-06 19:53 ` [PATCH v2 4/6] mm: Handle COW " Matthew Wilcox (Oracle)
@ 2023-10-08 22:05 ` Suren Baghdasaryan
2023-10-20 13:18 ` kernel test robot
1 sibling, 0 replies; 16+ messages in thread
From: Suren Baghdasaryan @ 2023-10-08 22:05 UTC (permalink / raw)
To: Matthew Wilcox (Oracle); +Cc: Andrew Morton, linux-mm
On Fri, Oct 6, 2023 at 12:53 PM Matthew Wilcox (Oracle)
<willy@infradead.org> wrote:
>
> If the page is not currently present in the page tables, we need to call
> the page fault handler to find out which page we're supposed to COW,
> so we need to both check that there is already an anon_vma and that the
> fault handler doesn't need the mmap_lock.
>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
> ---
> mm/memory.c | 12 +++++-------
> 1 file changed, 5 insertions(+), 7 deletions(-)
>
> diff --git a/mm/memory.c b/mm/memory.c
> index a9b0c135209a..938f481df0ab 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -4639,13 +4639,11 @@ static vm_fault_t do_cow_fault(struct vm_fault *vmf)
> struct vm_area_struct *vma = vmf->vma;
> vm_fault_t ret;
>
> - if (vmf->flags & FAULT_FLAG_VMA_LOCK) {
> - vma_end_read(vma);
> - return VM_FAULT_RETRY;
> - }
> -
> - if (unlikely(anon_vma_prepare(vma)))
> - return VM_FAULT_OOM;
> + ret = vmf_can_call_fault(vmf);
> + if (!ret)
> + ret = vmf_anon_prepare(vmf);
> + if (ret)
> + return ret;
>
> vmf->cow_page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, vmf->address);
> if (!vmf->cow_page)
> --
> 2.40.1
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 5/6] mm: Handle read faults under the VMA lock
2023-10-06 19:53 ` [PATCH v2 5/6] mm: Handle read " Matthew Wilcox (Oracle)
@ 2023-10-08 22:06 ` Suren Baghdasaryan
2023-10-20 9:55 ` kernel test robot
1 sibling, 0 replies; 16+ messages in thread
From: Suren Baghdasaryan @ 2023-10-08 22:06 UTC (permalink / raw)
To: Matthew Wilcox (Oracle); +Cc: Andrew Morton, linux-mm
On Fri, Oct 6, 2023 at 12:53 PM Matthew Wilcox (Oracle)
<willy@infradead.org> wrote:
>
> Most file-backed faults are already handled through ->map_pages(),
> but if we need to do I/O we'll come this way. Since filemap_fault()
> is now safe to be called under the VMA lock, we can handle these faults
> under the VMA lock now.
>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
> ---
> mm/memory.c | 7 +++----
> 1 file changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/mm/memory.c b/mm/memory.c
> index 938f481df0ab..e615afd28db2 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -4617,10 +4617,9 @@ static vm_fault_t do_read_fault(struct vm_fault *vmf)
> return ret;
> }
>
> - if (vmf->flags & FAULT_FLAG_VMA_LOCK) {
> - vma_end_read(vmf->vma);
> - return VM_FAULT_RETRY;
> - }
> + ret = vmf_can_call_fault(vmf);
> + if (ret)
> + return ret;
>
> ret = __do_fault(vmf);
> if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY)))
> --
> 2.40.1
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 6/6] mm: Handle write faults to RO pages under the VMA lock
2023-10-06 19:53 ` [PATCH v2 6/6] mm: Handle write faults to RO pages " Matthew Wilcox (Oracle)
@ 2023-10-08 22:07 ` Suren Baghdasaryan
0 siblings, 0 replies; 16+ messages in thread
From: Suren Baghdasaryan @ 2023-10-08 22:07 UTC (permalink / raw)
To: Matthew Wilcox (Oracle); +Cc: Andrew Morton, linux-mm
On Fri, Oct 6, 2023 at 12:53 PM Matthew Wilcox (Oracle)
<willy@infradead.org> wrote:
>
> I think this is a pretty rare occurrence, but for consistency handle
> faults with the VMA lock held the same way that we handle other
> faults with the VMA lock held.
>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
> ---
> mm/memory.c | 13 ++++++-------
> 1 file changed, 6 insertions(+), 7 deletions(-)
>
> diff --git a/mm/memory.c b/mm/memory.c
> index e615afd28db2..3d1bc622e344 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3301,10 +3301,9 @@ static vm_fault_t wp_pfn_shared(struct vm_fault *vmf)
> vm_fault_t ret;
>
> pte_unmap_unlock(vmf->pte, vmf->ptl);
> - if (vmf->flags & FAULT_FLAG_VMA_LOCK) {
> - vma_end_read(vmf->vma);
> - return VM_FAULT_RETRY;
> - }
> + ret = vmf_can_call_fault(vmf);
> + if (ret)
> + return ret;
>
> vmf->flags |= FAULT_FLAG_MKWRITE;
> ret = vma->vm_ops->pfn_mkwrite(vmf);
> @@ -3328,10 +3327,10 @@ static vm_fault_t wp_page_shared(struct vm_fault *vmf, struct folio *folio)
> vm_fault_t tmp;
>
> pte_unmap_unlock(vmf->pte, vmf->ptl);
> - if (vmf->flags & FAULT_FLAG_VMA_LOCK) {
> + tmp = vmf_can_call_fault(vmf);
> + if (tmp) {
> folio_put(folio);
> - vma_end_read(vmf->vma);
> - return VM_FAULT_RETRY;
> + return tmp;
> }
>
> tmp = do_page_mkwrite(vmf, folio);
> --
> 2.40.1
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 5/6] mm: Handle read faults under the VMA lock
2023-10-06 19:53 ` [PATCH v2 5/6] mm: Handle read " Matthew Wilcox (Oracle)
2023-10-08 22:06 ` Suren Baghdasaryan
@ 2023-10-20 9:55 ` kernel test robot
1 sibling, 0 replies; 16+ messages in thread
From: kernel test robot @ 2023-10-20 9:55 UTC (permalink / raw)
To: Matthew Wilcox (Oracle)
Cc: oe-lkp, lkp, linux-mm, ying.huang, feng.tang, fengwei.yin,
Andrew Morton, Matthew Wilcox (Oracle),
Suren Baghdasaryan, oliver.sang
Hello,
kernel test robot noticed a 46.0% improvement of vm-scalability.throughput on:
commit: 39fbbca087dd149cdb82f08e7b92d62395c21ecf ("[PATCH v2 5/6] mm: Handle read faults under the VMA lock")
url: https://github.com/intel-lab-lkp/linux/commits/Matthew-Wilcox-Oracle/mm-Make-lock_folio_maybe_drop_mmap-VMA-lock-aware/20231007-035513
base: v6.6-rc4
patch link: https://lore.kernel.org/all/20231006195318.4087158-6-willy@infradead.org/
patch subject: [PATCH v2 5/6] mm: Handle read faults under the VMA lock
testcase: vm-scalability
test machine: 96 threads 2 sockets Intel(R) Xeon(R) Platinum 8260L CPU @ 2.40GHz (Cascade Lake) with 128G memory
parameters:
runtime: 300s
size: 2T
test: shm-pread-seq-mt
cpufreq_governor: performance
test-description: The motivation behind this suite is to exercise functions and regions of the mm/ of the Linux kernel which are of interest to us.
test-url: https://git.kernel.org/cgit/linux/kernel/git/wfg/vm-scalability.git/
Details are as below:
-------------------------------------------------------------------------------------------------->
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20231020/202310201715.3f52109d-oliver.sang@intel.com
=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/runtime/size/tbox_group/test/testcase:
gcc-12/performance/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/300s/2T/lkp-csl-2sp3/shm-pread-seq-mt/vm-scalability
commit:
90e99527c7 ("mm: Handle COW faults under the VMA lock")
39fbbca087 ("mm: Handle read faults under the VMA lock")
90e99527c746cd9e 39fbbca087dd149cdb82f08e7b9
---------------- ---------------------------
%stddev %change %stddev
\ | \
34.69 ± 23% +72.5% 59.82 ± 2% vm-scalability.free_time
173385 +45.6% 252524 vm-scalability.median
16599151 +46.0% 24242352 vm-scalability.throughput
390.45 +6.9% 417.32 vm-scalability.time.elapsed_time
390.45 +6.9% 417.32 vm-scalability.time.elapsed_time.max
45781 ± 2% +16.3% 53251 ± 2% vm-scalability.time.involuntary_context_switches
4.213e+09 +50.1% 6.325e+09 vm-scalability.time.maximum_resident_set_size
5.316e+08 +47.3% 7.83e+08 vm-scalability.time.minor_page_faults
6400 -8.0% 5890 vm-scalability.time.percent_of_cpu_this_job_got
21673 -10.2% 19455 vm-scalability.time.system_time
3319 +54.4% 5126 vm-scalability.time.user_time
2.321e+08 ± 2% +27.2% 2.953e+08 ± 5% vm-scalability.time.voluntary_context_switches
5.004e+09 +42.2% 7.116e+09 vm-scalability.workload
13110 +24.0% 16254 uptime.idle
1.16e+10 +24.5% 1.444e+10 cpuidle..time
2.648e+08 ± 3% +16.3% 3.079e+08 ± 5% cpuidle..usage
22.86 +6.3 29.17 mpstat.cpu.all.idle%
8.29 ± 5% -1.2 7.13 ± 7% mpstat.cpu.all.iowait%
58.63 -9.2 49.38 mpstat.cpu.all.sys%
9.05 +4.0 13.09 mpstat.cpu.all.usr%
8721571 ± 5% +44.8% 12630342 ± 2% numa-numastat.node0.local_node
8773210 ± 5% +44.8% 12706884 ± 2% numa-numastat.node0.numa_hit
7793725 ± 5% +51.3% 11793573 numa-numastat.node1.local_node
7842342 ± 5% +50.7% 11816543 numa-numastat.node1.numa_hit
23.17 +26.8% 29.37 vmstat.cpu.id
31295414 +50.9% 47211341 vmstat.memory.cache
95303378 -18.8% 77355720 vmstat.memory.free
1176885 ± 2% +19.2% 1402891 ± 3% vmstat.system.cs
194658 +5.4% 205149 ± 2% vmstat.system.in
9920198 ± 10% -48.9% 5071533 ± 15% turbostat.C1
0.51 ± 12% -0.3 0.21 ± 12% turbostat.C1%
1831098 ± 15% -72.0% 512888 ± 19% turbostat.C1E
0.14 ± 13% -0.1 0.06 ± 11% turbostat.C1E%
8736699 +36.3% 11905646 turbostat.C6
22.74 +6.3 29.02 turbostat.C6%
17.82 +25.5% 22.37 turbostat.CPU%c1
5.36 +28.2% 6.87 turbostat.CPU%c6
0.07 +42.9% 0.10 turbostat.IPC
77317703 +12.3% 86804635 ± 3% turbostat.IRQ
2.443e+08 ± 3% +18.9% 2.904e+08 ± 6% turbostat.POLL
4.80 +30.2% 6.24 turbostat.Pkg%pc2
266.73 -1.3% 263.33 turbostat.PkgWatt
0.00 -25.0% 0.00 perf-sched.sch_delay.avg.ms.io_schedule.folio_wait_bit_common.shmem_get_folio_gfp.shmem_fault
0.06 ± 11% -21.8% 0.04 ± 9% perf-sched.sch_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
26.45 ± 9% -16.0% 22.21 ± 6% perf-sched.sch_delay.max.ms.io_schedule.folio_wait_bit_common.shmem_get_folio_gfp.shmem_fault
0.00 -25.0% 0.00 perf-sched.total_sch_delay.average.ms
106.37 ±167% -79.1% 22.21 ± 6% perf-sched.total_sch_delay.max.ms
0.46 ± 2% -16.0% 0.39 ± 5% perf-sched.total_wait_and_delay.average.ms
2202457 ± 2% +26.1% 2776824 ± 3% perf-sched.total_wait_and_delay.count.ms
0.45 ± 2% -15.9% 0.38 ± 5% perf-sched.total_wait_time.average.ms
0.02 ± 2% -19.8% 0.01 ± 2% perf-sched.wait_and_delay.avg.ms.io_schedule.folio_wait_bit_common.shmem_get_folio_gfp.shmem_fault
494.65 ± 4% +10.6% 546.88 ± 3% perf-sched.wait_and_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
2196122 ± 2% +26.1% 2770017 ± 3% perf-sched.wait_and_delay.count.io_schedule.folio_wait_bit_common.shmem_get_folio_gfp.shmem_fault
0.01 ± 3% -19.5% 0.01 perf-sched.wait_time.avg.ms.io_schedule.folio_wait_bit_common.shmem_get_folio_gfp.shmem_fault
494.63 ± 4% +10.6% 546.87 ± 3% perf-sched.wait_time.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
0.22 ± 42% -68.8% 0.07 ±125% perf-sched.wait_time.max.ms.__cond_resched.stop_one_cpu.migrate_task_to.task_numa_migrate.isra
11445425 +82.1% 20837223 meminfo.Active
11444642 +82.1% 20836443 meminfo.Active(anon)
31218122 +51.0% 47138293 meminfo.Cached
30006048 +53.7% 46116816 meminfo.Committed_AS
17425032 +37.4% 23950392 meminfo.Inactive
17423257 +37.5% 23948613 meminfo.Inactive(anon)
164910 +21.8% 200913 meminfo.KReclaimable
26336530 +57.6% 41514589 meminfo.Mapped
94668993 -19.0% 76693589 meminfo.MemAvailable
95202238 -18.9% 77208832 meminfo.MemFree
36610737 +49.1% 54604143 meminfo.Memused
4072810 +50.1% 6114589 meminfo.PageTables
164910 +21.8% 200913 meminfo.SReclaimable
28535318 +55.8% 44455489 meminfo.Shmem
367289 +10.1% 404373 meminfo.Slab
37978157 +50.2% 57055526 meminfo.max_used_kB
2860756 +82.1% 5208445 proc-vmstat.nr_active_anon
2361286 -19.0% 1912151 proc-vmstat.nr_dirty_background_threshold
4728345 -19.0% 3828978 proc-vmstat.nr_dirty_threshold
7804148 +51.0% 11783823 proc-vmstat.nr_file_pages
23801109 -18.9% 19303173 proc-vmstat.nr_free_pages
4355690 +37.5% 5986921 proc-vmstat.nr_inactive_anon
6583645 +57.6% 10377790 proc-vmstat.nr_mapped
1018109 +50.1% 1528565 proc-vmstat.nr_page_table_pages
7133183 +55.8% 11112858 proc-vmstat.nr_shmem
41226 +21.8% 50226 proc-vmstat.nr_slab_reclaimable
2860756 +82.1% 5208445 proc-vmstat.nr_zone_active_anon
4355690 +37.5% 5986921 proc-vmstat.nr_zone_inactive_anon
112051 +3.8% 116273 proc-vmstat.numa_hint_faults
16618553 +47.6% 24525492 proc-vmstat.numa_hit
16518296 +47.9% 24425975 proc-vmstat.numa_local
11052273 +49.9% 16566743 proc-vmstat.pgactivate
16757533 +47.2% 24672644 proc-vmstat.pgalloc_normal
5.329e+08 +47.2% 7.844e+08 proc-vmstat.pgfault
16101786 +48.3% 23877738 proc-vmstat.pgfree
3302784 +6.0% 3500288 proc-vmstat.unevictable_pgs_scanned
6101287 ± 7% +81.3% 11062634 ± 3% numa-meminfo.node0.Active
6101026 ± 7% +81.3% 11062389 ± 3% numa-meminfo.node0.Active(anon)
17217355 ± 5% +46.3% 25196100 ± 3% numa-meminfo.node0.FilePages
9363213 ± 7% +31.9% 12347562 ± 2% numa-meminfo.node0.Inactive
9362621 ± 7% +31.9% 12347130 ± 2% numa-meminfo.node0.Inactive(anon)
14211196 ± 7% +51.2% 21487599 numa-meminfo.node0.Mapped
45879058 ± 2% -19.6% 36888633 ± 2% numa-meminfo.node0.MemFree
19925073 ± 5% +45.1% 28915498 ± 3% numa-meminfo.node0.MemUsed
2032891 +50.5% 3060344 numa-meminfo.node0.PageTables
15318197 ± 6% +52.0% 23276446 ± 2% numa-meminfo.node0.Shmem
5342463 ± 7% +82.9% 9769639 ± 4% numa-meminfo.node1.Active
5341941 ± 7% +82.9% 9769104 ± 4% numa-meminfo.node1.Active(anon)
13998966 ± 8% +56.6% 21919509 ± 3% numa-meminfo.node1.FilePages
8060699 ± 7% +43.7% 11584190 ± 2% numa-meminfo.node1.Inactive
8059515 ± 7% +43.7% 11582844 ± 2% numa-meminfo.node1.Inactive(anon)
12125745 ± 7% +65.0% 20005342 numa-meminfo.node1.Mapped
49326340 ± 2% -18.2% 40347902 ± 2% numa-meminfo.node1.MemFree
16682503 ± 7% +53.8% 25660941 ± 3% numa-meminfo.node1.MemUsed
2039529 +49.6% 3051247 numa-meminfo.node1.PageTables
13214266 ± 7% +60.1% 21155303 ± 2% numa-meminfo.node1.Shmem
156378 ± 13% +21.1% 189316 ± 9% numa-meminfo.node1.Slab
1525784 ± 7% +81.4% 2767183 ± 3% numa-vmstat.node0.nr_active_anon
4304756 ± 5% +46.4% 6302189 ± 3% numa-vmstat.node0.nr_file_pages
11469263 ± 2% -19.6% 9218468 ± 2% numa-vmstat.node0.nr_free_pages
2340569 ± 7% +32.0% 3088383 ± 2% numa-vmstat.node0.nr_inactive_anon
3553304 ± 7% +51.3% 5375214 numa-vmstat.node0.nr_mapped
508315 +50.6% 765564 numa-vmstat.node0.nr_page_table_pages
3829966 ± 6% +52.0% 5822276 ± 2% numa-vmstat.node0.nr_shmem
1525783 ± 7% +81.4% 2767184 ± 3% numa-vmstat.node0.nr_zone_active_anon
2340569 ± 7% +32.0% 3088382 ± 2% numa-vmstat.node0.nr_zone_inactive_anon
8773341 ± 5% +44.8% 12707017 ± 2% numa-vmstat.node0.numa_hit
8721702 ± 5% +44.8% 12630474 ± 2% numa-vmstat.node0.numa_local
1335910 ± 7% +82.9% 2443778 ± 4% numa-vmstat.node1.nr_active_anon
3500040 ± 8% +56.7% 5482887 ± 3% numa-vmstat.node1.nr_file_pages
12331163 ± 2% -18.2% 10083422 ± 2% numa-vmstat.node1.nr_free_pages
2014795 ± 7% +43.8% 2897243 ± 2% numa-vmstat.node1.nr_inactive_anon
3031806 ± 7% +65.1% 5004449 numa-vmstat.node1.nr_mapped
510000 +49.7% 763297 numa-vmstat.node1.nr_page_table_pages
3303865 ± 7% +60.2% 5291835 ± 2% numa-vmstat.node1.nr_shmem
1335910 ± 7% +82.9% 2443778 ± 4% numa-vmstat.node1.nr_zone_active_anon
2014795 ± 7% +43.8% 2897242 ± 2% numa-vmstat.node1.nr_zone_inactive_anon
7842425 ± 5% +50.7% 11816530 numa-vmstat.node1.numa_hit
7793808 ± 5% +51.3% 11793555 numa-vmstat.node1.numa_local
9505083 +21.3% 11532590 ± 3% sched_debug.cfs_rq:/.avg_vruntime.avg
9551715 +21.4% 11595502 ± 3% sched_debug.cfs_rq:/.avg_vruntime.max
9426050 +21.4% 11443528 ± 3% sched_debug.cfs_rq:/.avg_vruntime.min
19249 ± 4% +28.3% 24698 ± 10% sched_debug.cfs_rq:/.avg_vruntime.stddev
0.79 -30.7% 0.55 ± 8% sched_debug.cfs_rq:/.h_nr_running.avg
12458 ± 12% +70.8% 21277 ± 22% sched_debug.cfs_rq:/.load.avg
13767 ± 95% +311.7% 56677 ± 29% sched_debug.cfs_rq:/.load.stddev
9505083 +21.3% 11532590 ± 3% sched_debug.cfs_rq:/.min_vruntime.avg
9551715 +21.4% 11595502 ± 3% sched_debug.cfs_rq:/.min_vruntime.max
9426050 +21.4% 11443528 ± 3% sched_debug.cfs_rq:/.min_vruntime.min
19249 ± 4% +28.3% 24698 ± 10% sched_debug.cfs_rq:/.min_vruntime.stddev
0.78 -30.7% 0.54 ± 8% sched_debug.cfs_rq:/.nr_running.avg
170.67 -21.4% 134.10 ± 6% sched_debug.cfs_rq:/.removed.load_avg.max
708.55 -32.2% 480.43 ± 7% sched_debug.cfs_rq:/.runnable_avg.avg
1510 ± 3% -12.5% 1320 ± 4% sched_debug.cfs_rq:/.runnable_avg.max
219.68 ± 7% -12.7% 191.74 ± 5% sched_debug.cfs_rq:/.runnable_avg.stddev
707.51 -32.3% 479.05 ± 7% sched_debug.cfs_rq:/.util_avg.avg
1506 ± 3% -12.6% 1317 ± 4% sched_debug.cfs_rq:/.util_avg.max
219.64 ± 7% -13.0% 191.15 ± 5% sched_debug.cfs_rq:/.util_avg.stddev
564.18 ± 2% -32.4% 381.24 ± 8% sched_debug.cfs_rq:/.util_est_enqueued.avg
1168 ± 7% -14.8% 995.94 ± 7% sched_debug.cfs_rq:/.util_est_enqueued.max
235.45 ± 5% -21.4% 185.13 ± 7% sched_debug.cfs_rq:/.util_est_enqueued.stddev
149234 ± 5% +192.0% 435707 ± 10% sched_debug.cpu.avg_idle.avg
404765 ± 17% +47.3% 596259 ± 15% sched_debug.cpu.avg_idle.max
5455 ± 4% +3302.8% 185624 ± 34% sched_debug.cpu.avg_idle.min
201990 +24.9% 252309 ± 5% sched_debug.cpu.clock.avg
201997 +24.9% 252315 ± 5% sched_debug.cpu.clock.max
201983 +24.9% 252303 ± 5% sched_debug.cpu.clock.min
3.80 ± 2% -10.1% 3.42 ± 3% sched_debug.cpu.clock.stddev
200296 +24.8% 249952 ± 5% sched_debug.cpu.clock_task.avg
200541 +24.8% 250280 ± 5% sched_debug.cpu.clock_task.max
194086 +25.5% 243582 ± 5% sched_debug.cpu.clock_task.min
4069 -32.7% 2739 ± 8% sched_debug.cpu.curr->pid.avg
8703 +15.2% 10027 ± 3% sched_debug.cpu.curr->pid.max
0.00 ± 6% -27.2% 0.00 ± 5% sched_debug.cpu.next_balance.stddev
0.78 -32.7% 0.52 ± 8% sched_debug.cpu.nr_running.avg
0.33 ± 6% -13.9% 0.29 ± 5% sched_debug.cpu.nr_running.stddev
2372181 ± 2% +57.6% 3737590 ± 8% sched_debug.cpu.nr_switches.avg
2448893 ± 2% +58.5% 3880813 ± 8% sched_debug.cpu.nr_switches.max
2290032 ± 2% +55.9% 3570559 ± 8% sched_debug.cpu.nr_switches.min
36185 ± 10% +74.8% 63244 ± 8% sched_debug.cpu.nr_switches.stddev
0.10 ± 19% +138.0% 0.23 ± 19% sched_debug.cpu.nr_uninterruptible.avg
201984 +24.9% 252304 ± 5% sched_debug.cpu_clk
201415 +25.0% 251735 ± 5% sched_debug.ktime
202543 +24.8% 252867 ± 5% sched_debug.sched_clk
3.84 ± 2% -14.1% 3.30 ± 2% perf-stat.i.MPKI
1.679e+10 +30.1% 2.186e+10 perf-stat.i.branch-instructions
0.54 ± 2% -0.1 0.45 perf-stat.i.branch-miss-rate%
75872684 -2.6% 73927540 perf-stat.i.branch-misses
31.85 -1.1 30.75 perf-stat.i.cache-miss-rate%
1184992 ± 2% +19.1% 1411069 ± 3% perf-stat.i.context-switches
3.49 -29.3% 2.47 perf-stat.i.cpi
2.265e+11 -8.1% 2.081e+11 perf-stat.i.cpu-cycles
950.46 ± 3% -11.6% 840.03 ± 2% perf-stat.i.cycles-between-cache-misses
9514714 ± 12% +27.3% 12109471 ± 10% perf-stat.i.dTLB-load-misses
1.556e+10 +29.9% 2.022e+10 perf-stat.i.dTLB-loads
1575276 ± 5% +35.8% 2138868 ± 5% perf-stat.i.dTLB-store-misses
3.396e+09 +21.6% 4.129e+09 perf-stat.i.dTLB-stores
79.97 +2.8 82.74 perf-stat.i.iTLB-load-miss-rate%
4265612 +8.4% 4624960 ± 2% perf-stat.i.iTLB-load-misses
712599 ± 8% -38.4% 438645 ± 7% perf-stat.i.iTLB-loads
5.59e+10 +27.7% 7.137e+10 perf-stat.i.instructions
12120 +11.6% 13525 ± 2% perf-stat.i.instructions-per-iTLB-miss
0.35 +32.7% 0.46 perf-stat.i.ipc
0.04 ± 38% +119.0% 0.08 ± 33% perf-stat.i.major-faults
2.36 -8.1% 2.17 perf-stat.i.metric.GHz
863.69 +7.5% 928.37 perf-stat.i.metric.K/sec
378.76 +28.8% 487.87 perf-stat.i.metric.M/sec
1359089 +37.9% 1874285 perf-stat.i.minor-faults
84.30 -2.8 81.50 perf-stat.i.node-load-miss-rate%
89.54 -2.5 87.09 perf-stat.i.node-store-miss-rate%
1359089 +37.9% 1874285 perf-stat.i.page-faults
3.65 ± 3% -22.5% 2.82 ± 4% perf-stat.overall.MPKI
0.45 -0.1 0.34 perf-stat.overall.branch-miss-rate%
32.64 -1.7 30.98 perf-stat.overall.cache-miss-rate%
4.05 -28.0% 2.92 perf-stat.overall.cpi
1113 ± 3% -7.1% 1034 ± 3% perf-stat.overall.cycles-between-cache-misses
0.05 ± 5% +0.0 0.05 ± 5% perf-stat.overall.dTLB-store-miss-rate%
85.73 +5.6 91.37 perf-stat.overall.iTLB-load-miss-rate%
13110 ± 2% +17.8% 15440 ± 2% perf-stat.overall.instructions-per-iTLB-miss
0.25 +39.0% 0.34 perf-stat.overall.ipc
4378 -4.2% 4195 perf-stat.overall.path-length
1.679e+10 +30.2% 2.186e+10 perf-stat.ps.branch-instructions
75862675 -2.6% 73920168 perf-stat.ps.branch-misses
1184994 ± 2% +19.1% 1411192 ± 3% perf-stat.ps.context-switches
2.265e+11 -8.1% 2.082e+11 perf-stat.ps.cpu-cycles
9518014 ± 12% +27.3% 12118863 ± 10% perf-stat.ps.dTLB-load-misses
1.556e+10 +29.9% 2.022e+10 perf-stat.ps.dTLB-loads
1575414 ± 5% +35.8% 2139373 ± 5% perf-stat.ps.dTLB-store-misses
3.396e+09 +21.6% 4.129e+09 perf-stat.ps.dTLB-stores
4265139 +8.4% 4625090 ± 2% perf-stat.ps.iTLB-load-misses
711002 ± 8% -38.5% 437258 ± 7% perf-stat.ps.iTLB-loads
5.59e+10 +27.7% 7.137e+10 perf-stat.ps.instructions
0.04 ± 37% +118.9% 0.08 ± 33% perf-stat.ps.major-faults
1359186 +37.9% 1874615 perf-stat.ps.minor-faults
1359186 +37.9% 1874615 perf-stat.ps.page-faults
2.191e+13 +36.3% 2.986e+13 perf-stat.total.instructions
74.66 -6.7 67.93 perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.do_access
74.61 -6.7 67.89 perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.do_access
53.18 -6.3 46.88 perf-profile.calltrace.cycles-pp.filemap_map_pages.do_read_fault.do_fault.__handle_mm_fault.handle_mm_fault
35.54 -6.1 29.43 perf-profile.calltrace.cycles-pp.next_uptodate_folio.filemap_map_pages.do_read_fault.do_fault.__handle_mm_fault
76.49 -5.4 71.07 perf-profile.calltrace.cycles-pp.asm_exc_page_fault.do_access
79.82 -3.9 75.89 perf-profile.calltrace.cycles-pp.do_access
70.02 -3.8 66.23 perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
70.39 -3.7 66.70 perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.do_access
68.31 -2.8 65.51 perf-profile.calltrace.cycles-pp.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
68.29 -2.8 65.50 perf-profile.calltrace.cycles-pp.do_read_fault.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
0.65 ± 7% -0.3 0.37 ± 71% perf-profile.calltrace.cycles-pp.finish_task_switch.__schedule.schedule.io_schedule.folio_wait_bit_common
1.94 ± 6% -0.2 1.71 ± 6% perf-profile.calltrace.cycles-pp.__schedule.schedule.io_schedule.folio_wait_bit_common.shmem_get_folio_gfp
1.96 ± 6% -0.2 1.74 ± 6% perf-profile.calltrace.cycles-pp.io_schedule.folio_wait_bit_common.shmem_get_folio_gfp.shmem_fault.__do_fault
1.95 ± 6% -0.2 1.74 ± 6% perf-profile.calltrace.cycles-pp.schedule.io_schedule.folio_wait_bit_common.shmem_get_folio_gfp.shmem_fault
0.86 +0.1 1.00 ± 2% perf-profile.calltrace.cycles-pp.folio_add_file_rmap_range.set_pte_range.filemap_map_pages.do_read_fault.do_fault
0.56 +0.2 0.72 ± 4% perf-profile.calltrace.cycles-pp.sched_ttwu_pending.__flush_smp_call_function_queue.flush_smp_call_function_queue.do_idle.cpu_startup_entry
1.16 ± 3% +0.2 1.33 ± 2% perf-profile.calltrace.cycles-pp.set_pte_range.filemap_map_pages.do_read_fault.do_fault.__handle_mm_fault
0.71 ± 2% +0.2 0.92 ± 3% perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.flush_smp_call_function_queue.do_idle.cpu_startup_entry.start_secondary
0.78 +0.2 1.02 ± 4% perf-profile.calltrace.cycles-pp.flush_smp_call_function_queue.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
0.44 ± 44% +0.3 0.73 ± 3% perf-profile.calltrace.cycles-pp.set_pte_range.finish_fault.do_read_fault.do_fault.__handle_mm_fault
0.89 ± 9% +0.3 1.24 ± 8% perf-profile.calltrace.cycles-pp.finish_fault.do_read_fault.do_fault.__handle_mm_fault.handle_mm_fault
1.23 +0.4 1.59 perf-profile.calltrace.cycles-pp.irqentry_exit_to_user_mode.asm_exc_page_fault.do_access
0.18 ±141% +0.4 0.57 ± 5% perf-profile.calltrace.cycles-pp.try_to_wake_up.wake_page_function.__wake_up_common.folio_wake_bit.filemap_map_pages
1.50 +0.6 2.05 perf-profile.calltrace.cycles-pp.filemap_get_entry.shmem_get_folio_gfp.shmem_fault.__do_fault.do_read_fault
0.00 +0.6 0.56 ± 4% perf-profile.calltrace.cycles-pp.wake_page_function.__wake_up_common.folio_wake_bit.do_read_fault.do_fault
0.09 ±223% +0.6 0.69 ± 4% perf-profile.calltrace.cycles-pp.__wake_up_common.folio_wake_bit.do_read_fault.do_fault.__handle_mm_fault
0.00 +0.6 0.60 perf-profile.calltrace.cycles-pp.folio_add_file_rmap_range.set_pte_range.finish_fault.do_read_fault.do_fault
2.98 ± 3% +0.7 3.66 ± 2% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irq.folio_wait_bit_common.shmem_get_folio_gfp.shmem_fault
3.39 ± 3% +0.8 4.21 perf-profile.calltrace.cycles-pp._raw_spin_lock_irq.folio_wait_bit_common.shmem_get_folio_gfp.shmem_fault.__do_fault
7.48 +0.9 8.41 perf-profile.calltrace.cycles-pp.folio_wait_bit_common.shmem_get_folio_gfp.shmem_fault.__do_fault.do_read_fault
2.25 ± 6% +1.0 3.30 ± 3% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_wake_bit.do_read_fault.do_fault
2.44 ± 5% +1.1 3.56 ± 2% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_wake_bit.do_read_fault.do_fault.__handle_mm_fault
3.11 ± 4% +1.4 4.52 perf-profile.calltrace.cycles-pp.folio_wake_bit.do_read_fault.do_fault.__handle_mm_fault.handle_mm_fault
10.14 +1.9 12.06 perf-profile.calltrace.cycles-pp.shmem_get_folio_gfp.shmem_fault.__do_fault.do_read_fault.do_fault
10.26 +2.0 12.25 perf-profile.calltrace.cycles-pp.shmem_fault.__do_fault.do_read_fault.do_fault.__handle_mm_fault
10.29 +2.0 12.29 perf-profile.calltrace.cycles-pp.__do_fault.do_read_fault.do_fault.__handle_mm_fault.handle_mm_fault
9.69 +5.5 15.21 ± 2% perf-profile.calltrace.cycles-pp.do_rw_once
74.66 -6.7 67.94 perf-profile.children.cycles-pp.exc_page_fault
74.62 -6.7 67.90 perf-profile.children.cycles-pp.do_user_addr_fault
53.19 -6.3 46.89 perf-profile.children.cycles-pp.filemap_map_pages
35.56 -6.1 29.44 perf-profile.children.cycles-pp.next_uptodate_folio
76.51 -6.0 70.48 perf-profile.children.cycles-pp.asm_exc_page_fault
70.02 -3.8 66.24 perf-profile.children.cycles-pp.__handle_mm_fault
70.40 -3.7 66.71 perf-profile.children.cycles-pp.handle_mm_fault
81.33 -3.5 77.78 perf-profile.children.cycles-pp.do_access
68.32 -2.8 65.52 perf-profile.children.cycles-pp.do_fault
68.30 -2.8 65.50 perf-profile.children.cycles-pp.do_read_fault
2.07 ± 7% -2.0 0.12 ± 6% perf-profile.children.cycles-pp.down_read_trylock
1.28 ± 4% -1.1 0.16 ± 4% perf-profile.children.cycles-pp.up_read
0.65 ± 12% -0.4 0.28 ± 15% perf-profile.children.cycles-pp.intel_idle_irq
1.96 ± 6% -0.2 1.74 ± 6% perf-profile.children.cycles-pp.schedule
1.96 ± 6% -0.2 1.74 ± 6% perf-profile.children.cycles-pp.io_schedule
0.36 ± 7% -0.2 0.15 ± 3% perf-profile.children.cycles-pp.mtree_range_walk
0.30 ± 8% -0.2 0.13 ± 14% perf-profile.children.cycles-pp.mm_cid_get
0.12 ± 12% -0.1 0.03 ±100% perf-profile.children.cycles-pp.update_sg_lb_stats
0.16 ± 9% -0.1 0.07 ± 15% perf-profile.children.cycles-pp.load_balance
0.14 ± 10% -0.1 0.05 ± 46% perf-profile.children.cycles-pp.update_sd_lb_stats
0.20 ± 10% -0.1 0.11 ± 8% perf-profile.children.cycles-pp.newidle_balance
0.14 ± 10% -0.1 0.06 ± 17% perf-profile.children.cycles-pp.find_busiest_group
0.33 ± 6% -0.0 0.28 ± 5% perf-profile.children.cycles-pp.pick_next_task_fair
0.05 +0.0 0.06 perf-profile.children.cycles-pp.nohz_run_idle_balance
0.06 +0.0 0.08 ± 6% perf-profile.children.cycles-pp.__update_load_avg_se
0.04 ± 44% +0.0 0.06 perf-profile.children.cycles-pp.reweight_entity
0.09 ± 7% +0.0 0.11 ± 4% perf-profile.children.cycles-pp.xas_descend
0.08 ± 5% +0.0 0.10 ± 4% perf-profile.children.cycles-pp.update_curr
0.09 ± 7% +0.0 0.11 ± 3% perf-profile.children.cycles-pp.prepare_task_switch
0.10 ± 4% +0.0 0.12 ± 3% perf-profile.children.cycles-pp.call_function_single_prep_ipi
0.08 ± 4% +0.0 0.10 ± 5% perf-profile.children.cycles-pp.__update_load_avg_cfs_rq
0.04 ± 44% +0.0 0.06 ± 7% perf-profile.children.cycles-pp.sched_clock
0.13 ± 7% +0.0 0.16 ± 4% perf-profile.children.cycles-pp.__sysvec_call_function_single
0.08 ± 6% +0.0 0.10 ± 3% perf-profile.children.cycles-pp.set_next_entity
0.16 ± 4% +0.0 0.19 ± 3% perf-profile.children.cycles-pp.__switch_to
0.09 ± 4% +0.0 0.12 ± 4% perf-profile.children.cycles-pp.llist_reverse_order
0.04 ± 44% +0.0 0.07 ± 5% perf-profile.children.cycles-pp.place_entity
0.14 ± 3% +0.0 0.16 ± 3% perf-profile.children.cycles-pp.llist_add_batch
0.09 ± 5% +0.0 0.12 ± 6% perf-profile.children.cycles-pp.available_idle_cpu
0.15 ± 4% +0.0 0.18 ± 4% perf-profile.children.cycles-pp.sysvec_call_function_single
0.08 ± 5% +0.0 0.12 ± 6% perf-profile.children.cycles-pp.wake_affine
0.08 +0.0 0.11 perf-profile.children.cycles-pp.__list_del_entry_valid_or_report
0.11 ± 4% +0.0 0.14 ± 3% perf-profile.children.cycles-pp.update_rq_clock_task
0.11 ± 4% +0.0 0.14 ± 4% perf-profile.children.cycles-pp.__switch_to_asm
0.04 ± 44% +0.0 0.07 ± 6% perf-profile.children.cycles-pp.folio_add_lru
0.06 ± 7% +0.0 0.10 ± 6% perf-profile.children.cycles-pp.shmem_add_to_page_cache
0.18 ± 5% +0.0 0.22 ± 4% perf-profile.children.cycles-pp.asm_sysvec_call_function_single
0.02 ±141% +0.0 0.06 ± 6% perf-profile.children.cycles-pp.tick_nohz_idle_exit
0.12 ± 3% +0.0 0.17 ± 5% perf-profile.children.cycles-pp.select_task_rq_fair
0.13 ± 3% +0.0 0.18 ± 6% perf-profile.children.cycles-pp.select_task_rq
0.23 ± 3% +0.1 0.29 ± 3% perf-profile.children.cycles-pp.__smp_call_single_queue
0.20 ± 3% +0.1 0.26 ± 3% perf-profile.children.cycles-pp.update_load_avg
0.01 ±223% +0.1 0.07 ± 18% perf-profile.children.cycles-pp.shmem_alloc_and_acct_folio
0.26 ± 2% +0.1 0.34 ± 3% perf-profile.children.cycles-pp.dequeue_entity
0.29 ± 3% +0.1 0.37 ± 4% perf-profile.children.cycles-pp.dequeue_task_fair
0.17 ± 3% +0.1 0.26 ± 2% perf-profile.children.cycles-pp.sync_regs
0.34 ± 2% +0.1 0.42 ± 4% perf-profile.children.cycles-pp.ttwu_queue_wakelist
0.28 ± 3% +0.1 0.37 ± 4% perf-profile.children.cycles-pp.enqueue_entity
0.28 ± 3% +0.1 0.38 ± 6% perf-profile.children.cycles-pp.__perf_sw_event
0.32 ± 2% +0.1 0.42 ± 5% perf-profile.children.cycles-pp.___perf_sw_event
0.34 ± 3% +0.1 0.44 ± 4% perf-profile.children.cycles-pp.enqueue_task_fair
0.36 ± 2% +0.1 0.46 ± 3% perf-profile.children.cycles-pp.activate_task
0.24 ± 2% +0.1 0.35 perf-profile.children.cycles-pp.native_irq_return_iret
0.30 ± 6% +0.1 0.42 ± 10% perf-profile.children.cycles-pp.xas_load
0.31 +0.1 0.43 ± 3% perf-profile.children.cycles-pp.folio_unlock
0.44 ± 2% +0.1 0.56 ± 4% perf-profile.children.cycles-pp.ttwu_do_activate
0.40 ± 6% +0.2 0.56 ± 5% perf-profile.children.cycles-pp._compound_head
1.52 +0.2 1.68 ± 4% perf-profile.children.cycles-pp.wake_page_function
0.68 ± 3% +0.2 0.86 ± 4% perf-profile.children.cycles-pp.try_to_wake_up
0.66 ± 2% +0.2 0.84 ± 3% perf-profile.children.cycles-pp.sched_ttwu_pending
0.85 ± 2% +0.2 1.09 ± 3% perf-profile.children.cycles-pp.__flush_smp_call_function_queue
0.79 ± 2% +0.2 1.03 ± 4% perf-profile.children.cycles-pp.flush_smp_call_function_queue
1.83 +0.3 2.08 ± 4% perf-profile.children.cycles-pp.__wake_up_common
1.29 +0.3 1.60 perf-profile.children.cycles-pp.folio_add_file_rmap_range
0.89 ± 9% +0.4 1.24 ± 8% perf-profile.children.cycles-pp.finish_fault
1.24 +0.4 1.60 perf-profile.children.cycles-pp.irqentry_exit_to_user_mode
1.68 ± 3% +0.4 2.06 ± 2% perf-profile.children.cycles-pp.set_pte_range
1.50 +0.6 2.06 perf-profile.children.cycles-pp.filemap_get_entry
3.42 ± 3% +0.8 4.24 perf-profile.children.cycles-pp._raw_spin_lock_irq
7.48 +0.9 8.41 perf-profile.children.cycles-pp.folio_wait_bit_common
9.67 ± 4% +1.4 11.07 ± 2% perf-profile.children.cycles-pp._raw_spin_lock_irqsave
12.08 ± 3% +1.8 13.84 perf-profile.children.cycles-pp.folio_wake_bit
10.15 +1.9 12.07 perf-profile.children.cycles-pp.shmem_get_folio_gfp
11.80 ± 4% +1.9 13.74 ± 2% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
10.26 +2.0 12.25 perf-profile.children.cycles-pp.shmem_fault
10.29 +2.0 12.29 perf-profile.children.cycles-pp.__do_fault
8.59 +5.3 13.94 ± 2% perf-profile.children.cycles-pp.do_rw_once
35.10 -6.1 28.98 ± 2% perf-profile.self.cycles-pp.next_uptodate_folio
2.06 ± 7% -1.9 0.11 ± 4% perf-profile.self.cycles-pp.down_read_trylock
1.28 ± 4% -1.1 0.16 ± 3% perf-profile.self.cycles-pp.up_read
1.66 ± 6% -1.0 0.68 ± 3% perf-profile.self.cycles-pp.__handle_mm_fault
7.20 -0.7 6.55 perf-profile.self.cycles-pp.filemap_map_pages
0.64 ± 12% -0.4 0.28 ± 15% perf-profile.self.cycles-pp.intel_idle_irq
0.36 ± 7% -0.2 0.15 perf-profile.self.cycles-pp.mtree_range_walk
0.30 ± 8% -0.2 0.13 ± 14% perf-profile.self.cycles-pp.mm_cid_get
0.71 ± 8% -0.1 0.59 ± 7% perf-profile.self.cycles-pp.__schedule
0.05 ± 8% +0.0 0.06 ± 7% perf-profile.self.cycles-pp.ttwu_do_activate
0.08 ± 5% +0.0 0.10 ± 4% perf-profile.self.cycles-pp.do_idle
0.06 ± 6% +0.0 0.08 ± 6% perf-profile.self.cycles-pp.enqueue_task_fair
0.05 ± 8% +0.0 0.07 ± 8% perf-profile.self.cycles-pp.__update_load_avg_se
0.09 ± 5% +0.0 0.10 ± 4% perf-profile.self.cycles-pp.xas_descend
0.04 ± 44% +0.0 0.06 perf-profile.self.cycles-pp.reweight_entity
0.05 ± 7% +0.0 0.07 ± 9% perf-profile.self.cycles-pp.set_pte_range
0.08 ± 6% +0.0 0.10 ± 5% perf-profile.self.cycles-pp.update_load_avg
0.10 ± 4% +0.0 0.12 ± 3% perf-profile.self.cycles-pp.call_function_single_prep_ipi
0.07 ± 5% +0.0 0.09 ± 5% perf-profile.self.cycles-pp.__update_load_avg_cfs_rq
0.08 ± 6% +0.0 0.10 ± 6% perf-profile.self.cycles-pp.flush_smp_call_function_queue
0.10 ± 4% +0.0 0.13 ± 2% perf-profile.self.cycles-pp.__flush_smp_call_function_queue
0.16 ± 4% +0.0 0.19 ± 3% perf-profile.self.cycles-pp.__switch_to
0.14 ± 3% +0.0 0.16 ± 3% perf-profile.self.cycles-pp.llist_add_batch
0.09 ± 5% +0.0 0.12 ± 6% perf-profile.self.cycles-pp.available_idle_cpu
0.08 ± 5% +0.0 0.12 ± 6% perf-profile.self.cycles-pp.enqueue_entity
0.08 ± 5% +0.0 0.12 ± 4% perf-profile.self.cycles-pp.llist_reverse_order
0.10 ± 4% +0.0 0.13 ± 3% perf-profile.self.cycles-pp.update_rq_clock_task
0.08 +0.0 0.11 perf-profile.self.cycles-pp.__list_del_entry_valid_or_report
0.11 ± 4% +0.0 0.14 ± 4% perf-profile.self.cycles-pp.__switch_to_asm
0.09 ± 5% +0.0 0.12 ± 8% perf-profile.self.cycles-pp.ttwu_queue_wakelist
0.12 ± 4% +0.0 0.16 ± 6% perf-profile.self.cycles-pp.xas_load
0.00 +0.1 0.05 perf-profile.self.cycles-pp.sched_ttwu_pending
0.00 +0.1 0.06 perf-profile.self.cycles-pp.asm_exc_page_fault
0.11 ± 4% +0.1 0.18 ± 4% perf-profile.self.cycles-pp.shmem_fault
0.17 ± 3% +0.1 0.26 ± 2% perf-profile.self.cycles-pp.sync_regs
0.31 ± 2% +0.1 0.40 ± 5% perf-profile.self.cycles-pp.___perf_sw_event
0.31 ± 2% +0.1 0.40 ± 3% perf-profile.self.cycles-pp.__wake_up_common
0.24 ± 2% +0.1 0.35 perf-profile.self.cycles-pp.native_irq_return_iret
0.31 +0.1 0.43 ± 3% perf-profile.self.cycles-pp.folio_unlock
0.44 ± 3% +0.1 0.57 ± 2% perf-profile.self.cycles-pp._raw_spin_lock_irq
0.68 ± 3% +0.1 0.83 ± 2% perf-profile.self.cycles-pp.folio_wake_bit
0.85 +0.2 1.00 ± 3% perf-profile.self.cycles-pp._raw_spin_lock_irqsave
0.40 ± 5% +0.2 0.56 ± 5% perf-profile.self.cycles-pp._compound_head
1.29 +0.3 1.59 perf-profile.self.cycles-pp.folio_add_file_rmap_range
0.99 +0.3 1.30 ± 2% perf-profile.self.cycles-pp.shmem_get_folio_gfp
2.08 +0.3 2.39 ± 2% perf-profile.self.cycles-pp.folio_wait_bit_common
1.18 +0.4 1.55 perf-profile.self.cycles-pp.irqentry_exit_to_user_mode
1.43 +0.5 1.90 perf-profile.self.cycles-pp.filemap_get_entry
3.93 +1.9 5.85 perf-profile.self.cycles-pp.do_access
11.80 ± 4% +1.9 13.74 ± 2% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
6.55 +4.5 11.08 ± 2% perf-profile.self.cycles-pp.do_rw_once
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 4/6] mm: Handle COW faults under the VMA lock
2023-10-06 19:53 ` [PATCH v2 4/6] mm: Handle COW " Matthew Wilcox (Oracle)
2023-10-08 22:05 ` Suren Baghdasaryan
@ 2023-10-20 13:18 ` kernel test robot
1 sibling, 0 replies; 16+ messages in thread
From: kernel test robot @ 2023-10-20 13:18 UTC (permalink / raw)
To: Matthew Wilcox (Oracle)
Cc: oe-lkp, lkp, linux-mm, ying.huang, feng.tang, fengwei.yin,
Andrew Morton, Matthew Wilcox (Oracle),
Suren Baghdasaryan, oliver.sang
Hello,
kernel test robot noticed a 38.7% improvement of will-it-scale.per_thread_ops on:
commit: 90e99527c746cd9ef7ebf0333c9611e45c6e5e1d ("[PATCH v2 4/6] mm: Handle COW faults under the VMA lock")
url: https://github.com/intel-lab-lkp/linux/commits/Matthew-Wilcox-Oracle/mm-Make-lock_folio_maybe_drop_mmap-VMA-lock-aware/20231007-035513
base: v6.6-rc4
patch link: https://lore.kernel.org/all/20231006195318.4087158-5-willy@infradead.org/
patch subject: [PATCH v2 4/6] mm: Handle COW faults under the VMA lock
testcase: will-it-scale
test machine: 104 threads 2 sockets (Skylake) with 192G memory
parameters:
nr_task: 16
mode: thread
test: page_fault2
cpufreq_governor: performance
Details are as below:
-------------------------------------------------------------------------------------------------->
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20231020/202310201702.62f04f91-oliver.sang@intel.com
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
gcc-12/performance/x86_64-rhel-8.3/thread/16/debian-11.1-x86_64-20220510.cgz/lkp-skl-fpga01/page_fault2/will-it-scale
commit:
c8b329d48e ("mm: Handle shared faults under the VMA lock")
90e99527c7 ("mm: Handle COW faults under the VMA lock")
c8b329d48e0dac74 90e99527c746cd9ef7ebf0333c9
---------------- ---------------------------
%stddev %change %stddev
\ | \
1.11 ± 2% +0.4 1.50 mpstat.cpu.all.usr%
690.67 ± 20% -35.3% 447.00 ± 6% perf-c2c.HITM.local
71432 ± 3% -10.5% 63958 meminfo.Active
70468 ± 3% -10.4% 63142 meminfo.Active(anon)
5.722e+08 ± 2% +38.8% 7.942e+08 numa-numastat.node0.local_node
5.723e+08 ± 2% +38.8% 7.944e+08 numa-numastat.node0.numa_hit
4746 -54.0% 2183 vmstat.system.cs
106237 +1.7% 108086 vmstat.system.in
69143 ± 4% -10.2% 62107 ± 2% numa-meminfo.node1.Active
68750 ± 3% -10.1% 61835 numa-meminfo.node1.Active(anon)
70251 ± 4% -9.8% 63348 numa-meminfo.node1.Shmem
1889742 ± 2% +38.7% 2621754 will-it-scale.16.threads
118108 ± 2% +38.7% 163859 will-it-scale.per_thread_ops
1889742 ± 2% +38.7% 2621754 will-it-scale.workload
5.723e+08 ± 2% +38.8% 7.944e+08 numa-vmstat.node0.numa_hit
5.722e+08 ± 2% +38.8% 7.942e+08 numa-vmstat.node0.numa_local
17189 ± 3% -10.1% 15458 numa-vmstat.node1.nr_active_anon
17563 ± 4% -9.8% 15837 numa-vmstat.node1.nr_shmem
17189 ± 3% -10.1% 15458 numa-vmstat.node1.nr_zone_active_anon
66914 ± 10% -54.3% 30547 ± 4% turbostat.C1
0.07 ± 18% -0.1 0.02 ± 33% turbostat.C1%
513918 ± 3% -74.2% 132621 ± 2% turbostat.C1E
0.54 ± 4% -0.4 0.16 ± 4% turbostat.C1E%
0.11 +18.2% 0.13 turbostat.IPC
218.42 +2.0% 222.83 turbostat.PkgWatt
30.47 +13.3% 34.53 turbostat.RAMWatt
720.36 +24.0% 893.56 ± 4% sched_debug.cfs_rq:/.runnable_avg.max
225.47 ± 7% +16.4% 262.37 sched_debug.cfs_rq:/.runnable_avg.stddev
713.28 +25.3% 893.53 ± 4% sched_debug.cfs_rq:/.util_avg.max
224.87 ± 7% +16.6% 262.19 sched_debug.cfs_rq:/.util_avg.stddev
72.59 ± 49% +63.1% 118.38 ± 11% sched_debug.cfs_rq:/.util_est_enqueued.avg
605.14 ± 4% +40.7% 851.22 sched_debug.cfs_rq:/.util_est_enqueued.max
151.28 ± 22% +64.0% 248.15 ± 5% sched_debug.cfs_rq:/.util_est_enqueued.stddev
8811 -42.4% 5078 sched_debug.cpu.nr_switches.avg
17617 ± 3% -10.4% 15785 proc-vmstat.nr_active_anon
332941 +4.6% 348206 proc-vmstat.nr_anon_pages
855626 +1.7% 870502 proc-vmstat.nr_inactive_anon
17617 ± 3% -10.4% 15785 proc-vmstat.nr_zone_active_anon
855626 +1.7% 870502 proc-vmstat.nr_zone_inactive_anon
5.729e+08 ± 2% +38.8% 7.95e+08 proc-vmstat.numa_hit
5.727e+08 ± 2% +38.8% 7.948e+08 proc-vmstat.numa_local
16509 ± 4% -13.0% 14365 proc-vmstat.pgactivate
5.724e+08 ± 2% +38.7% 7.94e+08 proc-vmstat.pgalloc_normal
5.704e+08 ± 2% +38.8% 7.914e+08 proc-vmstat.pgfault
5.723e+08 ± 2% +38.7% 7.94e+08 proc-vmstat.pgfree
0.00 ± 37% +164.7% 0.01 ± 6% perf-sched.sch_delay.avg.ms.__cond_resched.tlb_batch_pages_flush.zap_pte_range.zap_pmd_range.isra
0.02 ± 12% +26.4% 0.02 ± 10% perf-sched.sch_delay.avg.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.do_syscall_64
0.00 ±223% +9466.7% 0.05 ±181% perf-sched.sch_delay.avg.ms.io_schedule.folio_wait_bit_common.filemap_fault.__do_fault
0.02 ± 94% -61.2% 0.01 ± 11% perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.do_select.core_sys_select.kern_select
0.00 ± 8% +1068.0% 0.05 ±189% perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.vm_mmap_pgoff
0.01 ± 14% +52.6% 0.01 ± 34% perf-sched.sch_delay.max.ms.__cond_resched.tlb_batch_pages_flush.zap_pte_range.zap_pmd_range.isra
0.01 ± 9% +10802.8% 0.65 ±212% perf-sched.sch_delay.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt
0.00 ±223% +10533.3% 0.05 ±162% perf-sched.sch_delay.max.ms.io_schedule.folio_wait_bit_common.filemap_fault.__do_fault
62.95 ± 2% +113.8% 134.58 ± 2% perf-sched.total_wait_and_delay.average.ms
13913 -52.2% 6654 perf-sched.total_wait_and_delay.count.ms
62.87 ± 2% +113.8% 134.44 ± 2% perf-sched.total_wait_time.average.ms
2.95 ± 3% +1477.8% 46.48 ± 2% perf-sched.wait_and_delay.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read_killable.lock_mm_and_find_vma
1.18 ± 7% +2017.8% 24.99 ± 2% perf-sched.wait_and_delay.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.__vm_munmap
2.76 ± 3% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.vm_mmap_pgoff
6894 ± 2% -94.4% 384.67 perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read_killable.lock_mm_and_find_vma
1070 ± 11% -60.9% 418.33 perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.__vm_munmap
112.33 ± 13% -100.0% 0.00 perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.vm_mmap_pgoff
15.07 ± 30% +469.9% 85.90 ± 4% perf-sched.wait_and_delay.max.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read_killable.lock_mm_and_find_vma
11.68 ± 17% +558.0% 76.85 ± 11% perf-sched.wait_and_delay.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.__vm_munmap
14.21 ± 27% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.vm_mmap_pgoff
17.20 ± 29% -69.9% 5.17 ± 7% perf-sched.wait_and_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
3893 ± 8% -19.2% 3144 ± 19% perf-sched.wait_and_delay.max.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
2.99 ± 28% +906.8% 30.07 ± 12% perf-sched.wait_time.avg.ms.__cond_resched.__alloc_pages.__folio_alloc.vma_alloc_folio.do_cow_fault
3.59 ± 49% +796.7% 32.22 ± 11% perf-sched.wait_time.avg.ms.__cond_resched.shmem_get_folio_gfp.shmem_fault.__do_fault.do_cow_fault
1.81 ± 75% +2169.9% 41.07 ± 29% perf-sched.wait_time.avg.ms.__cond_resched.tlb_batch_pages_flush.zap_pte_range.zap_pmd_range.isra
3.46 ±101% +1224.0% 45.81 ± 30% perf-sched.wait_time.avg.ms.__cond_resched.zap_pmd_range.isra.0.unmap_page_range
3.15 ± 29% +943.4% 32.88 ± 7% perf-sched.wait_time.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_exc_page_fault
2.88 ± 50% +922.9% 29.44 ± 11% perf-sched.wait_time.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt
2.94 ± 3% +1481.0% 46.47 ± 2% perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read_killable.lock_mm_and_find_vma
1.18 ± 7% +2023.3% 24.96 ± 3% perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.__vm_munmap
2.76 ± 3% +1449.8% 42.73 ± 9% perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.vm_mmap_pgoff
10.38 ± 3% +533.8% 65.76 ± 7% perf-sched.wait_time.max.ms.__cond_resched.__alloc_pages.__folio_alloc.vma_alloc_folio.do_cow_fault
9.13 ± 26% +596.6% 63.59 ± 11% perf-sched.wait_time.max.ms.__cond_resched.shmem_get_folio_gfp.shmem_fault.__do_fault.do_cow_fault
6.77 ± 70% +843.3% 63.87 ± 30% perf-sched.wait_time.max.ms.__cond_resched.tlb_batch_pages_flush.zap_pte_range.zap_pmd_range.isra
5.71 ± 64% +1111.5% 69.19 ± 15% perf-sched.wait_time.max.ms.__cond_resched.zap_pmd_range.isra.0.unmap_page_range
10.23 ± 4% +560.7% 67.56 ± 6% perf-sched.wait_time.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_exc_page_fault
8.83 ± 30% +582.4% 60.23 ± 7% perf-sched.wait_time.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt
15.06 ± 30% +470.1% 85.89 ± 4% perf-sched.wait_time.max.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read_killable.lock_mm_and_find_vma
11.67 ± 17% +558.2% 76.84 ± 11% perf-sched.wait_time.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.__vm_munmap
14.21 ± 27% +429.5% 75.22 ± 9% perf-sched.wait_time.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.vm_mmap_pgoff
17.16 ± 28% -69.9% 5.16 ± 7% perf-sched.wait_time.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
3893 ± 8% -19.2% 3144 ± 19% perf-sched.wait_time.max.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
14.12 +16.6% 16.46 perf-stat.i.MPKI
2.231e+09 ± 2% +16.6% 2.601e+09 perf-stat.i.branch-instructions
19953628 +8.8% 21705347 perf-stat.i.branch-misses
51.96 ± 2% +13.0 65.01 perf-stat.i.cache-miss-rate%
1.566e+08 ± 2% +36.8% 2.142e+08 perf-stat.i.cache-misses
3.015e+08 ± 3% +9.2% 3.294e+08 perf-stat.i.cache-references
4702 -55.0% 2116 perf-stat.i.context-switches
2.58 -14.2% 2.22 perf-stat.i.cpi
114.64 -2.2% 112.13 perf-stat.i.cpu-migrations
183.46 -26.2% 135.46 perf-stat.i.cycles-between-cache-misses
4280505 ± 3% +22.7% 5251081 ± 6% perf-stat.i.dTLB-load-misses
2.774e+09 ± 2% +19.1% 3.303e+09 perf-stat.i.dTLB-loads
0.98 ± 2% +0.2 1.14 perf-stat.i.dTLB-store-miss-rate%
15927669 ± 4% +38.8% 22110291 perf-stat.i.dTLB-store-misses
1.604e+09 ± 2% +19.9% 1.923e+09 perf-stat.i.dTLB-stores
79.86 +3.1 82.95 perf-stat.i.iTLB-load-miss-rate%
2701759 ± 2% +19.0% 3214102 perf-stat.i.iTLB-load-misses
679352 -2.8% 660048 perf-stat.i.iTLB-loads
1.115e+10 ± 2% +17.1% 1.305e+10 perf-stat.i.instructions
0.39 +16.8% 0.45 perf-stat.i.ipc
0.29 ± 26% -31.6% 0.20 ± 17% perf-stat.i.major-faults
762.98 ± 2% +39.2% 1062 perf-stat.i.metric.K/sec
66.44 ± 2% +18.0% 78.42 perf-stat.i.metric.M/sec
1890049 ± 2% +38.5% 2616916 perf-stat.i.minor-faults
47044113 ± 2% +41.1% 66393293 perf-stat.i.node-loads
11825548 ± 2% +34.0% 15841684 perf-stat.i.node-stores
1890049 ± 2% +38.5% 2616917 perf-stat.i.page-faults
14.05 +16.9% 16.42 perf-stat.overall.MPKI
0.89 -0.1 0.83 perf-stat.overall.branch-miss-rate%
51.96 ± 2% +13.1 65.04 perf-stat.overall.cache-miss-rate%
2.57 -14.4% 2.20 perf-stat.overall.cpi
183.08 -26.7% 134.14 perf-stat.overall.cycles-between-cache-misses
0.98 ± 2% +0.2 1.14 perf-stat.overall.dTLB-store-miss-rate%
79.90 +3.1 82.97 perf-stat.overall.iTLB-load-miss-rate%
0.39 +16.7% 0.45 perf-stat.overall.ipc
0.22 ± 2% -0.1 0.15 ± 3% perf-stat.overall.node-load-miss-rate%
0.19 ± 8% -0.1 0.13 ± 16% perf-stat.overall.node-store-miss-rate%
1779185 -15.5% 1503815 perf-stat.overall.path-length
2.224e+09 ± 2% +16.6% 2.593e+09 perf-stat.ps.branch-instructions
19885795 +8.8% 21625880 perf-stat.ps.branch-misses
1.56e+08 ± 2% +36.8% 2.135e+08 perf-stat.ps.cache-misses
3.005e+08 ± 3% +9.2% 3.283e+08 perf-stat.ps.cache-references
4686 -55.0% 2109 perf-stat.ps.context-switches
114.35 -2.3% 111.73 perf-stat.ps.cpu-migrations
4265367 ± 3% +22.7% 5233761 ± 6% perf-stat.ps.dTLB-load-misses
2.765e+09 ± 2% +19.1% 3.292e+09 perf-stat.ps.dTLB-loads
15874379 ± 4% +38.8% 22037238 perf-stat.ps.dTLB-store-misses
1.598e+09 ± 2% +19.9% 1.917e+09 perf-stat.ps.dTLB-stores
2692499 ± 2% +19.0% 3203465 perf-stat.ps.iTLB-load-misses
677243 -2.9% 657791 perf-stat.ps.iTLB-loads
1.111e+10 ± 2% +17.1% 1.3e+10 perf-stat.ps.instructions
0.29 ± 26% -31.6% 0.20 ± 17% perf-stat.ps.major-faults
1883712 ± 2% +38.5% 2608263 perf-stat.ps.minor-faults
46887454 ± 2% +41.1% 66175688 perf-stat.ps.node-loads
11785781 ± 2% +34.0% 15789100 perf-stat.ps.node-stores
1883712 ± 2% +38.5% 2608264 perf-stat.ps.page-faults
3.362e+12 ± 2% +17.3% 3.943e+12 perf-stat.total.instructions
47.03 ± 2% -8.6 38.45 perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
47.22 ± 2% -8.6 38.67 perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.testcase
8.30 ± 6% -8.3 0.00 perf-profile.calltrace.cycles-pp.lock_mm_and_find_vma.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
7.19 ± 4% -7.2 0.00 perf-profile.calltrace.cycles-pp.down_read_trylock.lock_mm_and_find_vma.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
57.96 ± 3% -4.7 53.23 perf-profile.calltrace.cycles-pp.asm_exc_page_fault.testcase
61.72 ± 3% -3.3 58.42 perf-profile.calltrace.cycles-pp.testcase
2.19 ± 13% -0.6 1.59 ± 6% perf-profile.calltrace.cycles-pp.lock_vma_under_rcu.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
0.91 ± 8% +0.2 1.09 ± 7% perf-profile.calltrace.cycles-pp.__mem_cgroup_charge.do_cow_fault.do_fault.__handle_mm_fault.handle_mm_fault
0.56 ± 2% +0.2 0.78 ± 5% perf-profile.calltrace.cycles-pp.__perf_sw_event.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
1.11 ± 4% +0.2 1.34 ± 4% perf-profile.calltrace.cycles-pp.filemap_get_entry.shmem_get_folio_gfp.shmem_fault.__do_fault.do_cow_fault
0.86 ± 6% +0.3 1.13 ± 4% perf-profile.calltrace.cycles-pp.lru_add_fn.folio_batch_move_lru.folio_add_lru_vma.set_pte_range.finish_fault
1.42 ± 3% +0.3 1.77 ± 2% perf-profile.calltrace.cycles-pp.shmem_get_folio_gfp.shmem_fault.__do_fault.do_cow_fault.do_fault
0.87 ± 6% +0.4 1.27 ± 3% perf-profile.calltrace.cycles-pp.__free_one_page.free_pcppages_bulk.free_unref_page_list.release_pages.tlb_batch_pages_flush
1.66 ± 3% +0.4 2.10 perf-profile.calltrace.cycles-pp.shmem_fault.__do_fault.do_cow_fault.do_fault.__handle_mm_fault
0.54 ± 45% +0.4 0.98 ± 4% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_add_lru_vma
0.96 ± 6% +0.4 1.40 ± 2% perf-profile.calltrace.cycles-pp.free_pcppages_bulk.free_unref_page_list.release_pages.tlb_batch_pages_flush.zap_pte_range
1.23 ± 4% +0.4 1.68 perf-profile.calltrace.cycles-pp.sync_regs.asm_exc_page_fault.testcase
0.26 ±100% +0.5 0.72 ± 8% perf-profile.calltrace.cycles-pp.folio_add_new_anon_rmap.set_pte_range.finish_fault.do_cow_fault.do_fault
1.74 ± 3% +0.5 2.22 perf-profile.calltrace.cycles-pp.__do_fault.do_cow_fault.do_fault.__handle_mm_fault.handle_mm_fault
0.59 ± 45% +0.5 1.06 ± 3% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_add_lru_vma.set_pte_range
0.89 ± 5% +0.5 1.36 perf-profile.calltrace.cycles-pp._compound_head.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas
0.60 ± 45% +0.5 1.08 ± 3% perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_add_lru_vma.set_pte_range.finish_fault
0.00 +0.5 0.52 ± 2% perf-profile.calltrace.cycles-pp.tlb_batch_pages_flush.tlb_finish_mmu.unmap_region.do_vmi_align_munmap.do_vmi_munmap
0.00 +0.5 0.52 ± 2% perf-profile.calltrace.cycles-pp.tlb_finish_mmu.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
0.08 ±223% +0.6 0.67 perf-profile.calltrace.cycles-pp.page_remove_rmap.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas
1.56 ± 4% +0.6 2.18 ± 2% perf-profile.calltrace.cycles-pp.free_unref_page_list.release_pages.tlb_batch_pages_flush.zap_pte_range.zap_pmd_range
0.00 +0.6 0.63 ± 5% perf-profile.calltrace.cycles-pp.___perf_sw_event.__perf_sw_event.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
0.00 +0.7 0.66 ± 2% perf-profile.calltrace.cycles-pp.__perf_sw_event.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
2.04 ± 8% +0.9 2.91 perf-profile.calltrace.cycles-pp.folio_batch_move_lru.folio_add_lru_vma.set_pte_range.finish_fault.do_cow_fault
2.16 ± 7% +0.9 3.10 ± 2% perf-profile.calltrace.cycles-pp.folio_add_lru_vma.set_pte_range.finish_fault.do_cow_fault.do_fault
2.80 ± 4% +1.0 3.76 perf-profile.calltrace.cycles-pp.irqentry_exit_to_user_mode.asm_exc_page_fault.testcase
2.93 ± 5% +1.1 4.06 perf-profile.calltrace.cycles-pp._raw_spin_lock.__pte_offset_map_lock.finish_fault.do_cow_fault.do_fault
3.11 ± 7% +1.1 4.24 ± 2% perf-profile.calltrace.cycles-pp.set_pte_range.finish_fault.do_cow_fault.do_fault.__handle_mm_fault
3.15 ± 4% +1.2 4.31 perf-profile.calltrace.cycles-pp.error_entry.testcase
3.05 ± 5% +1.2 4.23 perf-profile.calltrace.cycles-pp.__pte_offset_map_lock.finish_fault.do_cow_fault.do_fault.__handle_mm_fault
3.21 ± 3% +1.2 4.41 perf-profile.calltrace.cycles-pp.__irqentry_text_end.testcase
2.62 ± 6% +1.4 3.98 perf-profile.calltrace.cycles-pp.release_pages.tlb_batch_pages_flush.zap_pte_range.zap_pmd_range.unmap_page_range
2.78 ± 6% +1.4 4.20 perf-profile.calltrace.cycles-pp.tlb_batch_pages_flush.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas
0.70 ± 48% +1.7 2.38 ± 3% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.rmqueue_bulk.rmqueue.get_page_from_freelist
0.71 ± 48% +1.7 2.39 ± 2% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.rmqueue_bulk.rmqueue.get_page_from_freelist.__alloc_pages
1.98 ± 10% +1.7 3.66 perf-profile.calltrace.cycles-pp.rmqueue_bulk.rmqueue.get_page_from_freelist.__alloc_pages.__folio_alloc
2.43 ± 9% +1.8 4.25 perf-profile.calltrace.cycles-pp.rmqueue.get_page_from_freelist.__alloc_pages.__folio_alloc.vma_alloc_folio
2.64 ± 8% +1.9 4.55 perf-profile.calltrace.cycles-pp.get_page_from_freelist.__alloc_pages.__folio_alloc.vma_alloc_folio.do_cow_fault
3.07 ± 8% +2.1 5.13 perf-profile.calltrace.cycles-pp.__alloc_pages.__folio_alloc.vma_alloc_folio.do_cow_fault.do_fault
3.15 ± 8% +2.1 5.25 perf-profile.calltrace.cycles-pp.__folio_alloc.vma_alloc_folio.do_cow_fault.do_fault.__handle_mm_fault
4.46 ± 5% +2.3 6.72 perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas.unmap_region
4.47 ± 5% +2.3 6.74 perf-profile.calltrace.cycles-pp.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
4.47 ± 5% +2.3 6.74 perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap
4.47 ± 5% +2.3 6.74 perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.unmap_vmas.unmap_region.do_vmi_align_munmap
6.38 ± 6% +2.3 8.65 perf-profile.calltrace.cycles-pp.finish_fault.do_cow_fault.do_fault.__handle_mm_fault.handle_mm_fault
3.64 ± 7% +2.3 5.97 perf-profile.calltrace.cycles-pp.vma_alloc_folio.do_cow_fault.do_fault.__handle_mm_fault.handle_mm_fault
4.81 ± 6% +2.5 7.28 perf-profile.calltrace.cycles-pp.__munmap
4.81 ± 6% +2.5 7.28 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap
4.81 ± 6% +2.5 7.28 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
4.81 ± 6% +2.5 7.28 perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
4.81 ± 6% +2.5 7.28 perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
4.79 ± 6% +2.5 7.27 perf-profile.calltrace.cycles-pp.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap
4.80 ± 6% +2.5 7.28 perf-profile.calltrace.cycles-pp.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
4.80 ± 6% +2.5 7.28 perf-profile.calltrace.cycles-pp.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64
31.04 ± 3% +3.1 34.10 perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
32.18 ± 3% +3.2 35.42 perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
10.32 ± 4% +3.6 13.90 perf-profile.calltrace.cycles-pp.copy_page.do_cow_fault.do_fault.__handle_mm_fault.handle_mm_fault
23.83 ± 5% +9.0 32.85 perf-profile.calltrace.cycles-pp.do_cow_fault.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
23.96 ± 5% +9.0 33.00 perf-profile.calltrace.cycles-pp.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
47.11 ± 2% -8.6 38.50 perf-profile.children.cycles-pp.do_user_addr_fault
47.25 ± 2% -8.5 38.70 perf-profile.children.cycles-pp.exc_page_fault
8.31 ± 6% -8.3 0.00 perf-profile.children.cycles-pp.lock_mm_and_find_vma
7.32 ± 4% -7.1 0.18 ± 9% perf-profile.children.cycles-pp.down_read_trylock
54.76 ± 3% -5.9 48.89 perf-profile.children.cycles-pp.asm_exc_page_fault
3.55 ± 3% -3.4 0.18 ± 8% perf-profile.children.cycles-pp.up_read
63.31 ± 3% -2.7 60.56 perf-profile.children.cycles-pp.testcase
2.19 ± 13% -0.6 1.59 ± 6% perf-profile.children.cycles-pp.lock_vma_under_rcu
0.55 ± 10% -0.2 0.37 ± 6% perf-profile.children.cycles-pp.mtree_range_walk
0.30 ± 11% -0.1 0.18 ± 10% perf-profile.children.cycles-pp.handle_pte_fault
0.20 ± 13% -0.1 0.12 ± 9% perf-profile.children.cycles-pp.pte_offset_map_nolock
0.14 ± 10% -0.1 0.07 ± 10% perf-profile.children.cycles-pp.access_error
0.08 ± 14% -0.0 0.04 ± 45% perf-profile.children.cycles-pp.intel_idle
0.07 ± 11% +0.0 0.10 ± 12% perf-profile.children.cycles-pp.xas_start
0.05 ± 46% +0.0 0.08 ± 7% perf-profile.children.cycles-pp.policy_node
0.11 ± 7% +0.0 0.14 ± 12% perf-profile.children.cycles-pp.folio_unlock
0.15 ± 6% +0.0 0.20 ± 10% perf-profile.children.cycles-pp._raw_spin_trylock
0.11 ± 10% +0.0 0.15 ± 7% perf-profile.children.cycles-pp.get_pfnblock_flags_mask
0.12 ± 12% +0.0 0.17 ± 6% perf-profile.children.cycles-pp.mem_cgroup_update_lru_size
0.13 ± 10% +0.0 0.18 ± 5% perf-profile.children.cycles-pp.uncharge_folio
0.15 ± 8% +0.0 0.20 ± 7% perf-profile.children.cycles-pp.__mem_cgroup_uncharge_list
0.11 ± 10% +0.0 0.16 ± 6% perf-profile.children.cycles-pp.shmem_get_policy
0.15 ± 7% +0.0 0.20 ± 4% perf-profile.children.cycles-pp.try_charge_memcg
0.13 ± 9% +0.0 0.18 ± 9% perf-profile.children.cycles-pp.cgroup_rstat_updated
0.01 ±223% +0.1 0.06 ± 23% perf-profile.children.cycles-pp.perf_swevent_event
0.20 ± 10% +0.1 0.26 ± 4% perf-profile.children.cycles-pp.__mod_zone_page_state
0.17 ± 9% +0.1 0.23 ± 8% perf-profile.children.cycles-pp.__count_memcg_events
0.14 ± 11% +0.1 0.20 ± 2% perf-profile.children.cycles-pp.free_swap_cache
0.20 ± 6% +0.1 0.25 ± 3% perf-profile.children.cycles-pp.free_unref_page_prepare
0.04 ± 45% +0.1 0.10 ± 19% perf-profile.children.cycles-pp.kthread_blkcg
0.14 ± 8% +0.1 0.20 ± 3% perf-profile.children.cycles-pp.free_pages_and_swap_cache
0.24 ± 8% +0.1 0.30 ± 6% perf-profile.children.cycles-pp.__list_add_valid_or_report
0.23 ± 9% +0.1 0.30 ± 4% perf-profile.children.cycles-pp.free_unref_page_commit
0.46 ± 4% +0.1 0.55 ± 2% perf-profile.children.cycles-pp.xas_load
0.00 +0.1 0.11 ± 9% perf-profile.children.cycles-pp.exit_to_user_mode_prepare
0.34 ± 3% +0.1 0.47 ± 6% perf-profile.children.cycles-pp.charge_memcg
0.32 ± 8% +0.1 0.47 ± 6% perf-profile.children.cycles-pp.__mod_memcg_lruvec_state
0.00 +0.2 0.15 ± 16% perf-profile.children.cycles-pp.put_page
0.25 ± 7% +0.2 0.41 ± 5% perf-profile.children.cycles-pp.__mod_node_page_state
0.20 ± 15% +0.2 0.36 ± 12% perf-profile.children.cycles-pp.blk_cgroup_congested
1.42 ± 4% +0.2 1.58 perf-profile.children.cycles-pp.__list_del_entry_valid_or_report
0.23 ± 16% +0.2 0.42 ± 10% perf-profile.children.cycles-pp.__folio_throttle_swaprate
0.36 ± 4% +0.2 0.56 ± 5% perf-profile.children.cycles-pp.__mod_lruvec_state
0.91 ± 8% +0.2 1.11 ± 7% perf-profile.children.cycles-pp.__mem_cgroup_charge
0.32 ± 9% +0.2 0.53 ± 2% perf-profile.children.cycles-pp.tlb_finish_mmu
0.45 ± 6% +0.2 0.68 perf-profile.children.cycles-pp.page_remove_rmap
1.11 ± 4% +0.2 1.34 ± 4% perf-profile.children.cycles-pp.filemap_get_entry
0.47 ± 12% +0.2 0.72 ± 8% perf-profile.children.cycles-pp.folio_add_new_anon_rmap
0.47 ± 11% +0.3 0.74 ± 6% perf-profile.children.cycles-pp.__mod_lruvec_page_state
0.88 ± 6% +0.3 1.17 ± 4% perf-profile.children.cycles-pp.lru_add_fn
0.85 ± 2% +0.3 1.16 ± 3% perf-profile.children.cycles-pp.___perf_sw_event
1.43 ± 4% +0.3 1.78 ± 2% perf-profile.children.cycles-pp.shmem_get_folio_gfp
1.06 ± 2% +0.4 1.47 ± 2% perf-profile.children.cycles-pp.__perf_sw_event
1.66 ± 3% +0.4 2.10 perf-profile.children.cycles-pp.shmem_fault
0.97 ± 6% +0.5 1.44 ± 3% perf-profile.children.cycles-pp.__free_one_page
1.27 ± 4% +0.5 1.74 perf-profile.children.cycles-pp.sync_regs
1.75 ± 4% +0.5 2.22 perf-profile.children.cycles-pp.__do_fault
1.06 ± 6% +0.5 1.58 ± 2% perf-profile.children.cycles-pp.free_pcppages_bulk
0.92 ± 5% +0.5 1.45 perf-profile.children.cycles-pp._compound_head
1.75 ± 5% +0.6 2.36 perf-profile.children.cycles-pp.native_irq_return_iret
1.74 ± 4% +0.7 2.47 ± 2% perf-profile.children.cycles-pp.free_unref_page_list
0.83 ± 18% +0.8 1.65 ± 2% perf-profile.children.cycles-pp.folio_lruvec_lock_irqsave
2.04 ± 8% +0.9 2.92 perf-profile.children.cycles-pp.folio_batch_move_lru
2.17 ± 7% +0.9 3.11 ± 2% perf-profile.children.cycles-pp.folio_add_lru_vma
2.85 ± 4% +1.0 3.82 perf-profile.children.cycles-pp.irqentry_exit_to_user_mode
3.01 ± 5% +1.1 4.14 perf-profile.children.cycles-pp._raw_spin_lock
3.12 ± 7% +1.1 4.26 ± 3% perf-profile.children.cycles-pp.set_pte_range
3.20 ± 3% +1.2 4.37 perf-profile.children.cycles-pp.error_entry
3.06 ± 5% +1.2 4.24 perf-profile.children.cycles-pp.__pte_offset_map_lock
3.22 ± 3% +1.2 4.41 perf-profile.children.cycles-pp.__irqentry_text_end
3.09 ± 6% +1.6 4.70 perf-profile.children.cycles-pp.release_pages
3.09 ± 6% +1.6 4.72 perf-profile.children.cycles-pp.tlb_batch_pages_flush
1.98 ± 10% +1.7 3.67 perf-profile.children.cycles-pp.rmqueue_bulk
2.44 ± 9% +1.8 4.27 perf-profile.children.cycles-pp.rmqueue
2.66 ± 8% +1.9 4.57 perf-profile.children.cycles-pp.get_page_from_freelist
3.14 ± 7% +2.1 5.23 perf-profile.children.cycles-pp.__alloc_pages
3.17 ± 8% +2.1 5.28 perf-profile.children.cycles-pp.__folio_alloc
4.48 ± 5% +2.3 6.75 perf-profile.children.cycles-pp.unmap_vmas
4.48 ± 5% +2.3 6.75 perf-profile.children.cycles-pp.unmap_page_range
4.48 ± 5% +2.3 6.75 perf-profile.children.cycles-pp.zap_pmd_range
4.48 ± 5% +2.3 6.75 perf-profile.children.cycles-pp.zap_pte_range
6.39 ± 6% +2.3 8.68 perf-profile.children.cycles-pp.finish_fault
3.68 ± 7% +2.4 6.03 perf-profile.children.cycles-pp.vma_alloc_folio
1.56 ± 21% +2.4 3.92 perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
1.66 ± 19% +2.4 4.08 perf-profile.children.cycles-pp._raw_spin_lock_irqsave
4.97 ± 5% +2.4 7.42 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
4.96 ± 5% +2.4 7.42 perf-profile.children.cycles-pp.do_syscall_64
4.81 ± 6% +2.5 7.28 perf-profile.children.cycles-pp.__munmap
4.81 ± 6% +2.5 7.28 perf-profile.children.cycles-pp.__x64_sys_munmap
4.81 ± 6% +2.5 7.28 perf-profile.children.cycles-pp.__vm_munmap
4.80 ± 6% +2.5 7.28 perf-profile.children.cycles-pp.do_vmi_munmap
4.80 ± 6% +2.5 7.28 perf-profile.children.cycles-pp.do_vmi_align_munmap
4.80 ± 6% +2.5 7.27 perf-profile.children.cycles-pp.unmap_region
31.08 ± 3% +3.1 34.13 perf-profile.children.cycles-pp.__handle_mm_fault
32.25 ± 3% +3.2 35.50 perf-profile.children.cycles-pp.handle_mm_fault
10.33 ± 4% +3.6 13.92 perf-profile.children.cycles-pp.copy_page
23.97 ± 5% +9.0 33.01 perf-profile.children.cycles-pp.do_fault
23.88 ± 5% +9.1 32.95 perf-profile.children.cycles-pp.do_cow_fault
7.29 ± 4% -7.1 0.18 ± 10% perf-profile.self.cycles-pp.down_read_trylock
6.77 ± 4% -5.8 0.93 ± 8% perf-profile.self.cycles-pp.__handle_mm_fault
3.51 ± 3% -3.3 0.18 ± 10% perf-profile.self.cycles-pp.up_read
0.54 ± 10% -0.2 0.36 ± 6% perf-profile.self.cycles-pp.mtree_range_walk
0.10 ± 18% -0.1 0.04 ± 72% perf-profile.self.cycles-pp.handle_pte_fault
0.12 ± 7% -0.1 0.07 ± 10% perf-profile.self.cycles-pp.access_error
0.10 ± 18% -0.1 0.05 ± 47% perf-profile.self.cycles-pp.pte_offset_map_nolock
0.08 ± 11% -0.0 0.04 ± 44% perf-profile.self.cycles-pp.do_fault
0.08 ± 14% -0.0 0.04 ± 45% perf-profile.self.cycles-pp.intel_idle
0.09 ± 6% +0.0 0.11 ± 5% perf-profile.self.cycles-pp.free_unref_page_prepare
0.06 ± 7% +0.0 0.09 ± 4% perf-profile.self.cycles-pp.free_pcppages_bulk
0.09 ± 6% +0.0 0.12 ± 6% perf-profile.self.cycles-pp.rmqueue_bulk
0.10 ± 6% +0.0 0.13 ± 10% perf-profile.self.cycles-pp.charge_memcg
0.11 ± 8% +0.0 0.14 ± 4% perf-profile.self.cycles-pp.__mod_lruvec_state
0.08 ± 12% +0.0 0.11 ± 9% perf-profile.self.cycles-pp.__pte_offset_map_lock
0.10 ± 10% +0.0 0.14 ± 11% perf-profile.self.cycles-pp.folio_unlock
0.12 ± 11% +0.0 0.16 ± 8% perf-profile.self.cycles-pp.uncharge_folio
0.12 ± 15% +0.0 0.16 ± 4% perf-profile.self.cycles-pp._raw_spin_lock_irqsave
0.10 ± 9% +0.0 0.14 ± 5% perf-profile.self.cycles-pp.get_pfnblock_flags_mask
0.10 ± 7% +0.0 0.15 ± 3% perf-profile.self.cycles-pp.try_charge_memcg
0.04 ± 71% +0.0 0.08 ± 16% perf-profile.self.cycles-pp.__do_fault
0.15 ± 6% +0.0 0.20 ± 10% perf-profile.self.cycles-pp._raw_spin_trylock
0.11 ± 9% +0.0 0.16 ± 6% perf-profile.self.cycles-pp.mem_cgroup_update_lru_size
0.13 ± 9% +0.0 0.17 ± 8% perf-profile.self.cycles-pp.set_pte_range
0.10 ± 9% +0.0 0.15 ± 5% perf-profile.self.cycles-pp.shmem_get_policy
0.18 ± 9% +0.0 0.24 ± 4% perf-profile.self.cycles-pp.__mod_zone_page_state
0.14 ± 10% +0.0 0.18 ± 2% perf-profile.self.cycles-pp.free_swap_cache
0.10 ± 11% +0.1 0.16 ± 10% perf-profile.self.cycles-pp.exc_page_fault
0.19 ± 11% +0.1 0.24 ± 5% perf-profile.self.cycles-pp.free_unref_page_commit
0.12 ± 12% +0.1 0.17 ± 10% perf-profile.self.cycles-pp.cgroup_rstat_updated
0.14 ± 8% +0.1 0.20 ± 5% perf-profile.self.cycles-pp.asm_exc_page_fault
0.01 ±223% +0.1 0.06 ± 23% perf-profile.self.cycles-pp.perf_swevent_event
0.17 ± 7% +0.1 0.22 ± 7% perf-profile.self.cycles-pp.xas_load
0.16 ± 9% +0.1 0.22 ± 5% perf-profile.self.cycles-pp.folio_add_new_anon_rmap
0.20 ± 8% +0.1 0.26 ± 4% perf-profile.self.cycles-pp.free_unref_page_list
0.06 ± 14% +0.1 0.13 ± 21% perf-profile.self.cycles-pp.__mem_cgroup_charge
0.22 ± 9% +0.1 0.28 ± 7% perf-profile.self.cycles-pp.__list_add_valid_or_report
0.13 ± 6% +0.1 0.19 ± 7% perf-profile.self.cycles-pp.folio_add_lru_vma
0.22 ± 7% +0.1 0.29 ± 5% perf-profile.self.cycles-pp.rmqueue
0.24 ± 5% +0.1 0.31 ± 6% perf-profile.self.cycles-pp.shmem_fault
0.21 ± 6% +0.1 0.29 ± 4% perf-profile.self.cycles-pp.get_page_from_freelist
0.22 ± 7% +0.1 0.30 ± 4% perf-profile.self.cycles-pp.__perf_sw_event
0.00 +0.1 0.10 ± 9% perf-profile.self.cycles-pp.exit_to_user_mode_prepare
0.29 ± 4% +0.1 0.39 ± 6% perf-profile.self.cycles-pp.shmem_get_folio_gfp
0.32 ± 7% +0.1 0.44 ± 4% perf-profile.self.cycles-pp.zap_pte_range
0.24 ± 9% +0.1 0.36 ± 6% perf-profile.self.cycles-pp.__mod_memcg_lruvec_state
0.16 ± 15% +0.1 0.29 ± 11% perf-profile.self.cycles-pp.blk_cgroup_congested
0.35 ± 7% +0.1 0.48 ± 4% perf-profile.self.cycles-pp.folio_batch_move_lru
0.39 ± 5% +0.1 0.53 ± 5% perf-profile.self.cycles-pp.__alloc_pages
0.31 ± 8% +0.1 0.45 ± 2% perf-profile.self.cycles-pp.vma_alloc_folio
0.29 ± 9% +0.1 0.44 ± 4% perf-profile.self.cycles-pp.page_remove_rmap
0.65 ± 7% +0.1 0.80 ± 7% perf-profile.self.cycles-pp.filemap_get_entry
0.44 ± 7% +0.2 0.59 ± 2% perf-profile.self.cycles-pp.lru_add_fn
0.00 +0.2 0.15 ± 16% perf-profile.self.cycles-pp.put_page
0.24 ± 7% +0.2 0.39 ± 6% perf-profile.self.cycles-pp.__mod_node_page_state
1.41 ± 4% +0.2 1.57 perf-profile.self.cycles-pp.__list_del_entry_valid_or_report
0.57 ± 8% +0.2 0.81 ± 5% perf-profile.self.cycles-pp.release_pages
0.75 ± 2% +0.3 1.03 ± 2% perf-profile.self.cycles-pp.___perf_sw_event
0.91 ± 6% +0.5 1.37 ± 3% perf-profile.self.cycles-pp.__free_one_page
1.27 ± 4% +0.5 1.74 perf-profile.self.cycles-pp.sync_regs
0.90 ± 5% +0.5 1.42 perf-profile.self.cycles-pp._compound_head
1.74 ± 5% +0.6 2.36 perf-profile.self.cycles-pp.native_irq_return_iret
2.82 ± 5% +0.9 3.72 perf-profile.self.cycles-pp.irqentry_exit_to_user_mode
2.99 ± 5% +1.1 4.11 perf-profile.self.cycles-pp._raw_spin_lock
3.18 ± 4% +1.2 4.34 perf-profile.self.cycles-pp.error_entry
3.22 ± 3% +1.2 4.41 perf-profile.self.cycles-pp.__irqentry_text_end
3.70 ± 4% +1.3 5.00 perf-profile.self.cycles-pp.testcase
1.56 ± 21% +2.4 3.92 perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
10.29 ± 4% +3.6 13.86 perf-profile.self.cycles-pp.copy_page
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 3/6] mm: Handle shared faults under the VMA lock
2023-10-06 19:53 ` [PATCH v2 3/6] mm: Handle shared faults " Matthew Wilcox (Oracle)
2023-10-08 22:01 ` Suren Baghdasaryan
@ 2023-10-20 13:23 ` kernel test robot
1 sibling, 0 replies; 16+ messages in thread
From: kernel test robot @ 2023-10-20 13:23 UTC (permalink / raw)
To: Matthew Wilcox (Oracle)
Cc: oe-lkp, lkp, linux-mm, ying.huang, feng.tang, fengwei.yin,
Andrew Morton, Matthew Wilcox (Oracle),
Suren Baghdasaryan, oliver.sang
Hello,
kernel test robot noticed a 67.5% improvement of stress-ng.fault.minor_page_faults_per_sec on:
commit: c8b329d48e0dac7438168a1857c3f67d4e23fed0 ("[PATCH v2 3/6] mm: Handle shared faults under the VMA lock")
url: https://github.com/intel-lab-lkp/linux/commits/Matthew-Wilcox-Oracle/mm-Make-lock_folio_maybe_drop_mmap-VMA-lock-aware/20231007-035513
base: v6.6-rc4
patch link: https://lore.kernel.org/all/20231006195318.4087158-4-willy@infradead.org/
patch subject: [PATCH v2 3/6] mm: Handle shared faults under the VMA lock
testcase: stress-ng
test machine: 36 threads 1 sockets Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz (Cascade Lake) with 128G memory
parameters:
nr_threads: 1
disk: 1HDD
testtime: 60s
fs: ext4
class: os
test: fault
cpufreq_governor: performance
In addition to that, the commit also has significant impact on the following tests:
+------------------+----------------------------------------------------------------------------------------------------+
| testcase: change | will-it-scale: will-it-scale.per_thread_ops 274.8% improvement |
| test machine | 224 threads 4 sockets Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with 192G memory |
| test parameters | cpufreq_governor=performance |
| | mode=thread |
| | nr_task=50% |
| | test=page_fault3 |
+------------------+----------------------------------------------------------------------------------------------------+
Details are as below:
-------------------------------------------------------------------------------------------------->
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20231020/202310201857.d7db939a-oliver.sang@intel.com
=========================================================================================
class/compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
os/gcc-12/performance/1HDD/ext4/x86_64-rhel-8.3/1/debian-11.1-x86_64-20220510.cgz/lkp-csl-d02/fault/stress-ng/60s
commit:
34611600bf ("mm: Call wp_page_copy() under the VMA lock")
c8b329d48e ("mm: Handle shared faults under the VMA lock")
34611600bfd1bf9f c8b329d48e0dac7438168a1857c
---------------- ---------------------------
%stddev %change %stddev
\ | \
157941 ± 6% +20.3% 190026 ± 11% meminfo.DirectMap4k
0.05 +0.0 0.05 perf-stat.i.dTLB-store-miss-rate%
51205 -100.0% 0.03 ± 81% perf-stat.i.major-faults
79003 +65.6% 130837 perf-stat.i.minor-faults
50394 -100.0% 0.03 ± 81% perf-stat.ps.major-faults
77754 +65.6% 128767 perf-stat.ps.minor-faults
53411 -100.0% 0.00 ±223% stress-ng.fault.major_page_faults_per_sec
80118 +67.5% 134204 stress-ng.fault.minor_page_faults_per_sec
1417 -4.7% 1350 stress-ng.fault.nanosecs_per_page_fault
3204300 -100.0% 0.33 ±141% stress-ng.time.major_page_faults
4815857 +67.3% 8059294 stress-ng.time.minor_page_faults
0.01 ± 68% +224.2% 0.03 ± 51% perf-sched.sch_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
0.55 ± 95% +368.6% 2.56 ± 35% perf-sched.sch_delay.max.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
0.05 ± 70% +168.2% 0.12 ± 32% perf-sched.wait_time.avg.ms.__cond_resched.__ext4_handle_dirty_metadata.ext4_mb_clear_bb.ext4_remove_blocks.ext4_ext_rm_leaf
0.05 ± 73% +114.3% 0.10 ± 13% perf-sched.wait_time.avg.ms.__cond_resched.zap_pmd_range.isra.0.unmap_page_range
0.09 ± 78% +79.2% 0.17 ± 8% perf-sched.wait_time.max.ms.__cond_resched.__ext4_handle_dirty_metadata.ext4_delete_entry.__ext4_unlink.ext4_unlink
0.05 ± 70% +229.6% 0.15 ± 21% perf-sched.wait_time.max.ms.__cond_resched.__ext4_handle_dirty_metadata.ext4_mb_clear_bb.ext4_remove_blocks.ext4_ext_rm_leaf
0.03 ±151% +260.5% 0.12 ± 35% perf-sched.wait_time.max.ms.__cond_resched.down_write_killable.vm_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.03 ±100% +183.8% 0.10 ± 35% perf-sched.wait_time.max.ms.__cond_resched.ext4_journal_check_start.__ext4_journal_start_sb.ext4_alloc_file_blocks.isra
0.08 ± 79% +134.1% 0.18 ± 36% perf-sched.wait_time.max.ms.__cond_resched.zap_pmd_range.isra.0.unmap_page_range
11.65 -0.8 10.82 ± 2% perf-profile.calltrace.cycles-pp.stress_fault
9.42 -0.8 8.61 ± 2% perf-profile.calltrace.cycles-pp.asm_exc_page_fault.stress_fault
8.84 -0.8 8.07 ± 3% perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.stress_fault
8.74 -0.7 8.00 ± 3% perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.stress_fault
7.56 ± 2% -0.5 7.04 ± 3% perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.stress_fault
6.99 ± 2% -0.5 6.51 ± 3% perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
11.10 -0.9 10.24 ± 2% perf-profile.children.cycles-pp.asm_exc_page_fault
12.38 -0.8 11.54 perf-profile.children.cycles-pp.stress_fault
8.92 -0.8 8.14 ± 3% perf-profile.children.cycles-pp.exc_page_fault
8.84 -0.8 8.07 ± 3% perf-profile.children.cycles-pp.do_user_addr_fault
7.63 ± 2% -0.5 7.09 ± 2% perf-profile.children.cycles-pp.handle_mm_fault
7.06 ± 2% -0.5 6.56 ± 3% perf-profile.children.cycles-pp.__handle_mm_fault
0.36 ± 8% -0.2 0.19 ± 8% perf-profile.children.cycles-pp.lock_mm_and_find_vma
1.46 ± 4% -0.1 1.33 ± 5% perf-profile.children.cycles-pp.page_cache_ra_unbounded
0.40 ± 5% -0.1 0.34 ± 8% perf-profile.children.cycles-pp.mas_next_slot
0.22 ± 13% -0.1 0.17 ± 14% perf-profile.children.cycles-pp.__mod_memcg_lruvec_state
0.03 ±100% +0.0 0.07 ± 5% perf-profile.children.cycles-pp.housekeeping_test_cpu
0.44 ± 4% -0.1 0.32 ± 13% perf-profile.self.cycles-pp.__handle_mm_fault
0.67 ± 9% -0.1 0.54 ± 10% perf-profile.self.cycles-pp.mtree_range_walk
0.58 ± 7% -0.1 0.49 ± 6% perf-profile.self.cycles-pp.percpu_counter_add_batch
0.16 ± 7% -0.1 0.10 ± 20% perf-profile.self.cycles-pp.madvise_cold_or_pageout_pte_range
0.39 ± 5% -0.1 0.33 ± 8% perf-profile.self.cycles-pp.mas_next_slot
0.26 ± 6% +0.0 0.29 ± 10% perf-profile.self.cycles-pp.filemap_fault
***************************************************************************************************
lkp-cpl-4sp2: 224 threads 4 sockets Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with 192G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
gcc-12/performance/x86_64-rhel-8.3/thread/50%/debian-11.1-x86_64-20220510.cgz/lkp-cpl-4sp2/page_fault3/will-it-scale
commit:
34611600bf ("mm: Call wp_page_copy() under the VMA lock")
c8b329d48e ("mm: Handle shared faults under the VMA lock")
34611600bfd1bf9f c8b329d48e0dac7438168a1857c
---------------- ---------------------------
%stddev %change %stddev
\ | \
46289 +39.6% 64618 ± 2% uptime.idle
3.839e+10 +47.5% 5.663e+10 ± 2% cpuidle..time
44548500 +28.6% 57277226 ± 2% cpuidle..usage
244.33 ± 7% +92.0% 469.17 ± 14% perf-c2c.DRAM.local
563.00 ± 3% -60.7% 221.00 ± 15% perf-c2c.HITM.remote
554762 -20.3% 441916 meminfo.Inactive
554566 -20.3% 441725 meminfo.Inactive(anon)
7360875 +46.0% 10746773 ± 2% meminfo.Mapped
20123 +28.9% 25930 meminfo.PageTables
56.22 +46.8% 82.52 ± 2% vmstat.cpu.id
63.54 ± 8% -39.5% 38.45 ± 14% vmstat.procs.r
23694 -84.7% 3627 vmstat.system.cs
455148 ± 2% +123.1% 1015448 ± 7% vmstat.system.in
2882478 ± 2% +274.8% 10804264 ± 5% will-it-scale.112.threads
55.72 +47.6% 82.27 ± 2% will-it-scale.112.threads_idle
25736 ± 2% +274.8% 96466 ± 5% will-it-scale.per_thread_ops
2882478 ± 2% +274.8% 10804264 ± 5% will-it-scale.workload
55.97 +26.4 82.36 ± 2% mpstat.cpu.all.idle%
0.82 -0.1 0.70 ± 4% mpstat.cpu.all.irq%
0.11 ± 4% -0.1 0.05 ± 5% mpstat.cpu.all.soft%
42.51 -26.9 15.64 ± 11% mpstat.cpu.all.sys%
0.59 ± 17% +0.7 1.25 ± 41% mpstat.cpu.all.usr%
1841712 +44.8% 2666713 ± 3% numa-meminfo.node0.Mapped
5224 ± 3% +31.6% 6877 ± 4% numa-meminfo.node0.PageTables
1845678 +46.3% 2699787 ± 2% numa-meminfo.node1.Mapped
5064 ± 5% +23.0% 6231 ± 2% numa-meminfo.node1.PageTables
1826141 +47.6% 2694729 ± 2% numa-meminfo.node2.Mapped
4794 ± 2% +30.8% 6269 ± 3% numa-meminfo.node2.PageTables
1868742 ± 2% +44.9% 2708096 ± 3% numa-meminfo.node3.Mapped
5026 ± 5% +28.8% 6474 ± 4% numa-meminfo.node3.PageTables
1591430 ± 4% +70.8% 2718150 ± 3% numa-numastat.node0.local_node
1673574 ± 3% +68.6% 2821949 ± 3% numa-numastat.node0.numa_hit
1577936 ± 6% +74.8% 2757801 ± 2% numa-numastat.node1.local_node
1645522 ± 5% +73.0% 2847142 ± 3% numa-numastat.node1.numa_hit
1537208 ± 3% +77.3% 2725353 ± 2% numa-numastat.node2.local_node
1639749 ± 3% +71.4% 2811161 ± 2% numa-numastat.node2.numa_hit
1637504 ± 5% +72.8% 2829154 ± 5% numa-numastat.node3.local_node
1732850 ± 4% +67.2% 2898001 ± 3% numa-numastat.node3.numa_hit
1684 -59.8% 677.17 ± 13% turbostat.Avg_MHz
44.43 -26.6 17.86 ± 13% turbostat.Busy%
44289096 +28.7% 57018721 ± 2% turbostat.C1
56.21 +26.6 82.76 ± 2% turbostat.C1%
55.57 +47.8% 82.14 ± 2% turbostat.CPU%c1
0.01 +533.3% 0.06 ± 17% turbostat.IPC
2.014e+08 ± 3% +174.5% 5.527e+08 ± 7% turbostat.IRQ
43515 ± 3% +37.9% 59997 ± 3% turbostat.POLL
685.24 -22.4% 532.03 ± 3% turbostat.PkgWatt
17.33 +5.6% 18.30 turbostat.RAMWatt
458598 +45.2% 666035 ± 3% numa-vmstat.node0.nr_mapped
1305 ± 3% +32.0% 1723 ± 4% numa-vmstat.node0.nr_page_table_pages
1673564 ± 3% +68.6% 2822055 ± 3% numa-vmstat.node0.numa_hit
1591420 ± 4% +70.8% 2718256 ± 3% numa-vmstat.node0.numa_local
461362 +46.3% 674878 ± 2% numa-vmstat.node1.nr_mapped
1266 ± 4% +23.2% 1559 ± 2% numa-vmstat.node1.nr_page_table_pages
1645442 ± 5% +73.0% 2847103 ± 3% numa-vmstat.node1.numa_hit
1577856 ± 6% +74.8% 2757762 ± 2% numa-vmstat.node1.numa_local
456314 +47.5% 672973 ± 2% numa-vmstat.node2.nr_mapped
1198 ± 2% +31.0% 1569 ± 3% numa-vmstat.node2.nr_page_table_pages
1639701 ± 3% +71.4% 2811174 ± 2% numa-vmstat.node2.numa_hit
1537161 ± 3% +77.3% 2725366 ± 2% numa-vmstat.node2.numa_local
464153 +46.1% 677988 ± 3% numa-vmstat.node3.nr_mapped
1255 ± 5% +29.1% 1621 ± 4% numa-vmstat.node3.nr_page_table_pages
1732732 ± 4% +67.3% 2898025 ± 3% numa-vmstat.node3.numa_hit
1637386 ± 5% +72.8% 2829178 ± 5% numa-vmstat.node3.numa_local
104802 -2.5% 102214 proc-vmstat.nr_anon_pages
4433098 -1.0% 4389891 proc-vmstat.nr_file_pages
138599 -20.3% 110426 proc-vmstat.nr_inactive_anon
1842991 +45.8% 2687030 ± 2% proc-vmstat.nr_mapped
5030 +28.9% 6483 proc-vmstat.nr_page_table_pages
3710638 -1.2% 3667429 proc-vmstat.nr_shmem
138599 -20.3% 110426 proc-vmstat.nr_zone_inactive_anon
43540 ± 7% -82.6% 7576 ± 47% proc-vmstat.numa_hint_faults
26753 ± 10% -77.6% 5982 ± 56% proc-vmstat.numa_hint_faults_local
6693986 +70.0% 11381806 ± 3% proc-vmstat.numa_hit
6346365 +73.9% 11034009 ± 3% proc-vmstat.numa_local
21587 ± 31% -92.2% 1683 ± 58% proc-vmstat.numa_pages_migrated
197966 -81.7% 36131 ± 25% proc-vmstat.numa_pte_updates
3749632 -1.1% 3708618 proc-vmstat.pgactivate
6848722 +68.4% 11532638 ± 3% proc-vmstat.pgalloc_normal
8.677e+08 ± 2% +276.2% 3.265e+09 ± 5% proc-vmstat.pgfault
6646708 +72.1% 11436096 ± 3% proc-vmstat.pgfree
21587 ± 31% -92.2% 1683 ± 58% proc-vmstat.pgmigrate_success
54536 ± 8% -24.2% 41332 ± 3% proc-vmstat.pgreuse
6305732 -84.8% 961479 ± 36% sched_debug.cfs_rq:/.avg_vruntime.avg
10700237 -83.0% 1820191 ± 34% sched_debug.cfs_rq:/.avg_vruntime.max
1797215 ± 18% -93.8% 112003 ± 80% sched_debug.cfs_rq:/.avg_vruntime.min
1512854 ± 2% -75.4% 372673 ± 31% sched_debug.cfs_rq:/.avg_vruntime.stddev
0.13 ± 20% +62.6% 0.21 ± 26% sched_debug.cfs_rq:/.h_nr_running.avg
0.33 ± 8% +17.2% 0.39 ± 9% sched_debug.cfs_rq:/.h_nr_running.stddev
4781 ± 82% -100.0% 0.12 ±223% sched_debug.cfs_rq:/.left_vruntime.avg
804679 ± 78% -100.0% 27.67 ±223% sched_debug.cfs_rq:/.left_vruntime.max
61817 ± 80% -100.0% 1.84 ±223% sched_debug.cfs_rq:/.left_vruntime.stddev
2654 ± 21% +156.8% 6815 ± 18% sched_debug.cfs_rq:/.load.avg
6305732 -84.8% 961479 ± 36% sched_debug.cfs_rq:/.min_vruntime.avg
10700237 -83.0% 1820191 ± 34% sched_debug.cfs_rq:/.min_vruntime.max
1797215 ± 18% -93.8% 112003 ± 80% sched_debug.cfs_rq:/.min_vruntime.min
1512854 ± 2% -75.4% 372673 ± 31% sched_debug.cfs_rq:/.min_vruntime.stddev
0.13 ± 20% +63.4% 0.21 ± 26% sched_debug.cfs_rq:/.nr_running.avg
0.33 ± 7% +18.1% 0.39 ± 9% sched_debug.cfs_rq:/.nr_running.stddev
4781 ± 82% -100.0% 0.12 ±223% sched_debug.cfs_rq:/.right_vruntime.avg
804679 ± 78% -100.0% 27.67 ±223% sched_debug.cfs_rq:/.right_vruntime.max
61817 ± 80% -100.0% 1.84 ±223% sched_debug.cfs_rq:/.right_vruntime.stddev
495.58 ± 3% -56.6% 214.98 ± 24% sched_debug.cfs_rq:/.runnable_avg.avg
1096 ± 7% -13.4% 949.07 ± 3% sched_debug.cfs_rq:/.runnable_avg.max
359.89 -23.0% 277.09 ± 11% sched_debug.cfs_rq:/.runnable_avg.stddev
493.94 ± 3% -56.5% 214.69 ± 24% sched_debug.cfs_rq:/.util_avg.avg
359.20 -22.9% 276.81 ± 11% sched_debug.cfs_rq:/.util_avg.stddev
97.00 ± 24% +76.3% 171.06 ± 31% sched_debug.cfs_rq:/.util_est_enqueued.avg
1512762 ± 4% -35.1% 981444 sched_debug.cpu.avg_idle.avg
5146368 ± 10% -71.3% 1476288 ± 33% sched_debug.cpu.avg_idle.max
578157 ± 8% -68.8% 180178 ± 10% sched_debug.cpu.avg_idle.min
670957 ± 5% -83.7% 109591 ± 26% sched_debug.cpu.avg_idle.stddev
73.60 ± 11% -81.3% 13.79 ± 9% sched_debug.cpu.clock.stddev
650.52 ± 18% +58.1% 1028 ± 14% sched_debug.cpu.curr->pid.avg
1959 ± 7% +19.6% 2342 ± 6% sched_debug.cpu.curr->pid.stddev
924262 ± 3% -45.6% 502853 sched_debug.cpu.max_idle_balance_cost.avg
2799134 ± 10% -70.8% 817753 ± 35% sched_debug.cpu.max_idle_balance_cost.max
377335 ± 9% -93.4% 24979 ± 94% sched_debug.cpu.max_idle_balance_cost.stddev
0.00 ± 8% -59.7% 0.00 ± 55% sched_debug.cpu.next_balance.stddev
0.10 ± 17% +57.8% 0.15 ± 14% sched_debug.cpu.nr_running.avg
1.28 ± 6% -19.6% 1.03 ± 6% sched_debug.cpu.nr_running.max
0.29 ± 7% +19.1% 0.35 ± 6% sched_debug.cpu.nr_running.stddev
17163 -79.4% 3534 ± 5% sched_debug.cpu.nr_switches.avg
7523 ± 10% -87.2% 961.21 ± 12% sched_debug.cpu.nr_switches.min
0.33 ± 5% -18.7% 0.27 ± 6% sched_debug.cpu.nr_uninterruptible.avg
0.00 -100.0% 0.00 sched_debug.rt_rq:.rt_nr_migratory.avg
0.17 -100.0% 0.00 sched_debug.rt_rq:.rt_nr_migratory.max
0.01 -100.0% 0.00 sched_debug.rt_rq:.rt_nr_migratory.stddev
0.00 -100.0% 0.00 sched_debug.rt_rq:.rt_nr_running.avg
0.17 -100.0% 0.00 sched_debug.rt_rq:.rt_nr_running.max
0.01 -100.0% 0.00 sched_debug.rt_rq:.rt_nr_running.stddev
0.18 ± 34% -100.0% 0.00 sched_debug.rt_rq:.rt_time.avg
40.73 ± 34% -100.0% 0.00 sched_debug.rt_rq:.rt_time.max
2.72 ± 34% -100.0% 0.00 sched_debug.rt_rq:.rt_time.stddev
2.63e+09 +165.9% 6.995e+09 ± 5% perf-stat.i.branch-instructions
0.45 -0.2 0.22 ± 3% perf-stat.i.branch-miss-rate%
12246564 +25.8% 15409530 ± 3% perf-stat.i.branch-misses
40.05 ± 6% +5.9 45.95 perf-stat.i.cache-miss-rate%
23716 -85.2% 3516 perf-stat.i.context-switches
30.25 ± 2% -84.8% 4.58 ± 18% perf-stat.i.cpi
3.785e+11 -60.2% 1.507e+11 ± 13% perf-stat.i.cpu-cycles
270.31 -6.3% 253.41 perf-stat.i.cpu-migrations
9670 ± 38% -79.6% 1972 ± 22% perf-stat.i.cycles-between-cache-misses
0.03 ± 4% -0.0 0.01 ± 10% perf-stat.i.dTLB-load-miss-rate%
958512 ± 3% +20.5% 1154691 ± 5% perf-stat.i.dTLB-load-misses
3.15e+09 +172.1% 8.571e+09 ± 5% perf-stat.i.dTLB-loads
4.91 +1.6 6.54 perf-stat.i.dTLB-store-miss-rate%
87894742 ± 2% +276.4% 3.308e+08 ± 5% perf-stat.i.dTLB-store-misses
1.709e+09 +176.5% 4.725e+09 ± 5% perf-stat.i.dTLB-stores
78.59 +13.6 92.23 perf-stat.i.iTLB-load-miss-rate%
8890053 +168.7% 23884564 ± 6% perf-stat.i.iTLB-load-misses
2405390 -17.2% 1990833 perf-stat.i.iTLB-loads
1.257e+10 +164.2% 3.323e+10 ± 5% perf-stat.i.instructions
0.03 ± 4% +571.5% 0.23 ± 17% perf-stat.i.ipc
1.69 -60.1% 0.67 ± 13% perf-stat.i.metric.GHz
33.38 +175.7% 92.04 ± 5% perf-stat.i.metric.M/sec
2877597 ± 2% +274.4% 10773809 ± 5% perf-stat.i.minor-faults
86.68 -3.1 83.57 ± 2% perf-stat.i.node-load-miss-rate%
5637384 ± 17% +105.0% 11559372 ± 20% perf-stat.i.node-load-misses
857520 ± 10% +158.1% 2213133 ± 9% perf-stat.i.node-loads
46.44 -16.5 29.97 perf-stat.i.node-store-miss-rate%
2608818 +80.4% 4705017 ± 3% perf-stat.i.node-store-misses
3024158 ± 2% +264.6% 11026854 ± 5% perf-stat.i.node-stores
2877597 ± 2% +274.4% 10773809 ± 5% perf-stat.i.page-faults
0.47 -0.2 0.22 ± 3% perf-stat.overall.branch-miss-rate%
39.95 ± 6% +6.0 45.93 perf-stat.overall.cache-miss-rate%
30.11 ± 2% -84.8% 4.58 ± 18% perf-stat.overall.cpi
9647 ± 38% -79.6% 1971 ± 21% perf-stat.overall.cycles-between-cache-misses
0.03 ± 4% -0.0 0.01 ± 10% perf-stat.overall.dTLB-load-miss-rate%
4.89 +1.7 6.54 perf-stat.overall.dTLB-store-miss-rate%
78.70 +13.6 92.28 perf-stat.overall.iTLB-load-miss-rate%
0.03 ± 2% +579.6% 0.23 ± 17% perf-stat.overall.ipc
86.61 -3.0 83.59 ± 2% perf-stat.overall.node-load-miss-rate%
46.33 -16.4 29.93 perf-stat.overall.node-store-miss-rate%
1315354 -29.2% 931848 perf-stat.overall.path-length
2.621e+09 +166.0% 6.97e+09 ± 5% perf-stat.ps.branch-instructions
12198295 +25.8% 15340303 ± 3% perf-stat.ps.branch-misses
23623 -85.2% 3499 perf-stat.ps.context-switches
3.77e+11 -60.2% 1.502e+11 ± 13% perf-stat.ps.cpu-cycles
266.50 -5.2% 252.58 perf-stat.ps.cpu-migrations
961568 ± 4% +19.7% 1150857 ± 5% perf-stat.ps.dTLB-load-misses
3.138e+09 +172.1% 8.541e+09 ± 5% perf-stat.ps.dTLB-loads
87534988 ± 2% +276.7% 3.297e+08 ± 5% perf-stat.ps.dTLB-store-misses
1.702e+09 +176.6% 4.708e+09 ± 5% perf-stat.ps.dTLB-stores
8850726 +169.0% 23812332 ± 6% perf-stat.ps.iTLB-load-misses
2394294 -17.1% 1983791 perf-stat.ps.iTLB-loads
1.253e+10 +164.3% 3.311e+10 ± 5% perf-stat.ps.instructions
2865432 ± 2% +274.7% 10737502 ± 5% perf-stat.ps.minor-faults
5615782 ± 17% +105.2% 11521907 ± 20% perf-stat.ps.node-load-misses
856502 ± 10% +157.6% 2206185 ± 9% perf-stat.ps.node-loads
2598165 +80.5% 4689182 ± 3% perf-stat.ps.node-store-misses
3009612 ± 2% +265.1% 10987528 ± 5% perf-stat.ps.node-stores
2865432 ± 2% +274.7% 10737502 ± 5% perf-stat.ps.page-faults
3.791e+12 +165.5% 1.006e+13 ± 5% perf-stat.total.instructions
0.05 ± 17% -77.2% 0.01 ± 73% perf-sched.sch_delay.avg.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
0.07 ± 34% -90.1% 0.01 ± 99% perf-sched.sch_delay.avg.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
0.00 ± 33% +377.8% 0.01 ± 9% perf-sched.sch_delay.avg.ms.__cond_resched.tlb_batch_pages_flush.zap_pte_range.zap_pmd_range.isra
0.08 ± 25% -90.3% 0.01 ± 12% perf-sched.sch_delay.avg.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
0.12 ± 64% -91.3% 0.01 ± 8% perf-sched.sch_delay.avg.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
0.06 ± 37% -89.4% 0.01 ± 16% perf-sched.sch_delay.avg.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
0.03 ± 19% -80.0% 0.01 ± 34% perf-sched.sch_delay.avg.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64
0.21 ±100% -97.3% 0.01 ± 6% perf-sched.sch_delay.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_exc_page_fault
0.02 ± 57% -100.0% 0.00 perf-sched.sch_delay.avg.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm
0.04 ± 34% -88.0% 0.00 ± 15% perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll
0.08 ± 21% -90.3% 0.01 ± 25% perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
0.06 ± 80% -100.0% 0.00 perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read_killable.lock_mm_and_find_vma
0.00 ± 19% +173.3% 0.01 ± 5% perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.vm_mmap_pgoff
0.02 ± 10% -87.3% 0.00 perf-sched.sch_delay.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
0.05 ± 16% -85.9% 0.01 ± 7% perf-sched.sch_delay.avg.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
0.03 ± 10% -74.0% 0.01 ± 27% perf-sched.sch_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
0.09 ± 40% -91.2% 0.01 ± 17% perf-sched.sch_delay.avg.ms.syslog_print.do_syslog.kmsg_read.vfs_read
0.01 ± 16% -74.6% 0.00 perf-sched.sch_delay.avg.ms.wait_for_partner.fifo_open.do_dentry_open.do_open
0.12 ± 66% -93.3% 0.01 ± 19% perf-sched.sch_delay.max.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
0.79 ± 40% -96.9% 0.02 ±186% perf-sched.sch_delay.max.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
0.00 ± 10% +123.1% 0.01 ± 22% perf-sched.sch_delay.max.ms.__cond_resched.tlb_batch_pages_flush.zap_pte_range.zap_pmd_range.isra
0.13 ± 14% -92.2% 0.01 ± 27% perf-sched.sch_delay.max.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
0.15 ± 54% -91.6% 0.01 ± 17% perf-sched.sch_delay.max.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
0.14 ± 43% -93.0% 0.01 ± 27% perf-sched.sch_delay.max.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
24.13 ±116% -99.9% 0.01 ± 15% perf-sched.sch_delay.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_exc_page_fault
0.06 ± 63% -100.0% 0.00 perf-sched.sch_delay.max.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm
0.22 ± 39% -95.5% 0.01 ± 15% perf-sched.sch_delay.max.ms.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll
0.22 ± 60% -93.8% 0.01 ± 20% perf-sched.sch_delay.max.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
65.91 ± 71% -100.0% 0.00 perf-sched.sch_delay.max.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read_killable.lock_mm_and_find_vma
13.37 ±143% +703.9% 107.48 ± 64% perf-sched.sch_delay.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.__vm_munmap
0.01 ± 31% +121.2% 0.02 ± 31% perf-sched.sch_delay.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.vm_mmap_pgoff
0.28 ± 14% -96.4% 0.01 ± 29% perf-sched.sch_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
0.19 ± 16% -93.0% 0.01 ± 23% perf-sched.sch_delay.max.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
0.16 ± 50% -93.2% 0.01 ± 27% perf-sched.sch_delay.max.ms.syslog_print.do_syslog.kmsg_read.vfs_read
0.10 ± 9% -94.3% 0.01 ± 37% perf-sched.sch_delay.max.ms.wait_for_partner.fifo_open.do_dentry_open.do_open
31.83 ± 4% +503.4% 192.03 ± 4% perf-sched.total_wait_and_delay.average.ms
61361 -82.4% 10800 ± 5% perf-sched.total_wait_and_delay.count.ms
31.76 ± 4% +502.8% 191.45 ± 5% perf-sched.total_wait_time.average.ms
1.67 ± 13% +9097.0% 153.21 ± 4% perf-sched.wait_and_delay.avg.ms.__cond_resched.shmem_get_folio_gfp.shmem_fault.__do_fault.do_fault
32.29 ± 9% -25.8% 23.95 ± 22% perf-sched.wait_and_delay.avg.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
1.57 ± 2% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read_killable.lock_mm_and_find_vma
1.11 ± 8% +12026.0% 134.28 ± 8% perf-sched.wait_and_delay.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.__vm_munmap
3.84 ± 5% +20.0% 4.61 ± 2% perf-sched.wait_and_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
610.00 ± 7% -30.8% 421.83 ± 15% perf-sched.wait_and_delay.count.__cond_resched.shmem_get_folio_gfp.shmem_fault.__do_fault.do_fault
51568 ± 2% -100.0% 0.00 perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read_killable.lock_mm_and_find_vma
1146 ± 4% +51.3% 1734 ± 6% perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.__vm_munmap
1226 ± 4% -11.6% 1084 ± 2% perf-sched.wait_and_delay.count.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
995.33 ± 3% -19.5% 801.33 ± 4% perf-sched.wait_and_delay.count.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
33.51 ± 78% +547.7% 217.02 ± 2% perf-sched.wait_and_delay.max.ms.__cond_resched.shmem_get_folio_gfp.shmem_fault.__do_fault.do_fault
65.98 ± 70% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read_killable.lock_mm_and_find_vma
15.43 ±115% +1309.1% 217.47 perf-sched.wait_and_delay.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.__vm_munmap
1.47 ± 8% +10284.0% 153.06 ± 4% perf-sched.wait_time.avg.ms.__cond_resched.shmem_get_folio_gfp.shmem_fault.__do_fault.do_fault
0.01 ± 34% +1.4e+06% 179.68 ± 11% perf-sched.wait_time.avg.ms.__cond_resched.tlb_batch_pages_flush.zap_pte_range.zap_pmd_range.isra
5.95 ± 21% -73.5% 1.58 ± 8% perf-sched.wait_time.avg.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
1.44 ± 6% +10218.7% 148.31 ± 9% perf-sched.wait_time.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_exc_page_fault
2.93 ± 23% -100.0% 0.00 perf-sched.wait_time.avg.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm
1.51 ± 4% -100.0% 0.00 perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read_killable.lock_mm_and_find_vma
1.08 ± 5% +12334.2% 134.16 ± 8% perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.__vm_munmap
0.38 ± 27% +47951.7% 182.68 ± 2% perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.vm_mmap_pgoff
2.83 ± 16% -82.5% 0.49 ± 2% perf-sched.wait_time.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
3.82 ± 6% +20.4% 4.60 ± 2% perf-sched.wait_time.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
6.49 ± 18% -75.3% 1.60 ± 9% perf-sched.wait_time.avg.ms.syslog_print.do_syslog.kmsg_read.vfs_read
0.03 ±133% -99.4% 0.00 ±223% perf-sched.wait_time.avg.ms.wait_for_partner.fifo_open.do_dentry_open.do_open
2.99 ± 14% +7148.4% 217.02 ± 2% perf-sched.wait_time.max.ms.__cond_resched.shmem_get_folio_gfp.shmem_fault.__do_fault.do_fault
0.06 ± 51% +3.5e+05% 209.30 ± 2% perf-sched.wait_time.max.ms.__cond_resched.tlb_batch_pages_flush.zap_pte_range.zap_pmd_range.isra
11.89 ± 21% -73.5% 3.16 ± 8% perf-sched.wait_time.max.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
2.62 ± 4% +7966.9% 211.41 ± 2% perf-sched.wait_time.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_exc_page_fault
4.63 ± 7% -100.0% 0.00 perf-sched.wait_time.max.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm
5.60 ± 74% -100.0% 0.00 perf-sched.wait_time.max.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read_killable.lock_mm_and_find_vma
4.03 ± 3% +5294.2% 217.47 perf-sched.wait_time.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.__vm_munmap
2.46 ± 25% +8701.7% 216.14 perf-sched.wait_time.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.vm_mmap_pgoff
16.44 ± 21% -93.5% 1.07 ± 3% perf-sched.wait_time.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
14.31 ± 59% -65.0% 5.01 perf-sched.wait_time.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
12.98 ± 18% -75.3% 3.20 ± 9% perf-sched.wait_time.max.ms.syslog_print.do_syslog.kmsg_read.vfs_read
0.63 ±151% -98.7% 0.01 ± 46% perf-sched.wait_time.max.ms.wait_for_partner.fifo_open.do_dentry_open.do_open
24.35 ± 5% -24.3 0.00 perf-profile.calltrace.cycles-pp.lock_mm_and_find_vma.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
22.08 ± 2% -22.1 0.00 perf-profile.calltrace.cycles-pp.down_read_trylock.lock_mm_and_find_vma.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
13.70 ± 2% -13.7 0.00 perf-profile.calltrace.cycles-pp.up_read.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
74.33 -12.7 61.66 perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
74.37 -12.2 62.18 perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.testcase
3.61 ± 8% -2.7 0.89 ± 16% perf-profile.calltrace.cycles-pp.acpi_safe_halt.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call
0.00 +0.7 0.71 ± 21% perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.acpi_safe_halt.acpi_idle_enter.cpuidle_enter_state
0.00 +0.7 0.71 ± 5% perf-profile.calltrace.cycles-pp.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.zap_pte_range.zap_pmd_range
0.00 +0.7 0.72 ± 5% perf-profile.calltrace.cycles-pp.on_each_cpu_cond_mask.flush_tlb_mm_range.zap_pte_range.zap_pmd_range.unmap_page_range
0.00 +0.7 0.73 ± 5% perf-profile.calltrace.cycles-pp.flush_tlb_mm_range.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas
0.00 +0.8 0.81 ± 15% perf-profile.calltrace.cycles-pp.filemap_get_entry.shmem_get_folio_gfp.shmem_fault.__do_fault.do_fault
0.71 ± 3% +1.2 1.90 ± 14% perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas.unmap_region
0.00 +1.2 1.20 ± 17% perf-profile.calltrace.cycles-pp.shmem_get_folio_gfp.shmem_fault.__do_fault.do_fault.__handle_mm_fault
0.71 ± 3% +1.2 1.94 ± 14% perf-profile.calltrace.cycles-pp.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
0.71 ± 3% +1.2 1.94 ± 14% perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap
0.71 ± 3% +1.2 1.94 ± 14% perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.unmap_vmas.unmap_region.do_vmi_align_munmap
0.00 +1.3 1.28 ± 27% perf-profile.calltrace.cycles-pp.set_pte_range.finish_fault.do_fault.__handle_mm_fault.handle_mm_fault
0.73 ± 3% +1.3 2.02 ± 14% perf-profile.calltrace.cycles-pp.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.73 ± 3% +1.3 2.02 ± 14% perf-profile.calltrace.cycles-pp.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64
0.72 ± 3% +1.3 2.00 ± 14% perf-profile.calltrace.cycles-pp.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap
0.00 +1.4 1.36 ± 17% perf-profile.calltrace.cycles-pp.shmem_fault.__do_fault.do_fault.__handle_mm_fault.handle_mm_fault
0.00 +1.4 1.40 ± 18% perf-profile.calltrace.cycles-pp.sync_regs.asm_exc_page_fault.testcase
0.00 +1.4 1.42 ± 17% perf-profile.calltrace.cycles-pp.__do_fault.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
0.00 +1.7 1.68 ± 20% perf-profile.calltrace.cycles-pp.finish_fault.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
0.00 +2.0 2.05 ± 14% perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.00 +2.0 2.05 ± 14% perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
41.13 +2.1 43.18 perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.acpi_safe_halt.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter
0.00 +2.1 2.08 ± 14% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
0.00 +2.1 2.08 ± 14% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.52 ± 13% +2.1 3.66 ± 14% perf-profile.calltrace.cycles-pp.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
3.32 ± 19% +19.0 22.27 ± 24% perf-profile.calltrace.cycles-pp.lock_vma_under_rcu.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
24.35 ± 5% -24.4 0.00 perf-profile.children.cycles-pp.lock_mm_and_find_vma
22.17 ± 2% -22.0 0.16 ± 17% perf-profile.children.cycles-pp.down_read_trylock
14.60 ± 2% -14.5 0.14 ± 21% perf-profile.children.cycles-pp.up_read
74.37 -12.4 62.02 perf-profile.children.cycles-pp.do_user_addr_fault
74.39 -12.2 62.20 perf-profile.children.cycles-pp.exc_page_fault
75.34 -5.5 69.86 perf-profile.children.cycles-pp.asm_exc_page_fault
23.24 -0.9 22.38 perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
0.44 ± 4% -0.2 0.28 ± 14% perf-profile.children.cycles-pp.scheduler_tick
0.20 ± 39% -0.1 0.08 ± 86% perf-profile.children.cycles-pp.x86_64_start_kernel
0.20 ± 39% -0.1 0.08 ± 86% perf-profile.children.cycles-pp.x86_64_start_reservations
0.20 ± 39% -0.1 0.08 ± 86% perf-profile.children.cycles-pp.start_kernel
0.20 ± 39% -0.1 0.08 ± 86% perf-profile.children.cycles-pp.arch_call_rest_init
0.20 ± 39% -0.1 0.08 ± 86% perf-profile.children.cycles-pp.rest_init
0.28 ± 4% -0.1 0.17 ± 19% perf-profile.children.cycles-pp._compound_head
0.47 ± 4% -0.1 0.37 ± 14% perf-profile.children.cycles-pp.update_process_times
0.47 ± 4% -0.1 0.37 ± 14% perf-profile.children.cycles-pp.tick_sched_handle
0.12 ± 13% -0.1 0.03 ±102% perf-profile.children.cycles-pp.load_balance
0.00 +0.1 0.07 ± 17% perf-profile.children.cycles-pp._raw_spin_trylock
0.00 +0.1 0.07 ± 10% perf-profile.children.cycles-pp.irqtime_account_irq
0.03 ± 70% +0.1 0.11 ± 19% perf-profile.children.cycles-pp.rebalance_domains
0.00 +0.1 0.08 ± 19% perf-profile.children.cycles-pp.__irqentry_text_end
0.00 +0.1 0.08 ± 17% perf-profile.children.cycles-pp.__count_memcg_events
0.00 +0.1 0.08 ± 18% perf-profile.children.cycles-pp.cgroup_rstat_updated
0.00 +0.1 0.10 ± 23% perf-profile.children.cycles-pp.folio_mark_dirty
0.00 +0.1 0.10 ± 18% perf-profile.children.cycles-pp.__pte_offset_map
0.08 ± 6% +0.1 0.18 ± 20% perf-profile.children.cycles-pp.__do_softirq
0.00 +0.1 0.11 ± 14% perf-profile.children.cycles-pp.pte_offset_map_nolock
0.53 ± 4% +0.1 0.65 ± 16% perf-profile.children.cycles-pp.__hrtimer_run_queues
0.00 +0.1 0.12 ± 28% perf-profile.children.cycles-pp.__mod_node_page_state
0.05 ± 8% +0.1 0.18 ± 19% perf-profile.children.cycles-pp._raw_spin_lock
0.00 +0.1 0.14 ± 22% perf-profile.children.cycles-pp.folio_unlock
0.00 +0.1 0.14 ± 20% perf-profile.children.cycles-pp.release_pages
0.02 ±141% +0.1 0.16 ± 36% perf-profile.children.cycles-pp.ktime_get
0.00 +0.1 0.14 ± 6% perf-profile.children.cycles-pp.native_flush_tlb_local
0.08 ± 5% +0.1 0.23 ± 16% perf-profile.children.cycles-pp.__irq_exit_rcu
0.01 ±223% +0.2 0.16 ± 24% perf-profile.children.cycles-pp.perf_mux_hrtimer_handler
0.00 +0.2 0.15 ± 17% perf-profile.children.cycles-pp.handle_pte_fault
0.00 +0.2 0.16 ±115% perf-profile.children.cycles-pp.menu_select
0.00 +0.2 0.16 ± 20% perf-profile.children.cycles-pp.__mod_memcg_lruvec_state
0.57 ± 4% +0.2 0.74 ± 15% perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
0.57 ± 4% +0.2 0.74 ± 15% perf-profile.children.cycles-pp.hrtimer_interrupt
0.01 ±223% +0.2 0.18 ± 29% perf-profile.children.cycles-pp.inode_needs_update_time
0.00 +0.2 0.18 ± 18% perf-profile.children.cycles-pp.tlb_batch_pages_flush
0.00 +0.2 0.19 ± 23% perf-profile.children.cycles-pp.__mod_lruvec_state
0.01 ±223% +0.2 0.21 ± 27% perf-profile.children.cycles-pp.file_update_time
0.00 +0.2 0.22 ± 4% perf-profile.children.cycles-pp.llist_reverse_order
0.02 ±141% +0.2 0.26 ± 5% perf-profile.children.cycles-pp.flush_tlb_func
0.00 +0.2 0.25 ± 18% perf-profile.children.cycles-pp.error_entry
0.00 +0.3 0.26 ± 18% perf-profile.children.cycles-pp.__pte_offset_map_lock
0.15 ± 9% +0.3 0.44 ± 21% perf-profile.children.cycles-pp.mtree_range_walk
0.07 ± 9% +0.3 0.38 ± 5% perf-profile.children.cycles-pp.default_send_IPI_mask_sequence_phys
0.05 ± 7% +0.3 0.37 ± 15% perf-profile.children.cycles-pp.xas_descend
0.66 ± 3% +0.3 1.00 ± 14% perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
0.05 ± 7% +0.4 0.43 ± 18% perf-profile.children.cycles-pp.folio_add_file_rmap_range
0.04 ± 44% +0.4 0.43 ± 19% perf-profile.children.cycles-pp.page_remove_rmap
0.06 ± 6% +0.4 0.45 ± 20% perf-profile.children.cycles-pp.__mod_lruvec_page_state
0.04 ± 44% +0.4 0.44 ± 20% perf-profile.children.cycles-pp.tlb_flush_rmaps
0.06 ± 16% +0.4 0.48 ± 24% perf-profile.children.cycles-pp.fault_dirty_shared_page
0.30 ± 3% +0.4 0.72 ± 5% perf-profile.children.cycles-pp.on_each_cpu_cond_mask
0.30 ± 3% +0.4 0.72 ± 5% perf-profile.children.cycles-pp.smp_call_function_many_cond
0.31 ± 2% +0.4 0.74 ± 5% perf-profile.children.cycles-pp.flush_tlb_mm_range
0.07 ± 6% +0.5 0.54 ± 14% perf-profile.children.cycles-pp.xas_load
0.13 ± 8% +0.6 0.75 ± 4% perf-profile.children.cycles-pp.__flush_smp_call_function_queue
0.11 ± 8% +0.6 0.75 ± 4% perf-profile.children.cycles-pp.__sysvec_call_function
0.12 ± 8% +0.7 0.82 ± 4% perf-profile.children.cycles-pp.sysvec_call_function
0.12 ± 4% +0.7 0.82 ± 15% perf-profile.children.cycles-pp.filemap_get_entry
0.08 ± 5% +0.8 0.86 ± 19% perf-profile.children.cycles-pp.___perf_sw_event
0.11 ± 6% +1.0 1.09 ± 20% perf-profile.children.cycles-pp.__perf_sw_event
0.18 ± 6% +1.0 1.21 ± 16% perf-profile.children.cycles-pp.shmem_get_folio_gfp
0.25 ± 46% +1.0 1.30 ± 27% perf-profile.children.cycles-pp.set_pte_range
0.18 ± 5% +1.2 1.36 ± 17% perf-profile.children.cycles-pp.shmem_fault
0.25 ± 5% +1.2 1.43 ± 17% perf-profile.children.cycles-pp.__do_fault
0.93 ± 4% +1.2 2.12 ± 14% perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
0.93 ± 4% +1.2 2.12 ± 14% perf-profile.children.cycles-pp.do_syscall_64
0.72 ± 3% +1.2 1.94 ± 14% perf-profile.children.cycles-pp.zap_pte_range
0.72 ± 3% +1.2 1.94 ± 14% perf-profile.children.cycles-pp.unmap_vmas
0.72 ± 3% +1.2 1.94 ± 14% perf-profile.children.cycles-pp.unmap_page_range
0.72 ± 3% +1.2 1.94 ± 14% perf-profile.children.cycles-pp.zap_pmd_range
0.44 ± 4% +1.2 1.67 perf-profile.children.cycles-pp.asm_sysvec_call_function
0.18 ± 19% +1.3 1.45 ± 17% perf-profile.children.cycles-pp.sync_regs
0.74 ± 3% +1.3 2.02 ± 14% perf-profile.children.cycles-pp.do_vmi_munmap
0.74 ± 3% +1.3 2.02 ± 14% perf-profile.children.cycles-pp.do_vmi_align_munmap
0.72 ± 3% +1.3 2.00 ± 14% perf-profile.children.cycles-pp.unmap_region
0.74 ± 3% +1.3 2.05 ± 14% perf-profile.children.cycles-pp.__vm_munmap
0.74 ± 3% +1.3 2.05 ± 14% perf-profile.children.cycles-pp.__x64_sys_munmap
0.31 ± 37% +1.4 1.70 ± 20% perf-profile.children.cycles-pp.finish_fault
1.53 ± 13% +2.2 3.68 ± 14% perf-profile.children.cycles-pp.do_fault
0.82 ± 24% +4.1 4.90 ± 33% perf-profile.children.cycles-pp.native_irq_return_iret
3.34 ± 19% +19.0 22.33 ± 24% perf-profile.children.cycles-pp.lock_vma_under_rcu
21.96 ± 2% -21.8 0.16 ± 17% perf-profile.self.cycles-pp.down_read_trylock
14.47 ± 2% -14.3 0.13 ± 22% perf-profile.self.cycles-pp.up_read
3.51 ± 9% -3.5 0.04 ±108% perf-profile.self.cycles-pp.asm_sysvec_apic_timer_interrupt
0.27 ± 4% -0.1 0.14 ± 19% perf-profile.self.cycles-pp._compound_head
0.12 ± 6% +0.0 0.14 ± 4% perf-profile.self.cycles-pp.llist_add_batch
0.00 +0.1 0.06 ± 14% perf-profile.self.cycles-pp.__mod_lruvec_state
0.00 +0.1 0.07 ± 17% perf-profile.self.cycles-pp._raw_spin_trylock
0.00 +0.1 0.07 ± 23% perf-profile.self.cycles-pp.__irqentry_text_end
0.00 +0.1 0.07 ± 18% perf-profile.self.cycles-pp.do_fault
0.00 +0.1 0.08 ± 17% perf-profile.self.cycles-pp.finish_fault
0.00 +0.1 0.08 ± 14% perf-profile.self.cycles-pp.cgroup_rstat_updated
0.00 +0.1 0.09 ± 20% perf-profile.self.cycles-pp.inode_needs_update_time
0.00 +0.1 0.09 ± 16% perf-profile.self.cycles-pp.__pte_offset_map_lock
0.00 +0.1 0.09 ± 20% perf-profile.self.cycles-pp.__pte_offset_map
0.00 +0.1 0.10 ± 16% perf-profile.self.cycles-pp.__mod_lruvec_page_state
0.00 +0.1 0.11 ± 22% perf-profile.self.cycles-pp.__mod_memcg_lruvec_state
0.00 +0.1 0.11 ± 31% perf-profile.self.cycles-pp.__mod_node_page_state
0.00 +0.1 0.11 ± 13% perf-profile.self.cycles-pp.flush_tlb_func
0.07 ± 5% +0.1 0.19 ± 12% perf-profile.self.cycles-pp.smp_call_function_many_cond
0.02 ±141% +0.1 0.14 ± 40% perf-profile.self.cycles-pp.ktime_get
0.00 +0.1 0.13 ± 17% perf-profile.self.cycles-pp.exc_page_fault
0.00 +0.1 0.13 ± 14% perf-profile.self.cycles-pp.xas_load
0.00 +0.1 0.13 ± 21% perf-profile.self.cycles-pp.folio_unlock
0.00 +0.1 0.14 ± 19% perf-profile.self.cycles-pp.release_pages
0.00 +0.1 0.14 ± 7% perf-profile.self.cycles-pp.native_flush_tlb_local
0.00 +0.1 0.14 ± 18% perf-profile.self.cycles-pp.shmem_fault
0.01 ±223% +0.2 0.18 ± 18% perf-profile.self.cycles-pp._raw_spin_lock
0.00 +0.2 0.17 ± 18% perf-profile.self.cycles-pp.set_pte_range
0.00 +0.2 0.19 ± 17% perf-profile.self.cycles-pp.folio_add_file_rmap_range
0.00 +0.2 0.22 ± 19% perf-profile.self.cycles-pp.page_remove_rmap
0.00 +0.2 0.22 ± 5% perf-profile.self.cycles-pp.llist_reverse_order
0.00 +0.2 0.24 ± 18% perf-profile.self.cycles-pp.error_entry
0.00 +0.2 0.24 ± 25% perf-profile.self.cycles-pp.__perf_sw_event
0.02 ± 99% +0.2 0.27 ± 16% perf-profile.self.cycles-pp.filemap_get_entry
0.00 +0.3 0.28 ± 5% perf-profile.self.cycles-pp.__flush_smp_call_function_queue
0.14 ± 7% +0.3 0.43 ± 21% perf-profile.self.cycles-pp.mtree_range_walk
0.00 +0.3 0.29 ± 13% perf-profile.self.cycles-pp.asm_exc_page_fault
0.07 ± 8% +0.3 0.38 ± 4% perf-profile.self.cycles-pp.default_send_IPI_mask_sequence_phys
0.04 ± 44% +0.3 0.35 ± 15% perf-profile.self.cycles-pp.xas_descend
0.00 +0.3 0.31 ± 19% perf-profile.self.cycles-pp.zap_pte_range
0.03 ± 70% +0.3 0.35 ± 19% perf-profile.self.cycles-pp.shmem_get_folio_gfp
0.01 ±223% +0.5 0.53 ± 13% perf-profile.self.cycles-pp.do_user_addr_fault
0.08 ± 6% +0.7 0.74 ± 19% perf-profile.self.cycles-pp.___perf_sw_event
0.18 ± 19% +1.3 1.44 ± 17% perf-profile.self.cycles-pp.sync_regs
19.02 +2.4 21.40 perf-profile.self.cycles-pp.acpi_safe_halt
0.27 ± 71% +2.4 2.72 ± 76% perf-profile.self.cycles-pp.handle_mm_fault
0.82 ± 24% +4.1 4.89 ± 33% perf-profile.self.cycles-pp.native_irq_return_iret
0.76 ± 3% +5.6 6.37 ± 18% perf-profile.self.cycles-pp.testcase
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2023-10-20 13:23 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-06 19:53 [PATCH v2 0/6] Handle more faults under the VMA lock Matthew Wilcox (Oracle)
2023-10-06 19:53 ` [PATCH v2 1/6] mm: Make lock_folio_maybe_drop_mmap() VMA lock aware Matthew Wilcox (Oracle)
2023-10-08 21:47 ` Suren Baghdasaryan
2023-10-06 19:53 ` [PATCH v2 2/6] mm: Call wp_page_copy() under the VMA lock Matthew Wilcox (Oracle)
2023-10-08 22:00 ` Suren Baghdasaryan
2023-10-06 19:53 ` [PATCH v2 3/6] mm: Handle shared faults " Matthew Wilcox (Oracle)
2023-10-08 22:01 ` Suren Baghdasaryan
2023-10-20 13:23 ` kernel test robot
2023-10-06 19:53 ` [PATCH v2 4/6] mm: Handle COW " Matthew Wilcox (Oracle)
2023-10-08 22:05 ` Suren Baghdasaryan
2023-10-20 13:18 ` kernel test robot
2023-10-06 19:53 ` [PATCH v2 5/6] mm: Handle read " Matthew Wilcox (Oracle)
2023-10-08 22:06 ` Suren Baghdasaryan
2023-10-20 9:55 ` kernel test robot
2023-10-06 19:53 ` [PATCH v2 6/6] mm: Handle write faults to RO pages " Matthew Wilcox (Oracle)
2023-10-08 22:07 ` Suren Baghdasaryan
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.