linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] Avoid the mmap lock for fault-around
@ 2023-04-04 13:58 Matthew Wilcox (Oracle)
  2023-04-04 13:58 ` [PATCH 1/6] mm: Allow per-VMA locks on file-backed VMAs Matthew Wilcox (Oracle)
                   ` (5 more replies)
  0 siblings, 6 replies; 19+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-04-04 13:58 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: Matthew Wilcox (Oracle), linux-mm, linux-fsdevel, Punit Agrawal

The linux-next tree currently contains patches (mostly from Suren)
which handle some page faults without the protection of the mmap lock.
This patchset adds the ability to handle page faults on parts of files
which are already in the page cache without taking the mmap lock.

I've taken a very gradual approach to pushing the lock down.  I'm not 100%
confident in my ability to grasp all the finer aspects of VMA handling,
so some reviewrs may well feel that I could have combined some of
these patches.  I did try to skip one of these steps and it had a bug,
so I feel justified in proceeding cautiously.

Several people have volunteered to run benchmarks on this, so I haven't.
I have run it through xfstests and it doesn't appear to introduce any
regressions.

Matthew Wilcox (Oracle) (6):
  mm: Allow per-VMA locks on file-backed VMAs
  mm: Move FAULT_FLAG_VMA_LOCK check from handle_mm_fault()
  mm: Move FAULT_FLAG_VMA_LOCK check into handle_pte_fault()
  mm: Move FAULT_FLAG_VMA_LOCK check down in handle_pte_fault()
  mm: Move the FAULT_FLAG_VMA_LOCK check down from do_pte_missing()
  mm: Run the fault-around code under the VMA lock

 mm/hugetlb.c |  4 ++++
 mm/memory.c  | 28 +++++++++++++++++++---------
 2 files changed, 23 insertions(+), 9 deletions(-)

-- 
2.39.2



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/6] mm: Allow per-VMA locks on file-backed VMAs
  2023-04-04 13:58 [PATCH 0/6] Avoid the mmap lock for fault-around Matthew Wilcox (Oracle)
@ 2023-04-04 13:58 ` Matthew Wilcox (Oracle)
  2023-04-07 17:54   ` Suren Baghdasaryan
  2023-04-04 13:58 ` [PATCH 2/6] mm: Move FAULT_FLAG_VMA_LOCK check from handle_mm_fault() Matthew Wilcox (Oracle)
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 19+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-04-04 13:58 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: Matthew Wilcox (Oracle), linux-mm, linux-fsdevel, Punit Agrawal

The fault path will immediately fail in handle_mm_fault(), so this
is the minimal step which allows the per-VMA lock to be taken on
file-backed VMAs.  There may be a small performance reduction as a
little unnecessary work will be done on each page fault.  See later
patches for the improvement.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/memory.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index fdaec7772fff..f726f85f0081 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -5223,6 +5223,9 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address,
 					    flags & FAULT_FLAG_REMOTE))
 		return VM_FAULT_SIGSEGV;
 
+	if ((flags & FAULT_FLAG_VMA_LOCK) && !vma_is_anonymous(vma))
+		return VM_FAULT_RETRY;
+
 	/*
 	 * Enable the memcg OOM handling for faults triggered in user
 	 * space.  Kernel faults are handled more gracefully.
@@ -5275,12 +5278,8 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm,
 	if (!vma)
 		goto inval;
 
-	/* Only anonymous vmas are supported for now */
-	if (!vma_is_anonymous(vma))
-		goto inval;
-
 	/* find_mergeable_anon_vma uses adjacent vmas which are not locked */
-	if (!vma->anon_vma)
+	if (vma_is_anonymous(vma) && !vma->anon_vma)
 		goto inval;
 
 	if (!vma_start_read(vma))
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/6] mm: Move FAULT_FLAG_VMA_LOCK check from handle_mm_fault()
  2023-04-04 13:58 [PATCH 0/6] Avoid the mmap lock for fault-around Matthew Wilcox (Oracle)
  2023-04-04 13:58 ` [PATCH 1/6] mm: Allow per-VMA locks on file-backed VMAs Matthew Wilcox (Oracle)
@ 2023-04-04 13:58 ` Matthew Wilcox (Oracle)
  2023-04-04 13:58 ` [PATCH 3/6] mm: Move FAULT_FLAG_VMA_LOCK check into handle_pte_fault() Matthew Wilcox (Oracle)
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 19+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-04-04 13:58 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: Matthew Wilcox (Oracle), linux-mm, linux-fsdevel, Punit Agrawal

Handle a little more of the page fault path outside the mmap sem.
The hugetlb path doesn't need to check whether the VMA is anonymous;
the VM_HUGETLB flag is only set on hugetlbfs VMAs.  There should be no
performance change from the previous commit; this is simply a step to
ease bisection of any problems.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/hugetlb.c |  4 ++++
 mm/memory.c  | 14 +++++++-------
 2 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index efc443a906fa..39f168e3518f 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6052,6 +6052,10 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 	int need_wait_lock = 0;
 	unsigned long haddr = address & huge_page_mask(h);
 
+	/* TODO: Handle faults under the VMA lock */
+	if (flags & FAULT_FLAG_VMA_LOCK)
+		return VM_FAULT_RETRY;
+
 	/*
 	 * Serialize hugepage allocation and instantiation, so that we don't
 	 * get spurious allocation failures if two CPUs race to instantiate
diff --git a/mm/memory.c b/mm/memory.c
index f726f85f0081..fc1f0ef9a7a5 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4992,10 +4992,10 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf)
 }
 
 /*
- * By the time we get here, we already hold the mm semaphore
- *
- * The mmap_lock may have been released depending on flags and our
- * return value.  See filemap_fault() and __folio_lock_or_retry().
+ * On entry, we hold either the VMA lock or the mmap_lock
+ * (FAULT_FLAG_VMA_LOCK tells you which).  If VM_FAULT_RETRY is set in
+ * the result, the mmap_lock is not held on exit.  See filemap_fault()
+ * and __folio_lock_or_retry().
  */
 static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
 		unsigned long address, unsigned int flags)
@@ -5014,6 +5014,9 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
 	p4d_t *p4d;
 	vm_fault_t ret;
 
+	if ((flags & FAULT_FLAG_VMA_LOCK) && !vma_is_anonymous(vma))
+		return VM_FAULT_RETRY;
+
 	pgd = pgd_offset(mm, address);
 	p4d = p4d_alloc(mm, pgd, address);
 	if (!p4d)
@@ -5223,9 +5226,6 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address,
 					    flags & FAULT_FLAG_REMOTE))
 		return VM_FAULT_SIGSEGV;
 
-	if ((flags & FAULT_FLAG_VMA_LOCK) && !vma_is_anonymous(vma))
-		return VM_FAULT_RETRY;
-
 	/*
 	 * Enable the memcg OOM handling for faults triggered in user
 	 * space.  Kernel faults are handled more gracefully.
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/6] mm: Move FAULT_FLAG_VMA_LOCK check into handle_pte_fault()
  2023-04-04 13:58 [PATCH 0/6] Avoid the mmap lock for fault-around Matthew Wilcox (Oracle)
  2023-04-04 13:58 ` [PATCH 1/6] mm: Allow per-VMA locks on file-backed VMAs Matthew Wilcox (Oracle)
  2023-04-04 13:58 ` [PATCH 2/6] mm: Move FAULT_FLAG_VMA_LOCK check from handle_mm_fault() Matthew Wilcox (Oracle)
@ 2023-04-04 13:58 ` Matthew Wilcox (Oracle)
  2023-04-04 13:58 ` [PATCH 4/6] mm: Move FAULT_FLAG_VMA_LOCK check down in handle_pte_fault() Matthew Wilcox (Oracle)
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 19+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-04-04 13:58 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: Matthew Wilcox (Oracle), linux-mm, linux-fsdevel, Punit Agrawal

Push the check down from __handle_mm_fault().  There's a mild upside to
this patch in that we'll allocate the page tables while under the VMA
lock rather than the mmap lock, reducing the hold time on the mmap lock,
since the retry will find the page tables already populated.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/memory.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index fc1f0ef9a7a5..a2e27403e4f1 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4896,6 +4896,9 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf)
 {
 	pte_t entry;
 
+	if ((vmf->flags & FAULT_FLAG_VMA_LOCK) && !vma_is_anonymous(vmf->vma))
+		return VM_FAULT_RETRY;
+
 	if (unlikely(pmd_none(*vmf->pmd))) {
 		/*
 		 * Leave __pte_alloc() until later: because vm_ops->fault may
@@ -5014,9 +5017,6 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
 	p4d_t *p4d;
 	vm_fault_t ret;
 
-	if ((flags & FAULT_FLAG_VMA_LOCK) && !vma_is_anonymous(vma))
-		return VM_FAULT_RETRY;
-
 	pgd = pgd_offset(mm, address);
 	p4d = p4d_alloc(mm, pgd, address);
 	if (!p4d)
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 4/6] mm: Move FAULT_FLAG_VMA_LOCK check down in handle_pte_fault()
  2023-04-04 13:58 [PATCH 0/6] Avoid the mmap lock for fault-around Matthew Wilcox (Oracle)
                   ` (2 preceding siblings ...)
  2023-04-04 13:58 ` [PATCH 3/6] mm: Move FAULT_FLAG_VMA_LOCK check into handle_pte_fault() Matthew Wilcox (Oracle)
@ 2023-04-04 13:58 ` Matthew Wilcox (Oracle)
  2023-04-04 13:58 ` [PATCH 5/6] mm: Move the FAULT_FLAG_VMA_LOCK check down from do_pte_missing() Matthew Wilcox (Oracle)
  2023-04-04 13:58 ` [PATCH 6/6] mm: Run the fault-around code under the VMA lock Matthew Wilcox (Oracle)
  5 siblings, 0 replies; 19+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-04-04 13:58 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: Matthew Wilcox (Oracle), linux-mm, linux-fsdevel, Punit Agrawal

Call do_pte_missing(), do_wp_page() and do_numa_page() under the
VMA lock.  The first two gain the check, but do_numa_page() looks
safe to run outside the mmap lock.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/memory.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index a2e27403e4f1..dc2baddc6040 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3329,6 +3329,9 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf)
 	struct vm_area_struct *vma = vmf->vma;
 	struct folio *folio = NULL;
 
+	if ((vmf->flags & FAULT_FLAG_VMA_LOCK) && !vma_is_anonymous(vma))
+		return VM_FAULT_RETRY;
+
 	if (likely(!unshare)) {
 		if (userfaultfd_pte_wp(vma, *vmf->pte)) {
 			pte_unmap_unlock(vmf->pte, vmf->ptl);
@@ -3644,6 +3647,8 @@ static vm_fault_t do_pte_missing(struct vm_fault *vmf)
 {
 	if (vma_is_anonymous(vmf->vma))
 		return do_anonymous_page(vmf);
+	else if (vmf->flags & FAULT_FLAG_VMA_LOCK)
+		return VM_FAULT_RETRY;
 	else
 		return do_fault(vmf);
 }
@@ -4896,9 +4901,6 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf)
 {
 	pte_t entry;
 
-	if ((vmf->flags & FAULT_FLAG_VMA_LOCK) && !vma_is_anonymous(vmf->vma))
-		return VM_FAULT_RETRY;
-
 	if (unlikely(pmd_none(*vmf->pmd))) {
 		/*
 		 * Leave __pte_alloc() until later: because vm_ops->fault may
@@ -4957,6 +4959,9 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf)
 	if (pte_protnone(vmf->orig_pte) && vma_is_accessible(vmf->vma))
 		return do_numa_page(vmf);
 
+	if ((vmf->flags & FAULT_FLAG_VMA_LOCK) && !vma_is_anonymous(vmf->vma))
+		return VM_FAULT_RETRY;
+
 	vmf->ptl = pte_lockptr(vmf->vma->vm_mm, vmf->pmd);
 	spin_lock(vmf->ptl);
 	entry = vmf->orig_pte;
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 5/6] mm: Move the FAULT_FLAG_VMA_LOCK check down from do_pte_missing()
  2023-04-04 13:58 [PATCH 0/6] Avoid the mmap lock for fault-around Matthew Wilcox (Oracle)
                   ` (3 preceding siblings ...)
  2023-04-04 13:58 ` [PATCH 4/6] mm: Move FAULT_FLAG_VMA_LOCK check down in handle_pte_fault() Matthew Wilcox (Oracle)
@ 2023-04-04 13:58 ` Matthew Wilcox (Oracle)
  2023-04-04 13:58 ` [PATCH 6/6] mm: Run the fault-around code under the VMA lock Matthew Wilcox (Oracle)
  5 siblings, 0 replies; 19+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-04-04 13:58 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: Matthew Wilcox (Oracle), linux-mm, linux-fsdevel, Punit Agrawal

Perform the check at the start of do_read_fault(), do_cow_fault()
and do_shared_fault() instead.  Should be no performance change from
the last commit.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/memory.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index dc2baddc6040..9952bebd25b4 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3647,8 +3647,6 @@ static vm_fault_t do_pte_missing(struct vm_fault *vmf)
 {
 	if (vma_is_anonymous(vmf->vma))
 		return do_anonymous_page(vmf);
-	else if (vmf->flags & FAULT_FLAG_VMA_LOCK)
-		return VM_FAULT_RETRY;
 	else
 		return do_fault(vmf);
 }
@@ -4523,6 +4521,8 @@ static vm_fault_t do_read_fault(struct vm_fault *vmf)
 {
 	vm_fault_t ret = 0;
 
+	if (vmf->flags & FAULT_FLAG_VMA_LOCK)
+		return VM_FAULT_RETRY;
 	/*
 	 * Let's call ->map_pages() first and use ->fault() as fallback
 	 * if page by the offset is not ready to be mapped (cold cache or
@@ -4550,6 +4550,9 @@ static vm_fault_t do_cow_fault(struct vm_fault *vmf)
 	struct vm_area_struct *vma = vmf->vma;
 	vm_fault_t ret;
 
+	if (vmf->flags & FAULT_FLAG_VMA_LOCK)
+		return VM_FAULT_RETRY;
+
 	if (unlikely(anon_vma_prepare(vma)))
 		return VM_FAULT_OOM;
 
@@ -4589,6 +4592,9 @@ static vm_fault_t do_shared_fault(struct vm_fault *vmf)
 	struct vm_area_struct *vma = vmf->vma;
 	vm_fault_t ret, tmp;
 
+	if (vmf->flags & FAULT_FLAG_VMA_LOCK)
+		return VM_FAULT_RETRY;
+
 	ret = __do_fault(vmf);
 	if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY)))
 		return ret;
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 6/6] mm: Run the fault-around code under the VMA lock
  2023-04-04 13:58 [PATCH 0/6] Avoid the mmap lock for fault-around Matthew Wilcox (Oracle)
                   ` (4 preceding siblings ...)
  2023-04-04 13:58 ` [PATCH 5/6] mm: Move the FAULT_FLAG_VMA_LOCK check down from do_pte_missing() Matthew Wilcox (Oracle)
@ 2023-04-04 13:58 ` Matthew Wilcox (Oracle)
  2023-04-04 15:23   ` Matthew Wilcox
  5 siblings, 1 reply; 19+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-04-04 13:58 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: Matthew Wilcox (Oracle), linux-mm, linux-fsdevel, Punit Agrawal

The map_pages fs method should be safe to run under the VMA lock instead
of the mmap lock.  This should have a measurable reduction in contention
on the mmap lock.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/memory.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 9952bebd25b4..590ccaf6f7fb 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4521,8 +4521,6 @@ static vm_fault_t do_read_fault(struct vm_fault *vmf)
 {
 	vm_fault_t ret = 0;
 
-	if (vmf->flags & FAULT_FLAG_VMA_LOCK)
-		return VM_FAULT_RETRY;
 	/*
 	 * Let's call ->map_pages() first and use ->fault() as fallback
 	 * if page by the offset is not ready to be mapped (cold cache or
@@ -4534,6 +4532,8 @@ static vm_fault_t do_read_fault(struct vm_fault *vmf)
 			return ret;
 	}
 
+	if (vmf->flags & FAULT_FLAG_VMA_LOCK)
+		return VM_FAULT_RETRY;
 	ret = __do_fault(vmf);
 	if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY)))
 		return ret;
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 6/6] mm: Run the fault-around code under the VMA lock
  2023-04-04 13:58 ` [PATCH 6/6] mm: Run the fault-around code under the VMA lock Matthew Wilcox (Oracle)
@ 2023-04-04 15:23   ` Matthew Wilcox
  2023-04-07 18:20     ` Suren Baghdasaryan
  2023-04-10  4:53     ` Yin, Fengwei
  0 siblings, 2 replies; 19+ messages in thread
From: Matthew Wilcox @ 2023-04-04 15:23 UTC (permalink / raw)
  To: Suren Baghdasaryan; +Cc: linux-mm, linux-fsdevel, Punit Agrawal

On Tue, Apr 04, 2023 at 02:58:50PM +0100, Matthew Wilcox (Oracle) wrote:
> The map_pages fs method should be safe to run under the VMA lock instead
> of the mmap lock.  This should have a measurable reduction in contention
> on the mmap lock.

https://github.com/antonblanchard/will-it-scale/pull/37/files should
be a good microbenchmark to report numbers from.  Obviously real-world
benchmarks will be more compelling.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/6] mm: Allow per-VMA locks on file-backed VMAs
  2023-04-04 13:58 ` [PATCH 1/6] mm: Allow per-VMA locks on file-backed VMAs Matthew Wilcox (Oracle)
@ 2023-04-07 17:54   ` Suren Baghdasaryan
  2023-04-07 20:12     ` Matthew Wilcox
  0 siblings, 1 reply; 19+ messages in thread
From: Suren Baghdasaryan @ 2023-04-07 17:54 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: linux-mm, linux-fsdevel, Punit Agrawal

On Tue, Apr 4, 2023 at 6:59 AM Matthew Wilcox (Oracle)
<willy@infradead.org> wrote:
>
> The fault path will immediately fail in handle_mm_fault(), so this
> is the minimal step which allows the per-VMA lock to be taken on
> file-backed VMAs.  There may be a small performance reduction as a
> little unnecessary work will be done on each page fault.  See later
> patches for the improvement.
>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  mm/memory.c | 9 ++++-----
>  1 file changed, 4 insertions(+), 5 deletions(-)
>
> diff --git a/mm/memory.c b/mm/memory.c
> index fdaec7772fff..f726f85f0081 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -5223,6 +5223,9 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address,
>                                             flags & FAULT_FLAG_REMOTE))
>                 return VM_FAULT_SIGSEGV;
>
> +       if ((flags & FAULT_FLAG_VMA_LOCK) && !vma_is_anonymous(vma))
> +               return VM_FAULT_RETRY;
> +

There are count_vm_event(PGFAULT) and count_memcg_event_mm(vma->vm_mm,
PGFAULT) earlier in this function. Returning here and retrying I think
will double-count this page fault. Returning before this accounting
should fix this issue.

>         /*
>          * Enable the memcg OOM handling for faults triggered in user
>          * space.  Kernel faults are handled more gracefully.
> @@ -5275,12 +5278,8 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm,
>         if (!vma)
>                 goto inval;
>
> -       /* Only anonymous vmas are supported for now */
> -       if (!vma_is_anonymous(vma))
> -               goto inval;
> -
>         /* find_mergeable_anon_vma uses adjacent vmas which are not locked */
> -       if (!vma->anon_vma)
> +       if (vma_is_anonymous(vma) && !vma->anon_vma)
>                 goto inval;
>
>         if (!vma_start_read(vma))
> --
> 2.39.2
>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 6/6] mm: Run the fault-around code under the VMA lock
  2023-04-04 15:23   ` Matthew Wilcox
@ 2023-04-07 18:20     ` Suren Baghdasaryan
  2023-04-10  4:53     ` Yin, Fengwei
  1 sibling, 0 replies; 19+ messages in thread
From: Suren Baghdasaryan @ 2023-04-07 18:20 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-mm, linux-fsdevel, Punit Agrawal

On Tue, Apr 4, 2023 at 8:23 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Tue, Apr 04, 2023 at 02:58:50PM +0100, Matthew Wilcox (Oracle) wrote:
> > The map_pages fs method should be safe to run under the VMA lock instead
> > of the mmap lock.  This should have a measurable reduction in contention
> > on the mmap lock.
>
> https://github.com/antonblanchard/will-it-scale/pull/37/files should
> be a good microbenchmark to report numbers from.  Obviously real-world
> benchmarks will be more compelling.

The series looks sane to me. I'll run some tests on a NUMA machine to
see if anything breaks. Thanks!


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/6] mm: Allow per-VMA locks on file-backed VMAs
  2023-04-07 17:54   ` Suren Baghdasaryan
@ 2023-04-07 20:12     ` Matthew Wilcox
  2023-04-07 20:26       ` Suren Baghdasaryan
  0 siblings, 1 reply; 19+ messages in thread
From: Matthew Wilcox @ 2023-04-07 20:12 UTC (permalink / raw)
  To: Suren Baghdasaryan; +Cc: linux-mm, linux-fsdevel, Punit Agrawal

On Fri, Apr 07, 2023 at 10:54:00AM -0700, Suren Baghdasaryan wrote:
> On Tue, Apr 4, 2023 at 6:59 AM Matthew Wilcox (Oracle)
> <willy@infradead.org> wrote:
> >
> > The fault path will immediately fail in handle_mm_fault(), so this
> > is the minimal step which allows the per-VMA lock to be taken on
> > file-backed VMAs.  There may be a small performance reduction as a
> > little unnecessary work will be done on each page fault.  See later
> > patches for the improvement.
> >
> > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> > ---
> >  mm/memory.c | 9 ++++-----
> >  1 file changed, 4 insertions(+), 5 deletions(-)
> >
> > diff --git a/mm/memory.c b/mm/memory.c
> > index fdaec7772fff..f726f85f0081 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -5223,6 +5223,9 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address,
> >                                             flags & FAULT_FLAG_REMOTE))
> >                 return VM_FAULT_SIGSEGV;
> >
> > +       if ((flags & FAULT_FLAG_VMA_LOCK) && !vma_is_anonymous(vma))
> > +               return VM_FAULT_RETRY;
> > +
> 
> There are count_vm_event(PGFAULT) and count_memcg_event_mm(vma->vm_mm,
> PGFAULT) earlier in this function. Returning here and retrying I think
> will double-count this page fault. Returning before this accounting
> should fix this issue.

You're right, but this will be an issue with later patches in the series
anyway as we move the check further and further down the call-chain.
For that matter, it's an issue in do_swap_page() right now, isn't it?
I suppose we don't care too much because it's the rare case where we go
into do_swap_page() and so the stats are "correct enough".


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/6] mm: Allow per-VMA locks on file-backed VMAs
  2023-04-07 20:12     ` Matthew Wilcox
@ 2023-04-07 20:26       ` Suren Baghdasaryan
  2023-04-07 21:36         ` Matthew Wilcox
  0 siblings, 1 reply; 19+ messages in thread
From: Suren Baghdasaryan @ 2023-04-07 20:26 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-mm, linux-fsdevel, Punit Agrawal

On Fri, Apr 7, 2023 at 1:12 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Fri, Apr 07, 2023 at 10:54:00AM -0700, Suren Baghdasaryan wrote:
> > On Tue, Apr 4, 2023 at 6:59 AM Matthew Wilcox (Oracle)
> > <willy@infradead.org> wrote:
> > >
> > > The fault path will immediately fail in handle_mm_fault(), so this
> > > is the minimal step which allows the per-VMA lock to be taken on
> > > file-backed VMAs.  There may be a small performance reduction as a
> > > little unnecessary work will be done on each page fault.  See later
> > > patches for the improvement.
> > >
> > > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> > > ---
> > >  mm/memory.c | 9 ++++-----
> > >  1 file changed, 4 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/mm/memory.c b/mm/memory.c
> > > index fdaec7772fff..f726f85f0081 100644
> > > --- a/mm/memory.c
> > > +++ b/mm/memory.c
> > > @@ -5223,6 +5223,9 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address,
> > >                                             flags & FAULT_FLAG_REMOTE))
> > >                 return VM_FAULT_SIGSEGV;
> > >
> > > +       if ((flags & FAULT_FLAG_VMA_LOCK) && !vma_is_anonymous(vma))
> > > +               return VM_FAULT_RETRY;
> > > +
> >
> > There are count_vm_event(PGFAULT) and count_memcg_event_mm(vma->vm_mm,
> > PGFAULT) earlier in this function. Returning here and retrying I think
> > will double-count this page fault. Returning before this accounting
> > should fix this issue.
>
> You're right, but this will be an issue with later patches in the series
> anyway as we move the check further and further down the call-chain.
> For that matter, it's an issue in do_swap_page() right now, isn't it?
> I suppose we don't care too much because it's the rare case where we go
> into do_swap_page() and so the stats are "correct enough".

True. do_swap_page() has the same issue. Can we move these
count_vm_event() calls to the end of handle_mm_fault():

vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address,
   unsigned int flags, struct pt_regs *regs)
{
       vm_fault_t ret;

       __set_current_state(TASK_RUNNING);

-       count_vm_event(PGFAULT);
-       count_memcg_event_mm(vma->vm_mm, PGFAULT);

       ret = sanitize_fault_flags(vma, &flags);
       if (ret)
-              return ret;
-              goto out;
       ...
       mm_account_fault(regs, address, flags, ret);
+out:
+       if (ret != VM_FAULT_RETRY) {
+              count_vm_event(PGFAULT);
+              count_memcg_event_mm(vma->vm_mm, PGFAULT);
+       }

       return ret;
}

?


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/6] mm: Allow per-VMA locks on file-backed VMAs
  2023-04-07 20:26       ` Suren Baghdasaryan
@ 2023-04-07 21:36         ` Matthew Wilcox
  2023-04-07 22:40           ` Suren Baghdasaryan
  2023-04-14 18:02           ` Suren Baghdasaryan
  0 siblings, 2 replies; 19+ messages in thread
From: Matthew Wilcox @ 2023-04-07 21:36 UTC (permalink / raw)
  To: Suren Baghdasaryan; +Cc: linux-mm, linux-fsdevel, Punit Agrawal

On Fri, Apr 07, 2023 at 01:26:08PM -0700, Suren Baghdasaryan wrote:
> True. do_swap_page() has the same issue. Can we move these
> count_vm_event() calls to the end of handle_mm_fault():

I was going to suggest something similar, so count that as an
Acked-by.  This will change the accounting for the current retry
situation, where we drop the mmap_lock in filemap_fault(), initiate I/O
and return RETRY.  I think that's probably a good change though; I don't
see why applications should have their fault counters incremented twice
for that situation.

>        mm_account_fault(regs, address, flags, ret);
> +out:
> +       if (ret != VM_FAULT_RETRY) {
> +              count_vm_event(PGFAULT);
> +              count_memcg_event_mm(vma->vm_mm, PGFAULT);
> +       }

Nit: this is a bitmask, so we should probably have:

	if (!(ret & VM_FAULT_RETRY)) {

in case somebody's ORed it with VM_FAULT_MAJOR or something.

Hm, I wonder if we're handling that correctly; if we need to start I/O,
do we preserve VM_FAULT_MAJOR returned from the first attempt?  Not
in a position where I can look at the code right now.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/6] mm: Allow per-VMA locks on file-backed VMAs
  2023-04-07 21:36         ` Matthew Wilcox
@ 2023-04-07 22:40           ` Suren Baghdasaryan
  2023-04-07 22:49             ` Suren Baghdasaryan
  2023-04-14 18:02           ` Suren Baghdasaryan
  1 sibling, 1 reply; 19+ messages in thread
From: Suren Baghdasaryan @ 2023-04-07 22:40 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-mm, linux-fsdevel, Punit Agrawal

On Fri, Apr 7, 2023 at 2:36 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Fri, Apr 07, 2023 at 01:26:08PM -0700, Suren Baghdasaryan wrote:
> > True. do_swap_page() has the same issue. Can we move these
> > count_vm_event() calls to the end of handle_mm_fault():
>
> I was going to suggest something similar, so count that as an
> Acked-by.  This will change the accounting for the current retry
> situation, where we drop the mmap_lock in filemap_fault(), initiate I/O
> and return RETRY.  I think that's probably a good change though; I don't
> see why applications should have their fault counters incremented twice
> for that situation.
>
> >        mm_account_fault(regs, address, flags, ret);
> > +out:
> > +       if (ret != VM_FAULT_RETRY) {
> > +              count_vm_event(PGFAULT);
> > +              count_memcg_event_mm(vma->vm_mm, PGFAULT);
> > +       }
>
> Nit: this is a bitmask, so we should probably have:
>
>         if (!(ret & VM_FAULT_RETRY)) {
>
> in case somebody's ORed it with VM_FAULT_MAJOR or something.
>
> Hm, I wonder if we're handling that correctly; if we need to start I/O,
> do we preserve VM_FAULT_MAJOR returned from the first attempt?  Not
> in a position where I can look at the code right now.

Interesting question. IIUC mm_account_fault() is the place where
VM_FAULT_MAJOR is used to perform minor/major fault accounting.
However it has an early exit:

        /*
         * We don't do accounting for some specific faults:
         *
         * - Unsuccessful faults (e.g. when the address wasn't valid).  That
         *   includes arch_vma_access_permitted() failing before reaching here.
         *   So this is not a "this many hardware page faults" counter.  We
         *   should use the hw profiling for that.
         *
         * - Incomplete faults (VM_FAULT_RETRY).  They will only be counted
         *   once they're completed.
         */
        if (ret & (VM_FAULT_ERROR | VM_FAULT_RETRY))
                return;

So, VM_FAULT_RETRY is ignored in the accounting path.
For the current do_swap_page (the only user returning VM_FAULT_RETRY
this late) it's fine because we did not really do anything. However
with your series and with my current attempts to drop the vma lock, do
swapin_readahead() and return VM_FAULT_RETRY the situation changes. We
need to register such (VM_FAULT_MAJOR | VM_FAULT_RETRY) case as a
major fault, otherwise after retry it will be accounted as only a
minor fault. Accounting the retry as a minor fault after the original
major fault is technically also double-counting but probably not as
big of a problem as missing the major fault accounting.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/6] mm: Allow per-VMA locks on file-backed VMAs
  2023-04-07 22:40           ` Suren Baghdasaryan
@ 2023-04-07 22:49             ` Suren Baghdasaryan
  0 siblings, 0 replies; 19+ messages in thread
From: Suren Baghdasaryan @ 2023-04-07 22:49 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-mm, linux-fsdevel, Punit Agrawal

On Fri, Apr 7, 2023 at 3:40 PM Suren Baghdasaryan <surenb@google.com> wrote:
>
> On Fri, Apr 7, 2023 at 2:36 PM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Fri, Apr 07, 2023 at 01:26:08PM -0700, Suren Baghdasaryan wrote:
> > > True. do_swap_page() has the same issue. Can we move these
> > > count_vm_event() calls to the end of handle_mm_fault():
> >
> > I was going to suggest something similar, so count that as an
> > Acked-by.  This will change the accounting for the current retry
> > situation, where we drop the mmap_lock in filemap_fault(), initiate I/O
> > and return RETRY.  I think that's probably a good change though; I don't
> > see why applications should have their fault counters incremented twice
> > for that situation.
> >
> > >        mm_account_fault(regs, address, flags, ret);
> > > +out:
> > > +       if (ret != VM_FAULT_RETRY) {
> > > +              count_vm_event(PGFAULT);
> > > +              count_memcg_event_mm(vma->vm_mm, PGFAULT);
> > > +       }
> >
> > Nit: this is a bitmask, so we should probably have:
> >
> >         if (!(ret & VM_FAULT_RETRY)) {
> >
> > in case somebody's ORed it with VM_FAULT_MAJOR or something.
> >
> > Hm, I wonder if we're handling that correctly; if we need to start I/O,
> > do we preserve VM_FAULT_MAJOR returned from the first attempt?  Not
> > in a position where I can look at the code right now.
>
> Interesting question. IIUC mm_account_fault() is the place where
> VM_FAULT_MAJOR is used to perform minor/major fault accounting.
> However it has an early exit:
>
>         /*
>          * We don't do accounting for some specific faults:
>          *
>          * - Unsuccessful faults (e.g. when the address wasn't valid).  That
>          *   includes arch_vma_access_permitted() failing before reaching here.
>          *   So this is not a "this many hardware page faults" counter.  We
>          *   should use the hw profiling for that.
>          *
>          * - Incomplete faults (VM_FAULT_RETRY).  They will only be counted
>          *   once they're completed.
>          */
>         if (ret & (VM_FAULT_ERROR | VM_FAULT_RETRY))
>                 return;
>
> So, VM_FAULT_RETRY is ignored in the accounting path.
> For the current do_swap_page (the only user returning VM_FAULT_RETRY
> this late) it's fine because we did not really do anything. However
> with your series and with my current attempts to drop the vma lock, do
> swapin_readahead() and return VM_FAULT_RETRY the situation changes. We
> need to register such (VM_FAULT_MAJOR | VM_FAULT_RETRY) case as a
> major fault, otherwise after retry it will be accounted as only a
> minor fault. Accounting the retry as a minor fault after the original
> major fault is technically also double-counting but probably not as
> big of a problem as missing the major fault accounting.

Actually, looks like in most cases where VM_FAULT_MAJOR is returned we
also do count_vm_event(PGMAJFAULT) and
count_memcg_event_mm(vma->vm_mm, PGMAJFAULT). mm_account_fault()
modifies only current->{maj|min}_flt and perf_sw_event().


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 6/6] mm: Run the fault-around code under the VMA lock
  2023-04-04 15:23   ` Matthew Wilcox
  2023-04-07 18:20     ` Suren Baghdasaryan
@ 2023-04-10  4:53     ` Yin, Fengwei
  2023-04-10 14:11       ` Matthew Wilcox
  1 sibling, 1 reply; 19+ messages in thread
From: Yin, Fengwei @ 2023-04-10  4:53 UTC (permalink / raw)
  To: willy, surenb; +Cc: linux-mm, linux-fsdevel, Agrawal, Punit

On Tue, 2023-04-04 at 16:23 +0100, Matthew Wilcox wrote:
> On Tue, Apr 04, 2023 at 02:58:50PM +0100, Matthew Wilcox (Oracle)
> wrote:
> > The map_pages fs method should be safe to run under the VMA lock
> > instead
> > of the mmap lock.  This should have a measurable reduction in
> > contention
> > on the mmap lock.
> 
> https://github.com/antonblanchard/will-it-scale/pull/37/files should
> be a good microbenchmark to report numbers from.  Obviously real-
> world
> benchmarks will be more compelling.
> 

Test result in my side with page_fault4 of will-it-scale in thread 
mode is:
  15274196 (without the patch) -> 17291444 (with the patch)

13.2% improvement on a Ice Lake with 48C/96T + 192G RAM + ext4 
filesystem.


The perf showed the mmap_lock contention reduced a lot:
(Removed the grandson functions of do_user_addr_fault()) 

latest linux-next with the patch:
    51.78%--do_user_addr_fault
            |          
            |--49.09%--handle_mm_fault
            |--1.19%--lock_vma_under_rcu
            --1.09%--down_read

latest linux-next without the patch:
    73.65%--do_user_addr_fault
            |          
            |--28.65%--handle_mm_fault
            |--17.22%--down_read_trylock
            |--10.92%--down_read
            |--9.20%--up_read
            --7.30%--find_vma

My understanding is down_read_trylock, down_read and up_read all are
related with mmap_lock. So the mmap_lock contention reduction is quite
obvious.

For functional testing, our 0day already cherry-picked the patchset and
the testing is ongoing. If any problem hit during testing, 0day will
report it out. Thanks.


Regards
Yin, Fengwei


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 6/6] mm: Run the fault-around code under the VMA lock
  2023-04-10  4:53     ` Yin, Fengwei
@ 2023-04-10 14:11       ` Matthew Wilcox
  2023-04-12  7:35         ` Yin, Fengwei
  0 siblings, 1 reply; 19+ messages in thread
From: Matthew Wilcox @ 2023-04-10 14:11 UTC (permalink / raw)
  To: Yin, Fengwei; +Cc: surenb, linux-mm, linux-fsdevel, Agrawal, Punit

On Mon, Apr 10, 2023 at 04:53:19AM +0000, Yin, Fengwei wrote:
> On Tue, 2023-04-04 at 16:23 +0100, Matthew Wilcox wrote:
> > On Tue, Apr 04, 2023 at 02:58:50PM +0100, Matthew Wilcox (Oracle)
> > wrote:
> > > The map_pages fs method should be safe to run under the VMA lock
> > > instead
> > > of the mmap lock.  This should have a measurable reduction in
> > > contention
> > > on the mmap lock.
> > 
> > https://github.com/antonblanchard/will-it-scale/pull/37/files should
> > be a good microbenchmark to report numbers from.  Obviously real-
> > world
> > benchmarks will be more compelling.
> > 
> 
> Test result in my side with page_fault4 of will-it-scale in thread 
> mode is:
>   15274196 (without the patch) -> 17291444 (with the patch)
> 
> 13.2% improvement on a Ice Lake with 48C/96T + 192G RAM + ext4 
> filesystem.

Thanks!  That is really good news.

> The perf showed the mmap_lock contention reduced a lot:
> (Removed the grandson functions of do_user_addr_fault()) 
> 
> latest linux-next with the patch:
>     51.78%--do_user_addr_fault
>             |          
>             |--49.09%--handle_mm_fault
>             |--1.19%--lock_vma_under_rcu
>             --1.09%--down_read
> 
> latest linux-next without the patch:
>     73.65%--do_user_addr_fault
>             |          
>             |--28.65%--handle_mm_fault
>             |--17.22%--down_read_trylock
>             |--10.92%--down_read
>             |--9.20%--up_read
>             --7.30%--find_vma
> 
> My understanding is down_read_trylock, down_read and up_read all are
> related with mmap_lock. So the mmap_lock contention reduction is quite
> obvious.

Absolutely.  I'm a little surprised that find_vma() basically disappeared
from the perf results.  I guess that it was cache cold after contending
on the mmap_lock rwsem.  But this is a very encouraging result.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 6/6] mm: Run the fault-around code under the VMA lock
  2023-04-10 14:11       ` Matthew Wilcox
@ 2023-04-12  7:35         ` Yin, Fengwei
  0 siblings, 0 replies; 19+ messages in thread
From: Yin, Fengwei @ 2023-04-12  7:35 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: surenb, linux-mm, linux-fsdevel, Agrawal, Punit



On 4/10/2023 10:11 PM, Matthew Wilcox wrote:
> On Mon, Apr 10, 2023 at 04:53:19AM +0000, Yin, Fengwei wrote:
>> On Tue, 2023-04-04 at 16:23 +0100, Matthew Wilcox wrote:
>>> On Tue, Apr 04, 2023 at 02:58:50PM +0100, Matthew Wilcox (Oracle)
>>> wrote:
>>>> The map_pages fs method should be safe to run under the VMA lock
>>>> instead
>>>> of the mmap lock.  This should have a measurable reduction in
>>>> contention
>>>> on the mmap lock.
>>>
>>> https://github.com/antonblanchard/will-it-scale/pull/37/files should
>>> be a good microbenchmark to report numbers from.  Obviously real-
>>> world
>>> benchmarks will be more compelling.
>>>
>>
>> Test result in my side with page_fault4 of will-it-scale in thread 
>> mode is:
>>   15274196 (without the patch) -> 17291444 (with the patch)
>>
>> 13.2% improvement on a Ice Lake with 48C/96T + 192G RAM + ext4 
>> filesystem.
> 
> Thanks!  That is really good news.
> 
>> The perf showed the mmap_lock contention reduced a lot:
>> (Removed the grandson functions of do_user_addr_fault()) 
>>
>> latest linux-next with the patch:
>>     51.78%--do_user_addr_fault
>>             |          
>>             |--49.09%--handle_mm_fault
>>             |--1.19%--lock_vma_under_rcu
>>             --1.09%--down_read
>>
>> latest linux-next without the patch:
>>     73.65%--do_user_addr_fault
>>             |          
>>             |--28.65%--handle_mm_fault
>>             |--17.22%--down_read_trylock
>>             |--10.92%--down_read
>>             |--9.20%--up_read
>>             --7.30%--find_vma
>>
>> My understanding is down_read_trylock, down_read and up_read all are
>> related with mmap_lock. So the mmap_lock contention reduction is quite
>> obvious.
> 
> Absolutely.  I'm a little surprised that find_vma() basically disappeared
> from the perf results.  I guess that it was cache cold after contending
> on the mmap_lock rwsem.  But this is a very encouraging result.

Yes. find_vma() was surprise. So I did more check about it.
1. re-run the testing for more 3 times in case I made stupid mistake.
   All testing show same behavior for find_vma().

2. perf report for find_vma() shows:
	6.26%--find_vma                                       
		|                                             
		--0.66%--mt_find                             
			|                                  
			--0.60%--mtree_range_walk

   The most time used in find_vma() is not mt_find. It's mmap_assert_locked(mm).

3. perf annotate of find_vma shows:
          │    ffffffffa91e9f20 <load0>:                                                                                                                         
     0.07 │      nop                                                                                                                                             
     0.07 │      sub  $0x10,%rsp                                                                                                                                 
     0.01 │      mov  %gs:0x28,%rax                                                                                                                              
     0.05 │      mov  %rax,0x8(%rsp)                                                                                                                             
     0.02 │      xor  %eax,%eax                                                                                                                                  
          │      mov  %rsi,(%rsp)                                                                                                                                
     0.00 │      mov  0x70(%rdi),%rax     --> This is rwsem_is_locked(&mm->mmap_lock)                                                                                                                      
    99.60 │      test %rax,%rax                                                                                                                                  
          │    ↓ je   4e                                                                                                                                         
     0.09 │      mov  $0xffffffffffffffff,%rdx                                                                                                                   
          │      mov  %rsp,%rsi

   I believe the line "mov  0x70(%rdi),%rax" should occupy 99.60% runtime of find_vma()
   instead of line "test %rax,%rax". And it's accessing the mmap_lock. With this series,
   the mmap_lock contention is reduced a lot with page_fault4 of will-it-scale. It
   makes mmap_lock access fast also.


Regards
Yin, Fengwei


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/6] mm: Allow per-VMA locks on file-backed VMAs
  2023-04-07 21:36         ` Matthew Wilcox
  2023-04-07 22:40           ` Suren Baghdasaryan
@ 2023-04-14 18:02           ` Suren Baghdasaryan
  1 sibling, 0 replies; 19+ messages in thread
From: Suren Baghdasaryan @ 2023-04-14 18:02 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-mm, linux-fsdevel, Punit Agrawal

On Fri, Apr 7, 2023 at 2:36 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Fri, Apr 07, 2023 at 01:26:08PM -0700, Suren Baghdasaryan wrote:
> > True. do_swap_page() has the same issue. Can we move these
> > count_vm_event() calls to the end of handle_mm_fault():
>
> I was going to suggest something similar, so count that as an
> Acked-by.  This will change the accounting for the current retry
> situation, where we drop the mmap_lock in filemap_fault(), initiate I/O
> and return RETRY.  I think that's probably a good change though; I don't
> see why applications should have their fault counters incremented twice
> for that situation.
>
> >        mm_account_fault(regs, address, flags, ret);
> > +out:
> > +       if (ret != VM_FAULT_RETRY) {
> > +              count_vm_event(PGFAULT);
> > +              count_memcg_event_mm(vma->vm_mm, PGFAULT);
> > +       }
>
> Nit: this is a bitmask, so we should probably have:
>
>         if (!(ret & VM_FAULT_RETRY)) {
>
> in case somebody's ORed it with VM_FAULT_MAJOR or something.

The fix is posted at
https://lore.kernel.org/all/20230414175444.1837474-1-surenb@google.com/

>
> Hm, I wonder if we're handling that correctly; if we need to start I/O,
> do we preserve VM_FAULT_MAJOR returned from the first attempt?  Not
> in a position where I can look at the code right now.


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2023-04-14 18:03 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-04 13:58 [PATCH 0/6] Avoid the mmap lock for fault-around Matthew Wilcox (Oracle)
2023-04-04 13:58 ` [PATCH 1/6] mm: Allow per-VMA locks on file-backed VMAs Matthew Wilcox (Oracle)
2023-04-07 17:54   ` Suren Baghdasaryan
2023-04-07 20:12     ` Matthew Wilcox
2023-04-07 20:26       ` Suren Baghdasaryan
2023-04-07 21:36         ` Matthew Wilcox
2023-04-07 22:40           ` Suren Baghdasaryan
2023-04-07 22:49             ` Suren Baghdasaryan
2023-04-14 18:02           ` Suren Baghdasaryan
2023-04-04 13:58 ` [PATCH 2/6] mm: Move FAULT_FLAG_VMA_LOCK check from handle_mm_fault() Matthew Wilcox (Oracle)
2023-04-04 13:58 ` [PATCH 3/6] mm: Move FAULT_FLAG_VMA_LOCK check into handle_pte_fault() Matthew Wilcox (Oracle)
2023-04-04 13:58 ` [PATCH 4/6] mm: Move FAULT_FLAG_VMA_LOCK check down in handle_pte_fault() Matthew Wilcox (Oracle)
2023-04-04 13:58 ` [PATCH 5/6] mm: Move the FAULT_FLAG_VMA_LOCK check down from do_pte_missing() Matthew Wilcox (Oracle)
2023-04-04 13:58 ` [PATCH 6/6] mm: Run the fault-around code under the VMA lock Matthew Wilcox (Oracle)
2023-04-04 15:23   ` Matthew Wilcox
2023-04-07 18:20     ` Suren Baghdasaryan
2023-04-10  4:53     ` Yin, Fengwei
2023-04-10 14:11       ` Matthew Wilcox
2023-04-12  7:35         ` Yin, Fengwei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).