linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm: Always sanity check anon_vma first for per-vma locks
@ 2024-04-10 17:06 Peter Xu
  2024-04-10 20:26 ` Matthew Wilcox
  2024-04-11 17:13 ` Liam R. Howlett
  0 siblings, 2 replies; 39+ messages in thread
From: Peter Xu @ 2024-04-10 17:06 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: peterx, Andrew Morton, Matthew Wilcox, Suren Baghdasaryan,
	Lokesh Gidra, Liam R . Howlett, Alistair Popple

anon_vma is a tricky object in the context of per-vma lock, because it's
racy to modify it in that context and mmap lock is needed if it's not
stable yet.

So far there are three places that sanity checks anon_vma for that:

  - lock_vma_under_rcu(): this is the major entrance of per-vma lock, where
    we have taken care of anon memory v.s. potential anon_vma allocations.

  - lock_vma(): even if it looks so generic as an API, it's only used in
    userfaultfd context to leverage per-vma locks.  It does extra check
    over MAP_PRIVATE file mappings for the same anon_vma issue.

  - vmf_anon_prepare(): it works for private file mapping faults just like
    what lock_vma() wanted to cover above.  One trivial difference is in
    some extremely corner case, the fault handler will still allow per-vma
    fault to happen, like a READ on a privately mapped file.

The question is whether that's intended to make it as complicated.  Per my
question in the thread, it is not intended, and Suren also seems to agree [1].

So the trivial side effect of such patch is:

  - We may do slightly better on the first WRITE of a private file mapping,
  because we can retry earlier (in lock_vma_under_rcu(), rather than
  vmf_anon_prepare() later).

  - We may always use mmap lock for the initial READs on a private file
  mappings, while before this patch it _can_ (only when no WRITE ever
  happened... but it doesn't make much sense for a MAP_PRIVATE..) do the
  read fault with per-vma lock.

Then noted that right after any WRITE the anon_vma will be stablized, then
there will be no difference.  And I believe that should be the majority
cases too; I also did try to run a program, writting to MAP_PRIVATE file
memory (that I pre-headed in the page cache) and I can hardly measure a
difference in performance.

Let's simply ignore all those trivial corner cases and unify the anon_vma
check from three places into one.  I also didn't check the rest users of
lock_vma_under_rcu(), where in a !fault context it could even fix something
that used to race with private file mappings but I didn't check further.

I still left a WARN_ON_ONCE() in vmf_anon_prepare() to double check we're
all good.

[1] https://lore.kernel.org/r/CAJuCfpGj5xk-NxSwW6Mt8NGZcV9N__8zVPMGXDPAYKMcN9=Oig@mail.gmail.com

Cc: Matthew Wilcox <willy@infradead.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Alistair Popple <apopple@nvidia.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 mm/memory.c      | 10 ++++------
 mm/userfaultfd.c | 13 ++-----------
 2 files changed, 6 insertions(+), 17 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 78422d1c7381..4e2a9c4d9776 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3219,10 +3219,8 @@ vm_fault_t vmf_anon_prepare(struct vm_fault *vmf)
 
 	if (likely(vma->anon_vma))
 		return 0;
-	if (vmf->flags & FAULT_FLAG_VMA_LOCK) {
-		vma_end_read(vma);
-		return VM_FAULT_RETRY;
-	}
+	/* We shouldn't try a per-vma fault at all if anon_vma isn't solid */
+	WARN_ON_ONCE(vmf->flags & FAULT_FLAG_VMA_LOCK);
 	if (__anon_vma_prepare(vma))
 		return VM_FAULT_OOM;
 	return 0;
@@ -5826,9 +5824,9 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm,
 	 * find_mergeable_anon_vma uses adjacent vmas which are not locked.
 	 * This check must happen after vma_start_read(); otherwise, a
 	 * concurrent mremap() with MREMAP_DONTUNMAP could dissociate the VMA
-	 * from its anon_vma.
+	 * from its anon_vma.  This applies to both anon or private file maps.
 	 */
-	if (unlikely(vma_is_anonymous(vma) && !vma->anon_vma))
+	if (unlikely(!(vma->vm_flags & VM_SHARED) && !vma->anon_vma))
 		goto inval_end_read;
 
 	/* Check since vm_start/vm_end might change before we lock the VMA */
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index f6267afe65d1..61f21da77dcd 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -72,17 +72,8 @@ static struct vm_area_struct *lock_vma(struct mm_struct *mm,
 	struct vm_area_struct *vma;
 
 	vma = lock_vma_under_rcu(mm, address);
-	if (vma) {
-		/*
-		 * lock_vma_under_rcu() only checks anon_vma for private
-		 * anonymous mappings. But we need to ensure it is assigned in
-		 * private file-backed vmas as well.
-		 */
-		if (!(vma->vm_flags & VM_SHARED) && unlikely(!vma->anon_vma))
-			vma_end_read(vma);
-		else
-			return vma;
-	}
+	if (vma)
+		return vma;
 
 	mmap_read_lock(mm);
 	vma = find_vma_and_prepare_anon(mm, address);
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2024-04-26 15:50 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-10 17:06 [PATCH] mm: Always sanity check anon_vma first for per-vma locks Peter Xu
2024-04-10 20:26 ` Matthew Wilcox
     [not found]   ` <Zhb6B8UsidEEbFu3@x1n>
2024-04-10 21:10     ` Matthew Wilcox
2024-04-10 21:23       ` Peter Xu
2024-04-10 23:59         ` Matthew Wilcox
2024-04-11  0:20           ` Peter Xu
2024-04-11 14:50             ` Matthew Wilcox
2024-04-11 15:34               ` Peter Xu
2024-04-11 17:14                 ` Matthew Wilcox
2024-04-11 15:42   ` Suren Baghdasaryan
2024-04-11 17:13 ` Liam R. Howlett
     [not found]   ` <ZhhSItiyLYBEdAX3@x1n>
2024-04-11 21:27     ` Matthew Wilcox
2024-04-11 21:46       ` Peter Xu
2024-04-11 22:02         ` Matthew Wilcox
2024-04-12  3:14           ` Matthew Wilcox
2024-04-12 12:38             ` Peter Xu
2024-04-12 13:06               ` Suren Baghdasaryan
2024-04-12 14:16                 ` Matthew Wilcox
2024-04-12 14:53                   ` Suren Baghdasaryan
2024-04-12 15:19                     ` Matthew Wilcox
2024-04-12 15:31                       ` Matthew Wilcox
2024-04-13 21:46                         ` Suren Baghdasaryan
2024-04-13 22:52                           ` Matthew Wilcox
2024-04-13 23:11                             ` Suren Baghdasaryan
2024-04-13 21:41                       ` Suren Baghdasaryan
2024-04-13 22:46                         ` Matthew Wilcox
2024-04-15 15:58                       ` Suren Baghdasaryan
2024-04-15 16:13                         ` Matthew Wilcox
2024-04-15 16:19                           ` Suren Baghdasaryan
2024-04-15 16:26                             ` Matthew Wilcox
2024-04-12 12:46             ` Suren Baghdasaryan
2024-04-12 13:32               ` Matthew Wilcox
2024-04-12 13:46                 ` Suren Baghdasaryan
2024-04-26 14:00             ` Matthew Wilcox
2024-04-26 15:07               ` Suren Baghdasaryan
2024-04-26 15:28                 ` Matthew Wilcox
2024-04-26 15:32                   ` Suren Baghdasaryan
2024-04-26 15:50                     ` Matthew Wilcox
2024-04-26 15:32                 ` Liam R. Howlett

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).