[PATCH] mm: Don't fault around userfaultfd-registered regions on reads

* [PATCH] mm: Don't fault around userfaultfd-registered regions on reads
@ 2020-11-26 22:23 Peter Xu
  2020-11-27  8:16 ` Mike Rapoport
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Peter Xu @ 2020-11-26 22:23 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: peterx, Andrew Morton, Hugh Dickins, Andrea Arcangeli, Mike Rapoport

Faulting around for reads are in most cases helpful for the performance so that
continuous memory accesses may avoid another trip of page fault.  However it
may not always work as expected.

For example, userfaultfd registered regions may not be the best candidate for
pre-faults around the reads.

For missing mode uffds, fault around does not help because if the page cache
existed, then the page should be there already.  If the page cache is not
there, nothing else we can do, either.  If the fault-around code is destined to
be helpless for userfault-missing vmas, then ideally we can skip it.

For wr-protected mode uffds, errornously fault in those pages around could lead
to threads accessing the pages without uffd server's awareness.  For example,
when punching holes on uffd-wp registered shmem regions, we'll first try to
unmap all the pages before evicting the page cache but without locking the
page (please refer to shmem_fallocate(), where unmap_mapping_range() is called
before shmem_truncate_range()).  When fault-around happens near a hole being
punched, we might errornously fault in the "holes" right before it will be
punched.  Then there's a small window before the page cache was finally
dropped, and after the page will be writable again (NOTE: the uffd-wp protect
information is totally lost due to the pre-unmap in shmem_fallocate(), so the
page can be writable within the small window).  That's severe data loss.

Let's grant the userspace full control of the uffd-registered ranges, rather
than trying to do the tricks.

Cc: Hugh Dickins <hughd@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---

Note that since no file-backed uffd-wp support is there yet upstream, so the
uffd-wp check is actually not really functioning.  However since we have all
the necessary uffd-wp concepts already upstream, maybe it's better to do it
once and for all.

This patch comes from debugging a data loss issue when working on the uffd-wp
support on shmem/hugetlbfs.  I posted this out for early review and comments,
but also because it should already start to benefit missing mode userfaultfd to
avoid trying to fault around on reads.
---
 include/linux/userfaultfd_k.h |  5 +++++
 mm/memory.c                   | 17 +++++++++++++++++
 2 files changed, 22 insertions(+)

diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h
index a8e5f3ea9bb2..451d99bb3a1a 100644
--- a/include/linux/userfaultfd_k.h
+++ b/include/linux/userfaultfd_k.h
@@ -62,6 +62,11 @@ static inline bool userfaultfd_wp(struct vm_area_struct *vma)
 	return vma->vm_flags & VM_UFFD_WP;
 }
 
+static inline bool vma_registered_userfaultfd(struct vm_area_struct *vma)
+{
+	return userfaultfd_missing(vma) || userfaultfd_wp(vma);
+}
+
 static inline bool userfaultfd_pte_wp(struct vm_area_struct *vma,
 				      pte_t pte)
 {
diff --git a/mm/memory.c b/mm/memory.c
index eeae590e526a..ca58ada94c96 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3933,6 +3933,23 @@ static vm_fault_t do_fault_around(struct vm_fault *vmf)
 	int off;
 	vm_fault_t ret = 0;
 
+	/*
+	 * Be extremely careful with uffd-armed regions.
+	 *
+	 * For missing mode uffds, fault around does not help because if the
+	 * page cache existed, then the page should be there already.  If the
+	 * page cache is not there, nothing else we can do either.
+	 *
+	 * For wr-protected mode uffds, errornously fault in those pages around
+	 * could lead to threads accessing the pages without uffd server's
+	 * awareness, finally it could cause ghostly data corruption.
+	 *
+	 * The idea is that, every single page of uffd regions should be
+	 * governed by the userspace on which page to fault in.
+	 */
+	if (unlikely(vma_registered_userfaultfd(vmf->vma)))
+		return 0;
+
 	nr_pages = READ_ONCE(fault_around_bytes) >> PAGE_SHIFT;
 	mask = ~(nr_pages * PAGE_SIZE - 1) & PAGE_MASK;
 
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 12+ messages in thread