linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect
@ 2020-12-19  4:30 Nadav Amit
  2020-12-19 19:15 ` Andrea Arcangeli
  0 siblings, 1 reply; 120+ messages in thread
From: Nadav Amit @ 2020-12-19  4:30 UTC (permalink / raw)
  To: linux-mm, Peter Xu
  Cc: linux-kernel, Nadav Amit, Andrea Arcangeli, Pavel Emelyanov,
	Mike Kravetz, Mike Rapoport, stable

From: Nadav Amit <namit@vmware.com>

Userfaultfd self-tests fail occasionally, indicating a memory
corruption.

Analyzing this problem indicates that there is a real bug since
mmap_lock is only taken for read in mwriteprotect_range(). This might
cause the TLBs to be in an inconsistent state due to the deferred
batched TLB flushes.

Consider the following scenario with 3 CPUs (cpu2 is not shown):

cpu0				cpu1
----				----
userfaultfd_writeprotect()
[ write-protecting ]
mwriteprotect_range()
 mmap_read_lock()
 change_protection()
  change_protection_range()
   ...
   change_pte_range()
   [ defer TLB flushes]
				userfaultfd_writeprotect()
				 mmap_read_lock()
				 change_protection()
				 [ write-unprotect ]
				 ...
				  [ unprotect PTE logically ]
				...
				[ page-fault]
				...
				wp_page_copy()
				[ set new writable page in PTE]

At this point no TLB flush took place. cpu2 (not shown) might have a
stale writable PTE, which was cached in the TLB before cpu0 called
userfaultfd_writeprotect(), and this PTE points to a different page.

Therefore, write-protecting of memory, even using userfaultfd, which
does not change the vma, requires to prevent any concurrent reader (#PF
handler) from reading PTEs from the page-tables if any CPU might still
hold in it TLB a PTE with higher permissions for the same address. To
do so mmap_lock needs to be taken for write.

Surprisingly, memory-unprotection using userfaultfd also poses a
problem. Although memory unprotection is logically a promotion of PTE
permissions, and therefore should not require a TLB flush, the current
code might actually cause a demotion of the permission, and therefore
requires a TLB flush.

During unprotection of userfaultfd managed memory region, the PTE is not
really made writable, but instead marked "logically" as writable, and
left for the #PF handler to be handled later. While this is ok, the code
currently also *removes* the write permission, and therefore makes it
necessary to flush the TLBs.

To resolve these problems, acquire mmap_lock for write when
write-protecting userfaultfd region using ioctl's. Keep taking mmap_lock
for read when unprotecting memory, but keep the write-bit set when
resolving userfaultfd write-protection.

Cc: Peter Xu <peterx@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: <stable@vger.kernel.org>
Fixes: 292924b26024 ("userfaultfd: wp: apply _PAGE_UFFD_WP bit")
Signed-off-by: Nadav Amit <namit@vmware.com>
---
 mm/mprotect.c    |  3 ++-
 mm/userfaultfd.c | 15 +++++++++++++--
 2 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/mm/mprotect.c b/mm/mprotect.c
index ab709023e9aa..c08c4055b051 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -75,7 +75,8 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
 		oldpte = *pte;
 		if (pte_present(oldpte)) {
 			pte_t ptent;
-			bool preserve_write = prot_numa && pte_write(oldpte);
+			bool preserve_write = (prot_numa || uffd_wp_resolve) &&
+					      pte_write(oldpte);
 
 			/*
 			 * Avoid trapping faults against the zero or KSM
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 9a3d451402d7..7423808640ef 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -652,7 +652,15 @@ int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start,
 	/* Does the address range wrap, or is the span zero-sized? */
 	BUG_ON(start + len <= start);
 
-	mmap_read_lock(dst_mm);
+	/*
+	 * Although we do not change the VMA, we have to ensure deferred TLB
+	 * flushes are performed before page-faults can be handled. Otherwise
+	 * we can get inconsistent TLB state.
+	 */
+	if (enable_wp)
+		mmap_write_lock(dst_mm);
+	else
+		mmap_read_lock(dst_mm);
 
 	/*
 	 * If memory mappings are changing because of non-cooperative
@@ -686,6 +694,9 @@ int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start,
 
 	err = 0;
 out_unlock:
-	mmap_read_unlock(dst_mm);
+	if (enable_wp)
+		mmap_write_unlock(dst_mm);
+	else
+		mmap_read_unlock(dst_mm);
 	return err;
 }
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 120+ messages in thread

end of thread, other threads:[~2021-01-18  2:50 UTC | newest]

Thread overview: 120+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-19  4:30 [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect Nadav Amit
2020-12-19 19:15 ` Andrea Arcangeli
     [not found]   ` <EDC00345-B46E-4396-8379-98E943723809@gmail.com>
2020-12-19 22:06     ` Nadav Amit
2020-12-20  2:20       ` Andrea Arcangeli
2020-12-21  4:36         ` Nadav Amit
2020-12-21  5:12           ` Yu Zhao
2020-12-21  5:25             ` Nadav Amit
2020-12-21  5:39               ` Nadav Amit
2020-12-21  7:29                 ` Yu Zhao
2020-12-22 20:34       ` Andy Lutomirski
2020-12-22 20:58         ` Nadav Amit
2020-12-22 21:34           ` Andrea Arcangeli
2020-12-20  2:01     ` Andy Lutomirski
2020-12-20  2:49       ` Andrea Arcangeli
2020-12-20  5:08         ` Andy Lutomirski
2020-12-21 18:03           ` Andrea Arcangeli
2020-12-21 18:22             ` Andy Lutomirski
2020-12-20  6:05     ` Yu Zhao
2020-12-20  8:06       ` Nadav Amit
2020-12-20  9:54         ` Yu Zhao
2020-12-21  3:33           ` Nadav Amit
2020-12-21  4:44             ` Yu Zhao
2020-12-21 17:27         ` Peter Xu
2020-12-21 18:31           ` Nadav Amit
2020-12-21 19:16             ` Yu Zhao
2020-12-21 19:55               ` Linus Torvalds
2020-12-21 20:21                 ` Yu Zhao
2020-12-21 20:25                   ` Linus Torvalds
2020-12-21 20:23                 ` Nadav Amit
2020-12-21 20:26                   ` Linus Torvalds
2020-12-21 21:24                     ` Yu Zhao
2020-12-21 21:49                       ` Nadav Amit
2020-12-21 22:30                         ` Peter Xu
2020-12-21 22:55                           ` Nadav Amit
2020-12-21 23:30                             ` Linus Torvalds
2020-12-21 23:46                               ` Nadav Amit
2020-12-22 19:44                             ` Andrea Arcangeli
2020-12-22 20:19                               ` Nadav Amit
2020-12-22 21:17                                 ` Andrea Arcangeli
2020-12-21 23:12                           ` Yu Zhao
2020-12-21 23:33                             ` Linus Torvalds
2020-12-22  0:00                               ` Yu Zhao
2020-12-22  0:11                                 ` Linus Torvalds
2020-12-22  0:24                                   ` Yu Zhao
2020-12-21 23:22                           ` Linus Torvalds
2020-12-22  3:19                             ` Andy Lutomirski
2020-12-22  4:16                               ` Linus Torvalds
2020-12-22 20:19                                 ` Andy Lutomirski
2021-01-05 15:37                                 ` Peter Zijlstra
2021-01-05 18:03                                   ` Andrea Arcangeli
2021-01-12 16:20                                     ` Peter Zijlstra
2021-01-12 11:43                                   ` Vinayak Menon
2021-01-12 15:47                                     ` Laurent Dufour
2021-01-12 16:57                                       ` Peter Zijlstra
2021-01-12 19:02                                         ` Laurent Dufour
2021-01-12 19:15                                           ` Nadav Amit
2021-01-12 19:56                                             ` Yu Zhao
2021-01-12 20:38                                               ` Nadav Amit
2021-01-12 20:49                                                 ` Yu Zhao
2021-01-12 21:43                                                 ` Will Deacon
2021-01-12 22:29                                                   ` Nadav Amit
2021-01-12 22:46                                                     ` Will Deacon
2021-01-13  0:31                                                     ` Andy Lutomirski
2021-01-17  4:41                                                   ` Yu Zhao
2021-01-17  7:32                                                     ` Nadav Amit
2021-01-17  9:16                                                       ` Yu Zhao
2021-01-17 10:13                                                         ` Nadav Amit
2021-01-17 19:25                                                           ` Yu Zhao
2021-01-18  2:49                                                             ` Nadav Amit
2020-12-22  9:38                               ` Nadav Amit
2020-12-22 19:31                               ` Andrea Arcangeli
2020-12-22 20:15                                 ` Matthew Wilcox
2020-12-22 20:26                                   ` Andrea Arcangeli
2020-12-22 21:14                                 ` Yu Zhao
2020-12-22 22:02                                   ` Andrea Arcangeli
2020-12-22 23:39                                     ` Yu Zhao
2020-12-22 23:50                                       ` Linus Torvalds
2020-12-23  0:01                                         ` Linus Torvalds
2020-12-23  0:23                                           ` Yu Zhao
2020-12-23  2:17                                             ` Andrea Arcangeli
2020-12-23  9:44                                           ` Linus Torvalds
2020-12-23 10:06                                             ` Yu Zhao
2020-12-23 16:24                                               ` Peter Xu
2020-12-23 18:51                                                 ` Andrea Arcangeli
2020-12-23 18:55                                                   ` Andrea Arcangeli
2020-12-23 19:12                                                 ` Yu Zhao
2020-12-23 19:32                                                   ` Peter Xu
2020-12-23  0:20                                         ` Linus Torvalds
2020-12-23  2:56                                       ` Andrea Arcangeli
2020-12-23  3:36                                         ` Yu Zhao
2020-12-23 15:52                                           ` Peter Xu
2020-12-23 21:07                                             ` Andrea Arcangeli
2020-12-23 21:39                                           ` Andrea Arcangeli
2020-12-23 22:29                                             ` Yu Zhao
2020-12-23 23:04                                               ` Andrea Arcangeli
2020-12-24  1:21                                               ` Andy Lutomirski
2020-12-24  2:00                                                 ` Andrea Arcangeli
2020-12-24  3:09                                                   ` Nadav Amit
2020-12-24  3:30                                                     ` Nadav Amit
2020-12-24  3:34                                                     ` Yu Zhao
2020-12-24  4:01                                                       ` Andrea Arcangeli
2020-12-24  5:18                                                         ` Nadav Amit
2020-12-24 18:49                                                           ` Andrea Arcangeli
2020-12-24 19:16                                                             ` Andrea Arcangeli
2020-12-24  4:37                                                       ` Nadav Amit
2020-12-24  3:31                                                   ` Andrea Arcangeli
2020-12-23 23:39                                             ` Linus Torvalds
2020-12-24  1:01                                               ` Andrea Arcangeli
2020-12-22 21:14                                 ` Nadav Amit
2020-12-22 12:40                       ` Nadav Amit
2020-12-22 18:30                         ` Yu Zhao
2020-12-22 19:20                           ` Nadav Amit
2020-12-23 16:23                             ` Will Deacon
2020-12-23 19:04                               ` Nadav Amit
2020-12-23 22:05                         ` Andrea Arcangeli
2020-12-23 22:45                           ` Nadav Amit
2020-12-23 23:55                             ` Andrea Arcangeli
2020-12-21 21:55                   ` Peter Xu
2020-12-21 23:13                     ` Linus Torvalds
2020-12-21 19:53             ` Peter Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).