linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: aarcange@redhat.com, akpm@linux-foundation.org,
	linux-mm@kvack.org, luto@kernel.org, mike.kravetz@oracle.com,
	minchan@kernel.org, mm-commits@vger.kernel.org, namit@vmware.com,
	peterx@redhat.com, peterz@infradead.org, rppt@linux.vnet.ibm.com,
	stable@vger.kernel.org, torvalds@linux-foundation.org,
	will@kernel.org, xemul@openvz.org, yuzhao@google.com
Subject: [patch 22/29] mm/userfaultfd: fix memory corruption due to writeprotect
Date: Fri, 12 Mar 2021 21:08:17 -0800	[thread overview]
Message-ID: <20210313050817.0WOtpAOpA%akpm@linux-foundation.org> (raw)
In-Reply-To: <20210312210632.9b7d62973d72a56fb13c7a03@linux-foundation.org>

From: Nadav Amit <namit@vmware.com>
Subject: mm/userfaultfd: fix memory corruption due to writeprotect

Userfaultfd self-test fails occasionally, indicating a memory corruption.

Analyzing this problem indicates that there is a real bug since mmap_lock
is only taken for read in mwriteprotect_range() and defers flushes, and
since there is insufficient consideration of concurrent deferred TLB
flushes in wp_page_copy().  Although the PTE is flushed from the TLBs in
wp_page_copy(), this flush takes place after the copy has already been
performed, and therefore changes of the page are possible between the time
of the copy and the time in which the PTE is flushed.

To make matters worse, memory-unprotection using userfaultfd also poses a
problem.  Although memory unprotection is logically a promotion of PTE
permissions, and therefore should not require a TLB flush, the current
userrfaultfd code might actually cause a demotion of the architectural PTE
permission: when userfaultfd_writeprotect() unprotects memory region, it
unintentionally *clears* the RW-bit if it was already set.  Note that this
unprotecting a PTE that is not write-protected is a valid use-case: the
userfaultfd monitor might ask to unprotect a region that holds both
write-protected and write-unprotected PTEs.

The scenario that happens in selftests/vm/userfaultfd is as follows:

cpu0				cpu1			cpu2
----				----			----
							[ Writable PTE
							  cached in TLB ]
userfaultfd_writeprotect()
[ write-*unprotect* ]
mwriteprotect_range()
mmap_read_lock()
change_protection()

change_protection_range()
...
change_pte_range()
[ *clear* “write”-bit ]
[ defer TLB flushes ]
				[ page-fault ]
				...
				wp_page_copy()
				 cow_user_page()
				  [ copy page ]
							[ write to old
							  page ]
				...
				 set_pte_at_notify()

A similar scenario can happen:

cpu0		cpu1		cpu2		cpu3
----		----		----		----
						[ Writable PTE
				  		  cached in TLB ]
userfaultfd_writeprotect()
[ write-protect ]
[ deferred TLB flush ]
		userfaultfd_writeprotect()
		[ write-unprotect ]
		[ deferred TLB flush]
				[ page-fault ]
				wp_page_copy()
				 cow_user_page()
				 [ copy page ]
				 ...		[ write to page ]
				set_pte_at_notify()

This race exists since commit 292924b26024 ("userfaultfd: wp: apply
_PAGE_UFFD_WP bit").  Yet, as Yu Zhao pointed, these races became apparent
since commit 09854ba94c6a ("mm: do_wp_page() simplification") which made
wp_page_copy() more likely to take place, specifically if page_count(page)
> 1.

To resolve the aforementioned races, check whether there are pending
flushes on uffd-write-protected VMAs, and if there are, perform a flush
before doing the COW.

Further optimizations will follow to avoid during uffd-write-unprotect
unnecassary PTE write-protection and TLB flushes.

Link: https://lkml.kernel.org/r/20210304095423.3825684-1-namit@vmware.com
Fixes: 09854ba94c6a ("mm: do_wp_page() simplification")
Signed-off-by: Nadav Amit <namit@vmware.com>
Suggested-by: Yu Zhao <yuzhao@google.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Tested-by: Peter Xu <peterx@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: <stable@vger.kernel.org>	[5.9+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/memory.c |    8 ++++++++
 1 file changed, 8 insertions(+)

--- a/mm/memory.c~mm-userfaultfd-fix-memory-corruption-due-to-writeprotect
+++ a/mm/memory.c
@@ -3097,6 +3097,14 @@ static vm_fault_t do_wp_page(struct vm_f
 		return handle_userfault(vmf, VM_UFFD_WP);
 	}
 
+	/*
+	 * Userfaultfd write-protect can defer flushes. Ensure the TLB
+	 * is flushed in this case before copying.
+	 */
+	if (unlikely(userfaultfd_wp(vmf->vma) &&
+		     mm_tlb_flush_pending(vmf->vma->vm_mm)))
+		flush_tlb_page(vmf->vma, vmf->address);
+
 	vmf->page = vm_normal_page(vma, vmf->address, vmf->orig_pte);
 	if (!vmf->page) {
 		/*
_


  parent reply	other threads:[~2021-03-13  5:08 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-13  5:06 incoming Andrew Morton
2021-03-13  5:07 ` [patch 01/29] memblock: fix section mismatch warning Andrew Morton
2021-03-13  5:07 ` [patch 02/29] stop_machine: mark helpers __always_inline Andrew Morton
2021-03-13  5:07 ` [patch 03/29] init/Kconfig: make COMPILE_TEST depend on HAS_IOMEM Andrew Morton
2021-03-13  5:07 ` [patch 04/29] mm/page_alloc.c: refactor initialization of struct page for holes in memory layout Andrew Morton
2021-03-13  5:07 ` [patch 05/29] mm/fork: clear PASID for new mm Andrew Morton
2021-03-13  5:07 ` [patch 06/29] hugetlb: dedup the code to add a new file_region Andrew Morton
2021-03-13  5:07 ` [patch 07/29] hugetlb: break earlier in add_reservation_in_range() when we can Andrew Morton
2021-03-13  5:07 ` [patch 08/29] mm: introduce page_needs_cow_for_dma() for deciding whether cow Andrew Morton
2021-03-13  5:07 ` [patch 09/29] mm: use is_cow_mapping() across tree where proper Andrew Morton
2021-03-13  5:07 ` [patch 10/29] hugetlb: do early cow when page pinned on src mm Andrew Morton
2021-03-13  5:07 ` [patch 11/29] mm/highmem.c: fix zero_user_segments() with start > end Andrew Morton
2021-03-13  5:07 ` [patch 12/29] binfmt_misc: fix possible deadlock in bm_register_write Andrew Morton
2021-03-13  5:07 ` [patch 13/29] MAINTAINERS: exclude uapi directories in API/ABI section Andrew Morton
2021-03-13  5:07 ` [patch 14/29] linux/compiler-clang.h: define HAVE_BUILTIN_BSWAP* Andrew Morton
2021-03-13  5:07 ` [patch 15/29] kfence: fix printk format for ptrdiff_t Andrew Morton
2021-03-13  5:07 ` [patch 16/29] kfence, slab: fix cache_alloc_debugcheck_after() for bulk allocations Andrew Morton
2021-03-13  5:08 ` [patch 17/29] kfence: fix reports if constant function prefixes exist Andrew Morton
2021-03-13  5:08 ` [patch 18/29] include/linux/sched/mm.h: use rcu_dereference in in_vfork() Andrew Morton
2021-03-13  5:08 ` [patch 19/29] mm/madvise: replace ptrace attach requirement for process_madvise Andrew Morton
2021-03-13  5:08 ` [patch 20/29] kasan, mm: fix crash with HW_TAGS and DEBUG_PAGEALLOC Andrew Morton
2021-03-13  5:08 ` [patch 21/29] kasan: fix KASAN_STACK dependency for HW_TAGS Andrew Morton
2021-03-13  5:08 ` Andrew Morton [this message]
2021-03-13  5:08 ` [patch 23/29] mm, hwpoison: do not lock page again when me_huge_page() successfully recovers Andrew Morton
2021-03-13 19:23   ` Linus Torvalds
2021-03-14  6:36     ` HORIGUCHI NAOYA(堀口 直也)
2021-03-13  5:08 ` [patch 24/29] ia64: fix ia64_syscall_get_set_arguments() for break-based syscalls Andrew Morton
2021-03-13  5:08 ` [patch 25/29] ia64: fix ptrace(PTRACE_SYSCALL_INFO_EXIT) sign Andrew Morton
2021-03-13  5:08 ` [patch 26/29] mm/memcg: rename mem_cgroup_split_huge_fixup to split_page_memcg and add nr_pages argument Andrew Morton
2021-03-13  5:08 ` [patch 27/29] mm/memcg: set memcg when splitting page Andrew Morton
2021-03-13  5:08 ` [patch 28/29] zram: fix return value on writeback_store Andrew Morton
2021-03-13  5:08 ` [patch 29/29] zram: fix broken page writeback Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210313050817.0WOtpAOpA%akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=aarcange@redhat.com \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mike.kravetz@oracle.com \
    --cc=minchan@kernel.org \
    --cc=mm-commits@vger.kernel.org \
    --cc=namit@vmware.com \
    --cc=peterx@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rppt@linux.vnet.ibm.com \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=will@kernel.org \
    --cc=xemul@openvz.org \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).