All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: aarcange@redhat.com, akpm@linux-foundation.org,
	egorenar@linux.ibm.com, gerald.schaefer@linux.ibm.com,
	hca@linux.ibm.com, linux-mm@kvack.org,
	mm-commits@vger.kernel.org, stable@vger.kernel.org,
	torvalds@linux-foundation.org
Subject: [patch 6/8] mm/userfaultfd: do not access vma->vm_mm after calling handle_userfault()
Date: Sat, 21 Nov 2020 22:17:15 -0800	[thread overview]
Message-ID: <20201122061715.eoZ-MxK3C%akpm@linux-foundation.org> (raw)
In-Reply-To: <20201121221631.948ae4655e913a319d61700a@linux-foundation.org>

From: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Subject: mm/userfaultfd: do not access vma->vm_mm after calling handle_userfault()

Alexander reported a syzkaller / KASAN finding on s390, see below for
complete output.

In do_huge_pmd_anonymous_page(), the pre-allocated pagetable will be freed
in some cases.  In the case of userfaultfd_missing(), this will happen
after calling handle_userfault(), which might have released the mmap_lock.
Therefore, the following pte_free(vma->vm_mm, pgtable) will access an
unstable vma->vm_mm, which could have been freed or re-used already.

For all architectures other than s390 this will go w/o any negative
impact, because pte_free() simply frees the page and ignores the passed-in
mm.  The implementation for SPARC32 would also access mm->page_table_lock
for pte_free(), but there is no THP support in SPARC32, so the buggy code
path will not be used there.

For s390, the mm->context.pgtable_list is being used to maintain the 2K
pagetable fragments, and operating on an already freed or even re-used mm
could result in various more or less subtle bugs due to list / pagetable
corruption.

Fix this by calling pte_free() before handle_userfault(), similar to how
it is already done in __do_huge_pmd_anonymous_page() for the WRITE /
non-huge_zero_page case.

Commit 6b251fc96cf2c ("userfaultfd: call handle_userfault() for
userfaultfd_missing() faults") actually introduced both, the
do_huge_pmd_anonymous_page() and also __do_huge_pmd_anonymous_page()
changes wrt to calling handle_userfault(), but only in the latter case it
put the pte_free() before calling handle_userfault().

==================================================================
BUG: KASAN: use-after-free in do_huge_pmd_anonymous_page+0xcda/0xd90 mm/huge_memory.c:744
Read of size 8 at addr 00000000962d6988 by task syz-executor.0/9334

CPU: 1 PID: 9334 Comm: syz-executor.0 Not tainted 5.10.0-rc1-syzkaller-07083-g4c9720875573 #0
Hardware name: IBM 3906 M04 701 (KVM/Linux)
Call Trace:
 [<00000000aa0a7a1c>] unwind_start arch/s390/include/asm/unwind.h:65 [inline]
 [<00000000aa0a7a1c>] show_stack+0x174/0x220 arch/s390/kernel/dumpstack.c:135
 [<00000000aa105952>] __dump_stack lib/dump_stack.c:77 [inline]
 [<00000000aa105952>] dump_stack+0x262/0x2e8 lib/dump_stack.c:118
 [<00000000aa0b484e>] print_address_description.constprop.0+0x5e/0x218 mm/kasan/report.c:385
 [<00000000a61f13aa>] __kasan_report mm/kasan/report.c:545 [inline]
 [<00000000a61f13aa>] kasan_report+0x11a/0x168 mm/kasan/report.c:562
 [<00000000a620d782>] do_huge_pmd_anonymous_page+0xcda/0xd90 mm/huge_memory.c:744
 [<00000000a610632e>] create_huge_pmd mm/memory.c:4256 [inline]
 [<00000000a610632e>] __handle_mm_fault+0xe6e/0x1068 mm/memory.c:4480
 [<00000000a61067b0>] handle_mm_fault+0x288/0x748 mm/memory.c:4607
 [<00000000a598b55c>] do_exception+0x394/0xae0 arch/s390/mm/fault.c:479
 [<00000000a598d7c4>] do_dat_exception+0x34/0x80 arch/s390/mm/fault.c:567
 [<00000000aa124e5e>] pgm_check_handler+0x1da/0x22c arch/s390/kernel/entry.S:706
 [<00000000aa0a6902>] copy_from_user_mvcos arch/s390/lib/uaccess.c:111 [inline]
 [<00000000aa0a6902>] raw_copy_from_user+0x3a/0x88 arch/s390/lib/uaccess.c:174
 [<00000000a7c24668>] _copy_from_user+0x48/0xa8 lib/usercopy.c:16
 [<00000000a5b0b2a8>] copy_from_user include/linux/uaccess.h:192 [inline]
 [<00000000a5b0b2a8>] __do_sys_sigaltstack kernel/signal.c:4064 [inline]
 [<00000000a5b0b2a8>] __s390x_sys_sigaltstack+0xc8/0x240 kernel/signal.c:4060
 [<00000000aa124a9c>] system_call+0xe0/0x28c arch/s390/kernel/entry.S:415

Allocated by task 9334:
 stack_trace_save+0xbe/0xf0 kernel/stacktrace.c:121
 kasan_save_stack+0x30/0x60 mm/kasan/common.c:48
 kasan_set_track mm/kasan/common.c:56 [inline]
 __kasan_kmalloc.constprop.0+0xd0/0xe8 mm/kasan/common.c:461
 slab_post_alloc_hook mm/slab.h:526 [inline]
 slab_alloc_node mm/slub.c:2891 [inline]
 slab_alloc mm/slub.c:2899 [inline]
 kmem_cache_alloc+0x118/0x348 mm/slub.c:2904
 vm_area_dup+0x9c/0x2b8 kernel/fork.c:356
 __split_vma+0xba/0x560 mm/mmap.c:2742
 split_vma+0xca/0x108 mm/mmap.c:2800
 mlock_fixup+0x4ae/0x600 mm/mlock.c:550
 apply_vma_lock_flags+0x2c6/0x398 mm/mlock.c:619
 do_mlock+0x1aa/0x718 mm/mlock.c:711
 __do_sys_mlock2 mm/mlock.c:738 [inline]
 __s390x_sys_mlock2+0x86/0xa8 mm/mlock.c:728
 system_call+0xe0/0x28c arch/s390/kernel/entry.S:415

Freed by task 9333:
 stack_trace_save+0xbe/0xf0 kernel/stacktrace.c:121
 kasan_save_stack+0x30/0x60 mm/kasan/common.c:48
 kasan_set_track+0x32/0x48 mm/kasan/common.c:56
 kasan_set_free_info+0x34/0x50 mm/kasan/generic.c:355
 __kasan_slab_free+0x11e/0x190 mm/kasan/common.c:422
 slab_free_hook mm/slub.c:1544 [inline]
 slab_free_freelist_hook mm/slub.c:1577 [inline]
 slab_free mm/slub.c:3142 [inline]
 kmem_cache_free+0x7c/0x4b8 mm/slub.c:3158
 __vma_adjust+0x7b2/0x2508 mm/mmap.c:960
 vma_merge+0x87e/0xce0 mm/mmap.c:1209
 userfaultfd_release+0x412/0x6b8 fs/userfaultfd.c:868
 __fput+0x22c/0x7a8 fs/file_table.c:281
 task_work_run+0x200/0x320 kernel/task_work.c:151
 tracehook_notify_resume include/linux/tracehook.h:188 [inline]
 do_notify_resume+0x100/0x148 arch/s390/kernel/signal.c:538
 system_call+0xe6/0x28c arch/s390/kernel/entry.S:416

The buggy address belongs to the object at 00000000962d6948
 which belongs to the cache vm_area_struct of size 200
The buggy address is located 64 bytes inside of
 200-byte region [00000000962d6948, 00000000962d6a10)
The buggy address belongs to the page:
page:00000000313a09fe refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x962d6
flags: 0x3ffff00000000200(slab)
raw: 3ffff00000000200 000040000257e080 0000000c0000000c 000000008020ba00
raw: 0000000000000000 000f001e00000000 ffffffff00000001 0000000096959501
page dumped because: kasan: bad access detected
page->mem_cgroup:0000000096959501

Memory state around the buggy address:
 00000000962d6880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 00000000962d6900: 00 fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb
>00000000962d6980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                      ^
 00000000962d6a00: fb fb fc fc fc fc fc fc fc fc 00 00 00 00 00 00
 00000000962d6a80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
==================================================================

Link: https://lkml.kernel.org/r/20201110190329.11920-1-gerald.schaefer@linux.ibm.com
Fixes: 6b251fc96cf2c ("userfaultfd: call handle_userfault() for userfaultfd_missing() faults")
Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Reported-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: <stable@vger.kernel.org>	[4.3+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/huge_memory.c |    9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

--- a/mm/huge_memory.c~mm-userfaultfd-do-not-access-vma-vm_mm-after-calling-handle_userfault
+++ a/mm/huge_memory.c
@@ -710,7 +710,6 @@ vm_fault_t do_huge_pmd_anonymous_page(st
 			transparent_hugepage_use_zero_page()) {
 		pgtable_t pgtable;
 		struct page *zero_page;
-		bool set;
 		vm_fault_t ret;
 		pgtable = pte_alloc_one(vma->vm_mm);
 		if (unlikely(!pgtable))
@@ -723,25 +722,25 @@ vm_fault_t do_huge_pmd_anonymous_page(st
 		}
 		vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd);
 		ret = 0;
-		set = false;
 		if (pmd_none(*vmf->pmd)) {
 			ret = check_stable_address_space(vma->vm_mm);
 			if (ret) {
 				spin_unlock(vmf->ptl);
+				pte_free(vma->vm_mm, pgtable);
 			} else if (userfaultfd_missing(vma)) {
 				spin_unlock(vmf->ptl);
+				pte_free(vma->vm_mm, pgtable);
 				ret = handle_userfault(vmf, VM_UFFD_MISSING);
 				VM_BUG_ON(ret & VM_FAULT_FALLBACK);
 			} else {
 				set_huge_zero_page(pgtable, vma->vm_mm, vma,
 						   haddr, vmf->pmd, zero_page);
 				spin_unlock(vmf->ptl);
-				set = true;
 			}
-		} else
+		} else {
 			spin_unlock(vmf->ptl);
-		if (!set)
 			pte_free(vma->vm_mm, pgtable);
+		}
 		return ret;
 	}
 	gfp = alloc_hugepage_direct_gfpmask(vma);
_

  parent reply	other threads:[~2020-11-22  6:17 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-22  6:16 incoming Andrew Morton
2020-11-22  6:16 ` [patch 1/8] mm/madvise: fix memory leak from process_madvise Andrew Morton
2020-11-22  6:17 ` [patch 2/8] compiler-clang: remove version check for BPF Tracing Andrew Morton
2020-11-22  6:17 ` [patch 3/8] mm: fix phys_to_target_node() and memory_add_physaddr_to_nid() exports Andrew Morton
2020-11-22  6:17 ` [patch 4/8] mm: fix readahead_page_batch for retry entries Andrew Morton
2020-11-22  6:17 ` [patch 5/8] mm: memcg/slab: fix root memcg vmstats Andrew Morton
2020-11-22  6:17 ` Andrew Morton [this message]
2020-11-22  6:17 ` [patch 7/8] libfs: fix error cast of negative value in simple_attr_write() Andrew Morton
2020-11-22  6:17 ` [patch 8/8] mm: fix madvise WILLNEED performance problem Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201122061715.eoZ-MxK3C%akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=aarcange@redhat.com \
    --cc=egorenar@linux.ibm.com \
    --cc=gerald.schaefer@linux.ibm.com \
    --cc=hca@linux.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mm-commits@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.