* [PATCH 0/3] userfaultfd v4.11 updates @ 2017-03-02 17:37 Andrea Arcangeli 2017-03-02 17:37 ` [PATCH 1/3] userfaultfd: non-cooperative: fix fork fctx->new memleak Andrea Arcangeli ` (2 more replies) 0 siblings, 3 replies; 8+ messages in thread From: Andrea Arcangeli @ 2017-03-02 17:37 UTC (permalink / raw) To: linux-mm, Andrew Morton Cc: Mike Rapoport, Dr. David Alan Gilbert, Mike Kravetz, Pavel Emelyanov, Hillf Danton Hello, this is incremental against current -mm. The earlier patchset fixed the fctx->orig memleak but as I thought during review, there was a further leak in fctx->new and Mike promptly fixed it too. I've been running a reproducer and there's no further memleak when the uffd is closed by the parent while fork() blocks waiting for the event to be received. In addition I found a potential stale pointer in MADV_DONTNEED if userfaultfd_remove has to block and releases the mmap_sem in the process. This patch revalidates the vma and fixes the race. Unfortunately calling userfaultfd_remove last is not ok as explained in the commit. A "make" run from the directory selftests/vm/ independently started to fail in v4.11 trying to write executables to the root directory. That's not very friendly because it worked before, but it's easy to fix and the last patch corrects this behavior in the vm/Makefile. Andrea Arcangeli (2): userfaultfd: non-cooperative: userfaultfd_remove revalidate vma in MADV_DONTNEED userfaultfd: selftest: vm: allow to build in vm/ directory Mike Rapoport (1): userfaultfd: non-cooperative: fix fork fctx->new memleak fs/userfaultfd.c | 18 ++++++++++----- include/linux/userfaultfd_k.h | 7 +++--- mm/madvise.c | 44 ++++++++++++++++++++++++++++++++++--- tools/testing/selftests/vm/Makefile | 4 ++++ 4 files changed, 60 insertions(+), 13 deletions(-) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 1/3] userfaultfd: non-cooperative: fix fork fctx->new memleak 2017-03-02 17:37 [PATCH 0/3] userfaultfd v4.11 updates Andrea Arcangeli @ 2017-03-02 17:37 ` Andrea Arcangeli 2017-03-03 11:09 ` Mike Rapoport 2017-09-28 9:26 ` Sasha Levin 2017-03-02 17:37 ` [PATCH 2/3] userfaultfd: non-cooperative: userfaultfd_remove revalidate vma in MADV_DONTNEED Andrea Arcangeli 2017-03-02 17:37 ` [PATCH 3/3] userfaultfd: selftest: vm: allow to build in vm/ directory Andrea Arcangeli 2 siblings, 2 replies; 8+ messages in thread From: Andrea Arcangeli @ 2017-03-02 17:37 UTC (permalink / raw) To: linux-mm, Andrew Morton Cc: Mike Rapoport, Dr. David Alan Gilbert, Mike Kravetz, Pavel Emelyanov, Hillf Danton From: Mike Rapoport <rppt@linux.vnet.ibm.com> We have a memleak in the ->new ctx if the uffd of the parent is closed before the fork event is read, nothing frees the new context. Reported-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> --- fs/userfaultfd.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index d2f15a6..5087a69 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -548,6 +548,15 @@ static void userfaultfd_event_wait_completion(struct userfaultfd_ctx *ctx, if (ACCESS_ONCE(ctx->released) || fatal_signal_pending(current)) { __remove_wait_queue(&ctx->event_wqh, &ewq->wq); + if (ewq->msg.event == UFFD_EVENT_FORK) { + struct userfaultfd_ctx *new; + + new = (struct userfaultfd_ctx *) + (unsigned long) + ewq->msg.arg.reserved.reserved1; + + userfaultfd_ctx_put(new); + } break; } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 1/3] userfaultfd: non-cooperative: fix fork fctx->new memleak 2017-03-02 17:37 ` [PATCH 1/3] userfaultfd: non-cooperative: fix fork fctx->new memleak Andrea Arcangeli @ 2017-03-03 11:09 ` Mike Rapoport 2017-09-28 9:26 ` Sasha Levin 1 sibling, 0 replies; 8+ messages in thread From: Mike Rapoport @ 2017-03-03 11:09 UTC (permalink / raw) To: Andrea Arcangeli Cc: linux-mm, Andrew Morton, Dr. David Alan Gilbert, Mike Kravetz, Pavel Emelyanov, Hillf Danton > From: Mike Rapoport <rppt@linux.vnet.ibm.com> > > We have a memleak in the ->new ctx if the uffd of the parent is closed > before the fork event is read, nothing frees the new context. > > Reported-by: Andrea Arcangeli <aarcange@redhat.com> I think Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com> would be appropriate here. > Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> > --- > fs/userfaultfd.c | 9 +++++++++ > 1 file changed, 9 insertions(+) > > diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c > index d2f15a6..5087a69 100644 > --- a/fs/userfaultfd.c > +++ b/fs/userfaultfd.c > @@ -548,6 +548,15 @@ static void userfaultfd_event_wait_completion(struct userfaultfd_ctx *ctx, > if (ACCESS_ONCE(ctx->released) || > fatal_signal_pending(current)) { > __remove_wait_queue(&ctx->event_wqh, &ewq->wq); > + if (ewq->msg.event == UFFD_EVENT_FORK) { > + struct userfaultfd_ctx *new; > + > + new = (struct userfaultfd_ctx *) > + (unsigned long) > + ewq->msg.arg.reserved.reserved1; > + > + userfaultfd_ctx_put(new); > + } > break; > } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/3] userfaultfd: non-cooperative: fix fork fctx->new memleak 2017-03-02 17:37 ` [PATCH 1/3] userfaultfd: non-cooperative: fix fork fctx->new memleak Andrea Arcangeli 2017-03-03 11:09 ` Mike Rapoport @ 2017-09-28 9:26 ` Sasha Levin 2017-09-28 10:16 ` Andrea Arcangeli 1 sibling, 1 reply; 8+ messages in thread From: Sasha Levin @ 2017-09-28 9:26 UTC (permalink / raw) To: Andrea Arcangeli Cc: linux-mm, Andrew Morton, Mike Rapoport, Dr. David Alan Gilbert, Mike Kravetz, Pavel Emelyanov, Hillf Danton, alexander.levin On Thu, Mar 2, 2017 at 9:37 AM, Andrea Arcangeli <aarcange@redhat.com> wrote: > From: Mike Rapoport <rppt@linux.vnet.ibm.com> > > We have a memleak in the ->new ctx if the uffd of the parent is closed > before the fork event is read, nothing frees the new context. Hey Mike, This seems to result in the following: ================================================================== BUG: KASAN: use-after-free in resolve_userfault_fork fs/userfaultfd.c:967 [inline] BUG: KASAN: use-after-free in userfaultfd_ctx_read+0xa2e/0x2110 fs/userfaultfd.c:1093 Read of size 4 at addr ffff880064944204 by task syz-executor5/6524 CPU: 1 PID: 6524 Comm: syz-executor5 Not tainted 4.13.0-next-20170908+ #246 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-1ubuntu1 04/01/2014 Call Trace: __dump_stack lib/dump_stack.c:16 [inline] dump_stack+0x11d/0x1e5 lib/dump_stack.c:52 print_address_description+0xcb/0x250 mm/kasan/report.c:252 kasan_report_error mm/kasan/report.c:351 [inline] kasan_report+0x275/0x360 mm/kasan/report.c:409 __asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:429 resolve_userfault_fork fs/userfaultfd.c:967 [inline] userfaultfd_ctx_read+0xa2e/0x2110 fs/userfaultfd.c:1093 userfaultfd_read+0x1a3/0x260 fs/userfaultfd.c:1126 do_loop_readv_writev fs/read_write.c:693 [inline] do_iter_read+0x3db/0x5b0 fs/read_write.c:917 vfs_readv+0x130/0x1d0 fs/read_write.c:979 do_readv+0x108/0x2d0 fs/read_write.c:1012 SYSC_readv fs/read_write.c:1099 [inline] SyS_readv+0x27/0x30 fs/read_write.c:1096 do_syscall_64+0x26a/0x800 arch/x86/entry/common.c:287 entry_SYSCALL64_slow_path+0x25/0x25 RIP: 0033:0x452ea9 RSP: 002b:00007fa93a8bac08 EFLAGS: 00000216 ORIG_RAX: 0000000000000013 RAX: ffffffffffffffda RBX: 0000000000718210 RCX: 0000000000452ea9 RDX: 0000000000000001 RSI: 000000002000f000 RDI: 0000000000000015 RBP: 0000000000004000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000216 R12: 00000000004bc64f R13: 00000000ffffffff R14: 0000000000000015 R15: 000000002000f000 Allocated by task 6495: save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59 save_stack+0x43/0xd0 mm/kasan/kasan.c:447 set_track mm/kasan/kasan.c:459 [inline] kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551 kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:489 slab_post_alloc_hook mm/slab.h:444 [inline] slab_alloc_node mm/slub.c:2745 [inline] slab_alloc mm/slub.c:2753 [inline] kmem_cache_alloc+0x121/0x390 mm/slub.c:2758 dup_userfaultfd+0x21e/0x890 fs/userfaultfd.c:653 dup_mmap kernel/fork.c:658 [inline] dup_mm kernel/fork.c:1179 [inline] copy_mm+0xa49/0x130d kernel/fork.c:1233 copy_process.part.30+0x21d5/0x4f00 kernel/fork.c:1735 copy_process kernel/fork.c:1548 [inline] _do_fork+0x1f7/0x1050 kernel/fork.c:2027 SYSC_clone kernel/fork.c:2137 [inline] SyS_clone+0x37/0x50 kernel/fork.c:2131 do_syscall_64+0x26a/0x800 arch/x86/entry/common.c:287 return_from_SYSCALL_64+0x0/0x7a Freed by task 6495: save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59 save_stack+0x43/0xd0 mm/kasan/kasan.c:447 set_track mm/kasan/kasan.c:459 [inline] kasan_slab_free+0x72/0xc0 mm/kasan/kasan.c:524 slab_free_hook mm/slub.c:1390 [inline] slab_free_freelist_hook mm/slub.c:1412 [inline] slab_free mm/slub.c:2988 [inline] kmem_cache_free+0xb6/0x320 mm/slub.c:3010 userfaultfd_ctx_put+0x50c/0x740 fs/userfaultfd.c:165 userfaultfd_event_wait_completion+0x763/0x930 fs/userfaultfd.c:599 dup_fctx fs/userfaultfd.c:687 [inline] dup_userfaultfd_complete+0x2e0/0x480 fs/userfaultfd.c:695 dup_mmap kernel/fork.c:726 [inline] dup_mm kernel/fork.c:1179 [inline] copy_mm+0xe7c/0x130d kernel/fork.c:1233 copy_process.part.30+0x21d5/0x4f00 kernel/fork.c:1735 copy_process kernel/fork.c:1548 [inline] _do_fork+0x1f7/0x1050 kernel/fork.c:2027 SYSC_clone kernel/fork.c:2137 [inline] SyS_clone+0x37/0x50 kernel/fork.c:2131 do_syscall_64+0x26a/0x800 arch/x86/entry/common.c:287 return_from_SYSCALL_64+0x0/0x7a The buggy address belongs to the object at ffff880064944040 which belongs to the cache userfaultfd_ctx_cache of size 480 The buggy address is located 452 bytes inside of 480-byte region [ffff880064944040, ffff880064944220) The buggy address belongs to the page: page:ffffea0001925100 count:1 mapcount:0 mapping: (null) index:0xffff880064947bc0 compound_mapcount: 0 flags: 0x4fffe0000008100(slab|head) raw: 04fffe0000008100 0000000000000000 ffff880064947bc0 0000000100120001 raw: ffff88006a556d98 ffff88006a556d98 ffff88003e7c31c0 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff880064944100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff880064944180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >ffff880064944200: fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc ^ ffff880064944280: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff880064944300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ================================================================== Thanks, Sasha -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/3] userfaultfd: non-cooperative: fix fork fctx->new memleak 2017-09-28 9:26 ` Sasha Levin @ 2017-09-28 10:16 ` Andrea Arcangeli 0 siblings, 0 replies; 8+ messages in thread From: Andrea Arcangeli @ 2017-09-28 10:16 UTC (permalink / raw) To: Sasha Levin Cc: linux-mm, Andrew Morton, Mike Rapoport, Dr. David Alan Gilbert, Mike Kravetz, Pavel Emelyanov, Hillf Danton, alexander.levin Hello Sasha, On Thu, Sep 28, 2017 at 02:26:42AM -0700, Sasha Levin wrote: > On Thu, Mar 2, 2017 at 9:37 AM, Andrea Arcangeli <aarcange@redhat.com> wrote: > > From: Mike Rapoport <rppt@linux.vnet.ibm.com> > > > > We have a memleak in the ->new ctx if the uffd of the parent is closed > > before the fork event is read, nothing frees the new context. > > Hey Mike, > > This seems to result in the following: Andrew just included this fix in -mm: https://lkml.org/lkml/2017/9/20/571 Thanks, Andrea -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 2/3] userfaultfd: non-cooperative: userfaultfd_remove revalidate vma in MADV_DONTNEED 2017-03-02 17:37 [PATCH 0/3] userfaultfd v4.11 updates Andrea Arcangeli 2017-03-02 17:37 ` [PATCH 1/3] userfaultfd: non-cooperative: fix fork fctx->new memleak Andrea Arcangeli @ 2017-03-02 17:37 ` Andrea Arcangeli 2017-03-03 19:48 ` Mike Rapoport 2017-03-02 17:37 ` [PATCH 3/3] userfaultfd: selftest: vm: allow to build in vm/ directory Andrea Arcangeli 2 siblings, 1 reply; 8+ messages in thread From: Andrea Arcangeli @ 2017-03-02 17:37 UTC (permalink / raw) To: linux-mm, Andrew Morton Cc: Mike Rapoport, Dr. David Alan Gilbert, Mike Kravetz, Pavel Emelyanov, Hillf Danton userfaultfd_remove() has to be execute before zapping the pagetables or UFFDIO_COPY could keep filling pages after zap_page_range returned, which would result in non zero data after a MADV_DONTNEED. However userfaultfd_remove() may have to release the mmap_sem. This was handled correctly in MADV_REMOVE, but MADV_DONTNEED accessed a potentially stale vma (the very vma passed to zap_page_range(vma, ...)). The fix consists in revalidating the vma in case userfaultfd_remove() had to release the mmap_sem. This also optimizes away an unnecessary down_read/up_read in the MADV_REMOVE case if UFFD_EVENT_FORK had to be delivered. It all remains zero runtime cost in case CONFIG_USERFAULTFD=n as userfaultfd_remove() will be defined as "true" at build time. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> --- fs/userfaultfd.c | 9 +++------ include/linux/userfaultfd_k.h | 7 +++---- mm/madvise.c | 44 ++++++++++++++++++++++++++++++++++++++++--- 3 files changed, 47 insertions(+), 13 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 5087a69..2104811 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -694,8 +694,7 @@ void mremap_userfaultfd_complete(struct vm_userfaultfd_ctx *vm_ctx, userfaultfd_event_wait_completion(ctx, &ewq); } -void userfaultfd_remove(struct vm_area_struct *vma, - struct vm_area_struct **prev, +bool userfaultfd_remove(struct vm_area_struct *vma, unsigned long start, unsigned long end) { struct mm_struct *mm = vma->vm_mm; @@ -704,13 +703,11 @@ void userfaultfd_remove(struct vm_area_struct *vma, ctx = vma->vm_userfaultfd_ctx.ctx; if (!ctx || !(ctx->features & UFFD_FEATURE_EVENT_REMOVE)) - return; + return true; userfaultfd_ctx_get(ctx); up_read(&mm->mmap_sem); - *prev = NULL; /* We wait for ACK w/o the mmap semaphore */ - msg_init(&ewq.msg); ewq.msg.event = UFFD_EVENT_REMOVE; @@ -719,7 +716,7 @@ void userfaultfd_remove(struct vm_area_struct *vma, userfaultfd_event_wait_completion(ctx, &ewq); - down_read(&mm->mmap_sem); + return false; } static bool has_unmap_ctx(struct userfaultfd_ctx *ctx, struct list_head *unmaps, diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index f2b79bf..48a3483 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -61,8 +61,7 @@ extern void mremap_userfaultfd_complete(struct vm_userfaultfd_ctx *, unsigned long from, unsigned long to, unsigned long len); -extern void userfaultfd_remove(struct vm_area_struct *vma, - struct vm_area_struct **prev, +extern bool userfaultfd_remove(struct vm_area_struct *vma, unsigned long start, unsigned long end); @@ -118,11 +117,11 @@ static inline void mremap_userfaultfd_complete(struct vm_userfaultfd_ctx *ctx, { } -static inline void userfaultfd_remove(struct vm_area_struct *vma, - struct vm_area_struct **prev, +static inline bool userfaultfd_remove(struct vm_area_struct *vma, unsigned long start, unsigned long end) { + return true; } static inline int userfaultfd_unmap_prep(struct vm_area_struct *vma, diff --git a/mm/madvise.c b/mm/madvise.c index dc5927c..7a2abf0 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -513,7 +513,43 @@ static long madvise_dontneed(struct vm_area_struct *vma, if (!can_madv_dontneed_vma(vma)) return -EINVAL; - userfaultfd_remove(vma, prev, start, end); + if (!userfaultfd_remove(vma, start, end)) { + *prev = NULL; /* mmap_sem has been dropped, prev is stale */ + + down_read(¤t->mm->mmap_sem); + vma = find_vma(current->mm, start); + if (!vma) + return -ENOMEM; + if (start < vma->vm_start) { + /* + * This "vma" under revalidation is the one + * with the lowest vma->vm_start where start + * is also < vma->vm_end. If start < + * vma->vm_start it means an hole materialized + * in the user address space within the + * virtual range passed to MADV_DONTNEED. + */ + return -ENOMEM; + } + if (!can_madv_dontneed_vma(vma)) + return -EINVAL; + if (end > vma->vm_end) { + /* + * Don't fail if end > vma->vm_end. If the old + * vma was splitted while the mmap_sem was + * released the effect of the concurrent + * operation may not cause MADV_DONTNEED to + * have an undefined result. There may be an + * adjacent next vma that we'll walk + * next. userfaultfd_remove() will generate an + * UFFD_EVENT_REMOVE repetition on the + * end-vma->vm_end range, but the manager can + * handle a repetition fine. + */ + end = vma->vm_end; + } + VM_WARN_ON(start >= end); + } zap_page_range(vma, start, end - start); return 0; } @@ -554,8 +590,10 @@ static long madvise_remove(struct vm_area_struct *vma, * mmap_sem. */ get_file(f); - userfaultfd_remove(vma, prev, start, end); - up_read(¤t->mm->mmap_sem); + if (userfaultfd_remove(vma, start, end)) { + /* mmap_sem was not released by userfaultfd_remove() */ + up_read(¤t->mm->mmap_sem); + } error = vfs_fallocate(f, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE, offset, end - start); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 2/3] userfaultfd: non-cooperative: userfaultfd_remove revalidate vma in MADV_DONTNEED 2017-03-02 17:37 ` [PATCH 2/3] userfaultfd: non-cooperative: userfaultfd_remove revalidate vma in MADV_DONTNEED Andrea Arcangeli @ 2017-03-03 19:48 ` Mike Rapoport 0 siblings, 0 replies; 8+ messages in thread From: Mike Rapoport @ 2017-03-03 19:48 UTC (permalink / raw) To: Andrea Arcangeli Cc: linux-mm, Andrew Morton, Dr. David Alan Gilbert, Mike Kravetz, Pavel Emelyanov, Hillf Danton > userfaultfd_remove() has to be execute before zapping the pagetables or > UFFDIO_COPY could keep filling pages after zap_page_range returned, > which would result in non zero data after a MADV_DONTNEED. > > However userfaultfd_remove() may have to release the mmap_sem. This > was handled correctly in MADV_REMOVE, but MADV_DONTNEED accessed a > potentially stale vma (the very vma passed to zap_page_range(vma, ...)). > > The fix consists in revalidating the vma in case userfaultfd_remove() > had to release the mmap_sem. > > This also optimizes away an unnecessary down_read/up_read in the > MADV_REMOVE case if UFFD_EVENT_FORK had to be delivered. > > It all remains zero runtime cost in case CONFIG_USERFAULTFD=n as > userfaultfd_remove() will be defined as "true" at build time. > > Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com> > --- > fs/userfaultfd.c | 9 +++------ > include/linux/userfaultfd_k.h | 7 +++---- > mm/madvise.c | 44 ++++++++++++++++++++++++++++++++++++++++--- > 3 files changed, 47 insertions(+), 13 deletions(-) > > diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c > index 5087a69..2104811 100644 > --- a/fs/userfaultfd.c > +++ b/fs/userfaultfd.c > @@ -694,8 +694,7 @@ void mremap_userfaultfd_complete(struct vm_userfaultfd_ctx *vm_ctx, > userfaultfd_event_wait_completion(ctx, &ewq); > } > > -void userfaultfd_remove(struct vm_area_struct *vma, > - struct vm_area_struct **prev, > +bool userfaultfd_remove(struct vm_area_struct *vma, > unsigned long start, unsigned long end) > { > struct mm_struct *mm = vma->vm_mm; > @@ -704,13 +703,11 @@ void userfaultfd_remove(struct vm_area_struct *vma, > > ctx = vma->vm_userfaultfd_ctx.ctx; > if (!ctx || !(ctx->features & UFFD_FEATURE_EVENT_REMOVE)) > - return; > + return true; > > userfaultfd_ctx_get(ctx); > up_read(&mm->mmap_sem); > > - *prev = NULL; /* We wait for ACK w/o the mmap semaphore */ > - > msg_init(&ewq.msg); > > ewq.msg.event = UFFD_EVENT_REMOVE; > @@ -719,7 +716,7 @@ void userfaultfd_remove(struct vm_area_struct *vma, > > userfaultfd_event_wait_completion(ctx, &ewq); > > - down_read(&mm->mmap_sem); > + return false; > } > > static bool has_unmap_ctx(struct userfaultfd_ctx *ctx, struct list_head *unmaps, > diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h > index f2b79bf..48a3483 100644 > --- a/include/linux/userfaultfd_k.h > +++ b/include/linux/userfaultfd_k.h > @@ -61,8 +61,7 @@ extern void mremap_userfaultfd_complete(struct vm_userfaultfd_ctx *, > unsigned long from, unsigned long to, > unsigned long len); > > -extern void userfaultfd_remove(struct vm_area_struct *vma, > - struct vm_area_struct **prev, > +extern bool userfaultfd_remove(struct vm_area_struct *vma, > unsigned long start, > unsigned long end); > > @@ -118,11 +117,11 @@ static inline void mremap_userfaultfd_complete(struct vm_userfaultfd_ctx *ctx, > { > } > > -static inline void userfaultfd_remove(struct vm_area_struct *vma, > - struct vm_area_struct **prev, > +static inline bool userfaultfd_remove(struct vm_area_struct *vma, > unsigned long start, > unsigned long end) > { > + return true; > } > > static inline int userfaultfd_unmap_prep(struct vm_area_struct *vma, > diff --git a/mm/madvise.c b/mm/madvise.c > index dc5927c..7a2abf0 100644 > --- a/mm/madvise.c > +++ b/mm/madvise.c > @@ -513,7 +513,43 @@ static long madvise_dontneed(struct vm_area_struct *vma, > if (!can_madv_dontneed_vma(vma)) > return -EINVAL; > > - userfaultfd_remove(vma, prev, start, end); > + if (!userfaultfd_remove(vma, start, end)) { > + *prev = NULL; /* mmap_sem has been dropped, prev is stale */ > + > + down_read(¤t->mm->mmap_sem); > + vma = find_vma(current->mm, start); > + if (!vma) > + return -ENOMEM; > + if (start < vma->vm_start) { > + /* > + * This "vma" under revalidation is the one > + * with the lowest vma->vm_start where start > + * is also < vma->vm_end. If start < > + * vma->vm_start it means an hole materialized > + * in the user address space within the > + * virtual range passed to MADV_DONTNEED. > + */ > + return -ENOMEM; > + } > + if (!can_madv_dontneed_vma(vma)) > + return -EINVAL; > + if (end > vma->vm_end) { > + /* > + * Don't fail if end > vma->vm_end. If the old > + * vma was splitted while the mmap_sem was > + * released the effect of the concurrent > + * operation may not cause MADV_DONTNEED to > + * have an undefined result. There may be an > + * adjacent next vma that we'll walk > + * next. userfaultfd_remove() will generate an > + * UFFD_EVENT_REMOVE repetition on the > + * end-vma->vm_end range, but the manager can > + * handle a repetition fine. > + */ > + end = vma->vm_end; > + } > + VM_WARN_ON(start >= end); > + } > zap_page_range(vma, start, end - start); > return 0; > } > @@ -554,8 +590,10 @@ static long madvise_remove(struct vm_area_struct *vma, > * mmap_sem. > */ > get_file(f); > - userfaultfd_remove(vma, prev, start, end); > - up_read(¤t->mm->mmap_sem); > + if (userfaultfd_remove(vma, start, end)) { > + /* mmap_sem was not released by userfaultfd_remove() */ > + up_read(¤t->mm->mmap_sem); > + } > error = vfs_fallocate(f, > FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE, > offset, end - start); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 3/3] userfaultfd: selftest: vm: allow to build in vm/ directory 2017-03-02 17:37 [PATCH 0/3] userfaultfd v4.11 updates Andrea Arcangeli 2017-03-02 17:37 ` [PATCH 1/3] userfaultfd: non-cooperative: fix fork fctx->new memleak Andrea Arcangeli 2017-03-02 17:37 ` [PATCH 2/3] userfaultfd: non-cooperative: userfaultfd_remove revalidate vma in MADV_DONTNEED Andrea Arcangeli @ 2017-03-02 17:37 ` Andrea Arcangeli 2 siblings, 0 replies; 8+ messages in thread From: Andrea Arcangeli @ 2017-03-02 17:37 UTC (permalink / raw) To: linux-mm, Andrew Morton Cc: Mike Rapoport, Dr. David Alan Gilbert, Mike Kravetz, Pavel Emelyanov, Hillf Danton linux/tools/testing/selftests/vm $ make gcc -Wall -I ../../../../usr/include compaction_test.c -lrt -o /compaction_test /usr/lib/gcc/x86_64-pc-linux-gnu/4.9.4/../../../../x86_64-pc-linux-gnu/bin/ld: cannot open output file /compaction_test: Permission denied collect2: error: ld returned 1 exit status make: *** [../lib.mk:54: /compaction_test] Error 1 Since commit a8ba798bc8ec663cf02e80b0dd770324de9bafd9 selftests/vm build fails if run from the "selftests/vm" directory, but it works in the selftests/ directory. It's quicker to be able to do a local vm-only build after a tree wipe and this patch allows for it again. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> --- tools/testing/selftests/vm/Makefile | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/tools/testing/selftests/vm/Makefile b/tools/testing/selftests/vm/Makefile index 4cff7e7..41642ba 100644 --- a/tools/testing/selftests/vm/Makefile +++ b/tools/testing/selftests/vm/Makefile @@ -1,5 +1,9 @@ # Makefile for vm selftests +ifndef OUTPUT + OUTPUT := $(shell pwd) +endif + CFLAGS = -Wall -I ../../../../usr/include $(EXTRA_CFLAGS) LDLIBS = -lrt TEST_GEN_FILES = compaction_test -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 8+ messages in thread
end of thread, other threads:[~2017-09-28 10:16 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-03-02 17:37 [PATCH 0/3] userfaultfd v4.11 updates Andrea Arcangeli 2017-03-02 17:37 ` [PATCH 1/3] userfaultfd: non-cooperative: fix fork fctx->new memleak Andrea Arcangeli 2017-03-03 11:09 ` Mike Rapoport 2017-09-28 9:26 ` Sasha Levin 2017-09-28 10:16 ` Andrea Arcangeli 2017-03-02 17:37 ` [PATCH 2/3] userfaultfd: non-cooperative: userfaultfd_remove revalidate vma in MADV_DONTNEED Andrea Arcangeli 2017-03-03 19:48 ` Mike Rapoport 2017-03-02 17:37 ` [PATCH 3/3] userfaultfd: selftest: vm: allow to build in vm/ directory Andrea Arcangeli
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).