All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Rapoport <rppt@linux.vnet.ibm.com>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>,
	'Andrew Morton' <akpm@linux-foundation.org>,
	linux-mm@kvack.org, 'Mike Kravetz' <mike.kravetz@oracle.com>,
	"'Dr. David Alan Gilbert'" <dgilbert@redhat.com>,
	'Shaohua Li' <shli@fb.com>,
	'Pavel Emelyanov' <xemul@virtuozzo.com>
Subject: Re: [PATCH 25/33] userfaultfd: shmem: add userfaultfd hook for shared memory faults
Date: Sun, 20 Nov 2016 14:10:51 +0200	[thread overview]
Message-ID: <20161120121050.GC32009@rapoport-lnx> (raw)
In-Reply-To: <20161118003734.GC10229@redhat.com>

On Fri, Nov 18, 2016 at 01:37:34AM +0100, Andrea Arcangeli wrote:
> Hello,
> 
> I found a minor issue with the non cooperative testcase, sometime an
> userfault would trigger in between UFFD_EVENT_MADVDONTNEED and
> UFFDIO_UNREGISTER:
> 
> 		case UFFD_EVENT_MADVDONTNEED:
> 			uffd_reg.range.start = msg.arg.madv_dn.start;
> 			uffd_reg.range.len = msg.arg.madv_dn.end -
> 				msg.arg.madv_dn.start;
> 			if (ioctl(uffd, UFFDIO_UNREGISTER, &uffd_reg.range))
> 
> It always triggered at the nr == 0:
> 
> 	for (nr = 0; nr < nr_pages; nr++) {
> 		if (my_bcmp(area_dst + nr * page_size, zeropage, page_size))
> 
> The userfault still pending after UFFDIO_UNREGISTER returned, lead to
> poll() getting a UFFD_EVENT_PAGEFAULT and trying to do a UFFDIO_COPY
> into the unregistered range, which gracefully results in -EINVAL.
> 
> So this could be all handled in userland, by storing the MADV_DONTNEED
> range and calling UFFDIO_WAKE instead of UFFDIO_COPY... but I think
> it's more reliable to fix it into the kernel.
> 
> If a pending userfault happens before UFFDIO_UNREGISTER it'll just
> behave like if it happened after.
> 
> I also noticed the order of uffd notification of MADV_DONTNEED and the
> pagetable zap was wrong, we've to notify userland first so it won't
> risk to call UFFDIO_COPY while the process runs zap_page_range.
> 
> With the two patches appended below the -EINVAL error out of
> UFFDIO_COPY is gone.
> 
> From fc27d209e566d95e8ae0eb83a703aa4e02316b4c Mon Sep 17 00:00:00 2001
> From: Andrea Arcangeli <aarcange@redhat.com>
> Date: Thu, 17 Nov 2016 20:15:50 +0100
> Subject: [PATCH 1/2] userfaultfd: non-cooperative: avoid MADV_DONTNEED race
>  condition
> 
> MADV_DONTNEED must be notified to userland before the pages are
> zapped. This allows userland to immediately stop adding pages to the
> userfaultfd ranges before the pages are actually zapped or there could
> be non-zeropage leftovers as result of concurrent UFFDIO_COPY run in
> between zap_page_range and madvise_userfault_dontneed (both
> MADV_DONTNEED and UFFDIO_COPY runs under the mmap_sem for reading, so
> they can run concurrently).
> 
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>

Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>

> ---
>  mm/madvise.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 7168bc6..4d4c7f8 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -476,8 +476,8 @@ static long madvise_dontneed(struct vm_area_struct *vma,
>  	if (vma->vm_flags & (VM_LOCKED|VM_HUGETLB|VM_PFNMAP))
>  		return -EINVAL;
> 
> -	zap_page_range(vma, start, end - start, NULL);
>  	madvise_userfault_dontneed(vma, prev, start, end);
> +	zap_page_range(vma, start, end - start, NULL);
>  	return 0;
>  }
> 
> 
> 
> From 18e7b30cf82c927af4c0323a6caac20184a03ff4 Mon Sep 17 00:00:00 2001
> From: Andrea Arcangeli <aarcange@redhat.com>
> Date: Thu, 17 Nov 2016 20:20:40 +0100
> Subject: [PATCH 2/2] userfaultfd: non-cooperative: wake userfaults after
>  UFFDIO_UNREGISTER
> 
> Userfaults may still happen after the userfaultfd monitor thread
> received a UFFD_EVENT_MADVDONTNEED until UFFDIO_UNREGISTER is run.
> 
> Wake any pending userfault within UFFDIO_UNREGISTER protected by the
> mmap_sem for writing, so they will not be reported to userland leading
> to UFFDIO_COPY returning -EINVAL (as the range was already
> unregistered) and they will not hang permanently either.
> 
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>

Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>

> ---
>  fs/userfaultfd.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> index 2b75fab..42168d3 100644
> --- a/fs/userfaultfd.c
> +++ b/fs/userfaultfd.c
> @@ -1267,6 +1267,19 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
>  			start = vma->vm_start;
>  		vma_end = min(end, vma->vm_end);
> 
> +		if (userfaultfd_missing(vma)) {
> +			/*
> +			 * Wake any concurrent pending userfault while
> +			 * we unregister, so they will not hang
> +			 * permanently and it avoids userland to call
> +			 * UFFDIO_WAKE explicitly.
> +			 */
> +			struct userfaultfd_wake_range range;
> +			range.start = start;
> +			range.len = vma_end - start;
> +			wake_userfault(vma->vm_userfaultfd_ctx.ctx, &range);
> +		}
> +
>  		new_flags = vma->vm_flags & ~(VM_UFFD_MISSING | VM_UFFD_WP);
>  		prev = vma_merge(mm, prev, start, vma_end, new_flags,
>  				 vma->anon_vma, vma->vm_file, vma->vm_pgoff,
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2016-11-20 12:11 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-02 19:33 [PATCH 00/33] userfaultfd tmpfs/hugetlbfs/non-cooperative Andrea Arcangeli
2016-11-02 19:33 ` [PATCH 01/33] userfaultfd: document _IOR/_IOW Andrea Arcangeli
2016-11-02 19:33 ` [PATCH 02/33] userfaultfd: correct comment about UFFD_FEATURE_PAGEFAULT_FLAG_WP Andrea Arcangeli
2016-11-02 19:33 ` [PATCH 03/33] userfaultfd: convert BUG() to WARN_ON_ONCE() Andrea Arcangeli
2016-11-02 19:33 ` [PATCH 04/33] userfaultfd: use vma_is_anonymous Andrea Arcangeli
2016-11-02 19:33 ` [PATCH 05/33] userfaultfd: non-cooperative: Split the find_userfault() routine Andrea Arcangeli
2016-11-02 19:33 ` [PATCH 06/33] userfaultfd: non-cooperative: Add ability to report non-PF events from uffd descriptor Andrea Arcangeli
2016-11-02 19:33 ` [PATCH 07/33] userfaultfd: non-cooperative: report all available features to userland Andrea Arcangeli
2016-11-02 19:33 ` [PATCH 08/33] userfaultfd: non-cooperative: Add fork() event Andrea Arcangeli
2016-11-02 19:33 ` [PATCH 09/33] userfaultfd: non-cooperative: Add fork() event, build warning fix Andrea Arcangeli
2016-11-02 19:33 ` [PATCH 10/33] userfaultfd: non-cooperative: dup_userfaultfd: use mm_count instead of mm_users Andrea Arcangeli
2016-11-02 19:33 ` [PATCH 11/33] userfaultfd: non-cooperative: Add mremap() event Andrea Arcangeli
2016-11-03  7:41   ` Hillf Danton
2016-11-03 17:52     ` Mike Rapoport
2016-11-04 15:40     ` Mike Rapoport
2016-11-02 19:33 ` [PATCH 12/33] userfaultfd: non-cooperative: Add madvise() event for MADV_DONTNEED request Andrea Arcangeli
2016-11-03  8:01   ` Hillf Danton
2016-11-03 17:24     ` Mike Rapoport
2016-11-04 16:40       ` [PATCH 12/33] userfaultfd: non-cooperative: Add madvise() event for MADV_DONTNEED requestg Andrea Arcangeli
2016-11-04 15:42     ` [PATCH 12/33] userfaultfd: non-cooperative: Add madvise() event for MADV_DONTNEED request Mike Rapoport
2016-11-02 19:33 ` [PATCH 13/33] userfaultfd: hugetlbfs: add copy_huge_page_from_user for hugetlb userfaultfd support Andrea Arcangeli
2016-11-02 19:33 ` [PATCH 14/33] userfaultfd: hugetlbfs: add hugetlb_mcopy_atomic_pte for " Andrea Arcangeli
2016-11-02 19:33 ` [PATCH 15/33] userfaultfd: hugetlbfs: add __mcopy_atomic_hugetlb for huge page UFFDIO_COPY Andrea Arcangeli
2016-11-03 10:15   ` Hillf Danton
2016-11-03 17:33     ` Mike Kravetz
2016-11-03 19:14       ` Mike Kravetz
2016-11-04  6:43         ` Hillf Danton
2016-11-04 19:36         ` Andrea Arcangeli
2016-11-04 20:34           ` Mike Kravetz
2016-11-08 21:06           ` Mike Kravetz
2016-11-16 18:28             ` Andrea Arcangeli
2016-11-16 18:53               ` Mike Kravetz
2016-11-17 15:40                 ` Andrea Arcangeli
2016-11-17 19:26                   ` Mike Kravetz
2016-11-18  0:05                     ` Andrea Arcangeli
2016-11-18  5:52                       ` Mike Kravetz
2016-11-22  1:16                         ` Mike Kravetz
2016-11-23  6:38                           ` Hillf Danton
2016-12-15 19:02                             ` Andrea Arcangeli
2016-12-16  3:54                               ` Hillf Danton
2016-11-17 19:41               ` Mike Kravetz
2016-11-04 16:35       ` Andrea Arcangeli
2016-11-02 19:33 ` [PATCH 16/33] userfaultfd: hugetlbfs: add userfaultfd hugetlb hook Andrea Arcangeli
2016-11-04  7:02   ` Hillf Danton
2016-11-02 19:33 ` [PATCH 17/33] userfaultfd: hugetlbfs: allow registration of ranges containing huge pages Andrea Arcangeli
2016-11-02 19:33 ` [PATCH 18/33] userfaultfd: hugetlbfs: add userfaultfd_hugetlb test Andrea Arcangeli
2016-11-02 19:33 ` [PATCH 19/33] userfaultfd: hugetlbfs: userfaultfd_huge_must_wait for hugepmd ranges Andrea Arcangeli
2016-11-02 19:33 ` [PATCH 20/33] userfaultfd: introduce vma_can_userfault Andrea Arcangeli
2016-11-04  7:39   ` Hillf Danton
2016-11-02 19:33 ` [PATCH 21/33] userfaultfd: shmem: add shmem_mcopy_atomic_pte for userfaultfd support Andrea Arcangeli
2016-11-02 19:33 ` [PATCH 22/33] userfaultfd: shmem: introduce vma_is_shmem Andrea Arcangeli
2016-11-02 19:33 ` [PATCH 23/33] userfaultfd: shmem: add tlbflush.h header for microblaze Andrea Arcangeli
2016-11-02 19:33 ` [PATCH 24/33] userfaultfd: shmem: use shmem_mcopy_atomic_pte for shared memory Andrea Arcangeli
2016-11-02 19:33 ` [PATCH 25/33] userfaultfd: shmem: add userfaultfd hook for shared memory faults Andrea Arcangeli
2016-11-04  8:59   ` Hillf Danton
2016-11-04 14:53     ` Mike Rapoport
2016-11-04 15:44     ` Mike Rapoport
2016-11-04 16:56       ` Andrea Arcangeli
2016-11-18  0:37       ` Andrea Arcangeli
2016-11-20 12:10         ` Mike Rapoport [this message]
2016-11-02 19:33 ` [PATCH 26/33] userfaultfd: shmem: allow registration of shared memory ranges Andrea Arcangeli
2016-11-02 19:33 ` [PATCH 27/33] userfaultfd: shmem: add userfaultfd_shmem test Andrea Arcangeli
2016-11-02 19:34 ` [PATCH 28/33] userfaultfd: shmem: lock the page before adding it to pagecache Andrea Arcangeli
2016-11-02 19:34 ` [PATCH 29/33] userfaultfd: shmem: avoid leaking blocks and used blocks in UFFDIO_COPY Andrea Arcangeli
2016-11-02 19:34 ` [PATCH 30/33] userfaultfd: non-cooperative: selftest: introduce userfaultfd_open Andrea Arcangeli
2016-11-02 19:34 ` [PATCH 31/33] userfaultfd: non-cooperative: selftest: add ufd parameter to copy_page Andrea Arcangeli
2016-11-02 19:34 ` [PATCH 32/33] userfaultfd: non-cooperative: selftest: add test for FORK, MADVDONTNEED and REMAP events Andrea Arcangeli
2016-11-02 19:34 ` [PATCH 33/33] mm: mprotect: use pmd_trans_unstable instead of taking the pmd_lock Andrea Arcangeli
2016-11-02 20:07 ` [PATCH 00/33] userfaultfd tmpfs/hugetlbfs/non-cooperative Andrea Arcangeli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161120121050.GC32009@rapoport-lnx \
    --to=rppt@linux.vnet.ibm.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=dgilbert@redhat.com \
    --cc=hillf.zj@alibaba-inc.com \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=shli@fb.com \
    --cc=xemul@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.