linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andrea Arcangeli <aarcange@redhat.com>
To: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	Hillf Danton <hillf.zj@alibaba-inc.com>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Pavel Emelyanov <xemul@virtuozzo.com>,
	Linux-MM <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v2 4/5] userfaultfd: mcopy_atomic: return -ENOENT when no compatible VMA found
Date: Thu, 2 Feb 2017 19:02:47 +0100	[thread overview]
Message-ID: <20170202180247.GA32446@redhat.com> (raw)
In-Reply-To: <1485542673-24387-5-git-send-email-rppt@linux.vnet.ibm.com>

On Fri, Jan 27, 2017 at 08:44:32PM +0200, Mike Rapoport wrote:
> -		err = -EINVAL;
> +		err = -ENOENT;
>  		dst_vma = find_vma(dst_mm, dst_start);
>  		if (!dst_vma || !is_vm_hugetlb_page(dst_vma))
>  			goto out_unlock;
> +		/*
> +		 * Only allow __mcopy_atomic_hugetlb on userfaultfd
> +		 * registered ranges.
> +		 */
> +		if (!dst_vma->vm_userfaultfd_ctx.ctx)
> +			goto out_unlock;
>  
> +		err = -EINVAL;
>  		if (vma_hpagesize != vma_kernel_pagesize(dst_vma))
>  			goto out_unlock;

That's correct, if a new vma emerges with a different page size it
cannot have a not null dst_vma->vm_userfaultfd_ctx.ctx in the non
cooperative case.

> @@ -219,12 +226,6 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm,
>  		goto out_unlock;
>  
>  	/*
> -	 * Only allow __mcopy_atomic_hugetlb on userfaultfd registered ranges.
> -	 */
> -	if (!dst_vma->vm_userfaultfd_ctx.ctx)
> -		goto out_unlock;
> -
> -	/*

but this is buggy and it shouldn't be removed, we need this check also
if dst_vma was found not NULL.

>  	 * Ensure the dst_vma has a anon_vma.
>  	 */
>  	err = -ENOMEM;
> @@ -368,10 +369,23 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm,
>  	 * Make sure the vma is not shared, that the dst range is
>  	 * both valid and fully within a single existing vma.
>  	 */
> -	err = -EINVAL;
> +	err = -ENOENT;
>  	dst_vma = find_vma(dst_mm, dst_start);
>  	if (!dst_vma)
>  		goto out_unlock;
> +	/*
> +	 * Be strict and only allow __mcopy_atomic on userfaultfd
> +	 * registered ranges to prevent userland errors going
> +	 * unnoticed. As far as the VM consistency is concerned, it
> +	 * would be perfectly safe to remove this check, but there's
> +	 * no useful usage for __mcopy_atomic ouside of userfaultfd
> +	 * registered ranges. This is after all why these are ioctls
> +	 * belonging to the userfaultfd and not syscalls.
> +	 */
> +	if (!dst_vma->vm_userfaultfd_ctx.ctx)
> +		goto out_unlock;
> +
> +	err = -EINVAL;
>  	if (!vma_is_shmem(dst_vma) && dst_vma->vm_flags & VM_SHARED)
>  		goto out_unlock;
>  	if (dst_start < dst_vma->vm_start ||

This isn't enough, the -ENOENT should be returned also if the address
doesn't isn't in the range of the found vma, instead of -EINVAL. "vma"
may be a completely different vma just it happen to be way above the
fault address, and the vma previously covering the "addr" (which was
below the found "vma") was already munmapped, so you'd be returning
-EINVAL after munmap still unless the -EINVAL is moved down below.

The check on !vma_is_shmem(dst_vma) && dst_vma->vm_flags & VM_SHARED
instead can be shifted down below after setting err to -EINVAL as then
we know the vma is really the one we were looking for but it's of a
type we can't handle.


Now changing topic (aside from implementation comments above) I need
to raise a big fat warning that this change breaks the userland ABI.
There's definitely a 100% backwards compatible way to do this and it
would be to introduce a new UFFDIO_COPY2 ioctl.

However I think the above approach is ok because, userland should
always crash dump if UFFDIO_COPY returns -EINVAL to be strict, so it
means any app that is running into this corner case would currently be
getting -EINVAL and it'd crash in the first place.

For non cooperative usage, is very good that it will be allowed not to
ignore -EINVAL errors, and it will only ignore -ENOENT retvals so it
can be strict too (qemu will instead crash no matter if it gets
-ENOENT or -ENOSPC or -EINVAL).

I believe it's an acceptable ABI change with no risk to existing apps,
the main current user of userfaultfd being the cooperative usage (non
cooperative is still in -mm after all) and with David we reviewed qemu
to never run into the -ENOENT/-ENOSPC case. It'd crash already if it
would get -EINVAL (and it has to still crash with the more finegrined
-ENOENT/-ENOSPC).

So I'm positive about this ABI change as it can't break any existing
app we know of, and I'm also positive the other one in 5/5 for the
same reason (-ENOSPC) where I don't see implementation issues either.

There have been major complains about breaking userland ABI retvals in
the past, so I just want to give a big fat warning about these two
patches 4/5 and 5/5, but I personally think it's an acceptable change
as I don't see any risk of cooperative userland breaking because of it.

Thanks,
Andrea

  reply	other threads:[~2017-02-02 18:02 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-27 18:44 [PATCH v2 0/5] userfaultfd: non-cooperative: better tracking for mapping changes Mike Rapoport
2017-01-27 18:44 ` [PATCH v2 1/5] mm: call vm_munmap in munmap syscall instead of using open coded version Mike Rapoport
2017-01-27 18:44 ` [PATCH v2 2/5] userfaultfd: non-cooperative: add event for memory unmaps Mike Rapoport
2017-02-01  0:39   ` Andrew Morton
2017-02-01  6:37     ` Mike Rapoport
2017-02-02  9:15   ` Michal Hocko
2017-02-05 18:46   ` [v2,2/5] " Guenter Roeck
2017-02-06 23:52     ` Andrew Morton
2017-01-27 18:44 ` [PATCH v2 3/5] userfaultfd: non-cooperative: add event for exit() notification Mike Rapoport
2017-02-01  0:41   ` Andrew Morton
2017-02-01  6:35     ` Mike Rapoport
2017-02-01 22:27       ` Andrew Morton
2017-02-02 13:54     ` Mike Rapoport
2017-01-27 18:44 ` [PATCH v2 4/5] userfaultfd: mcopy_atomic: return -ENOENT when no compatible VMA found Mike Rapoport
2017-02-02 18:02   ` Andrea Arcangeli [this message]
2017-02-03 16:52     ` Mike Rapoport
2017-01-27 18:44 ` [PATCH v2 5/5] userfaultfd_copy: return -ENOSPC in case mm has gone Mike Rapoport

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170202180247.GA32446@redhat.com \
    --to=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=dgilbert@redhat.com \
    --cc=hillf.zj@alibaba-inc.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=rppt@linux.vnet.ibm.com \
    --cc=xemul@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).