linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH] gup: return -EFAULT on access_ok failure
       [not found] <1522431382-4232-1-git-send-email-mst@redhat.com>
@ 2018-04-05  1:53 ` Michael S. Tsirkin
  2018-04-05  2:40   ` Linus Torvalds
  0 siblings, 1 reply; 10+ messages in thread
From: Michael S. Tsirkin @ 2018-04-05  1:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: stable, syzbot+6304bf97ef436580fede, linux-mm,
	Kirill A. Shutemov, Andrew Morton, Huang Ying, Jonathan Corbet,
	Linus Torvalds, Peter Zijlstra, Thomas Gleixner,
	Thorsten Leemhuis

On Fri, Mar 30, 2018 at 08:37:45PM +0300, Michael S. Tsirkin wrote:
> get_user_pages_fast is supposed to be a faster drop-in equivalent of
> get_user_pages. As such, callers expect it to return a negative return
> code when passed an invalid address, and never expect it to
> return 0 when passed a positive number of pages, since
> its documentation says:
> 
>  * Returns number of pages pinned. This may be fewer than the number
>  * requested. If nr_pages is 0 or negative, returns 0. If no pages
>  * were pinned, returns -errno.
> 
> Unfortunately this is not what the implementation does: it returns 0 if
> passed a kernel address, confusing callers: for example, the following
> is pretty common but does not appear to do the right thing with a kernel
> address:
> 
>         ret = get_user_pages_fast(addr, 1, writeable, &page);
>         if (ret < 0)
>                 return ret;
> 
> Change get_user_pages_fast to return -EFAULT when supplied a
> kernel address to make it match expectations.
> 
> __get_user_pages_fast does not seem to be used like this, but let's
> change __get_user_pages_fast as well for consistency and to match
> documentation.
> 
> Lightly tested.
> 
> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Huang Ying <ying.huang@intel.com>
> Cc: Jonathan Corbet <corbet@lwn.net>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Thorsten Leemhuis <regressions@leemhuis.info>
> Cc: stable@vger.kernel.org
> Fixes: 5b65c4677a57 ("mm, x86/mm: Fix performance regression in get_user_pages_fast()")
> Reported-by: syzbot+6304bf97ef436580fede@syzkaller.appspotmail.com
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

Any feedback on this? As this fixes a bug in vhost, I'll merge
through the vhost tree unless someone objects.

> ---
>  mm/gup.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/gup.c b/mm/gup.c
> index 6afae32..5642521 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -1749,6 +1749,9 @@ int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
>  	unsigned long flags;
>  	int nr = 0;
>  
> +	if (nr_pages <= 0)
> +		return 0;
> +
>  	start &= PAGE_MASK;
>  	addr = start;
>  	len = (unsigned long) nr_pages << PAGE_SHIFT;
> @@ -1756,7 +1759,7 @@ int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
>  
>  	if (unlikely(!access_ok(write ? VERIFY_WRITE : VERIFY_READ,
>  					(void __user *)start, len)))
> -		return 0;
> +		return -EFAULT;
>  
>  	/*
>  	 * Disable interrupts.  We use the nested form as we can already have
> @@ -1806,9 +1809,12 @@ int get_user_pages_fast(unsigned long start, int nr_pages, int write,
>  	len = (unsigned long) nr_pages << PAGE_SHIFT;
>  	end = start + len;
>  
> +	if (nr_pages <= 0)
> +		return 0;
> +
>  	if (unlikely(!access_ok(write ? VERIFY_WRITE : VERIFY_READ,
>  					(void __user *)start, len)))
> -		return 0;
> +		return -EFAULT;
>  
>  	if (gup_fast_permitted(start, nr_pages, write)) {
>  		local_irq_disable();
> -- 
> MST

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] gup: return -EFAULT on access_ok failure
  2018-04-05  1:53 ` [PATCH] gup: return -EFAULT on access_ok failure Michael S. Tsirkin
@ 2018-04-05  2:40   ` Linus Torvalds
  2018-04-05 14:17     ` Michael S. Tsirkin
  0 siblings, 1 reply; 10+ messages in thread
From: Linus Torvalds @ 2018-04-05  2:40 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Linux Kernel Mailing List, stable, syzbot+6304bf97ef436580fede,
	linux-mm, Kirill A. Shutemov, Andrew Morton, Huang Ying,
	Jonathan Corbet, Peter Zijlstra, Thomas Gleixner,
	Thorsten Leemhuis

On Wed, Apr 4, 2018 at 6:53 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>
> Any feedback on this? As this fixes a bug in vhost, I'll merge
> through the vhost tree unless someone objects.

NAK.

__get_user_pages_fast() returns the number of pages it gets.

It has never returned an error code, and all the other versions of it
(architecture-specific) don't either.

If you ask for one page, and get zero pages, then that's an -EFAULT.
Note that that's an EFAULT regardless of whether that zero page
happened due to kernel addresses or just lack of mapping in user
space.

The documentation is simply wrong if it says anything else. Fix the
docs, and fix the users.

The correct use has always been to check the number of pages returned.

Just looking around, returning an error number looks like it could
seriously confuse some things. You have things like the kvm code that
does the *right* thing:

        unsigned long ... npinned ...

        npinned = get_user_pages_fast(uaddr, npages, write ?
FOLL_WRITE : 0, pages);
        if (npinned != npages) {
     ...

err:
        if (npinned > 0)
                release_pages(pages, npinned);

and the above code clearly depends on the actual behavior, not on the
documentation.

Any changes in this area would need some *extreme* care, exactly
because of code like the above that clearly depends on the existing
semantics.

In fact, the documentation really seems to be just buggy. The actual
get_user_pages() function itself is expressly being careful *not* to
return an error code, it even has a comment to the effect ("Have to be
a bit careful with return values").

So the "If no pages were pinned, returns -errno" comment is just bogus.

                  Linus

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] gup: return -EFAULT on access_ok failure
  2018-04-05  2:40   ` Linus Torvalds
@ 2018-04-05 14:17     ` Michael S. Tsirkin
  2018-04-05 15:40       ` Linus Torvalds
  0 siblings, 1 reply; 10+ messages in thread
From: Michael S. Tsirkin @ 2018-04-05 14:17 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linux Kernel Mailing List, stable, syzbot+6304bf97ef436580fede,
	linux-mm, Kirill A. Shutemov, Andrew Morton, Huang Ying,
	Jonathan Corbet, Peter Zijlstra, Thomas Gleixner,
	Thorsten Leemhuis

On Wed, Apr 04, 2018 at 07:40:36PM -0700, Linus Torvalds wrote:
> On Wed, Apr 4, 2018 at 6:53 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > Any feedback on this? As this fixes a bug in vhost, I'll merge
> > through the vhost tree unless someone objects.
> 
> NAK.
> 
> __get_user_pages_fast() returns the number of pages it gets.
> 
> It has never returned an error code, and all the other versions of it
> (architecture-specific) don't either.

Thanks Linus. I can change the docs and all the callers.


I wonder however whether all the following should be changed then:

static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,

...

                        if (!vma || check_vma_flags(vma, gup_flags))
                                return i ? : -EFAULT;

is this a bug in __get_user_pages?


Another example:

                                ret = get_gate_page(mm, start & PAGE_MASK,
                                                gup_flags, &vma,
                                                pages ? &pages[i] : NULL);
                                if (ret)
                                        return i ? : ret;

and ret is -EFAULT on error.


Another example:
                        switch (ret) {
                        case 0:
                                goto retry;
                        case -EFAULT:
                        case -ENOMEM:
                        case -EHWPOISON:
                                return i ? i : ret;
                        case -EBUSY:
                                return i;
                        case -ENOENT:
                                goto next_page;
                        }

it looks like this will return -EFAULT/-ENOMEM/-EHWPOISON
if i is 0.


> If you ask for one page, and get zero pages, then that's an -EFAULT.
> Note that that's an EFAULT regardless of whether that zero page
> happened due to kernel addresses or just lack of mapping in user
> space.
> 
> The documentation is simply wrong if it says anything else. Fix the
> docs, and fix the users.
> 
> The correct use has always been to check the number of pages returned.
> 
> Just looking around, returning an error number looks like it could
> seriously confuse some things.
>
> You have things like the kvm code that
> does the *right* thing:
> 
>         unsigned long ... npinned ...
> 
>         npinned = get_user_pages_fast(uaddr, npages, write ?
> FOLL_WRITE : 0, pages);
>         if (npinned != npages) {
>      ...
> 
> err:
>         if (npinned > 0)
>                 release_pages(pages, npinned);
> 
> and the above code clearly depends on the actual behavior, not on the
> documentation.

This seems to work fine with my patch since it only changes the
case where npinned == 0.

> Any changes in this area would need some *extreme* care, exactly
> because of code like the above that clearly depends on the existing
> semantics.
> 
> In fact, the documentation really seems to be just buggy. The actual
> get_user_pages() function itself is expressly being careful *not* to
> return an error code, it even has a comment to the effect ("Have to be
> a bit careful with return values").
> 
> So the "If no pages were pinned, returns -errno" comment is just bogus.
> 
>                   Linus

I'd like to change the doc then, but it seems that I'll have to change
the implementation in that case too.

-- 
MST

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] gup: return -EFAULT on access_ok failure
  2018-04-05 14:17     ` Michael S. Tsirkin
@ 2018-04-05 15:40       ` Linus Torvalds
  2018-04-05 18:28         ` Michael S. Tsirkin
  0 siblings, 1 reply; 10+ messages in thread
From: Linus Torvalds @ 2018-04-05 15:40 UTC (permalink / raw)
  To: Michael S. Tsirkin, Al Viro
  Cc: Linux Kernel Mailing List, stable, syzbot+6304bf97ef436580fede,
	linux-mm, Kirill A. Shutemov, Andrew Morton, Huang Ying,
	Jonathan Corbet, Peter Zijlstra, Thomas Gleixner,
	Thorsten Leemhuis

On Thu, Apr 5, 2018 at 7:17 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
>
> I wonder however whether all the following should be changed then:
>
> static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
>
> ...
>
>                         if (!vma || check_vma_flags(vma, gup_flags))
>                                 return i ? : -EFAULT;
>
> is this a bug in __get_user_pages?

Note the difference between "get_user_pages()", and "get_user_pages_fast()".

It's the *fast* versions that just return the number of pages pinned.

The non-fast ones will return an error code for various cases.

Why?

The non-fast cases actually *have* various error cases. They can block
and get interrupted etc.

The fast cases are basically "just get me the pages, dammit, and if
you can't get some page, stop".

At least that's one excuse for the difference in behavior.

The real excuse is probably just "that's how it worked" - the fast
case just walked the page tables and that was it.

                 Linus

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] gup: return -EFAULT on access_ok failure
  2018-04-05 15:40       ` Linus Torvalds
@ 2018-04-05 18:28         ` Michael S. Tsirkin
  2018-04-05 18:43           ` Linus Torvalds
  2018-04-06 11:35           ` Alan Cox
  0 siblings, 2 replies; 10+ messages in thread
From: Michael S. Tsirkin @ 2018-04-05 18:28 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Al Viro, Linux Kernel Mailing List, stable,
	syzbot+6304bf97ef436580fede, linux-mm, Kirill A. Shutemov,
	Andrew Morton, Huang Ying, Jonathan Corbet, Peter Zijlstra,
	Thomas Gleixner, Thorsten Leemhuis

On Thu, Apr 05, 2018 at 08:40:05AM -0700, Linus Torvalds wrote:
> On Thu, Apr 5, 2018 at 7:17 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > I wonder however whether all the following should be changed then:
> >
> > static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
> >
> > ...
> >
> >                         if (!vma || check_vma_flags(vma, gup_flags))
> >                                 return i ? : -EFAULT;
> >
> > is this a bug in __get_user_pages?
> 
> Note the difference between "get_user_pages()", and "get_user_pages_fast()".
> 
> It's the *fast* versions that just return the number of pages pinned.
> 
> The non-fast ones will return an error code for various cases.
> 
> Why?
> 
> The non-fast cases actually *have* various error cases. They can block
> and get interrupted etc.
> 
> The fast cases are basically "just get me the pages, dammit, and if
> you can't get some page, stop".
> 
> At least that's one excuse for the difference in behavior.
> 
> The real excuse is probably just "that's how it worked" - the fast
> case just walked the page tables and that was it.
> 
>                  Linus

I see, thanks for the clarification Linus.

to repeat what you are saying IIUC __get_user_pages_fast returns 0 if it can't
pin any pages and that is by design.  Returning 0 on error isn't usual I think
so I guess this behaviour should we well documented.

That part of my patch was wrong and should be replaced with a doc
update.

What about get_user_pages_fast though? That's the other part of the
patch. Right now get_user_pages_fast does:

                ret = get_user_pages_unlocked(start, nr_pages - nr, pages,
                                write ? FOLL_WRITE : 0);

                /* Have to be a bit careful with return values */
                if (nr > 0) {
                        if (ret < 0)
                                ret = nr;
                        else
                                ret += nr;
                }

so an error on the 1st page gets propagated to the caller,
and that get_user_pages_unlocked eventually calls __get_user_pages
so it does return an error sometimes.

Would it be correct to apply the second part of the patch then
(pasted below for reference) or should get_user_pages_fast
and all its callers be changed to return 0 on error instead?

@@ -1806,9 +1809,12 @@ int get_user_pages_fast(unsigned long start, int nr_pages, int write,
 	len = (unsigned long) nr_pages << PAGE_SHIFT;
 	end = start + len;
 
+	if (nr_pages <= 0)
+		return 0;
+
 	if (unlikely(!access_ok(write ? VERIFY_WRITE : VERIFY_READ,
 					(void __user *)start, len)))
-		return 0;
+		return -EFAULT;
 
 	if (gup_fast_permitted(start, nr_pages, write)) {
 		local_irq_disable();

-- 
MST

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] gup: return -EFAULT on access_ok failure
  2018-04-05 18:28         ` Michael S. Tsirkin
@ 2018-04-05 18:43           ` Linus Torvalds
  2018-04-05 19:34             ` Michael S. Tsirkin
  2018-04-05 21:08             ` Michael S. Tsirkin
  2018-04-06 11:35           ` Alan Cox
  1 sibling, 2 replies; 10+ messages in thread
From: Linus Torvalds @ 2018-04-05 18:43 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Al Viro, Linux Kernel Mailing List, stable,
	syzbot+6304bf97ef436580fede, linux-mm, Kirill A. Shutemov,
	Andrew Morton, Huang Ying, Jonathan Corbet, Peter Zijlstra,
	Thomas Gleixner, Thorsten Leemhuis

On Thu, Apr 5, 2018 at 11:28 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
>
> to repeat what you are saying IIUC __get_user_pages_fast returns 0 if it can't
> pin any pages and that is by design.  Returning 0 on error isn't usual I think
> so I guess this behaviour should we well documented.

Arguably it happens elsewhere too, and not just in the kernel.
"read()" at past the end of a file is not an error, you'll just get 0
for EOF.

So it's not really "returning 0 on error".

It really is simply returning the number of pages it got. End of
story. That number of pages can be smaller than the requested number
of pages, and _that_ is due to some error, but note how it can return
"5" on error too - you asked for 10 pages, but the error happened in
the middle!

So the right way to check for error is to bverify that you get the
number of pages that you asked for. If you don't, something bad
happened.

Of course, many users don't actually care about "I didn't get
everything". They only care about "did I get _something_". Then that 0
ends up being the error case, but note how it depends on the caller.

> What about get_user_pages_fast though?

We do seem to special-case the first page there. I'm not sure it's a
good idea. But like the __get_user_pages_fast(), we seem to have users
that know about the particular semantics and depend on it.

It's all ugly, I agree.

End result: we can't just change semantics of either of them.

At least not without going through every single user and checking that
they are ok with it.

Which I guess I could be ok with. Maybe changing the semantics of
__get_user_pages_fast() is acceptable, if you just change it
*everywhere* (which includes not just he users, but also the couple of
architecture-specific versions of that same function that we have.

                    Linus

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] gup: return -EFAULT on access_ok failure
  2018-04-05 18:43           ` Linus Torvalds
@ 2018-04-05 19:34             ` Michael S. Tsirkin
  2018-04-05 19:39               ` Chris Wilson
  2018-04-05 21:08             ` Michael S. Tsirkin
  1 sibling, 1 reply; 10+ messages in thread
From: Michael S. Tsirkin @ 2018-04-05 19:34 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Al Viro, Linux Kernel Mailing List, stable,
	syzbot+6304bf97ef436580fede, linux-mm, Kirill A. Shutemov,
	Andrew Morton, Huang Ying, Jonathan Corbet, Peter Zijlstra,
	Thomas Gleixner, Thorsten Leemhuis, Chris Wilson, Tvrtko Ursulin,
	Gong, Zhipeng, Akash Goel, Volkin, Bradley D, Daniel Vetter

On Thu, Apr 05, 2018 at 11:43:27AM -0700, Linus Torvalds wrote:
> On Thu, Apr 5, 2018 at 11:28 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > to repeat what you are saying IIUC __get_user_pages_fast returns 0 if it can't
> > pin any pages and that is by design.  Returning 0 on error isn't usual I think
> > so I guess this behaviour should we well documented.
> 
> Arguably it happens elsewhere too, and not just in the kernel.
> "read()" at past the end of a file is not an error, you'll just get 0
> for EOF.
> 
> So it's not really "returning 0 on error".
> 
> It really is simply returning the number of pages it got. End of
> story. That number of pages can be smaller than the requested number
> of pages, and _that_ is due to some error, but note how it can return
> "5" on error too - you asked for 10 pages, but the error happened in
> the middle!
> 
> So the right way to check for error is to bverify that you get the
> number of pages that you asked for. If you don't, something bad
> happened.
> 
> Of course, many users don't actually care about "I didn't get
> everything". They only care about "did I get _something_". Then that 0
> ends up being the error case, but note how it depends on the caller.
> 
> > What about get_user_pages_fast though?
> 
> We do seem to special-case the first page there. I'm not sure it's a
> good idea. But like the __get_user_pages_fast(), we seem to have users
> that know about the particular semantics and depend on it.
> 
> It's all ugly, I agree.
> 
> End result: we can't just change semantics of either of them.
> 
> At least not without going through every single user and checking that
> they are ok with it.
> 
> Which I guess I could be ok with. Maybe changing the semantics of
> __get_user_pages_fast() is acceptable, if you just change it
> *everywhere* (which includes not just he users, but also the couple of
> architecture-specific versions of that same function that we have.
> 
>                     Linus

OK I hope I understood what you are saying here.

At least drivers/gpu/drm/i915/i915_gem_userptr.c seems to
get it wrong:

        pinned = __get_user_pages_fast(obj->userptr.ptr,

        if (pinned < 0) {
                pages = ERR_PTR(pinned);
                pinned = 0;
        } else if (pinned < num_pages) {
                pages = __i915_gem_userptr_get_pages_schedule(obj);
                active = pages == ERR_PTR(-EAGAIN);
        } else {
                pages = __i915_gem_userptr_alloc_pages(obj, pvec, num_pages);
                active = !IS_ERR(pages);
        }

The <0 path is never taken.

Cc maintainers - should that driver be changed to use
get_user_pages_fast?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] gup: return -EFAULT on access_ok failure
  2018-04-05 19:34             ` Michael S. Tsirkin
@ 2018-04-05 19:39               ` Chris Wilson
  0 siblings, 0 replies; 10+ messages in thread
From: Chris Wilson @ 2018-04-05 19:39 UTC (permalink / raw)
  To: Michael S. Tsirkin, Linus Torvalds
  Cc: Al Viro, Linux Kernel Mailing List, stable,
	syzbot+6304bf97ef436580fede, linux-mm, Kirill A. Shutemov,
	Andrew Morton, Huang Ying, Jonathan Corbet, Peter Zijlstra,
	Thomas Gleixner, Thorsten Leemhuis, Tvrtko Ursulin, Gong,
	Zhipeng, Akash Goel, Volkin, Bradley D, Daniel Vetter

Quoting Michael S. Tsirkin (2018-04-05 20:34:08)
> On Thu, Apr 05, 2018 at 11:43:27AM -0700, Linus Torvalds wrote:
> > On Thu, Apr 5, 2018 at 11:28 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > to repeat what you are saying IIUC __get_user_pages_fast returns 0 if it can't
> > > pin any pages and that is by design.  Returning 0 on error isn't usual I think
> > > so I guess this behaviour should we well documented.
> > 
> > Arguably it happens elsewhere too, and not just in the kernel.
> > "read()" at past the end of a file is not an error, you'll just get 0
> > for EOF.
> > 
> > So it's not really "returning 0 on error".
> > 
> > It really is simply returning the number of pages it got. End of
> > story. That number of pages can be smaller than the requested number
> > of pages, and _that_ is due to some error, but note how it can return
> > "5" on error too - you asked for 10 pages, but the error happened in
> > the middle!
> > 
> > So the right way to check for error is to bverify that you get the
> > number of pages that you asked for. If you don't, something bad
> > happened.
> > 
> > Of course, many users don't actually care about "I didn't get
> > everything". They only care about "did I get _something_". Then that 0
> > ends up being the error case, but note how it depends on the caller.
> > 
> > > What about get_user_pages_fast though?
> > 
> > We do seem to special-case the first page there. I'm not sure it's a
> > good idea. But like the __get_user_pages_fast(), we seem to have users
> > that know about the particular semantics and depend on it.
> > 
> > It's all ugly, I agree.
> > 
> > End result: we can't just change semantics of either of them.
> > 
> > At least not without going through every single user and checking that
> > they are ok with it.
> > 
> > Which I guess I could be ok with. Maybe changing the semantics of
> > __get_user_pages_fast() is acceptable, if you just change it
> > *everywhere* (which includes not just he users, but also the couple of
> > architecture-specific versions of that same function that we have.
> > 
> >                     Linus
> 
> OK I hope I understood what you are saying here.
> 
> At least drivers/gpu/drm/i915/i915_gem_userptr.c seems to
> get it wrong:
> 
>         pinned = __get_user_pages_fast(obj->userptr.ptr,
> 
>         if (pinned < 0) {
>                 pages = ERR_PTR(pinned);
>                 pinned = 0;
>         } else if (pinned < num_pages) {
>                 pages = __i915_gem_userptr_get_pages_schedule(obj);
>                 active = pages == ERR_PTR(-EAGAIN);
>         } else {
>                 pages = __i915_gem_userptr_alloc_pages(obj, pvec, num_pages);
>                 active = !IS_ERR(pages);
>         }
> 
> The <0 path is never taken.

Please note that it only recently lost other paths that set an error
beforehand. Not exactly wrong since the short return is expected and
handled.

> Cc maintainers - should that driver be changed to use
> get_user_pages_fast?

It's not allowed to fault. __gup_fast has that comment, gup_fast does
not.
-Chris

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] gup: return -EFAULT on access_ok failure
  2018-04-05 18:43           ` Linus Torvalds
  2018-04-05 19:34             ` Michael S. Tsirkin
@ 2018-04-05 21:08             ` Michael S. Tsirkin
  1 sibling, 0 replies; 10+ messages in thread
From: Michael S. Tsirkin @ 2018-04-05 21:08 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Al Viro, Linux Kernel Mailing List, stable,
	syzbot+6304bf97ef436580fede, linux-mm, Kirill A. Shutemov,
	Andrew Morton, Huang Ying, Jonathan Corbet, Peter Zijlstra,
	Thomas Gleixner, Thorsten Leemhuis

On Thu, Apr 05, 2018 at 11:43:27AM -0700, Linus Torvalds wrote:
> On Thu, Apr 5, 2018 at 11:28 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > to repeat what you are saying IIUC __get_user_pages_fast returns 0 if it can't
> > pin any pages and that is by design.  Returning 0 on error isn't usual I think
> > so I guess this behaviour should we well documented.
> 
> Arguably it happens elsewhere too, and not just in the kernel.
> "read()" at past the end of a file is not an error, you'll just get 0
> for EOF.
> 
> So it's not really "returning 0 on error".
> 
> It really is simply returning the number of pages it got. End of
> story. That number of pages can be smaller than the requested number
> of pages, and _that_ is due to some error, but note how it can return
> "5" on error too - you asked for 10 pages, but the error happened in
> the middle!
> 
> So the right way to check for error is to bverify that you get the
> number of pages that you asked for. If you don't, something bad
> happened.
> 
> Of course, many users don't actually care about "I didn't get
> everything". They only care about "did I get _something_". Then that 0
> ends up being the error case, but note how it depends on the caller.
> 
> > What about get_user_pages_fast though?
> 
> We do seem to special-case the first page there. I'm not sure it's a
> good idea. But like the __get_user_pages_fast(), we seem to have users
> that know about the particular semantics and depend on it.
> 
> It's all ugly, I agree.
> 
> End result: we can't just change semantics of either of them.
> 
> At least not without going through every single user and checking that
> they are ok with it.
> 
> Which I guess I could be ok with. Maybe changing the semantics of
> __get_user_pages_fast() is acceptable, if you just change it
> *everywhere* (which includes not just he users, but also the couple of
> architecture-specific versions of that same function that we have.
> 
>                     Linus

For now I sent a patchset
1. documenting current behaviour for __get_user_pages_fast.
2. fixing get_user_pages_fast for consistency.

-- 
MST

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] gup: return -EFAULT on access_ok failure
  2018-04-05 18:28         ` Michael S. Tsirkin
  2018-04-05 18:43           ` Linus Torvalds
@ 2018-04-06 11:35           ` Alan Cox
  1 sibling, 0 replies; 10+ messages in thread
From: Alan Cox @ 2018-04-06 11:35 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Linus Torvalds, Al Viro, Linux Kernel Mailing List, stable,
	syzbot+6304bf97ef436580fede, linux-mm, Kirill A. Shutemov,
	Andrew Morton, Huang Ying, Jonathan Corbet, Peter Zijlstra,
	Thomas Gleixner, Thorsten Leemhuis

> so an error on the 1st page gets propagated to the caller,
> and that get_user_pages_unlocked eventually calls __get_user_pages
> so it does return an error sometimes.
> 
> Would it be correct to apply the second part of the patch then
> (pasted below for reference) or should get_user_pages_fast
> and all its callers be changed to return 0 on error instead?

0 isn't an error. As SuS sees it (ie from the userspace end of the pile)

returning the number you asked for means it worked

returning a smaller number means it worked partially and that much was
consumed (or in some cases more and the rest if so was lost - depends
what you are reading/writing)

returning 0 means you read nothing as you were at the end of file

returning an error code means it broke, or you should try again
(EAGAIN/EWOULDBLOCK)

The ugly bit there is the try-again semantics needs to exactly match the
attached poll() behaviour or you get busy loops.

Alan

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2018-04-06 11:36 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1522431382-4232-1-git-send-email-mst@redhat.com>
2018-04-05  1:53 ` [PATCH] gup: return -EFAULT on access_ok failure Michael S. Tsirkin
2018-04-05  2:40   ` Linus Torvalds
2018-04-05 14:17     ` Michael S. Tsirkin
2018-04-05 15:40       ` Linus Torvalds
2018-04-05 18:28         ` Michael S. Tsirkin
2018-04-05 18:43           ` Linus Torvalds
2018-04-05 19:34             ` Michael S. Tsirkin
2018-04-05 19:39               ` Chris Wilson
2018-04-05 21:08             ` Michael S. Tsirkin
2018-04-06 11:35           ` Alan Cox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).