linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/3] mm/get_user_pages_fast fixes, cleanups
@ 2018-04-05 21:03 Michael S. Tsirkin
  2018-04-05 21:03 ` [PATCH v2 1/3] mm/gup_benchmark: handle gup failures Michael S. Tsirkin
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Michael S. Tsirkin @ 2018-04-05 21:03 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kirill A . Shutemov, Andrew Morton, Huang Ying, Jonathan Corbet,
	Linus Torvalds, Peter Zijlstra, Thomas Gleixner,
	Thorsten Leemhuis

Turns out get_user_pages_fast and __get_user_pages_fast return different
values on error when given a single page: __get_user_pages_fast returns
0.  get_user_pages_fast returns either 0 or an error.

Callers of get_user_pages_fast expect an error so fix it up to return an
error consistently.

Stress the difference between get_user_pages_fast and __get_user_pages_fast
to make sure callers aren't confused.

Changes from v1:
	limit code changes to get_user_pages_fast.
	document __get_user_pages_fast
	fix a bug in caller

Michael S. Tsirkin (3):
  mm/gup_benchmark: handle gup failures
  gup: return -EFAULT on access_ok failure
  mm/gup: document return value

 arch/mips/mm/gup.c  | 2 ++
 arch/s390/mm/gup.c  | 2 ++
 arch/sh/mm/gup.c    | 2 ++
 arch/sparc/mm/gup.c | 4 ++++
 mm/gup.c            | 9 +++++++--
 mm/gup_benchmark.c  | 5 +++--
 mm/util.c           | 6 ++++--
 7 files changed, 24 insertions(+), 6 deletions(-)

-- 
MST

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v2 1/3] mm/gup_benchmark: handle gup failures
  2018-04-05 21:03 [PATCH v2 0/3] mm/get_user_pages_fast fixes, cleanups Michael S. Tsirkin
@ 2018-04-05 21:03 ` Michael S. Tsirkin
  2018-04-07 20:08   ` Linus Torvalds
  2018-04-05 21:03 ` [PATCH v2 2/3] gup: return -EFAULT on access_ok failure Michael S. Tsirkin
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 9+ messages in thread
From: Michael S. Tsirkin @ 2018-04-05 21:03 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kirill A . Shutemov, Andrew Morton, Huang Ying, Jonathan Corbet,
	Linus Torvalds, Peter Zijlstra, Thomas Gleixner,
	Thorsten Leemhuis, stable, linux-mm

__gup_benchmark_ioctl does not handle the case where
get_user_pages_fast fails:

- a negative return code will cause a buffer overrun
- returning with partial success will cause use of
  uninitialized memory.

Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thorsten Leemhuis <regressions@leemhuis.info>
Cc: stable@vger.kernel.org
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 mm/gup_benchmark.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/mm/gup_benchmark.c b/mm/gup_benchmark.c
index 5c8e2ab..d743035 100644
--- a/mm/gup_benchmark.c
+++ b/mm/gup_benchmark.c
@@ -23,7 +23,7 @@ static int __gup_benchmark_ioctl(unsigned int cmd,
 	struct page **pages;
 
 	nr_pages = gup->size / PAGE_SIZE;
-	pages = kvmalloc(sizeof(void *) * nr_pages, GFP_KERNEL);
+	pages = kvzalloc(sizeof(void *) * nr_pages, GFP_KERNEL);
 	if (!pages)
 		return -ENOMEM;
 
@@ -41,7 +41,8 @@ static int __gup_benchmark_ioctl(unsigned int cmd,
 		}
 
 		nr = get_user_pages_fast(addr, nr, gup->flags & 1, pages + i);
-		i += nr;
+		if (nr > 0)
+			i += nr;
 	}
 	end_time = ktime_get();
 
-- 
MST

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 2/3] gup: return -EFAULT on access_ok failure
  2018-04-05 21:03 [PATCH v2 0/3] mm/get_user_pages_fast fixes, cleanups Michael S. Tsirkin
  2018-04-05 21:03 ` [PATCH v2 1/3] mm/gup_benchmark: handle gup failures Michael S. Tsirkin
@ 2018-04-05 21:03 ` Michael S. Tsirkin
  2018-04-05 21:03 ` [PATCH v2 3/3] mm/gup: document return value Michael S. Tsirkin
  2018-04-06 20:08 ` [PATCH v2 0/3] mm/get_user_pages_fast fixes, cleanups Andrew Morton
  3 siblings, 0 replies; 9+ messages in thread
From: Michael S. Tsirkin @ 2018-04-05 21:03 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kirill A . Shutemov, Andrew Morton, Huang Ying, Jonathan Corbet,
	Linus Torvalds, Peter Zijlstra, Thomas Gleixner,
	Thorsten Leemhuis, stable, syzbot+6304bf97ef436580fede, linux-mm

get_user_pages_fast is supposed to be a faster drop-in equivalent of
get_user_pages. As such, callers expect it to return a negative return
code when passed an invalid address, and never expect it to
return 0 when passed a positive number of pages, since
its documentation says:

 * Returns number of pages pinned. This may be fewer than the number
 * requested. If nr_pages is 0 or negative, returns 0. If no pages
 * were pinned, returns -errno.

When get_user_pages_fast fall back on get_user_pages this is exactly
what happens.  Unfortunately the implementation is inconsistent: it returns 0 if
passed a kernel address, confusing callers: for example, the following
is pretty common but does not appear to do the right thing with a kernel
address:

        ret = get_user_pages_fast(addr, 1, writeable, &page);
        if (ret < 0)
                return ret;

Change get_user_pages_fast to return -EFAULT when supplied a
kernel address to make it match expectations.

All caller have been audited for consistency with the documented
semantics.

Lightly tested.

Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thorsten Leemhuis <regressions@leemhuis.info>
Cc: stable@vger.kernel.org
Fixes: 5b65c4677a57 ("mm, x86/mm: Fix performance regression in get_user_pages_fast()")
Reported-by: syzbot+6304bf97ef436580fede@syzkaller.appspotmail.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 mm/gup.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/mm/gup.c b/mm/gup.c
index 6afae32..8f3a064 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1806,9 +1806,12 @@ int get_user_pages_fast(unsigned long start, int nr_pages, int write,
 	len = (unsigned long) nr_pages << PAGE_SHIFT;
 	end = start + len;
 
+	if (nr_pages <= 0)
+		return 0;
+
 	if (unlikely(!access_ok(write ? VERIFY_WRITE : VERIFY_READ,
 					(void __user *)start, len)))
-		return 0;
+		return -EFAULT;
 
 	if (gup_fast_permitted(start, nr_pages, write)) {
 		local_irq_disable();
-- 
MST

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 3/3] mm/gup: document return value
  2018-04-05 21:03 [PATCH v2 0/3] mm/get_user_pages_fast fixes, cleanups Michael S. Tsirkin
  2018-04-05 21:03 ` [PATCH v2 1/3] mm/gup_benchmark: handle gup failures Michael S. Tsirkin
  2018-04-05 21:03 ` [PATCH v2 2/3] gup: return -EFAULT on access_ok failure Michael S. Tsirkin
@ 2018-04-05 21:03 ` Michael S. Tsirkin
  2018-04-06 20:08 ` [PATCH v2 0/3] mm/get_user_pages_fast fixes, cleanups Andrew Morton
  3 siblings, 0 replies; 9+ messages in thread
From: Michael S. Tsirkin @ 2018-04-05 21:03 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kirill A . Shutemov, Andrew Morton, Huang Ying, Jonathan Corbet,
	Linus Torvalds, Peter Zijlstra, Thomas Gleixner,
	Thorsten Leemhuis, Ralf Baechle, James Hogan, Martin Schwidefsky,
	Heiko Carstens, Yoshinori Sato, Rich Felker, David S. Miller,
	linux-mips, linux-s390, linux-sh, sparclinux, linux-mm

__get_user_pages_fast handles errors differently from
get_user_pages_fast: the former always returns the number of pages
pinned, the later might return a negative error code.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 arch/mips/mm/gup.c  | 2 ++
 arch/s390/mm/gup.c  | 2 ++
 arch/sh/mm/gup.c    | 2 ++
 arch/sparc/mm/gup.c | 4 ++++
 mm/gup.c            | 4 +++-
 mm/util.c           | 6 ++++--
 6 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/arch/mips/mm/gup.c b/arch/mips/mm/gup.c
index 1e4658e..5a4875ca 100644
--- a/arch/mips/mm/gup.c
+++ b/arch/mips/mm/gup.c
@@ -178,6 +178,8 @@ static int gup_pud_range(pgd_t pgd, unsigned long addr, unsigned long end,
 /*
  * Like get_user_pages_fast() except its IRQ-safe in that it won't fall
  * back to the regular GUP.
+ * Note a difference with get_user_pages_fast: this always returns the
+ * number of pages pinned, 0 if no pages were pinned.
  */
 int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
 			  struct page **pages)
diff --git a/arch/s390/mm/gup.c b/arch/s390/mm/gup.c
index 05c8abd..2809d11 100644
--- a/arch/s390/mm/gup.c
+++ b/arch/s390/mm/gup.c
@@ -220,6 +220,8 @@ static inline int gup_p4d_range(pgd_t *pgdp, pgd_t pgd, unsigned long addr,
 /*
  * Like get_user_pages_fast() except its IRQ-safe in that it won't fall
  * back to the regular GUP.
+ * Note a difference with get_user_pages_fast: this always returns the
+ * number of pages pinned, 0 if no pages were pinned.
  */
 int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
 			  struct page **pages)
diff --git a/arch/sh/mm/gup.c b/arch/sh/mm/gup.c
index 8045b5b..56c86ca 100644
--- a/arch/sh/mm/gup.c
+++ b/arch/sh/mm/gup.c
@@ -160,6 +160,8 @@ static int gup_pud_range(pgd_t pgd, unsigned long addr, unsigned long end,
 /*
  * Like get_user_pages_fast() except its IRQ-safe in that it won't fall
  * back to the regular GUP.
+ * Note a difference with get_user_pages_fast: this always returns the
+ * number of pages pinned, 0 if no pages were pinned.
  */
 int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
 			  struct page **pages)
diff --git a/arch/sparc/mm/gup.c b/arch/sparc/mm/gup.c
index 5335ba3..ca3eb69 100644
--- a/arch/sparc/mm/gup.c
+++ b/arch/sparc/mm/gup.c
@@ -192,6 +192,10 @@ static int gup_pud_range(pgd_t pgd, unsigned long addr, unsigned long end,
 	return 1;
 }
 
+/*
+ * Note a difference with get_user_pages_fast: this always returns the
+ * number of pages pinned, 0 if no pages were pinned.
+ */
 int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
 			  struct page **pages)
 {
diff --git a/mm/gup.c b/mm/gup.c
index 8f3a064..5cb5bb1 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1740,7 +1740,9 @@ bool gup_fast_permitted(unsigned long start, int nr_pages, int write)
 
 /*
  * Like get_user_pages_fast() except it's IRQ-safe in that it won't fall back to
- * the regular GUP. It will only return non-negative values.
+ * the regular GUP.
+ * Note a difference with get_user_pages_fast: this always returns the
+ * number of pages pinned, 0 if no pages were pinned.
  */
 int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
 			  struct page **pages)
diff --git a/mm/util.c b/mm/util.c
index c125050..db2f005 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -297,8 +297,10 @@ void arch_pick_mmap_layout(struct mm_struct *mm)
 /*
  * Like get_user_pages_fast() except its IRQ-safe in that it won't fall
  * back to the regular GUP.
- * If the architecture not support this function, simply return with no
- * page pinned
+ * Note a difference with get_user_pages_fast: this always returns the
+ * number of pages pinned, 0 if no pages were pinned.
+ * If the architecture does not support this function, simply return with no
+ * pages pinned.
  */
 int __weak __get_user_pages_fast(unsigned long start,
 				 int nr_pages, int write, struct page **pages)
-- 
MST

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 0/3] mm/get_user_pages_fast fixes, cleanups
  2018-04-05 21:03 [PATCH v2 0/3] mm/get_user_pages_fast fixes, cleanups Michael S. Tsirkin
                   ` (2 preceding siblings ...)
  2018-04-05 21:03 ` [PATCH v2 3/3] mm/gup: document return value Michael S. Tsirkin
@ 2018-04-06 20:08 ` Andrew Morton
  2018-04-08  3:08   ` Michael S. Tsirkin
  3 siblings, 1 reply; 9+ messages in thread
From: Andrew Morton @ 2018-04-06 20:08 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-kernel, Kirill A . Shutemov, Huang Ying, Jonathan Corbet,
	Linus Torvalds, Peter Zijlstra, Thomas Gleixner,
	Thorsten Leemhuis

On Fri, 6 Apr 2018 00:03:48 +0300 "Michael S. Tsirkin" <mst@redhat.com> wrote:

> Turns out get_user_pages_fast and __get_user_pages_fast return different
> values on error when given a single page: __get_user_pages_fast returns
> 0.  get_user_pages_fast returns either 0 or an error.
> 
> Callers of get_user_pages_fast expect an error so fix it up to return an
> error consistently.
> 
> Stress the difference between get_user_pages_fast and __get_user_pages_fast
> to make sure callers aren't confused.
> 

A term which is missing from all these changelogs is "vhost" :(

This patchset fixes a user-affecting bug, does it not?  If so, please
fully describe that bug so that we can decide which kernel version(s)
need the patchset.

And yes, this return value asymmetry is sad.  Did you scope out what
would be needed to fix up the callers so we can avoid this?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 1/3] mm/gup_benchmark: handle gup failures
  2018-04-05 21:03 ` [PATCH v2 1/3] mm/gup_benchmark: handle gup failures Michael S. Tsirkin
@ 2018-04-07 20:08   ` Linus Torvalds
  2018-04-08  3:12     ` Michael S. Tsirkin
  0 siblings, 1 reply; 9+ messages in thread
From: Linus Torvalds @ 2018-04-07 20:08 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Linux Kernel Mailing List, Kirill A . Shutemov, Andrew Morton,
	Huang Ying, Jonathan Corbet, Peter Zijlstra, Thomas Gleixner,
	Thorsten Leemhuis, stable, linux-mm

On Thu, Apr 5, 2018 at 2:03 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>
>                 nr = get_user_pages_fast(addr, nr, gup->flags & 1, pages + i);
> -               i += nr;
> +               if (nr > 0)
> +                       i += nr;

Can we just make this robust while at it, and just make it

        if (nr <= 0)
                break;

instead? Then it doesn't care about zero vs negative error, and
wouldn't get stuck in an endless loop if it got zero.

             Linus

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 0/3] mm/get_user_pages_fast fixes, cleanups
  2018-04-06 20:08 ` [PATCH v2 0/3] mm/get_user_pages_fast fixes, cleanups Andrew Morton
@ 2018-04-08  3:08   ` Michael S. Tsirkin
  0 siblings, 0 replies; 9+ messages in thread
From: Michael S. Tsirkin @ 2018-04-08  3:08 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, Kirill A . Shutemov, Huang Ying, Jonathan Corbet,
	Linus Torvalds, Peter Zijlstra, Thomas Gleixner,
	Thorsten Leemhuis

On Fri, Apr 06, 2018 at 01:08:02PM -0700, Andrew Morton wrote:
> On Fri, 6 Apr 2018 00:03:48 +0300 "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > Turns out get_user_pages_fast and __get_user_pages_fast return different
> > values on error when given a single page: __get_user_pages_fast returns
> > 0.  get_user_pages_fast returns either 0 or an error.
> > 
> > Callers of get_user_pages_fast expect an error so fix it up to return an
> > error consistently.
> > 
> > Stress the difference between get_user_pages_fast and __get_user_pages_fast
> > to make sure callers aren't confused.
> > 
> 
> A term which is missing from all these changelogs is "vhost" :(

vhost has a BUG_ON for unexpected handling so it catches the bug, but
it's not the only site affected.

> This patchset fixes a user-affecting bug, does it not?  If so, please
> fully describe that bug so that we can decide which kernel version(s)
> need the patchset.

OK, I'll try to write up something.

> And yes, this return value asymmetry is sad.  Did you scope out what
> would be needed to fix up the callers so we can avoid this?

Yes - there is a very small number of callers for __get_user_pages_fast
so it's easy to teach them all to treat any value <=0 as 0.
There seems to be some opposition to changing the API,
I'd like to look into this after the bugfix patchset is merged.

-- 
MST

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 1/3] mm/gup_benchmark: handle gup failures
  2018-04-07 20:08   ` Linus Torvalds
@ 2018-04-08  3:12     ` Michael S. Tsirkin
  2018-04-10  0:38       ` Andrew Morton
  0 siblings, 1 reply; 9+ messages in thread
From: Michael S. Tsirkin @ 2018-04-08  3:12 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linux Kernel Mailing List, Kirill A . Shutemov, Andrew Morton,
	Huang Ying, Jonathan Corbet, Peter Zijlstra, Thomas Gleixner,
	Thorsten Leemhuis, stable, linux-mm

On Sat, Apr 07, 2018 at 01:08:43PM -0700, Linus Torvalds wrote:
> On Thu, Apr 5, 2018 at 2:03 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> >                 nr = get_user_pages_fast(addr, nr, gup->flags & 1, pages + i);
> > -               i += nr;
> > +               if (nr > 0)
> > +                       i += nr;
> 
> Can we just make this robust while at it, and just make it
> 
>         if (nr <= 0)
>                 break;
> 
> instead? Then it doesn't care about zero vs negative error, and
> wouldn't get stuck in an endless loop if it got zero.
> 
>              Linus

I don't mind though it alredy breaks out on the next cycle:

                if (nr != gup->nr_pages_per_call)
                        break;

the only issue is i getting corrupted when nr < 0;

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 1/3] mm/gup_benchmark: handle gup failures
  2018-04-08  3:12     ` Michael S. Tsirkin
@ 2018-04-10  0:38       ` Andrew Morton
  0 siblings, 0 replies; 9+ messages in thread
From: Andrew Morton @ 2018-04-10  0:38 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Linus Torvalds, Linux Kernel Mailing List, Kirill A . Shutemov,
	Huang Ying, Jonathan Corbet, Peter Zijlstra, Thomas Gleixner,
	Thorsten Leemhuis, stable, linux-mm

On Sun, 8 Apr 2018 06:12:13 +0300 "Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Sat, Apr 07, 2018 at 01:08:43PM -0700, Linus Torvalds wrote:
> > On Thu, Apr 5, 2018 at 2:03 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > >                 nr = get_user_pages_fast(addr, nr, gup->flags & 1, pages + i);
> > > -               i += nr;
> > > +               if (nr > 0)
> > > +                       i += nr;
> > 
> > Can we just make this robust while at it, and just make it
> > 
> >         if (nr <= 0)
> >                 break;
> > 
> > instead? Then it doesn't care about zero vs negative error, and
> > wouldn't get stuck in an endless loop if it got zero.
> > 
> >              Linus
> 
> I don't mind though it alredy breaks out on the next cycle:
> 
>                 if (nr != gup->nr_pages_per_call)
>                         break;
> 
> the only issue is i getting corrupted when nr < 0;
> 

It does help readability to have the thing bail out as soon as we see
something go bad.  This?

--- a/mm/gup_benchmark.c~mm-gup_benchmark-handle-gup-failures-fix
+++ a/mm/gup_benchmark.c
@@ -41,8 +41,9 @@ static int __gup_benchmark_ioctl(unsigne
 		}
 
 		nr = get_user_pages_fast(addr, nr, gup->flags & 1, pages + i);
-		if (nr > 0)
-			i += nr;
+		if (nr <= 0)
+			break;
+		i += nr;
 	}
 	end_time = ktime_get();
 
_

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2018-04-10  0:38 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-05 21:03 [PATCH v2 0/3] mm/get_user_pages_fast fixes, cleanups Michael S. Tsirkin
2018-04-05 21:03 ` [PATCH v2 1/3] mm/gup_benchmark: handle gup failures Michael S. Tsirkin
2018-04-07 20:08   ` Linus Torvalds
2018-04-08  3:12     ` Michael S. Tsirkin
2018-04-10  0:38       ` Andrew Morton
2018-04-05 21:03 ` [PATCH v2 2/3] gup: return -EFAULT on access_ok failure Michael S. Tsirkin
2018-04-05 21:03 ` [PATCH v2 3/3] mm/gup: document return value Michael S. Tsirkin
2018-04-06 20:08 ` [PATCH v2 0/3] mm/get_user_pages_fast fixes, cleanups Andrew Morton
2018-04-08  3:08   ` Michael S. Tsirkin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).