All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] leverage FAULT_FOLL_ALLOW_RETRY in get_user_pages try#2
@ 2015-01-13 16:37 ` Andrea Arcangeli
  0 siblings, 0 replies; 26+ messages in thread
From: Andrea Arcangeli @ 2015-01-13 16:37 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel, linux-mm
  Cc: Kirill A. Shutemov, Michel Lespinasse, Andrew Jones,
	Hugh Dickins, Mel Gorman, Andres Lagar-Cavilla, Minchan Kim,
	KOSAKI Motohiro, \"Dr. David Alan Gilbert\",
	Peter Feiner, Peter Zijlstra, Benjamin Herrenschmidt,
	James Bottomley, David Miller, Steve Capper, Johannes Weiner

Last submit didn't go into -mm/3.19-rc, no prob but here I retry
(possibly too early for 3.20-rc but I don't expect breakages in this
area post -rc4) after a rebase on upstream.

FAULT_FOLL_ALLOW_RETRY allows the page fault to drop the mmap_sem for
reading to reduce the mmap_sem contention (for writing), like while
waiting for I/O completion. The problem is that right now practically
no get_user_pages call uses FAULT_FOLL_ALLOW_RETRY, so we're not
leveraging that nifty feature.

Andres fixed it for the KVM page fault. However get_user_pages_fast
remains uncovered, and 99% of other get_user_pages aren't using it
either (the only exception being FOLL_NOWAIT in KVM which is really
nonblocking and in fact it doesn't even release the mmap_sem).

So this patchsets extends the optimization Andres did in the KVM page
fault to the whole kernel. It makes most important places (including
gup_fast) to use FAULT_FOLL_ALLOW_RETRY to reduce the mmap_sem hold
times during I/O.

The only few places that remains uncovered are drivers like v4l and
other exceptions that tends to work on their own memory and they're
not working on random user memory (for example like O_DIRECT that uses
gup_fast and is fully covered by this patch).

A follow up patch should probably also add a printk_once warning to
get_user_pages that should go obsolete and be phased out
eventually. The "vmas" parameter of get_user_pages makes it
fundamentally incompatible with FAULT_FOLL_ALLOW_RETRY (vmas array
becomes meaningless the moment the mmap_sem is released). 

While this is just an optimization, this becomes an absolute
requirement for the userfaultfd feature
http://lwn.net/Articles/615086/ .

The userfaultfd allows to block the page fault, and in order to do so
I need to drop the mmap_sem first. So this patch also ensures that all
memory where userfaultfd could be registered by KVM, the very first
fault (no matter if it is a regular page fault, or a get_user_pages)
always has FAULT_FOLL_ALLOW_RETRY set. Then the userfaultfd blocks and
it is waken only when the pagetable is already mapped. The second
fault attempt after the wakeup doesn't need FAULT_FOLL_ALLOW_RETRY, so
it's ok to retry without it.

Thanks,
Andrea

Andrea Arcangeli (5):
  mm: gup: add get_user_pages_locked and get_user_pages_unlocked
  mm: gup: add __get_user_pages_unlocked to customize gup_flags
  mm: gup: use get_user_pages_unlocked within get_user_pages_fast
  mm: gup: use get_user_pages_unlocked
  mm: gup: kvm use get_user_pages_unlocked

 arch/mips/mm/gup.c                 |   8 +-
 arch/s390/mm/gup.c                 |   6 +-
 arch/sh/mm/gup.c                   |   6 +-
 arch/sparc/mm/gup.c                |   6 +-
 arch/x86/mm/gup.c                  |   7 +-
 drivers/media/pci/ivtv/ivtv-udma.c |   6 +-
 drivers/scsi/st.c                  |   7 +-
 drivers/video/fbdev/pvr2fb.c       |   6 +-
 include/linux/kvm_host.h           |  11 --
 include/linux/mm.h                 |  11 ++
 mm/gup.c                           | 203 ++++++++++++++++++++++++++++++++++---
 mm/nommu.c                         |  33 ++++++
 mm/process_vm_access.c             |   7 +-
 mm/util.c                          |  10 +-
 net/ceph/pagevec.c                 |   6 +-
 virt/kvm/async_pf.c                |   2 +-
 virt/kvm/kvm_main.c                |  50 +--------
 17 files changed, 261 insertions(+), 124 deletions(-)


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 0/5] leverage FAULT_FOLL_ALLOW_RETRY in get_user_pages try#2
@ 2015-01-13 16:37 ` Andrea Arcangeli
  0 siblings, 0 replies; 26+ messages in thread
From: Andrea Arcangeli @ 2015-01-13 16:37 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel, linux-mm
  Cc: Kirill A. Shutemov, Michel Lespinasse, Andrew Jones,
	Hugh Dickins, Mel Gorman, Andres Lagar-Cavilla, Minchan Kim,
	KOSAKI Motohiro, \"Dr. David Alan Gilbert\",
	Peter Feiner, Peter Zijlstra, Benjamin Herrenschmidt,
	James Bottomley, David Miller, Steve Capper, Johannes Weiner

Last submit didn't go into -mm/3.19-rc, no prob but here I retry
(possibly too early for 3.20-rc but I don't expect breakages in this
area post -rc4) after a rebase on upstream.

FAULT_FOLL_ALLOW_RETRY allows the page fault to drop the mmap_sem for
reading to reduce the mmap_sem contention (for writing), like while
waiting for I/O completion. The problem is that right now practically
no get_user_pages call uses FAULT_FOLL_ALLOW_RETRY, so we're not
leveraging that nifty feature.

Andres fixed it for the KVM page fault. However get_user_pages_fast
remains uncovered, and 99% of other get_user_pages aren't using it
either (the only exception being FOLL_NOWAIT in KVM which is really
nonblocking and in fact it doesn't even release the mmap_sem).

So this patchsets extends the optimization Andres did in the KVM page
fault to the whole kernel. It makes most important places (including
gup_fast) to use FAULT_FOLL_ALLOW_RETRY to reduce the mmap_sem hold
times during I/O.

The only few places that remains uncovered are drivers like v4l and
other exceptions that tends to work on their own memory and they're
not working on random user memory (for example like O_DIRECT that uses
gup_fast and is fully covered by this patch).

A follow up patch should probably also add a printk_once warning to
get_user_pages that should go obsolete and be phased out
eventually. The "vmas" parameter of get_user_pages makes it
fundamentally incompatible with FAULT_FOLL_ALLOW_RETRY (vmas array
becomes meaningless the moment the mmap_sem is released). 

While this is just an optimization, this becomes an absolute
requirement for the userfaultfd feature
http://lwn.net/Articles/615086/ .

The userfaultfd allows to block the page fault, and in order to do so
I need to drop the mmap_sem first. So this patch also ensures that all
memory where userfaultfd could be registered by KVM, the very first
fault (no matter if it is a regular page fault, or a get_user_pages)
always has FAULT_FOLL_ALLOW_RETRY set. Then the userfaultfd blocks and
it is waken only when the pagetable is already mapped. The second
fault attempt after the wakeup doesn't need FAULT_FOLL_ALLOW_RETRY, so
it's ok to retry without it.

Thanks,
Andrea

Andrea Arcangeli (5):
  mm: gup: add get_user_pages_locked and get_user_pages_unlocked
  mm: gup: add __get_user_pages_unlocked to customize gup_flags
  mm: gup: use get_user_pages_unlocked within get_user_pages_fast
  mm: gup: use get_user_pages_unlocked
  mm: gup: kvm use get_user_pages_unlocked

 arch/mips/mm/gup.c                 |   8 +-
 arch/s390/mm/gup.c                 |   6 +-
 arch/sh/mm/gup.c                   |   6 +-
 arch/sparc/mm/gup.c                |   6 +-
 arch/x86/mm/gup.c                  |   7 +-
 drivers/media/pci/ivtv/ivtv-udma.c |   6 +-
 drivers/scsi/st.c                  |   7 +-
 drivers/video/fbdev/pvr2fb.c       |   6 +-
 include/linux/kvm_host.h           |  11 --
 include/linux/mm.h                 |  11 ++
 mm/gup.c                           | 203 ++++++++++++++++++++++++++++++++++---
 mm/nommu.c                         |  33 ++++++
 mm/process_vm_access.c             |   7 +-
 mm/util.c                          |  10 +-
 net/ceph/pagevec.c                 |   6 +-
 virt/kvm/async_pf.c                |   2 +-
 virt/kvm/kvm_main.c                |  50 +--------
 17 files changed, 261 insertions(+), 124 deletions(-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 1/5] mm: gup: add get_user_pages_locked and get_user_pages_unlocked
  2015-01-13 16:37 ` Andrea Arcangeli
@ 2015-01-13 16:37   ` Andrea Arcangeli
  -1 siblings, 0 replies; 26+ messages in thread
From: Andrea Arcangeli @ 2015-01-13 16:37 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel, linux-mm
  Cc: Kirill A. Shutemov, Michel Lespinasse, Andrew Jones,
	Hugh Dickins, Mel Gorman, Andres Lagar-Cavilla, Minchan Kim,
	KOSAKI Motohiro, \"Dr. David Alan Gilbert\",
	Peter Feiner, Peter Zijlstra, Benjamin Herrenschmidt,
	James Bottomley, David Miller, Steve Capper, Johannes Weiner

We can leverage the VM_FAULT_RETRY functionality in the page fault
paths better by using either get_user_pages_locked or
get_user_pages_unlocked.

The former allow conversion of get_user_pages invocations that will
have to pass a "&locked" parameter to know if the mmap_sem was dropped
during the call. Example from:

    down_read(&mm->mmap_sem);
    do_something()
    get_user_pages(tsk, mm, ..., pages, NULL);
    up_read(&mm->mmap_sem);

to:

    int locked = 1;
    down_read(&mm->mmap_sem);
    do_something()
    get_user_pages_locked(tsk, mm, ..., pages, &locked);
    if (locked)
        up_read(&mm->mmap_sem);

The latter is suitable only as a drop in replacement of the form:

    down_read(&mm->mmap_sem);
    get_user_pages(tsk, mm, ..., pages, NULL);
    up_read(&mm->mmap_sem);

into:

    get_user_pages_unlocked(tsk, mm, ..., pages);

Where tsk, mm, the intermediate "..." paramters and "pages" can be any
value as before. Just the last parameter of get_user_pages (vmas) must
be NULL for get_user_pages_locked|unlocked to be usable (the latter
original form wouldn't have been safe anyway if vmas wasn't null, for
the former we just make it explicit by dropping the parameter).

If vmas is not NULL these two methods cannot be used.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: Andres Lagar-Cavilla <andreslc@google.com>
Reviewed-by: Peter Feiner <pfeiner@google.com>
---
 include/linux/mm.h |   7 +++
 mm/gup.c           | 177 +++++++++++++++++++++++++++++++++++++++++++++++++----
 mm/nommu.c         |  23 +++++++
 3 files changed, 196 insertions(+), 11 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 80fc92a..2ea38d8 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1234,6 +1234,13 @@ long get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
 		    unsigned long start, unsigned long nr_pages,
 		    int write, int force, struct page **pages,
 		    struct vm_area_struct **vmas);
+long get_user_pages_locked(struct task_struct *tsk, struct mm_struct *mm,
+		    unsigned long start, unsigned long nr_pages,
+		    int write, int force, struct page **pages,
+		    int *locked);
+long get_user_pages_unlocked(struct task_struct *tsk, struct mm_struct *mm,
+		    unsigned long start, unsigned long nr_pages,
+		    int write, int force, struct page **pages);
 int get_user_pages_fast(unsigned long start, int nr_pages, int write,
 			struct page **pages);
 struct kvec;
diff --git a/mm/gup.c b/mm/gup.c
index a900759..c2bd9b3 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -584,6 +584,165 @@ int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm,
 	return 0;
 }
 
+static __always_inline long __get_user_pages_locked(struct task_struct *tsk,
+						struct mm_struct *mm,
+						unsigned long start,
+						unsigned long nr_pages,
+						int write, int force,
+						struct page **pages,
+						struct vm_area_struct **vmas,
+						int *locked, bool notify_drop)
+{
+	int flags = FOLL_TOUCH;
+	long ret, pages_done;
+	bool lock_dropped;
+
+	if (locked) {
+		/* if VM_FAULT_RETRY can be returned, vmas become invalid */
+		BUG_ON(vmas);
+		/* check caller initialized locked */
+		BUG_ON(*locked != 1);
+	}
+
+	if (pages)
+		flags |= FOLL_GET;
+	if (write)
+		flags |= FOLL_WRITE;
+	if (force)
+		flags |= FOLL_FORCE;
+
+	pages_done = 0;
+	lock_dropped = false;
+	for (;;) {
+		ret = __get_user_pages(tsk, mm, start, nr_pages, flags, pages,
+				       vmas, locked);
+		if (!locked)
+			/* VM_FAULT_RETRY couldn't trigger, bypass */
+			return ret;
+
+		/* VM_FAULT_RETRY cannot return errors */
+		if (!*locked) {
+			BUG_ON(ret < 0);
+			BUG_ON(ret >= nr_pages);
+		}
+
+		if (!pages)
+			/* If it's a prefault don't insist harder */
+			return ret;
+
+		if (ret > 0) {
+			nr_pages -= ret;
+			pages_done += ret;
+			if (!nr_pages)
+				break;
+		}
+		if (*locked) {
+			/* VM_FAULT_RETRY didn't trigger */
+			if (!pages_done)
+				pages_done = ret;
+			break;
+		}
+		/* VM_FAULT_RETRY triggered, so seek to the faulting offset */
+		pages += ret;
+		start += ret << PAGE_SHIFT;
+
+		/*
+		 * Repeat on the address that fired VM_FAULT_RETRY
+		 * without FAULT_FLAG_ALLOW_RETRY but with
+		 * FAULT_FLAG_TRIED.
+		 */
+		*locked = 1;
+		lock_dropped = true;
+		down_read(&mm->mmap_sem);
+		ret = __get_user_pages(tsk, mm, start, 1, flags | FOLL_TRIED,
+				       pages, NULL, NULL);
+		if (ret != 1) {
+			BUG_ON(ret > 1);
+			if (!pages_done)
+				pages_done = ret;
+			break;
+		}
+		nr_pages--;
+		pages_done++;
+		if (!nr_pages)
+			break;
+		pages++;
+		start += PAGE_SIZE;
+	}
+	if (notify_drop && lock_dropped && *locked) {
+		/*
+		 * We must let the caller know we temporarily dropped the lock
+		 * and so the critical section protected by it was lost.
+		 */
+		up_read(&mm->mmap_sem);
+		*locked = 0;
+	}
+	return pages_done;
+}
+
+/*
+ * We can leverage the VM_FAULT_RETRY functionality in the page fault
+ * paths better by using either get_user_pages_locked() or
+ * get_user_pages_unlocked().
+ *
+ * get_user_pages_locked() is suitable to replace the form:
+ *
+ *      down_read(&mm->mmap_sem);
+ *      do_something()
+ *      get_user_pages(tsk, mm, ..., pages, NULL);
+ *      up_read(&mm->mmap_sem);
+ *
+ *  to:
+ *
+ *      int locked = 1;
+ *      down_read(&mm->mmap_sem);
+ *      do_something()
+ *      get_user_pages_locked(tsk, mm, ..., pages, &locked);
+ *      if (locked)
+ *          up_read(&mm->mmap_sem);
+ */
+long get_user_pages_locked(struct task_struct *tsk, struct mm_struct *mm,
+			   unsigned long start, unsigned long nr_pages,
+			   int write, int force, struct page **pages,
+			   int *locked)
+{
+	return __get_user_pages_locked(tsk, mm, start, nr_pages, write, force,
+				       pages, NULL, locked, true);
+}
+EXPORT_SYMBOL(get_user_pages_locked);
+
+/*
+ * get_user_pages_unlocked() is suitable to replace the form:
+ *
+ *      down_read(&mm->mmap_sem);
+ *      get_user_pages(tsk, mm, ..., pages, NULL);
+ *      up_read(&mm->mmap_sem);
+ *
+ *  with:
+ *
+ *      get_user_pages_unlocked(tsk, mm, ..., pages);
+ *
+ * It is functionally equivalent to get_user_pages_fast so
+ * get_user_pages_fast should be used instead, if the two parameters
+ * "tsk" and "mm" are respectively equal to current and current->mm,
+ * or if "force" shall be set to 1 (get_user_pages_fast misses the
+ * "force" parameter).
+ */
+long get_user_pages_unlocked(struct task_struct *tsk, struct mm_struct *mm,
+			     unsigned long start, unsigned long nr_pages,
+			     int write, int force, struct page **pages)
+{
+	long ret;
+	int locked = 1;
+	down_read(&mm->mmap_sem);
+	ret = __get_user_pages_locked(tsk, mm, start, nr_pages, write, force,
+				      pages, NULL, &locked, false);
+	if (locked)
+		up_read(&mm->mmap_sem);
+	return ret;
+}
+EXPORT_SYMBOL(get_user_pages_unlocked);
+
 /*
  * get_user_pages() - pin user pages in memory
  * @tsk:	the task_struct to use for page fault accounting, or
@@ -633,22 +792,18 @@ int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm,
  * use the correct cache flushing APIs.
  *
  * See also get_user_pages_fast, for performance critical applications.
+ *
+ * get_user_pages should be phased out in favor of
+ * get_user_pages_locked|unlocked or get_user_pages_fast. Nothing
+ * should use get_user_pages because it cannot pass
+ * FAULT_FLAG_ALLOW_RETRY to handle_mm_fault.
  */
 long get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
 		unsigned long start, unsigned long nr_pages, int write,
 		int force, struct page **pages, struct vm_area_struct **vmas)
 {
-	int flags = FOLL_TOUCH;
-
-	if (pages)
-		flags |= FOLL_GET;
-	if (write)
-		flags |= FOLL_WRITE;
-	if (force)
-		flags |= FOLL_FORCE;
-
-	return __get_user_pages(tsk, mm, start, nr_pages, flags, pages, vmas,
-				NULL);
+	return __get_user_pages_locked(tsk, mm, start, nr_pages, write, force,
+				       pages, vmas, NULL, false);
 }
 EXPORT_SYMBOL(get_user_pages);
 
diff --git a/mm/nommu.c b/mm/nommu.c
index b51eadf..d6d0a6ac 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -213,6 +213,29 @@ long get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
 }
 EXPORT_SYMBOL(get_user_pages);
 
+long get_user_pages_locked(struct task_struct *tsk, struct mm_struct *mm,
+			   unsigned long start, unsigned long nr_pages,
+			   int write, int force, struct page **pages,
+			   int *locked)
+{
+	return get_user_pages(tsk, mm, start, nr_pages, write, force,
+			      pages, NULL);
+}
+EXPORT_SYMBOL(get_user_pages_locked);
+
+long get_user_pages_unlocked(struct task_struct *tsk, struct mm_struct *mm,
+			     unsigned long start, unsigned long nr_pages,
+			     int write, int force, struct page **pages)
+{
+	long ret;
+	down_read(&mm->mmap_sem);
+	ret = get_user_pages(tsk, mm, start, nr_pages, write, force,
+			     pages, NULL);
+	up_read(&mm->mmap_sem);
+	return ret;
+}
+EXPORT_SYMBOL(get_user_pages_unlocked);
+
 /**
  * follow_pfn - look up PFN at a user virtual address
  * @vma: memory mapping

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 1/5] mm: gup: add get_user_pages_locked and get_user_pages_unlocked
@ 2015-01-13 16:37   ` Andrea Arcangeli
  0 siblings, 0 replies; 26+ messages in thread
From: Andrea Arcangeli @ 2015-01-13 16:37 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel, linux-mm
  Cc: Kirill A. Shutemov, Michel Lespinasse, Andrew Jones,
	Hugh Dickins, Mel Gorman, Andres Lagar-Cavilla, Minchan Kim,
	KOSAKI Motohiro, \"Dr. David Alan Gilbert\",
	Peter Feiner, Peter Zijlstra, Benjamin Herrenschmidt,
	James Bottomley, David Miller, Steve Capper, Johannes Weiner

We can leverage the VM_FAULT_RETRY functionality in the page fault
paths better by using either get_user_pages_locked or
get_user_pages_unlocked.

The former allow conversion of get_user_pages invocations that will
have to pass a "&locked" parameter to know if the mmap_sem was dropped
during the call. Example from:

    down_read(&mm->mmap_sem);
    do_something()
    get_user_pages(tsk, mm, ..., pages, NULL);
    up_read(&mm->mmap_sem);

to:

    int locked = 1;
    down_read(&mm->mmap_sem);
    do_something()
    get_user_pages_locked(tsk, mm, ..., pages, &locked);
    if (locked)
        up_read(&mm->mmap_sem);

The latter is suitable only as a drop in replacement of the form:

    down_read(&mm->mmap_sem);
    get_user_pages(tsk, mm, ..., pages, NULL);
    up_read(&mm->mmap_sem);

into:

    get_user_pages_unlocked(tsk, mm, ..., pages);

Where tsk, mm, the intermediate "..." paramters and "pages" can be any
value as before. Just the last parameter of get_user_pages (vmas) must
be NULL for get_user_pages_locked|unlocked to be usable (the latter
original form wouldn't have been safe anyway if vmas wasn't null, for
the former we just make it explicit by dropping the parameter).

If vmas is not NULL these two methods cannot be used.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: Andres Lagar-Cavilla <andreslc@google.com>
Reviewed-by: Peter Feiner <pfeiner@google.com>
---
 include/linux/mm.h |   7 +++
 mm/gup.c           | 177 +++++++++++++++++++++++++++++++++++++++++++++++++----
 mm/nommu.c         |  23 +++++++
 3 files changed, 196 insertions(+), 11 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 80fc92a..2ea38d8 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1234,6 +1234,13 @@ long get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
 		    unsigned long start, unsigned long nr_pages,
 		    int write, int force, struct page **pages,
 		    struct vm_area_struct **vmas);
+long get_user_pages_locked(struct task_struct *tsk, struct mm_struct *mm,
+		    unsigned long start, unsigned long nr_pages,
+		    int write, int force, struct page **pages,
+		    int *locked);
+long get_user_pages_unlocked(struct task_struct *tsk, struct mm_struct *mm,
+		    unsigned long start, unsigned long nr_pages,
+		    int write, int force, struct page **pages);
 int get_user_pages_fast(unsigned long start, int nr_pages, int write,
 			struct page **pages);
 struct kvec;
diff --git a/mm/gup.c b/mm/gup.c
index a900759..c2bd9b3 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -584,6 +584,165 @@ int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm,
 	return 0;
 }
 
+static __always_inline long __get_user_pages_locked(struct task_struct *tsk,
+						struct mm_struct *mm,
+						unsigned long start,
+						unsigned long nr_pages,
+						int write, int force,
+						struct page **pages,
+						struct vm_area_struct **vmas,
+						int *locked, bool notify_drop)
+{
+	int flags = FOLL_TOUCH;
+	long ret, pages_done;
+	bool lock_dropped;
+
+	if (locked) {
+		/* if VM_FAULT_RETRY can be returned, vmas become invalid */
+		BUG_ON(vmas);
+		/* check caller initialized locked */
+		BUG_ON(*locked != 1);
+	}
+
+	if (pages)
+		flags |= FOLL_GET;
+	if (write)
+		flags |= FOLL_WRITE;
+	if (force)
+		flags |= FOLL_FORCE;
+
+	pages_done = 0;
+	lock_dropped = false;
+	for (;;) {
+		ret = __get_user_pages(tsk, mm, start, nr_pages, flags, pages,
+				       vmas, locked);
+		if (!locked)
+			/* VM_FAULT_RETRY couldn't trigger, bypass */
+			return ret;
+
+		/* VM_FAULT_RETRY cannot return errors */
+		if (!*locked) {
+			BUG_ON(ret < 0);
+			BUG_ON(ret >= nr_pages);
+		}
+
+		if (!pages)
+			/* If it's a prefault don't insist harder */
+			return ret;
+
+		if (ret > 0) {
+			nr_pages -= ret;
+			pages_done += ret;
+			if (!nr_pages)
+				break;
+		}
+		if (*locked) {
+			/* VM_FAULT_RETRY didn't trigger */
+			if (!pages_done)
+				pages_done = ret;
+			break;
+		}
+		/* VM_FAULT_RETRY triggered, so seek to the faulting offset */
+		pages += ret;
+		start += ret << PAGE_SHIFT;
+
+		/*
+		 * Repeat on the address that fired VM_FAULT_RETRY
+		 * without FAULT_FLAG_ALLOW_RETRY but with
+		 * FAULT_FLAG_TRIED.
+		 */
+		*locked = 1;
+		lock_dropped = true;
+		down_read(&mm->mmap_sem);
+		ret = __get_user_pages(tsk, mm, start, 1, flags | FOLL_TRIED,
+				       pages, NULL, NULL);
+		if (ret != 1) {
+			BUG_ON(ret > 1);
+			if (!pages_done)
+				pages_done = ret;
+			break;
+		}
+		nr_pages--;
+		pages_done++;
+		if (!nr_pages)
+			break;
+		pages++;
+		start += PAGE_SIZE;
+	}
+	if (notify_drop && lock_dropped && *locked) {
+		/*
+		 * We must let the caller know we temporarily dropped the lock
+		 * and so the critical section protected by it was lost.
+		 */
+		up_read(&mm->mmap_sem);
+		*locked = 0;
+	}
+	return pages_done;
+}
+
+/*
+ * We can leverage the VM_FAULT_RETRY functionality in the page fault
+ * paths better by using either get_user_pages_locked() or
+ * get_user_pages_unlocked().
+ *
+ * get_user_pages_locked() is suitable to replace the form:
+ *
+ *      down_read(&mm->mmap_sem);
+ *      do_something()
+ *      get_user_pages(tsk, mm, ..., pages, NULL);
+ *      up_read(&mm->mmap_sem);
+ *
+ *  to:
+ *
+ *      int locked = 1;
+ *      down_read(&mm->mmap_sem);
+ *      do_something()
+ *      get_user_pages_locked(tsk, mm, ..., pages, &locked);
+ *      if (locked)
+ *          up_read(&mm->mmap_sem);
+ */
+long get_user_pages_locked(struct task_struct *tsk, struct mm_struct *mm,
+			   unsigned long start, unsigned long nr_pages,
+			   int write, int force, struct page **pages,
+			   int *locked)
+{
+	return __get_user_pages_locked(tsk, mm, start, nr_pages, write, force,
+				       pages, NULL, locked, true);
+}
+EXPORT_SYMBOL(get_user_pages_locked);
+
+/*
+ * get_user_pages_unlocked() is suitable to replace the form:
+ *
+ *      down_read(&mm->mmap_sem);
+ *      get_user_pages(tsk, mm, ..., pages, NULL);
+ *      up_read(&mm->mmap_sem);
+ *
+ *  with:
+ *
+ *      get_user_pages_unlocked(tsk, mm, ..., pages);
+ *
+ * It is functionally equivalent to get_user_pages_fast so
+ * get_user_pages_fast should be used instead, if the two parameters
+ * "tsk" and "mm" are respectively equal to current and current->mm,
+ * or if "force" shall be set to 1 (get_user_pages_fast misses the
+ * "force" parameter).
+ */
+long get_user_pages_unlocked(struct task_struct *tsk, struct mm_struct *mm,
+			     unsigned long start, unsigned long nr_pages,
+			     int write, int force, struct page **pages)
+{
+	long ret;
+	int locked = 1;
+	down_read(&mm->mmap_sem);
+	ret = __get_user_pages_locked(tsk, mm, start, nr_pages, write, force,
+				      pages, NULL, &locked, false);
+	if (locked)
+		up_read(&mm->mmap_sem);
+	return ret;
+}
+EXPORT_SYMBOL(get_user_pages_unlocked);
+
 /*
  * get_user_pages() - pin user pages in memory
  * @tsk:	the task_struct to use for page fault accounting, or
@@ -633,22 +792,18 @@ int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm,
  * use the correct cache flushing APIs.
  *
  * See also get_user_pages_fast, for performance critical applications.
+ *
+ * get_user_pages should be phased out in favor of
+ * get_user_pages_locked|unlocked or get_user_pages_fast. Nothing
+ * should use get_user_pages because it cannot pass
+ * FAULT_FLAG_ALLOW_RETRY to handle_mm_fault.
  */
 long get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
 		unsigned long start, unsigned long nr_pages, int write,
 		int force, struct page **pages, struct vm_area_struct **vmas)
 {
-	int flags = FOLL_TOUCH;
-
-	if (pages)
-		flags |= FOLL_GET;
-	if (write)
-		flags |= FOLL_WRITE;
-	if (force)
-		flags |= FOLL_FORCE;
-
-	return __get_user_pages(tsk, mm, start, nr_pages, flags, pages, vmas,
-				NULL);
+	return __get_user_pages_locked(tsk, mm, start, nr_pages, write, force,
+				       pages, vmas, NULL, false);
 }
 EXPORT_SYMBOL(get_user_pages);
 
diff --git a/mm/nommu.c b/mm/nommu.c
index b51eadf..d6d0a6ac 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -213,6 +213,29 @@ long get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
 }
 EXPORT_SYMBOL(get_user_pages);
 
+long get_user_pages_locked(struct task_struct *tsk, struct mm_struct *mm,
+			   unsigned long start, unsigned long nr_pages,
+			   int write, int force, struct page **pages,
+			   int *locked)
+{
+	return get_user_pages(tsk, mm, start, nr_pages, write, force,
+			      pages, NULL);
+}
+EXPORT_SYMBOL(get_user_pages_locked);
+
+long get_user_pages_unlocked(struct task_struct *tsk, struct mm_struct *mm,
+			     unsigned long start, unsigned long nr_pages,
+			     int write, int force, struct page **pages)
+{
+	long ret;
+	down_read(&mm->mmap_sem);
+	ret = get_user_pages(tsk, mm, start, nr_pages, write, force,
+			     pages, NULL);
+	up_read(&mm->mmap_sem);
+	return ret;
+}
+EXPORT_SYMBOL(get_user_pages_unlocked);
+
 /**
  * follow_pfn - look up PFN at a user virtual address
  * @vma: memory mapping

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 2/5] mm: gup: add __get_user_pages_unlocked to customize gup_flags
  2015-01-13 16:37 ` Andrea Arcangeli
@ 2015-01-13 16:37   ` Andrea Arcangeli
  -1 siblings, 0 replies; 26+ messages in thread
From: Andrea Arcangeli @ 2015-01-13 16:37 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel, linux-mm
  Cc: Kirill A. Shutemov, Michel Lespinasse, Andrew Jones,
	Hugh Dickins, Mel Gorman, Andres Lagar-Cavilla, Minchan Kim,
	KOSAKI Motohiro, \"Dr. David Alan Gilbert\",
	Peter Feiner, Peter Zijlstra, Benjamin Herrenschmidt,
	James Bottomley, David Miller, Steve Capper, Johannes Weiner

Some caller (like KVM) may want to set the gup_flags like
FOLL_HWPOSION to get a proper -EHWPOSION retval instead of -EFAULT to
take a more appropriate action if get_user_pages runs into a memory
failure.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---
 include/linux/mm.h |  4 ++++
 mm/gup.c           | 44 ++++++++++++++++++++++++++++++++------------
 mm/nommu.c         | 16 +++++++++++++---
 3 files changed, 49 insertions(+), 15 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 2ea38d8..c4d3102 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1238,6 +1238,10 @@ long get_user_pages_locked(struct task_struct *tsk, struct mm_struct *mm,
 		    unsigned long start, unsigned long nr_pages,
 		    int write, int force, struct page **pages,
 		    int *locked);
+long __get_user_pages_unlocked(struct task_struct *tsk, struct mm_struct *mm,
+			       unsigned long start, unsigned long nr_pages,
+			       int write, int force, struct page **pages,
+			       unsigned int gup_flags);
 long get_user_pages_unlocked(struct task_struct *tsk, struct mm_struct *mm,
 		    unsigned long start, unsigned long nr_pages,
 		    int write, int force, struct page **pages);
diff --git a/mm/gup.c b/mm/gup.c
index c2bd9b3..d616811 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -591,9 +591,9 @@ static __always_inline long __get_user_pages_locked(struct task_struct *tsk,
 						int write, int force,
 						struct page **pages,
 						struct vm_area_struct **vmas,
-						int *locked, bool notify_drop)
+						int *locked, bool notify_drop,
+						unsigned int flags)
 {
-	int flags = FOLL_TOUCH;
 	long ret, pages_done;
 	bool lock_dropped;
 
@@ -707,11 +707,37 @@ long get_user_pages_locked(struct task_struct *tsk, struct mm_struct *mm,
 			   int *locked)
 {
 	return __get_user_pages_locked(tsk, mm, start, nr_pages, write, force,
-				       pages, NULL, locked, true);
+				       pages, NULL, locked, true, FOLL_TOUCH);
 }
 EXPORT_SYMBOL(get_user_pages_locked);
 
 /*
+ * Same as get_user_pages_unlocked(...., FOLL_TOUCH) but it allows to
+ * pass additional gup_flags as last parameter (like FOLL_HWPOISON).
+ *
+ * NOTE: here FOLL_TOUCH is not set implicitly and must be set by the
+ * caller if required (just like with __get_user_pages). "FOLL_GET",
+ * "FOLL_WRITE" and "FOLL_FORCE" are set implicitly as needed
+ * according to the parameters "pages", "write", "force"
+ * respectively.
+ */
+__always_inline long __get_user_pages_unlocked(struct task_struct *tsk, struct mm_struct *mm,
+					       unsigned long start, unsigned long nr_pages,
+					       int write, int force, struct page **pages,
+					       unsigned int gup_flags)
+{
+	long ret;
+	int locked = 1;
+	down_read(&mm->mmap_sem);
+	ret = __get_user_pages_locked(tsk, mm, start, nr_pages, write, force,
+				      pages, NULL, &locked, false, gup_flags);
+	if (locked)
+		up_read(&mm->mmap_sem);
+	return ret;
+}
+EXPORT_SYMBOL(__get_user_pages_unlocked);
+
+/*
  * get_user_pages_unlocked() is suitable to replace the form:
  *
  *      down_read(&mm->mmap_sem);
@@ -732,14 +758,8 @@ long get_user_pages_unlocked(struct task_struct *tsk, struct mm_struct *mm,
 			     unsigned long start, unsigned long nr_pages,
 			     int write, int force, struct page **pages)
 {
-	long ret;
-	int locked = 1;
-	down_read(&mm->mmap_sem);
-	ret = __get_user_pages_locked(tsk, mm, start, nr_pages, write, force,
-				      pages, NULL, &locked, false);
-	if (locked)
-		up_read(&mm->mmap_sem);
-	return ret;
+	return __get_user_pages_unlocked(tsk, mm, start, nr_pages, write,
+					 force, pages, FOLL_TOUCH);
 }
 EXPORT_SYMBOL(get_user_pages_unlocked);
 
@@ -803,7 +823,7 @@ long get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
 		int force, struct page **pages, struct vm_area_struct **vmas)
 {
 	return __get_user_pages_locked(tsk, mm, start, nr_pages, write, force,
-				       pages, vmas, NULL, false);
+				       pages, vmas, NULL, false, FOLL_TOUCH);
 }
 EXPORT_SYMBOL(get_user_pages);
 
diff --git a/mm/nommu.c b/mm/nommu.c
index d6d0a6ac..05f42f0 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -223,9 +223,10 @@ long get_user_pages_locked(struct task_struct *tsk, struct mm_struct *mm,
 }
 EXPORT_SYMBOL(get_user_pages_locked);
 
-long get_user_pages_unlocked(struct task_struct *tsk, struct mm_struct *mm,
-			     unsigned long start, unsigned long nr_pages,
-			     int write, int force, struct page **pages)
+long __get_user_pages_unlocked(struct task_struct *tsk, struct mm_struct *mm,
+			       unsigned long start, unsigned long nr_pages,
+			       int write, int force, struct page **pages,
+			       unsigned int gup_flags)
 {
 	long ret;
 	down_read(&mm->mmap_sem);
@@ -234,6 +235,15 @@ long get_user_pages_unlocked(struct task_struct *tsk, struct mm_struct *mm,
 	up_read(&mm->mmap_sem);
 	return ret;
 }
+EXPORT_SYMBOL(__get_user_pages_unlocked);
+
+long get_user_pages_unlocked(struct task_struct *tsk, struct mm_struct *mm,
+			     unsigned long start, unsigned long nr_pages,
+			     int write, int force, struct page **pages)
+{
+	return __get_user_pages_unlocked(tsk, mm, start, nr_pages, write,
+					 force, pages, 0);
+}
 EXPORT_SYMBOL(get_user_pages_unlocked);
 
 /**

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 2/5] mm: gup: add __get_user_pages_unlocked to customize gup_flags
@ 2015-01-13 16:37   ` Andrea Arcangeli
  0 siblings, 0 replies; 26+ messages in thread
From: Andrea Arcangeli @ 2015-01-13 16:37 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel, linux-mm
  Cc: Kirill A. Shutemov, Michel Lespinasse, Andrew Jones,
	Hugh Dickins, Mel Gorman, Andres Lagar-Cavilla, Minchan Kim,
	KOSAKI Motohiro, \"Dr. David Alan Gilbert\",
	Peter Feiner, Peter Zijlstra, Benjamin Herrenschmidt,
	James Bottomley, David Miller, Steve Capper, Johannes Weiner

Some caller (like KVM) may want to set the gup_flags like
FOLL_HWPOSION to get a proper -EHWPOSION retval instead of -EFAULT to
take a more appropriate action if get_user_pages runs into a memory
failure.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---
 include/linux/mm.h |  4 ++++
 mm/gup.c           | 44 ++++++++++++++++++++++++++++++++------------
 mm/nommu.c         | 16 +++++++++++++---
 3 files changed, 49 insertions(+), 15 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 2ea38d8..c4d3102 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1238,6 +1238,10 @@ long get_user_pages_locked(struct task_struct *tsk, struct mm_struct *mm,
 		    unsigned long start, unsigned long nr_pages,
 		    int write, int force, struct page **pages,
 		    int *locked);
+long __get_user_pages_unlocked(struct task_struct *tsk, struct mm_struct *mm,
+			       unsigned long start, unsigned long nr_pages,
+			       int write, int force, struct page **pages,
+			       unsigned int gup_flags);
 long get_user_pages_unlocked(struct task_struct *tsk, struct mm_struct *mm,
 		    unsigned long start, unsigned long nr_pages,
 		    int write, int force, struct page **pages);
diff --git a/mm/gup.c b/mm/gup.c
index c2bd9b3..d616811 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -591,9 +591,9 @@ static __always_inline long __get_user_pages_locked(struct task_struct *tsk,
 						int write, int force,
 						struct page **pages,
 						struct vm_area_struct **vmas,
-						int *locked, bool notify_drop)
+						int *locked, bool notify_drop,
+						unsigned int flags)
 {
-	int flags = FOLL_TOUCH;
 	long ret, pages_done;
 	bool lock_dropped;
 
@@ -707,11 +707,37 @@ long get_user_pages_locked(struct task_struct *tsk, struct mm_struct *mm,
 			   int *locked)
 {
 	return __get_user_pages_locked(tsk, mm, start, nr_pages, write, force,
-				       pages, NULL, locked, true);
+				       pages, NULL, locked, true, FOLL_TOUCH);
 }
 EXPORT_SYMBOL(get_user_pages_locked);
 
 /*
+ * Same as get_user_pages_unlocked(...., FOLL_TOUCH) but it allows to
+ * pass additional gup_flags as last parameter (like FOLL_HWPOISON).
+ *
+ * NOTE: here FOLL_TOUCH is not set implicitly and must be set by the
+ * caller if required (just like with __get_user_pages). "FOLL_GET",
+ * "FOLL_WRITE" and "FOLL_FORCE" are set implicitly as needed
+ * according to the parameters "pages", "write", "force"
+ * respectively.
+ */
+__always_inline long __get_user_pages_unlocked(struct task_struct *tsk, struct mm_struct *mm,
+					       unsigned long start, unsigned long nr_pages,
+					       int write, int force, struct page **pages,
+					       unsigned int gup_flags)
+{
+	long ret;
+	int locked = 1;
+	down_read(&mm->mmap_sem);
+	ret = __get_user_pages_locked(tsk, mm, start, nr_pages, write, force,
+				      pages, NULL, &locked, false, gup_flags);
+	if (locked)
+		up_read(&mm->mmap_sem);
+	return ret;
+}
+EXPORT_SYMBOL(__get_user_pages_unlocked);
+
+/*
  * get_user_pages_unlocked() is suitable to replace the form:
  *
  *      down_read(&mm->mmap_sem);
@@ -732,14 +758,8 @@ long get_user_pages_unlocked(struct task_struct *tsk, struct mm_struct *mm,
 			     unsigned long start, unsigned long nr_pages,
 			     int write, int force, struct page **pages)
 {
-	long ret;
-	int locked = 1;
-	down_read(&mm->mmap_sem);
-	ret = __get_user_pages_locked(tsk, mm, start, nr_pages, write, force,
-				      pages, NULL, &locked, false);
-	if (locked)
-		up_read(&mm->mmap_sem);
-	return ret;
+	return __get_user_pages_unlocked(tsk, mm, start, nr_pages, write,
+					 force, pages, FOLL_TOUCH);
 }
 EXPORT_SYMBOL(get_user_pages_unlocked);
 
@@ -803,7 +823,7 @@ long get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
 		int force, struct page **pages, struct vm_area_struct **vmas)
 {
 	return __get_user_pages_locked(tsk, mm, start, nr_pages, write, force,
-				       pages, vmas, NULL, false);
+				       pages, vmas, NULL, false, FOLL_TOUCH);
 }
 EXPORT_SYMBOL(get_user_pages);
 
diff --git a/mm/nommu.c b/mm/nommu.c
index d6d0a6ac..05f42f0 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -223,9 +223,10 @@ long get_user_pages_locked(struct task_struct *tsk, struct mm_struct *mm,
 }
 EXPORT_SYMBOL(get_user_pages_locked);
 
-long get_user_pages_unlocked(struct task_struct *tsk, struct mm_struct *mm,
-			     unsigned long start, unsigned long nr_pages,
-			     int write, int force, struct page **pages)
+long __get_user_pages_unlocked(struct task_struct *tsk, struct mm_struct *mm,
+			       unsigned long start, unsigned long nr_pages,
+			       int write, int force, struct page **pages,
+			       unsigned int gup_flags)
 {
 	long ret;
 	down_read(&mm->mmap_sem);
@@ -234,6 +235,15 @@ long get_user_pages_unlocked(struct task_struct *tsk, struct mm_struct *mm,
 	up_read(&mm->mmap_sem);
 	return ret;
 }
+EXPORT_SYMBOL(__get_user_pages_unlocked);
+
+long get_user_pages_unlocked(struct task_struct *tsk, struct mm_struct *mm,
+			     unsigned long start, unsigned long nr_pages,
+			     int write, int force, struct page **pages)
+{
+	return __get_user_pages_unlocked(tsk, mm, start, nr_pages, write,
+					 force, pages, 0);
+}
 EXPORT_SYMBOL(get_user_pages_unlocked);
 
 /**

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 3/5] mm: gup: use get_user_pages_unlocked within get_user_pages_fast
  2015-01-13 16:37 ` Andrea Arcangeli
@ 2015-01-13 16:37   ` Andrea Arcangeli
  -1 siblings, 0 replies; 26+ messages in thread
From: Andrea Arcangeli @ 2015-01-13 16:37 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel, linux-mm
  Cc: Kirill A. Shutemov, Michel Lespinasse, Andrew Jones,
	Hugh Dickins, Mel Gorman, Andres Lagar-Cavilla, Minchan Kim,
	KOSAKI Motohiro, \"Dr. David Alan Gilbert\",
	Peter Feiner, Peter Zijlstra, Benjamin Herrenschmidt,
	James Bottomley, David Miller, Steve Capper, Johannes Weiner

This allows the get_user_pages_fast slow path to release the mmap_sem
before blocking.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---
 arch/mips/mm/gup.c  |  8 +++-----
 arch/s390/mm/gup.c  |  6 ++----
 arch/sh/mm/gup.c    |  6 ++----
 arch/sparc/mm/gup.c |  6 ++----
 arch/x86/mm/gup.c   |  7 +++----
 mm/gup.c            |  6 ++----
 mm/util.c           | 10 ++--------
 7 files changed, 16 insertions(+), 33 deletions(-)

diff --git a/arch/mips/mm/gup.c b/arch/mips/mm/gup.c
index 70795a6..349995d 100644
--- a/arch/mips/mm/gup.c
+++ b/arch/mips/mm/gup.c
@@ -301,11 +301,9 @@ slow_irqon:
 	start += nr << PAGE_SHIFT;
 	pages += nr;
 
-	down_read(&mm->mmap_sem);
-	ret = get_user_pages(current, mm, start,
-				(end - start) >> PAGE_SHIFT,
-				write, 0, pages, NULL);
-	up_read(&mm->mmap_sem);
+	ret = get_user_pages_unlocked(current, mm, start,
+				      (end - start) >> PAGE_SHIFT,
+				      write, 0, pages);
 
 	/* Have to be a bit careful with return values */
 	if (nr > 0) {
diff --git a/arch/s390/mm/gup.c b/arch/s390/mm/gup.c
index 639fce46..5c586c7 100644
--- a/arch/s390/mm/gup.c
+++ b/arch/s390/mm/gup.c
@@ -235,10 +235,8 @@ int get_user_pages_fast(unsigned long start, int nr_pages, int write,
 	/* Try to get the remaining pages with get_user_pages */
 	start += nr << PAGE_SHIFT;
 	pages += nr;
-	down_read(&mm->mmap_sem);
-	ret = get_user_pages(current, mm, start,
-			     nr_pages - nr, write, 0, pages, NULL);
-	up_read(&mm->mmap_sem);
+	ret = get_user_pages_unlocked(current, mm, start,
+			     nr_pages - nr, write, 0, pages);
 	/* Have to be a bit careful with return values */
 	if (nr > 0)
 		ret = (ret < 0) ? nr : ret + nr;
diff --git a/arch/sh/mm/gup.c b/arch/sh/mm/gup.c
index 37458f3..e15f52a 100644
--- a/arch/sh/mm/gup.c
+++ b/arch/sh/mm/gup.c
@@ -257,10 +257,8 @@ slow_irqon:
 		start += nr << PAGE_SHIFT;
 		pages += nr;
 
-		down_read(&mm->mmap_sem);
-		ret = get_user_pages(current, mm, start,
-			(end - start) >> PAGE_SHIFT, write, 0, pages, NULL);
-		up_read(&mm->mmap_sem);
+		ret = get_user_pages_unlocked(current, mm, start,
+			(end - start) >> PAGE_SHIFT, write, 0, pages);
 
 		/* Have to be a bit careful with return values */
 		if (nr > 0) {
diff --git a/arch/sparc/mm/gup.c b/arch/sparc/mm/gup.c
index ae6ce38..2e5c4fc 100644
--- a/arch/sparc/mm/gup.c
+++ b/arch/sparc/mm/gup.c
@@ -249,10 +249,8 @@ slow:
 		start += nr << PAGE_SHIFT;
 		pages += nr;
 
-		down_read(&mm->mmap_sem);
-		ret = get_user_pages(current, mm, start,
-			(end - start) >> PAGE_SHIFT, write, 0, pages, NULL);
-		up_read(&mm->mmap_sem);
+		ret = get_user_pages_unlocked(current, mm, start,
+			(end - start) >> PAGE_SHIFT, write, 0, pages);
 
 		/* Have to be a bit careful with return values */
 		if (nr > 0) {
diff --git a/arch/x86/mm/gup.c b/arch/x86/mm/gup.c
index d754782..4548dd8 100644
--- a/arch/x86/mm/gup.c
+++ b/arch/x86/mm/gup.c
@@ -388,10 +388,9 @@ slow_irqon:
 		start += nr << PAGE_SHIFT;
 		pages += nr;
 
-		down_read(&mm->mmap_sem);
-		ret = get_user_pages(current, mm, start,
-			(end - start) >> PAGE_SHIFT, write, 0, pages, NULL);
-		up_read(&mm->mmap_sem);
+		ret = get_user_pages_unlocked(current, mm, start,
+					      (end - start) >> PAGE_SHIFT,
+					      write, 0, pages);
 
 		/* Have to be a bit careful with return values */
 		if (nr > 0) {
diff --git a/mm/gup.c b/mm/gup.c
index d616811..da200a1 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1252,10 +1252,8 @@ int get_user_pages_fast(unsigned long start, int nr_pages, int write,
 		start += nr << PAGE_SHIFT;
 		pages += nr;
 
-		down_read(&mm->mmap_sem);
-		ret = get_user_pages(current, mm, start,
-				     nr_pages - nr, write, 0, pages, NULL);
-		up_read(&mm->mmap_sem);
+		ret = get_user_pages_unlocked(current, mm, start,
+					      nr_pages - nr, write, 0, pages);
 
 		/* Have to be a bit careful with return values */
 		if (nr > 0) {
diff --git a/mm/util.c b/mm/util.c
index fec39d4..f3ef639 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -240,14 +240,8 @@ int __weak get_user_pages_fast(unsigned long start,
 				int nr_pages, int write, struct page **pages)
 {
 	struct mm_struct *mm = current->mm;
-	int ret;
-
-	down_read(&mm->mmap_sem);
-	ret = get_user_pages(current, mm, start, nr_pages,
-					write, 0, pages, NULL);
-	up_read(&mm->mmap_sem);
-
-	return ret;
+	return get_user_pages_unlocked(current, mm, start, nr_pages,
+				       write, 0, pages);
 }
 EXPORT_SYMBOL_GPL(get_user_pages_fast);
 

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 3/5] mm: gup: use get_user_pages_unlocked within get_user_pages_fast
@ 2015-01-13 16:37   ` Andrea Arcangeli
  0 siblings, 0 replies; 26+ messages in thread
From: Andrea Arcangeli @ 2015-01-13 16:37 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel, linux-mm
  Cc: Kirill A. Shutemov, Michel Lespinasse, Andrew Jones,
	Hugh Dickins, Mel Gorman, Andres Lagar-Cavilla, Minchan Kim,
	KOSAKI Motohiro, \"Dr. David Alan Gilbert\",
	Peter Feiner, Peter Zijlstra, Benjamin Herrenschmidt,
	James Bottomley, David Miller, Steve Capper, Johannes Weiner

This allows the get_user_pages_fast slow path to release the mmap_sem
before blocking.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---
 arch/mips/mm/gup.c  |  8 +++-----
 arch/s390/mm/gup.c  |  6 ++----
 arch/sh/mm/gup.c    |  6 ++----
 arch/sparc/mm/gup.c |  6 ++----
 arch/x86/mm/gup.c   |  7 +++----
 mm/gup.c            |  6 ++----
 mm/util.c           | 10 ++--------
 7 files changed, 16 insertions(+), 33 deletions(-)

diff --git a/arch/mips/mm/gup.c b/arch/mips/mm/gup.c
index 70795a6..349995d 100644
--- a/arch/mips/mm/gup.c
+++ b/arch/mips/mm/gup.c
@@ -301,11 +301,9 @@ slow_irqon:
 	start += nr << PAGE_SHIFT;
 	pages += nr;
 
-	down_read(&mm->mmap_sem);
-	ret = get_user_pages(current, mm, start,
-				(end - start) >> PAGE_SHIFT,
-				write, 0, pages, NULL);
-	up_read(&mm->mmap_sem);
+	ret = get_user_pages_unlocked(current, mm, start,
+				      (end - start) >> PAGE_SHIFT,
+				      write, 0, pages);
 
 	/* Have to be a bit careful with return values */
 	if (nr > 0) {
diff --git a/arch/s390/mm/gup.c b/arch/s390/mm/gup.c
index 639fce46..5c586c7 100644
--- a/arch/s390/mm/gup.c
+++ b/arch/s390/mm/gup.c
@@ -235,10 +235,8 @@ int get_user_pages_fast(unsigned long start, int nr_pages, int write,
 	/* Try to get the remaining pages with get_user_pages */
 	start += nr << PAGE_SHIFT;
 	pages += nr;
-	down_read(&mm->mmap_sem);
-	ret = get_user_pages(current, mm, start,
-			     nr_pages - nr, write, 0, pages, NULL);
-	up_read(&mm->mmap_sem);
+	ret = get_user_pages_unlocked(current, mm, start,
+			     nr_pages - nr, write, 0, pages);
 	/* Have to be a bit careful with return values */
 	if (nr > 0)
 		ret = (ret < 0) ? nr : ret + nr;
diff --git a/arch/sh/mm/gup.c b/arch/sh/mm/gup.c
index 37458f3..e15f52a 100644
--- a/arch/sh/mm/gup.c
+++ b/arch/sh/mm/gup.c
@@ -257,10 +257,8 @@ slow_irqon:
 		start += nr << PAGE_SHIFT;
 		pages += nr;
 
-		down_read(&mm->mmap_sem);
-		ret = get_user_pages(current, mm, start,
-			(end - start) >> PAGE_SHIFT, write, 0, pages, NULL);
-		up_read(&mm->mmap_sem);
+		ret = get_user_pages_unlocked(current, mm, start,
+			(end - start) >> PAGE_SHIFT, write, 0, pages);
 
 		/* Have to be a bit careful with return values */
 		if (nr > 0) {
diff --git a/arch/sparc/mm/gup.c b/arch/sparc/mm/gup.c
index ae6ce38..2e5c4fc 100644
--- a/arch/sparc/mm/gup.c
+++ b/arch/sparc/mm/gup.c
@@ -249,10 +249,8 @@ slow:
 		start += nr << PAGE_SHIFT;
 		pages += nr;
 
-		down_read(&mm->mmap_sem);
-		ret = get_user_pages(current, mm, start,
-			(end - start) >> PAGE_SHIFT, write, 0, pages, NULL);
-		up_read(&mm->mmap_sem);
+		ret = get_user_pages_unlocked(current, mm, start,
+			(end - start) >> PAGE_SHIFT, write, 0, pages);
 
 		/* Have to be a bit careful with return values */
 		if (nr > 0) {
diff --git a/arch/x86/mm/gup.c b/arch/x86/mm/gup.c
index d754782..4548dd8 100644
--- a/arch/x86/mm/gup.c
+++ b/arch/x86/mm/gup.c
@@ -388,10 +388,9 @@ slow_irqon:
 		start += nr << PAGE_SHIFT;
 		pages += nr;
 
-		down_read(&mm->mmap_sem);
-		ret = get_user_pages(current, mm, start,
-			(end - start) >> PAGE_SHIFT, write, 0, pages, NULL);
-		up_read(&mm->mmap_sem);
+		ret = get_user_pages_unlocked(current, mm, start,
+					      (end - start) >> PAGE_SHIFT,
+					      write, 0, pages);
 
 		/* Have to be a bit careful with return values */
 		if (nr > 0) {
diff --git a/mm/gup.c b/mm/gup.c
index d616811..da200a1 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1252,10 +1252,8 @@ int get_user_pages_fast(unsigned long start, int nr_pages, int write,
 		start += nr << PAGE_SHIFT;
 		pages += nr;
 
-		down_read(&mm->mmap_sem);
-		ret = get_user_pages(current, mm, start,
-				     nr_pages - nr, write, 0, pages, NULL);
-		up_read(&mm->mmap_sem);
+		ret = get_user_pages_unlocked(current, mm, start,
+					      nr_pages - nr, write, 0, pages);
 
 		/* Have to be a bit careful with return values */
 		if (nr > 0) {
diff --git a/mm/util.c b/mm/util.c
index fec39d4..f3ef639 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -240,14 +240,8 @@ int __weak get_user_pages_fast(unsigned long start,
 				int nr_pages, int write, struct page **pages)
 {
 	struct mm_struct *mm = current->mm;
-	int ret;
-
-	down_read(&mm->mmap_sem);
-	ret = get_user_pages(current, mm, start, nr_pages,
-					write, 0, pages, NULL);
-	up_read(&mm->mmap_sem);
-
-	return ret;
+	return get_user_pages_unlocked(current, mm, start, nr_pages,
+				       write, 0, pages);
 }
 EXPORT_SYMBOL_GPL(get_user_pages_fast);
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 4/5] mm: gup: use get_user_pages_unlocked
  2015-01-13 16:37 ` Andrea Arcangeli
@ 2015-01-13 16:37   ` Andrea Arcangeli
  -1 siblings, 0 replies; 26+ messages in thread
From: Andrea Arcangeli @ 2015-01-13 16:37 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel, linux-mm
  Cc: Kirill A. Shutemov, Michel Lespinasse, Andrew Jones,
	Hugh Dickins, Mel Gorman, Andres Lagar-Cavilla, Minchan Kim,
	KOSAKI Motohiro, \"Dr. David Alan Gilbert\",
	Peter Feiner, Peter Zijlstra, Benjamin Herrenschmidt,
	James Bottomley, David Miller, Steve Capper, Johannes Weiner

This allows those get_user_pages calls to pass FAULT_FLAG_ALLOW_RETRY
to the page fault in order to release the mmap_sem during the I/O.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---
 drivers/media/pci/ivtv/ivtv-udma.c | 6 ++----
 drivers/scsi/st.c                  | 7 ++-----
 drivers/video/fbdev/pvr2fb.c       | 6 ++----
 mm/process_vm_access.c             | 7 ++-----
 net/ceph/pagevec.c                 | 6 ++----
 5 files changed, 10 insertions(+), 22 deletions(-)

diff --git a/drivers/media/pci/ivtv/ivtv-udma.c b/drivers/media/pci/ivtv/ivtv-udma.c
index bee2329..24152ac 100644
--- a/drivers/media/pci/ivtv/ivtv-udma.c
+++ b/drivers/media/pci/ivtv/ivtv-udma.c
@@ -124,10 +124,8 @@ int ivtv_udma_setup(struct ivtv *itv, unsigned long ivtv_dest_addr,
 	}
 
 	/* Get user pages for DMA Xfer */
-	down_read(&current->mm->mmap_sem);
-	err = get_user_pages(current, current->mm,
-			user_dma.uaddr, user_dma.page_count, 0, 1, dma->map, NULL);
-	up_read(&current->mm->mmap_sem);
+	err = get_user_pages_unlocked(current, current->mm,
+			user_dma.uaddr, user_dma.page_count, 0, 1, dma->map);
 
 	if (user_dma.page_count != err) {
 		IVTV_DEBUG_WARN("failed to map user pages, returned %d instead of %d\n",
diff --git a/drivers/scsi/st.c b/drivers/scsi/st.c
index 128d3b5..9a1c342 100644
--- a/drivers/scsi/st.c
+++ b/drivers/scsi/st.c
@@ -4551,18 +4551,15 @@ static int sgl_map_user_pages(struct st_buffer *STbp,
 		return -ENOMEM;
 
         /* Try to fault in all of the necessary pages */
-	down_read(&current->mm->mmap_sem);
         /* rw==READ means read from drive, write into memory area */
-	res = get_user_pages(
+	res = get_user_pages_unlocked(
 		current,
 		current->mm,
 		uaddr,
 		nr_pages,
 		rw == READ,
 		0, /* don't force */
-		pages,
-		NULL);
-	up_read(&current->mm->mmap_sem);
+		pages);
 
 	/* Errors and no page mapped should return here */
 	if (res < nr_pages)
diff --git a/drivers/video/fbdev/pvr2fb.c b/drivers/video/fbdev/pvr2fb.c
index 7c74f58..0e24eb9 100644
--- a/drivers/video/fbdev/pvr2fb.c
+++ b/drivers/video/fbdev/pvr2fb.c
@@ -686,10 +686,8 @@ static ssize_t pvr2fb_write(struct fb_info *info, const char *buf,
 	if (!pages)
 		return -ENOMEM;
 
-	down_read(&current->mm->mmap_sem);
-	ret = get_user_pages(current, current->mm, (unsigned long)buf,
-			     nr_pages, WRITE, 0, pages, NULL);
-	up_read(&current->mm->mmap_sem);
+	ret = get_user_pages_unlocked(current, current->mm, (unsigned long)buf,
+				      nr_pages, WRITE, 0, pages);
 
 	if (ret < nr_pages) {
 		nr_pages = ret;
diff --git a/mm/process_vm_access.c b/mm/process_vm_access.c
index 5077afc..b159769 100644
--- a/mm/process_vm_access.c
+++ b/mm/process_vm_access.c
@@ -99,11 +99,8 @@ static int process_vm_rw_single_vec(unsigned long addr,
 		size_t bytes;
 
 		/* Get the pages we're interested in */
-		down_read(&mm->mmap_sem);
-		pages = get_user_pages(task, mm, pa, pages,
-				      vm_write, 0, process_pages, NULL);
-		up_read(&mm->mmap_sem);
-
+		pages = get_user_pages_unlocked(task, mm, pa, pages,
+						vm_write, 0, process_pages);
 		if (pages <= 0)
 			return -EFAULT;
 
diff --git a/net/ceph/pagevec.c b/net/ceph/pagevec.c
index 5550130..096d914 100644
--- a/net/ceph/pagevec.c
+++ b/net/ceph/pagevec.c
@@ -23,17 +23,15 @@ struct page **ceph_get_direct_page_vector(const void __user *data,
 	if (!pages)
 		return ERR_PTR(-ENOMEM);
 
-	down_read(&current->mm->mmap_sem);
 	while (got < num_pages) {
-		rc = get_user_pages(current, current->mm,
+		rc = get_user_pages_unlocked(current, current->mm,
 		    (unsigned long)data + ((unsigned long)got * PAGE_SIZE),
-		    num_pages - got, write_page, 0, pages + got, NULL);
+		    num_pages - got, write_page, 0, pages + got);
 		if (rc < 0)
 			break;
 		BUG_ON(rc == 0);
 		got += rc;
 	}
-	up_read(&current->mm->mmap_sem);
 	if (rc < 0)
 		goto fail;
 	return pages;

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 4/5] mm: gup: use get_user_pages_unlocked
@ 2015-01-13 16:37   ` Andrea Arcangeli
  0 siblings, 0 replies; 26+ messages in thread
From: Andrea Arcangeli @ 2015-01-13 16:37 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel, linux-mm
  Cc: Kirill A. Shutemov, Michel Lespinasse, Andrew Jones,
	Hugh Dickins, Mel Gorman, Andres Lagar-Cavilla, Minchan Kim,
	KOSAKI Motohiro, \"Dr. David Alan Gilbert\",
	Peter Feiner, Peter Zijlstra, Benjamin Herrenschmidt,
	James Bottomley, David Miller, Steve Capper, Johannes Weiner

This allows those get_user_pages calls to pass FAULT_FLAG_ALLOW_RETRY
to the page fault in order to release the mmap_sem during the I/O.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---
 drivers/media/pci/ivtv/ivtv-udma.c | 6 ++----
 drivers/scsi/st.c                  | 7 ++-----
 drivers/video/fbdev/pvr2fb.c       | 6 ++----
 mm/process_vm_access.c             | 7 ++-----
 net/ceph/pagevec.c                 | 6 ++----
 5 files changed, 10 insertions(+), 22 deletions(-)

diff --git a/drivers/media/pci/ivtv/ivtv-udma.c b/drivers/media/pci/ivtv/ivtv-udma.c
index bee2329..24152ac 100644
--- a/drivers/media/pci/ivtv/ivtv-udma.c
+++ b/drivers/media/pci/ivtv/ivtv-udma.c
@@ -124,10 +124,8 @@ int ivtv_udma_setup(struct ivtv *itv, unsigned long ivtv_dest_addr,
 	}
 
 	/* Get user pages for DMA Xfer */
-	down_read(&current->mm->mmap_sem);
-	err = get_user_pages(current, current->mm,
-			user_dma.uaddr, user_dma.page_count, 0, 1, dma->map, NULL);
-	up_read(&current->mm->mmap_sem);
+	err = get_user_pages_unlocked(current, current->mm,
+			user_dma.uaddr, user_dma.page_count, 0, 1, dma->map);
 
 	if (user_dma.page_count != err) {
 		IVTV_DEBUG_WARN("failed to map user pages, returned %d instead of %d\n",
diff --git a/drivers/scsi/st.c b/drivers/scsi/st.c
index 128d3b5..9a1c342 100644
--- a/drivers/scsi/st.c
+++ b/drivers/scsi/st.c
@@ -4551,18 +4551,15 @@ static int sgl_map_user_pages(struct st_buffer *STbp,
 		return -ENOMEM;
 
         /* Try to fault in all of the necessary pages */
-	down_read(&current->mm->mmap_sem);
         /* rw==READ means read from drive, write into memory area */
-	res = get_user_pages(
+	res = get_user_pages_unlocked(
 		current,
 		current->mm,
 		uaddr,
 		nr_pages,
 		rw == READ,
 		0, /* don't force */
-		pages,
-		NULL);
-	up_read(&current->mm->mmap_sem);
+		pages);
 
 	/* Errors and no page mapped should return here */
 	if (res < nr_pages)
diff --git a/drivers/video/fbdev/pvr2fb.c b/drivers/video/fbdev/pvr2fb.c
index 7c74f58..0e24eb9 100644
--- a/drivers/video/fbdev/pvr2fb.c
+++ b/drivers/video/fbdev/pvr2fb.c
@@ -686,10 +686,8 @@ static ssize_t pvr2fb_write(struct fb_info *info, const char *buf,
 	if (!pages)
 		return -ENOMEM;
 
-	down_read(&current->mm->mmap_sem);
-	ret = get_user_pages(current, current->mm, (unsigned long)buf,
-			     nr_pages, WRITE, 0, pages, NULL);
-	up_read(&current->mm->mmap_sem);
+	ret = get_user_pages_unlocked(current, current->mm, (unsigned long)buf,
+				      nr_pages, WRITE, 0, pages);
 
 	if (ret < nr_pages) {
 		nr_pages = ret;
diff --git a/mm/process_vm_access.c b/mm/process_vm_access.c
index 5077afc..b159769 100644
--- a/mm/process_vm_access.c
+++ b/mm/process_vm_access.c
@@ -99,11 +99,8 @@ static int process_vm_rw_single_vec(unsigned long addr,
 		size_t bytes;
 
 		/* Get the pages we're interested in */
-		down_read(&mm->mmap_sem);
-		pages = get_user_pages(task, mm, pa, pages,
-				      vm_write, 0, process_pages, NULL);
-		up_read(&mm->mmap_sem);
-
+		pages = get_user_pages_unlocked(task, mm, pa, pages,
+						vm_write, 0, process_pages);
 		if (pages <= 0)
 			return -EFAULT;
 
diff --git a/net/ceph/pagevec.c b/net/ceph/pagevec.c
index 5550130..096d914 100644
--- a/net/ceph/pagevec.c
+++ b/net/ceph/pagevec.c
@@ -23,17 +23,15 @@ struct page **ceph_get_direct_page_vector(const void __user *data,
 	if (!pages)
 		return ERR_PTR(-ENOMEM);
 
-	down_read(&current->mm->mmap_sem);
 	while (got < num_pages) {
-		rc = get_user_pages(current, current->mm,
+		rc = get_user_pages_unlocked(current, current->mm,
 		    (unsigned long)data + ((unsigned long)got * PAGE_SIZE),
-		    num_pages - got, write_page, 0, pages + got, NULL);
+		    num_pages - got, write_page, 0, pages + got);
 		if (rc < 0)
 			break;
 		BUG_ON(rc == 0);
 		got += rc;
 	}
-	up_read(&current->mm->mmap_sem);
 	if (rc < 0)
 		goto fail;
 	return pages;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 5/5] mm: gup: kvm use get_user_pages_unlocked
  2015-01-13 16:37 ` Andrea Arcangeli
@ 2015-01-13 16:37   ` Andrea Arcangeli
  -1 siblings, 0 replies; 26+ messages in thread
From: Andrea Arcangeli @ 2015-01-13 16:37 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel, linux-mm
  Cc: Kirill A. Shutemov, Michel Lespinasse, Andrew Jones,
	Hugh Dickins, Mel Gorman, Andres Lagar-Cavilla, Minchan Kim,
	KOSAKI Motohiro, \"Dr. David Alan Gilbert\",
	Peter Feiner, Peter Zijlstra, Benjamin Herrenschmidt,
	James Bottomley, David Miller, Steve Capper, Johannes Weiner

Use the more generic get_user_pages_unlocked which has the additional
benefit of passing FAULT_FLAG_ALLOW_RETRY at the very first page fault
(which allows the first page fault in an unmapped area to be always
able to block indefinitely by being allowed to release the mmap_sem).

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---
 include/linux/kvm_host.h | 11 -----------
 virt/kvm/async_pf.c      |  2 +-
 virt/kvm/kvm_main.c      | 50 ++++--------------------------------------------
 3 files changed, 5 insertions(+), 58 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 26f1060..d189ee0 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -200,17 +200,6 @@ int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, unsigned long hva,
 int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu);
 #endif
 
-/*
- * Carry out a gup that requires IO. Allow the mm to relinquish the mmap
- * semaphore if the filemap/swap has to wait on a page lock. pagep == NULL
- * controls whether we retry the gup one more time to completion in that case.
- * Typically this is called after a FAULT_FLAG_RETRY_NOWAIT in the main tdp
- * handler.
- */
-int kvm_get_user_page_io(struct task_struct *tsk, struct mm_struct *mm,
-			 unsigned long addr, bool write_fault,
-			 struct page **pagep);
-
 enum {
 	OUTSIDE_GUEST_MODE,
 	IN_GUEST_MODE,
diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c
index 5ff7f7f..44660ae 100644
--- a/virt/kvm/async_pf.c
+++ b/virt/kvm/async_pf.c
@@ -80,7 +80,7 @@ static void async_pf_execute(struct work_struct *work)
 
 	might_sleep();
 
-	kvm_get_user_page_io(NULL, mm, addr, 1, NULL);
+	get_user_pages_unlocked(NULL, mm, addr, 1, 1, 0, NULL);
 	kvm_async_page_present_sync(vcpu, apf);
 
 	spin_lock(&vcpu->async_pf.lock);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 1cc6e2e..458b9b1 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1128,43 +1128,6 @@ static int get_user_page_nowait(struct task_struct *tsk, struct mm_struct *mm,
 	return __get_user_pages(tsk, mm, start, 1, flags, page, NULL, NULL);
 }
 
-int kvm_get_user_page_io(struct task_struct *tsk, struct mm_struct *mm,
-			 unsigned long addr, bool write_fault,
-			 struct page **pagep)
-{
-	int npages;
-	int locked = 1;
-	int flags = FOLL_TOUCH | FOLL_HWPOISON |
-		    (pagep ? FOLL_GET : 0) |
-		    (write_fault ? FOLL_WRITE : 0);
-
-	/*
-	 * If retrying the fault, we get here *not* having allowed the filemap
-	 * to wait on the page lock. We should now allow waiting on the IO with
-	 * the mmap semaphore released.
-	 */
-	down_read(&mm->mmap_sem);
-	npages = __get_user_pages(tsk, mm, addr, 1, flags, pagep, NULL,
-				  &locked);
-	if (!locked) {
-		VM_BUG_ON(npages);
-
-		if (!pagep)
-			return 0;
-
-		/*
-		 * The previous call has now waited on the IO. Now we can
-		 * retry and complete. Pass TRIED to ensure we do not re
-		 * schedule async IO (see e.g. filemap_fault).
-		 */
-		down_read(&mm->mmap_sem);
-		npages = __get_user_pages(tsk, mm, addr, 1, flags | FOLL_TRIED,
-					  pagep, NULL, NULL);
-	}
-	up_read(&mm->mmap_sem);
-	return npages;
-}
-
 static inline int check_user_page_hwpoison(unsigned long addr)
 {
 	int rc, flags = FOLL_TOUCH | FOLL_HWPOISON | FOLL_WRITE;
@@ -1227,15 +1190,10 @@ static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault,
 		npages = get_user_page_nowait(current, current->mm,
 					      addr, write_fault, page);
 		up_read(&current->mm->mmap_sem);
-	} else {
-		/*
-		 * By now we have tried gup_fast, and possibly async_pf, and we
-		 * are certainly not atomic. Time to retry the gup, allowing
-		 * mmap semaphore to be relinquished in the case of IO.
-		 */
-		npages = kvm_get_user_page_io(current, current->mm, addr,
-					      write_fault, page);
-	}
+	} else
+		npages = __get_user_pages_unlocked(current, current->mm, addr, 1,
+						   write_fault, 0, page,
+						   FOLL_TOUCH|FOLL_HWPOISON);
 	if (npages != 1)
 		return npages;
 

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 5/5] mm: gup: kvm use get_user_pages_unlocked
@ 2015-01-13 16:37   ` Andrea Arcangeli
  0 siblings, 0 replies; 26+ messages in thread
From: Andrea Arcangeli @ 2015-01-13 16:37 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel, linux-mm
  Cc: Kirill A. Shutemov, Michel Lespinasse, Andrew Jones,
	Hugh Dickins, Mel Gorman, Andres Lagar-Cavilla, Minchan Kim,
	KOSAKI Motohiro, \"Dr. David Alan Gilbert\",
	Peter Feiner, Peter Zijlstra, Benjamin Herrenschmidt,
	James Bottomley, David Miller, Steve Capper, Johannes Weiner

Use the more generic get_user_pages_unlocked which has the additional
benefit of passing FAULT_FLAG_ALLOW_RETRY at the very first page fault
(which allows the first page fault in an unmapped area to be always
able to block indefinitely by being allowed to release the mmap_sem).

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---
 include/linux/kvm_host.h | 11 -----------
 virt/kvm/async_pf.c      |  2 +-
 virt/kvm/kvm_main.c      | 50 ++++--------------------------------------------
 3 files changed, 5 insertions(+), 58 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 26f1060..d189ee0 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -200,17 +200,6 @@ int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, unsigned long hva,
 int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu);
 #endif
 
-/*
- * Carry out a gup that requires IO. Allow the mm to relinquish the mmap
- * semaphore if the filemap/swap has to wait on a page lock. pagep == NULL
- * controls whether we retry the gup one more time to completion in that case.
- * Typically this is called after a FAULT_FLAG_RETRY_NOWAIT in the main tdp
- * handler.
- */
-int kvm_get_user_page_io(struct task_struct *tsk, struct mm_struct *mm,
-			 unsigned long addr, bool write_fault,
-			 struct page **pagep);
-
 enum {
 	OUTSIDE_GUEST_MODE,
 	IN_GUEST_MODE,
diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c
index 5ff7f7f..44660ae 100644
--- a/virt/kvm/async_pf.c
+++ b/virt/kvm/async_pf.c
@@ -80,7 +80,7 @@ static void async_pf_execute(struct work_struct *work)
 
 	might_sleep();
 
-	kvm_get_user_page_io(NULL, mm, addr, 1, NULL);
+	get_user_pages_unlocked(NULL, mm, addr, 1, 1, 0, NULL);
 	kvm_async_page_present_sync(vcpu, apf);
 
 	spin_lock(&vcpu->async_pf.lock);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 1cc6e2e..458b9b1 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1128,43 +1128,6 @@ static int get_user_page_nowait(struct task_struct *tsk, struct mm_struct *mm,
 	return __get_user_pages(tsk, mm, start, 1, flags, page, NULL, NULL);
 }
 
-int kvm_get_user_page_io(struct task_struct *tsk, struct mm_struct *mm,
-			 unsigned long addr, bool write_fault,
-			 struct page **pagep)
-{
-	int npages;
-	int locked = 1;
-	int flags = FOLL_TOUCH | FOLL_HWPOISON |
-		    (pagep ? FOLL_GET : 0) |
-		    (write_fault ? FOLL_WRITE : 0);
-
-	/*
-	 * If retrying the fault, we get here *not* having allowed the filemap
-	 * to wait on the page lock. We should now allow waiting on the IO with
-	 * the mmap semaphore released.
-	 */
-	down_read(&mm->mmap_sem);
-	npages = __get_user_pages(tsk, mm, addr, 1, flags, pagep, NULL,
-				  &locked);
-	if (!locked) {
-		VM_BUG_ON(npages);
-
-		if (!pagep)
-			return 0;
-
-		/*
-		 * The previous call has now waited on the IO. Now we can
-		 * retry and complete. Pass TRIED to ensure we do not re
-		 * schedule async IO (see e.g. filemap_fault).
-		 */
-		down_read(&mm->mmap_sem);
-		npages = __get_user_pages(tsk, mm, addr, 1, flags | FOLL_TRIED,
-					  pagep, NULL, NULL);
-	}
-	up_read(&mm->mmap_sem);
-	return npages;
-}
-
 static inline int check_user_page_hwpoison(unsigned long addr)
 {
 	int rc, flags = FOLL_TOUCH | FOLL_HWPOISON | FOLL_WRITE;
@@ -1227,15 +1190,10 @@ static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault,
 		npages = get_user_page_nowait(current, current->mm,
 					      addr, write_fault, page);
 		up_read(&current->mm->mmap_sem);
-	} else {
-		/*
-		 * By now we have tried gup_fast, and possibly async_pf, and we
-		 * are certainly not atomic. Time to retry the gup, allowing
-		 * mmap semaphore to be relinquished in the case of IO.
-		 */
-		npages = kvm_get_user_page_io(current, current->mm, addr,
-					      write_fault, page);
-	}
+	} else
+		npages = __get_user_pages_unlocked(current, current->mm, addr, 1,
+						   write_fault, 0, page,
+						   FOLL_TOUCH|FOLL_HWPOISON);
 	if (npages != 1)
 		return npages;
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/5] mm: gup: kvm use get_user_pages_unlocked
  2015-01-13 16:37   ` Andrea Arcangeli
@ 2015-01-13 19:21     ` Andres Lagar-Cavilla
  -1 siblings, 0 replies; 26+ messages in thread
From: Andres Lagar-Cavilla @ 2015-01-13 19:21 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Andrew Morton, linux-kernel, linux-mm, Kirill A. Shutemov,
	Michel Lespinasse, Andrew Jones, Hugh Dickins, Mel Gorman,
	Minchan Kim, KOSAKI Motohiro, \Dr. David Alan Gilbert\,
	Peter Feiner, Peter Zijlstra, Benjamin Herrenschmidt,
	James Bottomley, David Miller, Steve Capper, Johannes Weiner

On Tue, Jan 13, 2015 at 8:37 AM, Andrea Arcangeli <aarcange@redhat.com> wrote:
> Use the more generic get_user_pages_unlocked which has the additional
> benefit of passing FAULT_FLAG_ALLOW_RETRY at the very first page fault
> (which allows the first page fault in an unmapped area to be always
> able to block indefinitely by being allowed to release the mmap_sem).
>
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: Andres Lagar-Cavilla <andreslc@google.com>

Thanks for the cleanup.

> ---
>  include/linux/kvm_host.h | 11 -----------
>  virt/kvm/async_pf.c      |  2 +-
>  virt/kvm/kvm_main.c      | 50 ++++--------------------------------------------
>  3 files changed, 5 insertions(+), 58 deletions(-)
>
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 26f1060..d189ee0 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -200,17 +200,6 @@ int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, unsigned long hva,
>  int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu);
>  #endif
>
> -/*
> - * Carry out a gup that requires IO. Allow the mm to relinquish the mmap
> - * semaphore if the filemap/swap has to wait on a page lock. pagep == NULL
> - * controls whether we retry the gup one more time to completion in that case.
> - * Typically this is called after a FAULT_FLAG_RETRY_NOWAIT in the main tdp
> - * handler.
> - */
> -int kvm_get_user_page_io(struct task_struct *tsk, struct mm_struct *mm,
> -                        unsigned long addr, bool write_fault,
> -                        struct page **pagep);
> -
>  enum {
>         OUTSIDE_GUEST_MODE,
>         IN_GUEST_MODE,
> diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c
> index 5ff7f7f..44660ae 100644
> --- a/virt/kvm/async_pf.c
> +++ b/virt/kvm/async_pf.c
> @@ -80,7 +80,7 @@ static void async_pf_execute(struct work_struct *work)
>
>         might_sleep();
>
> -       kvm_get_user_page_io(NULL, mm, addr, 1, NULL);
> +       get_user_pages_unlocked(NULL, mm, addr, 1, 1, 0, NULL);
>         kvm_async_page_present_sync(vcpu, apf);
>
>         spin_lock(&vcpu->async_pf.lock);
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 1cc6e2e..458b9b1 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -1128,43 +1128,6 @@ static int get_user_page_nowait(struct task_struct *tsk, struct mm_struct *mm,
>         return __get_user_pages(tsk, mm, start, 1, flags, page, NULL, NULL);
>  }
>
> -int kvm_get_user_page_io(struct task_struct *tsk, struct mm_struct *mm,
> -                        unsigned long addr, bool write_fault,
> -                        struct page **pagep)
> -{
> -       int npages;
> -       int locked = 1;
> -       int flags = FOLL_TOUCH | FOLL_HWPOISON |
> -                   (pagep ? FOLL_GET : 0) |
> -                   (write_fault ? FOLL_WRITE : 0);
> -
> -       /*
> -        * If retrying the fault, we get here *not* having allowed the filemap
> -        * to wait on the page lock. We should now allow waiting on the IO with
> -        * the mmap semaphore released.
> -        */
> -       down_read(&mm->mmap_sem);
> -       npages = __get_user_pages(tsk, mm, addr, 1, flags, pagep, NULL,
> -                                 &locked);
> -       if (!locked) {
> -               VM_BUG_ON(npages);
> -
> -               if (!pagep)
> -                       return 0;
> -
> -               /*
> -                * The previous call has now waited on the IO. Now we can
> -                * retry and complete. Pass TRIED to ensure we do not re
> -                * schedule async IO (see e.g. filemap_fault).
> -                */
> -               down_read(&mm->mmap_sem);
> -               npages = __get_user_pages(tsk, mm, addr, 1, flags | FOLL_TRIED,
> -                                         pagep, NULL, NULL);
> -       }
> -       up_read(&mm->mmap_sem);
> -       return npages;
> -}
> -
>  static inline int check_user_page_hwpoison(unsigned long addr)
>  {
>         int rc, flags = FOLL_TOUCH | FOLL_HWPOISON | FOLL_WRITE;
> @@ -1227,15 +1190,10 @@ static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault,
>                 npages = get_user_page_nowait(current, current->mm,
>                                               addr, write_fault, page);
>                 up_read(&current->mm->mmap_sem);
> -       } else {
> -               /*
> -                * By now we have tried gup_fast, and possibly async_pf, and we
> -                * are certainly not atomic. Time to retry the gup, allowing
> -                * mmap semaphore to be relinquished in the case of IO.
> -                */
> -               npages = kvm_get_user_page_io(current, current->mm, addr,
> -                                             write_fault, page);
> -       }
> +       } else
> +               npages = __get_user_pages_unlocked(current, current->mm, addr, 1,
> +                                                  write_fault, 0, page,
> +                                                  FOLL_TOUCH|FOLL_HWPOISON);
>         if (npages != 1)
>                 return npages;
>



-- 
Andres Lagar-Cavilla | Google Kernel Team | andreslc@google.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/5] mm: gup: kvm use get_user_pages_unlocked
@ 2015-01-13 19:21     ` Andres Lagar-Cavilla
  0 siblings, 0 replies; 26+ messages in thread
From: Andres Lagar-Cavilla @ 2015-01-13 19:21 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Andrew Morton, linux-kernel, linux-mm, Kirill A. Shutemov,
	Michel Lespinasse, Andrew Jones, Hugh Dickins, Mel Gorman,
	Minchan Kim, KOSAKI Motohiro, \Dr. David Alan Gilbert\,
	Peter Feiner, Peter Zijlstra, Benjamin Herrenschmidt,
	James Bottomley, David Miller, Steve Capper, Johannes Weiner

On Tue, Jan 13, 2015 at 8:37 AM, Andrea Arcangeli <aarcange@redhat.com> wrote:
> Use the more generic get_user_pages_unlocked which has the additional
> benefit of passing FAULT_FLAG_ALLOW_RETRY at the very first page fault
> (which allows the first page fault in an unmapped area to be always
> able to block indefinitely by being allowed to release the mmap_sem).
>
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: Andres Lagar-Cavilla <andreslc@google.com>

Thanks for the cleanup.

> ---
>  include/linux/kvm_host.h | 11 -----------
>  virt/kvm/async_pf.c      |  2 +-
>  virt/kvm/kvm_main.c      | 50 ++++--------------------------------------------
>  3 files changed, 5 insertions(+), 58 deletions(-)
>
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 26f1060..d189ee0 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -200,17 +200,6 @@ int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, unsigned long hva,
>  int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu);
>  #endif
>
> -/*
> - * Carry out a gup that requires IO. Allow the mm to relinquish the mmap
> - * semaphore if the filemap/swap has to wait on a page lock. pagep == NULL
> - * controls whether we retry the gup one more time to completion in that case.
> - * Typically this is called after a FAULT_FLAG_RETRY_NOWAIT in the main tdp
> - * handler.
> - */
> -int kvm_get_user_page_io(struct task_struct *tsk, struct mm_struct *mm,
> -                        unsigned long addr, bool write_fault,
> -                        struct page **pagep);
> -
>  enum {
>         OUTSIDE_GUEST_MODE,
>         IN_GUEST_MODE,
> diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c
> index 5ff7f7f..44660ae 100644
> --- a/virt/kvm/async_pf.c
> +++ b/virt/kvm/async_pf.c
> @@ -80,7 +80,7 @@ static void async_pf_execute(struct work_struct *work)
>
>         might_sleep();
>
> -       kvm_get_user_page_io(NULL, mm, addr, 1, NULL);
> +       get_user_pages_unlocked(NULL, mm, addr, 1, 1, 0, NULL);
>         kvm_async_page_present_sync(vcpu, apf);
>
>         spin_lock(&vcpu->async_pf.lock);
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 1cc6e2e..458b9b1 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -1128,43 +1128,6 @@ static int get_user_page_nowait(struct task_struct *tsk, struct mm_struct *mm,
>         return __get_user_pages(tsk, mm, start, 1, flags, page, NULL, NULL);
>  }
>
> -int kvm_get_user_page_io(struct task_struct *tsk, struct mm_struct *mm,
> -                        unsigned long addr, bool write_fault,
> -                        struct page **pagep)
> -{
> -       int npages;
> -       int locked = 1;
> -       int flags = FOLL_TOUCH | FOLL_HWPOISON |
> -                   (pagep ? FOLL_GET : 0) |
> -                   (write_fault ? FOLL_WRITE : 0);
> -
> -       /*
> -        * If retrying the fault, we get here *not* having allowed the filemap
> -        * to wait on the page lock. We should now allow waiting on the IO with
> -        * the mmap semaphore released.
> -        */
> -       down_read(&mm->mmap_sem);
> -       npages = __get_user_pages(tsk, mm, addr, 1, flags, pagep, NULL,
> -                                 &locked);
> -       if (!locked) {
> -               VM_BUG_ON(npages);
> -
> -               if (!pagep)
> -                       return 0;
> -
> -               /*
> -                * The previous call has now waited on the IO. Now we can
> -                * retry and complete. Pass TRIED to ensure we do not re
> -                * schedule async IO (see e.g. filemap_fault).
> -                */
> -               down_read(&mm->mmap_sem);
> -               npages = __get_user_pages(tsk, mm, addr, 1, flags | FOLL_TRIED,
> -                                         pagep, NULL, NULL);
> -       }
> -       up_read(&mm->mmap_sem);
> -       return npages;
> -}
> -
>  static inline int check_user_page_hwpoison(unsigned long addr)
>  {
>         int rc, flags = FOLL_TOUCH | FOLL_HWPOISON | FOLL_WRITE;
> @@ -1227,15 +1190,10 @@ static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault,
>                 npages = get_user_page_nowait(current, current->mm,
>                                               addr, write_fault, page);
>                 up_read(&current->mm->mmap_sem);
> -       } else {
> -               /*
> -                * By now we have tried gup_fast, and possibly async_pf, and we
> -                * are certainly not atomic. Time to retry the gup, allowing
> -                * mmap semaphore to be relinquished in the case of IO.
> -                */
> -               npages = kvm_get_user_page_io(current, current->mm, addr,
> -                                             write_fault, page);
> -       }
> +       } else
> +               npages = __get_user_pages_unlocked(current, current->mm, addr, 1,
> +                                                  write_fault, 0, page,
> +                                                  FOLL_TOUCH|FOLL_HWPOISON);
>         if (npages != 1)
>                 return npages;
>



-- 
Andres Lagar-Cavilla | Google Kernel Team | andreslc@google.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/5] leverage FAULT_FOLL_ALLOW_RETRY in get_user_pages try#2
  2015-01-13 16:37 ` Andrea Arcangeli
@ 2015-01-13 19:26   ` Andres Lagar-Cavilla
  -1 siblings, 0 replies; 26+ messages in thread
From: Andres Lagar-Cavilla @ 2015-01-13 19:26 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Andrew Morton, linux-kernel, linux-mm, Kirill A. Shutemov,
	Michel Lespinasse, Andrew Jones, Hugh Dickins, Mel Gorman,
	Minchan Kim, KOSAKI Motohiro, \Dr. David Alan Gilbert\,
	Peter Feiner, Peter Zijlstra, Benjamin Herrenschmidt,
	James Bottomley, David Miller, Steve Capper, Johannes Weiner

On Tue, Jan 13, 2015 at 8:37 AM, Andrea Arcangeli <aarcange@redhat.com> wrote:
> Last submit didn't go into -mm/3.19-rc, no prob but here I retry
> (possibly too early for 3.20-rc but I don't expect breakages in this
> area post -rc4) after a rebase on upstream.

The series looks good to me, I didn't spot any material changes from last time.

I would fold patches 3 & 4 together: it's the same idea behind both,
just different parts of the tree.

You could argue also for folding 1 &2 together, the "damage" is done
in 1, 2 as a standalone results in an (IMHO) unnecessarily large diff.

Thanks,
Andres


------ 8< snipping the rest ---------------

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/5] leverage FAULT_FOLL_ALLOW_RETRY in get_user_pages try#2
@ 2015-01-13 19:26   ` Andres Lagar-Cavilla
  0 siblings, 0 replies; 26+ messages in thread
From: Andres Lagar-Cavilla @ 2015-01-13 19:26 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Andrew Morton, linux-kernel, linux-mm, Kirill A. Shutemov,
	Michel Lespinasse, Andrew Jones, Hugh Dickins, Mel Gorman,
	Minchan Kim, KOSAKI Motohiro, \Dr. David Alan Gilbert\,
	Peter Feiner, Peter Zijlstra, Benjamin Herrenschmidt,
	James Bottomley, David Miller, Steve Capper, Johannes Weiner

On Tue, Jan 13, 2015 at 8:37 AM, Andrea Arcangeli <aarcange@redhat.com> wrote:
> Last submit didn't go into -mm/3.19-rc, no prob but here I retry
> (possibly too early for 3.20-rc but I don't expect breakages in this
> area post -rc4) after a rebase on upstream.

The series looks good to me, I didn't spot any material changes from last time.

I would fold patches 3 & 4 together: it's the same idea behind both,
just different parts of the tree.

You could argue also for folding 1 &2 together, the "damage" is done
in 1, 2 as a standalone results in an (IMHO) unnecessarily large diff.

Thanks,
Andres


------ 8< snipping the rest ---------------

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/5] mm: gup: add get_user_pages_locked and get_user_pages_unlocked
  2015-01-13 16:37   ` Andrea Arcangeli
@ 2015-01-16 12:47     ` Kirill A. Shutemov
  -1 siblings, 0 replies; 26+ messages in thread
From: Kirill A. Shutemov @ 2015-01-16 12:47 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Andrew Morton, linux-kernel, linux-mm, Michel Lespinasse,
	Andrew Jones, Hugh Dickins, Mel Gorman, Andres Lagar-Cavilla,
	Minchan Kim, KOSAKI Motohiro, \"Dr. David Alan Gilbert\",
	Peter Feiner, Peter Zijlstra, Benjamin Herrenschmidt,
	James Bottomley, David Miller, Steve Capper, Johannes Weiner

On Tue, Jan 13, 2015 at 05:37:50PM +0100, Andrea Arcangeli wrote:
> We can leverage the VM_FAULT_RETRY functionality in the page fault
> paths better by using either get_user_pages_locked or
> get_user_pages_unlocked.
> 
> The former allow conversion of get_user_pages invocations that will
> have to pass a "&locked" parameter to know if the mmap_sem was dropped
> during the call. Example from:
> 
>     down_read(&mm->mmap_sem);
>     do_something()
>     get_user_pages(tsk, mm, ..., pages, NULL);
>     up_read(&mm->mmap_sem);
> 
> to:
> 
>     int locked = 1;
>     down_read(&mm->mmap_sem);
>     do_something()
>     get_user_pages_locked(tsk, mm, ..., pages, &locked);
>     if (locked)
>         up_read(&mm->mmap_sem);
> 
> The latter is suitable only as a drop in replacement of the form:
> 
>     down_read(&mm->mmap_sem);
>     get_user_pages(tsk, mm, ..., pages, NULL);
>     up_read(&mm->mmap_sem);
> 
> into:
> 
>     get_user_pages_unlocked(tsk, mm, ..., pages);
> 
> Where tsk, mm, the intermediate "..." paramters and "pages" can be any
> value as before. Just the last parameter of get_user_pages (vmas) must
> be NULL for get_user_pages_locked|unlocked to be usable (the latter
> original form wouldn't have been safe anyway if vmas wasn't null, for
> the former we just make it explicit by dropping the parameter).
> 
> If vmas is not NULL these two methods cannot be used.
> 
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
> Reviewed-by: Andres Lagar-Cavilla <andreslc@google.com>
> Reviewed-by: Peter Feiner <pfeiner@google.com>

Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/5] mm: gup: add get_user_pages_locked and get_user_pages_unlocked
@ 2015-01-16 12:47     ` Kirill A. Shutemov
  0 siblings, 0 replies; 26+ messages in thread
From: Kirill A. Shutemov @ 2015-01-16 12:47 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Andrew Morton, linux-kernel, linux-mm, Michel Lespinasse,
	Andrew Jones, Hugh Dickins, Mel Gorman, Andres Lagar-Cavilla,
	Minchan Kim, KOSAKI Motohiro, \"Dr. David Alan Gilbert\",
	Peter Feiner, Peter Zijlstra, Benjamin Herrenschmidt,
	James Bottomley, David Miller, Steve Capper, Johannes Weiner

On Tue, Jan 13, 2015 at 05:37:50PM +0100, Andrea Arcangeli wrote:
> We can leverage the VM_FAULT_RETRY functionality in the page fault
> paths better by using either get_user_pages_locked or
> get_user_pages_unlocked.
> 
> The former allow conversion of get_user_pages invocations that will
> have to pass a "&locked" parameter to know if the mmap_sem was dropped
> during the call. Example from:
> 
>     down_read(&mm->mmap_sem);
>     do_something()
>     get_user_pages(tsk, mm, ..., pages, NULL);
>     up_read(&mm->mmap_sem);
> 
> to:
> 
>     int locked = 1;
>     down_read(&mm->mmap_sem);
>     do_something()
>     get_user_pages_locked(tsk, mm, ..., pages, &locked);
>     if (locked)
>         up_read(&mm->mmap_sem);
> 
> The latter is suitable only as a drop in replacement of the form:
> 
>     down_read(&mm->mmap_sem);
>     get_user_pages(tsk, mm, ..., pages, NULL);
>     up_read(&mm->mmap_sem);
> 
> into:
> 
>     get_user_pages_unlocked(tsk, mm, ..., pages);
> 
> Where tsk, mm, the intermediate "..." paramters and "pages" can be any
> value as before. Just the last parameter of get_user_pages (vmas) must
> be NULL for get_user_pages_locked|unlocked to be usable (the latter
> original form wouldn't have been safe anyway if vmas wasn't null, for
> the former we just make it explicit by dropping the parameter).
> 
> If vmas is not NULL these two methods cannot be used.
> 
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
> Reviewed-by: Andres Lagar-Cavilla <andreslc@google.com>
> Reviewed-by: Peter Feiner <pfeiner@google.com>

Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/5] mm: gup: add __get_user_pages_unlocked to customize gup_flags
  2015-01-13 16:37   ` Andrea Arcangeli
@ 2015-01-16 12:51     ` Kirill A. Shutemov
  -1 siblings, 0 replies; 26+ messages in thread
From: Kirill A. Shutemov @ 2015-01-16 12:51 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Andrew Morton, linux-kernel, linux-mm, Michel Lespinasse,
	Andrew Jones, Hugh Dickins, Mel Gorman, Andres Lagar-Cavilla,
	Minchan Kim, KOSAKI Motohiro, \"Dr. David Alan Gilbert\",
	Peter Feiner, Peter Zijlstra, Benjamin Herrenschmidt,
	James Bottomley, David Miller, Steve Capper, Johannes Weiner

On Tue, Jan 13, 2015 at 05:37:51PM +0100, Andrea Arcangeli wrote:
> Some caller (like KVM) may want to set the gup_flags like
> FOLL_HWPOSION to get a proper -EHWPOSION retval instead of -EFAULT to
> take a more appropriate action if get_user_pages runs into a memory
> failure.
> 
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>

Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/5] mm: gup: add __get_user_pages_unlocked to customize gup_flags
@ 2015-01-16 12:51     ` Kirill A. Shutemov
  0 siblings, 0 replies; 26+ messages in thread
From: Kirill A. Shutemov @ 2015-01-16 12:51 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Andrew Morton, linux-kernel, linux-mm, Michel Lespinasse,
	Andrew Jones, Hugh Dickins, Mel Gorman, Andres Lagar-Cavilla,
	Minchan Kim, KOSAKI Motohiro, \"Dr. David Alan Gilbert\",
	Peter Feiner, Peter Zijlstra, Benjamin Herrenschmidt,
	James Bottomley, David Miller, Steve Capper, Johannes Weiner

On Tue, Jan 13, 2015 at 05:37:51PM +0100, Andrea Arcangeli wrote:
> Some caller (like KVM) may want to set the gup_flags like
> FOLL_HWPOSION to get a proper -EHWPOSION retval instead of -EFAULT to
> take a more appropriate action if get_user_pages runs into a memory
> failure.
> 
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>

Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/5] mm: gup: use get_user_pages_unlocked within get_user_pages_fast
  2015-01-13 16:37   ` Andrea Arcangeli
@ 2015-01-16 12:55     ` Kirill A. Shutemov
  -1 siblings, 0 replies; 26+ messages in thread
From: Kirill A. Shutemov @ 2015-01-16 12:55 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Andrew Morton, linux-kernel, linux-mm, Michel Lespinasse,
	Andrew Jones, Hugh Dickins, Mel Gorman, Andres Lagar-Cavilla,
	Minchan Kim, KOSAKI Motohiro, \"Dr. David Alan Gilbert\",
	Peter Feiner, Peter Zijlstra, Benjamin Herrenschmidt,
	James Bottomley, David Miller, Steve Capper, Johannes Weiner

On Tue, Jan 13, 2015 at 05:37:52PM +0100, Andrea Arcangeli wrote:
> This allows the get_user_pages_fast slow path to release the mmap_sem
> before blocking.
> 
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>

Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/5] mm: gup: use get_user_pages_unlocked within get_user_pages_fast
@ 2015-01-16 12:55     ` Kirill A. Shutemov
  0 siblings, 0 replies; 26+ messages in thread
From: Kirill A. Shutemov @ 2015-01-16 12:55 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Andrew Morton, linux-kernel, linux-mm, Michel Lespinasse,
	Andrew Jones, Hugh Dickins, Mel Gorman, Andres Lagar-Cavilla,
	Minchan Kim, KOSAKI Motohiro, \"Dr. David Alan Gilbert\",
	Peter Feiner, Peter Zijlstra, Benjamin Herrenschmidt,
	James Bottomley, David Miller, Steve Capper, Johannes Weiner

On Tue, Jan 13, 2015 at 05:37:52PM +0100, Andrea Arcangeli wrote:
> This allows the get_user_pages_fast slow path to release the mmap_sem
> before blocking.
> 
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>

Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/5] mm: gup: use get_user_pages_unlocked
  2015-01-13 16:37   ` Andrea Arcangeli
@ 2015-01-16 12:58     ` Kirill A. Shutemov
  -1 siblings, 0 replies; 26+ messages in thread
From: Kirill A. Shutemov @ 2015-01-16 12:58 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Andrew Morton, linux-kernel, linux-mm, Michel Lespinasse,
	Andrew Jones, Hugh Dickins, Mel Gorman, Andres Lagar-Cavilla,
	Minchan Kim, KOSAKI Motohiro, \"Dr. David Alan Gilbert\",
	Peter Feiner, Peter Zijlstra, Benjamin Herrenschmidt,
	James Bottomley, David Miller, Steve Capper, Johannes Weiner

On Tue, Jan 13, 2015 at 05:37:53PM +0100, Andrea Arcangeli wrote:
> This allows those get_user_pages calls to pass FAULT_FLAG_ALLOW_RETRY
> to the page fault in order to release the mmap_sem during the I/O.
> 
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>

Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/5] mm: gup: use get_user_pages_unlocked
@ 2015-01-16 12:58     ` Kirill A. Shutemov
  0 siblings, 0 replies; 26+ messages in thread
From: Kirill A. Shutemov @ 2015-01-16 12:58 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Andrew Morton, linux-kernel, linux-mm, Michel Lespinasse,
	Andrew Jones, Hugh Dickins, Mel Gorman, Andres Lagar-Cavilla,
	Minchan Kim, KOSAKI Motohiro, \"Dr. David Alan Gilbert\",
	Peter Feiner, Peter Zijlstra, Benjamin Herrenschmidt,
	James Bottomley, David Miller, Steve Capper, Johannes Weiner

On Tue, Jan 13, 2015 at 05:37:53PM +0100, Andrea Arcangeli wrote:
> This allows those get_user_pages calls to pass FAULT_FLAG_ALLOW_RETRY
> to the page fault in order to release the mmap_sem during the I/O.
> 
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>

Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/5] mm: gup: kvm use get_user_pages_unlocked
  2015-01-13 16:37   ` Andrea Arcangeli
@ 2015-01-16 13:03     ` Kirill A. Shutemov
  -1 siblings, 0 replies; 26+ messages in thread
From: Kirill A. Shutemov @ 2015-01-16 13:03 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Andrew Morton, linux-kernel, linux-mm, Michel Lespinasse,
	Andrew Jones, Hugh Dickins, Mel Gorman, Andres Lagar-Cavilla,
	Minchan Kim, KOSAKI Motohiro, \"Dr. David Alan Gilbert\",
	Peter Feiner, Peter Zijlstra, Benjamin Herrenschmidt,
	James Bottomley, David Miller, Steve Capper, Johannes Weiner

On Tue, Jan 13, 2015 at 05:37:54PM +0100, Andrea Arcangeli wrote:
> Use the more generic get_user_pages_unlocked which has the additional
> benefit of passing FAULT_FLAG_ALLOW_RETRY at the very first page fault
> (which allows the first page fault in an unmapped area to be always
> able to block indefinitely by being allowed to release the mmap_sem).
> 
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>

Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/5] mm: gup: kvm use get_user_pages_unlocked
@ 2015-01-16 13:03     ` Kirill A. Shutemov
  0 siblings, 0 replies; 26+ messages in thread
From: Kirill A. Shutemov @ 2015-01-16 13:03 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Andrew Morton, linux-kernel, linux-mm, Michel Lespinasse,
	Andrew Jones, Hugh Dickins, Mel Gorman, Andres Lagar-Cavilla,
	Minchan Kim, KOSAKI Motohiro, \"Dr. David Alan Gilbert\",
	Peter Feiner, Peter Zijlstra, Benjamin Herrenschmidt,
	James Bottomley, David Miller, Steve Capper, Johannes Weiner

On Tue, Jan 13, 2015 at 05:37:54PM +0100, Andrea Arcangeli wrote:
> Use the more generic get_user_pages_unlocked which has the additional
> benefit of passing FAULT_FLAG_ALLOW_RETRY at the very first page fault
> (which allows the first page fault in an unmapped area to be always
> able to block indefinitely by being allowed to release the mmap_sem).
> 
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>

Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2015-01-16 13:03 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-13 16:37 [PATCH 0/5] leverage FAULT_FOLL_ALLOW_RETRY in get_user_pages try#2 Andrea Arcangeli
2015-01-13 16:37 ` Andrea Arcangeli
2015-01-13 16:37 ` [PATCH 1/5] mm: gup: add get_user_pages_locked and get_user_pages_unlocked Andrea Arcangeli
2015-01-13 16:37   ` Andrea Arcangeli
2015-01-16 12:47   ` Kirill A. Shutemov
2015-01-16 12:47     ` Kirill A. Shutemov
2015-01-13 16:37 ` [PATCH 2/5] mm: gup: add __get_user_pages_unlocked to customize gup_flags Andrea Arcangeli
2015-01-13 16:37   ` Andrea Arcangeli
2015-01-16 12:51   ` Kirill A. Shutemov
2015-01-16 12:51     ` Kirill A. Shutemov
2015-01-13 16:37 ` [PATCH 3/5] mm: gup: use get_user_pages_unlocked within get_user_pages_fast Andrea Arcangeli
2015-01-13 16:37   ` Andrea Arcangeli
2015-01-16 12:55   ` Kirill A. Shutemov
2015-01-16 12:55     ` Kirill A. Shutemov
2015-01-13 16:37 ` [PATCH 4/5] mm: gup: use get_user_pages_unlocked Andrea Arcangeli
2015-01-13 16:37   ` Andrea Arcangeli
2015-01-16 12:58   ` Kirill A. Shutemov
2015-01-16 12:58     ` Kirill A. Shutemov
2015-01-13 16:37 ` [PATCH 5/5] mm: gup: kvm " Andrea Arcangeli
2015-01-13 16:37   ` Andrea Arcangeli
2015-01-13 19:21   ` Andres Lagar-Cavilla
2015-01-13 19:21     ` Andres Lagar-Cavilla
2015-01-16 13:03   ` Kirill A. Shutemov
2015-01-16 13:03     ` Kirill A. Shutemov
2015-01-13 19:26 ` [PATCH 0/5] leverage FAULT_FOLL_ALLOW_RETRY in get_user_pages try#2 Andres Lagar-Cavilla
2015-01-13 19:26   ` Andres Lagar-Cavilla

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.