linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RESEND PATCH 0/7] Add FOLL_LONGTERM to GUP fast and use it
@ 2019-03-17 18:34 ira.weiny
  2019-03-17 18:34 ` [RESEND 1/7] mm/gup: Replace get_user_pages_longterm() with FOLL_LONGTERM ira.weiny
                   ` (7 more replies)
  0 siblings, 8 replies; 29+ messages in thread
From: ira.weiny @ 2019-03-17 18:34 UTC (permalink / raw)
  To: Andrew Morton, John Hubbard, Michal Hocko, Kirill A. Shutemov,
	Peter Zijlstra, Jason Gunthorpe, Benjamin Herrenschmidt,
	Paul Mackerras, David S. Miller, Martin Schwidefsky,
	Heiko Carstens, Rich Felker, Yoshinori Sato, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Ralf Baechle, James Hogan
  Cc: Ira Weiny, linux-mm, linux-kernel, linux-mips, linuxppc-dev,
	linux-s390, linux-sh, sparclinux, linux-rdma, netdev

From: Ira Weiny <ira.weiny@intel.com>

Resending after rebasing to the latest mm tree.

HFI1, qib, and mthca, use get_user_pages_fast() due to it performance
advantages.  These pages can be held for a significant time.  But
get_user_pages_fast() does not protect against mapping FS DAX pages.

Introduce FOLL_LONGTERM and use this flag in get_user_pages_fast() which
retains the performance while also adding the FS DAX checks.  XDP has also
shown interest in using this functionality.[1]

In addition we change get_user_pages() to use the new FOLL_LONGTERM flag and
remove the specialized get_user_pages_longterm call.

[1] https://lkml.org/lkml/2019/2/11/1789

Ira Weiny (7):
  mm/gup: Replace get_user_pages_longterm() with FOLL_LONGTERM
  mm/gup: Change write parameter to flags in fast walk
  mm/gup: Change GUP fast to use flags rather than a write 'bool'
  mm/gup: Add FOLL_LONGTERM capability to GUP fast
  IB/hfi1: Use the new FOLL_LONGTERM flag to get_user_pages_fast()
  IB/qib: Use the new FOLL_LONGTERM flag to get_user_pages_fast()
  IB/mthca: Use the new FOLL_LONGTERM flag to get_user_pages_fast()

 arch/mips/mm/gup.c                          |  11 +-
 arch/powerpc/kvm/book3s_64_mmu_hv.c         |   4 +-
 arch/powerpc/kvm/e500_mmu.c                 |   2 +-
 arch/powerpc/mm/mmu_context_iommu.c         |   3 +-
 arch/s390/kvm/interrupt.c                   |   2 +-
 arch/s390/mm/gup.c                          |  12 +-
 arch/sh/mm/gup.c                            |  11 +-
 arch/sparc/mm/gup.c                         |   9 +-
 arch/x86/kvm/paging_tmpl.h                  |   2 +-
 arch/x86/kvm/svm.c                          |   2 +-
 drivers/fpga/dfl-afu-dma-region.c           |   2 +-
 drivers/gpu/drm/via/via_dmablit.c           |   3 +-
 drivers/infiniband/core/umem.c              |   5 +-
 drivers/infiniband/hw/hfi1/user_pages.c     |   5 +-
 drivers/infiniband/hw/mthca/mthca_memfree.c |   3 +-
 drivers/infiniband/hw/qib/qib_user_pages.c  |   8 +-
 drivers/infiniband/hw/qib/qib_user_sdma.c   |   2 +-
 drivers/infiniband/hw/usnic/usnic_uiom.c    |   9 +-
 drivers/media/v4l2-core/videobuf-dma-sg.c   |   6 +-
 drivers/misc/genwqe/card_utils.c            |   2 +-
 drivers/misc/vmw_vmci/vmci_host.c           |   2 +-
 drivers/misc/vmw_vmci/vmci_queue_pair.c     |   6 +-
 drivers/platform/goldfish/goldfish_pipe.c   |   3 +-
 drivers/rapidio/devices/rio_mport_cdev.c    |   4 +-
 drivers/sbus/char/oradax.c                  |   2 +-
 drivers/scsi/st.c                           |   3 +-
 drivers/staging/gasket/gasket_page_table.c  |   4 +-
 drivers/tee/tee_shm.c                       |   2 +-
 drivers/vfio/vfio_iommu_spapr_tce.c         |   3 +-
 drivers/vfio/vfio_iommu_type1.c             |   3 +-
 drivers/vhost/vhost.c                       |   2 +-
 drivers/video/fbdev/pvr2fb.c                |   2 +-
 drivers/virt/fsl_hypervisor.c               |   2 +-
 drivers/xen/gntdev.c                        |   2 +-
 fs/io_uring.c                               |   5 +-
 fs/orangefs/orangefs-bufmap.c               |   2 +-
 include/linux/mm.h                          |  18 +-
 kernel/futex.c                              |   2 +-
 lib/iov_iter.c                              |   7 +-
 mm/gup.c                                    | 258 ++++++++++++--------
 mm/gup_benchmark.c                          |   5 +-
 mm/util.c                                   |   8 +-
 net/ceph/pagevec.c                          |   2 +-
 net/rds/info.c                              |   2 +-
 net/rds/rdma.c                              |   3 +-
 net/xdp/xdp_umem.c                          |   4 +-
 46 files changed, 262 insertions(+), 197 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [RESEND 1/7] mm/gup: Replace get_user_pages_longterm() with FOLL_LONGTERM
  2019-03-17 18:34 [RESEND PATCH 0/7] Add FOLL_LONGTERM to GUP fast and use it ira.weiny
@ 2019-03-17 18:34 ` ira.weiny
  2019-03-22 21:24   ` Dan Williams
  2019-03-17 18:34 ` [RESEND 2/7] mm/gup: Change write parameter to flags in fast walk ira.weiny
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 29+ messages in thread
From: ira.weiny @ 2019-03-17 18:34 UTC (permalink / raw)
  To: Andrew Morton, John Hubbard, Michal Hocko, Kirill A. Shutemov,
	Peter Zijlstra, Jason Gunthorpe, Benjamin Herrenschmidt,
	Paul Mackerras, David S. Miller, Martin Schwidefsky,
	Heiko Carstens, Rich Felker, Yoshinori Sato, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Ralf Baechle, James Hogan
  Cc: Ira Weiny, Aneesh Kumar K . V, Michal Hocko, linux-mm,
	linux-kernel, linux-mips, linuxppc-dev, linux-s390, linux-sh,
	sparclinux, linux-rdma, netdev

From: Ira Weiny <ira.weiny@intel.com>

Rather than have a separate get_user_pages_longterm() call,
introduce FOLL_LONGTERM and change the longterm callers to use
it.

This patch does not change any functionality.

FOLL_LONGTERM can only be supported with get_user_pages() as it
requires vmas to determine if DAX is in use.

CC: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes from V1:
	Rebased on 5.1 merge
	Adjusted for changes introduced by CONFIG_CMA
	Convert new users of GUP longterm
		io_uring.c
		xdp_umem.c

 arch/powerpc/mm/mmu_context_iommu.c        |   3 +-
 drivers/infiniband/core/umem.c             |   5 +-
 drivers/infiniband/hw/qib/qib_user_pages.c |   8 +-
 drivers/infiniband/hw/usnic/usnic_uiom.c   |   9 +-
 drivers/media/v4l2-core/videobuf-dma-sg.c  |   6 +-
 drivers/vfio/vfio_iommu_type1.c            |   3 +-
 fs/io_uring.c                              |   5 +-
 include/linux/mm.h                         |  14 +-
 mm/gup.c                                   | 171 ++++++++++++---------
 mm/gup_benchmark.c                         |   5 +-
 net/xdp/xdp_umem.c                         |   4 +-
 11 files changed, 129 insertions(+), 104 deletions(-)

diff --git a/arch/powerpc/mm/mmu_context_iommu.c b/arch/powerpc/mm/mmu_context_iommu.c
index e7a9c4f6bfca..2bd48998765e 100644
--- a/arch/powerpc/mm/mmu_context_iommu.c
+++ b/arch/powerpc/mm/mmu_context_iommu.c
@@ -148,7 +148,8 @@ static long mm_iommu_do_alloc(struct mm_struct *mm, unsigned long ua,
 	}
 
 	down_read(&mm->mmap_sem);
-	ret = get_user_pages_longterm(ua, entries, FOLL_WRITE, mem->hpages, NULL);
+	ret = get_user_pages(ua, entries, FOLL_WRITE | FOLL_LONGTERM,
+			     mem->hpages, NULL);
 	up_read(&mm->mmap_sem);
 	if (ret != entries) {
 		/* free the reference taken */
diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index fe5551562dbc..31191f098e73 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -189,10 +189,11 @@ struct ib_umem *ib_umem_get(struct ib_udata *udata, unsigned long addr,
 
 	while (npages) {
 		down_read(&mm->mmap_sem);
-		ret = get_user_pages_longterm(cur_base,
+		ret = get_user_pages(cur_base,
 				     min_t(unsigned long, npages,
 					   PAGE_SIZE / sizeof (struct page *)),
-				     gup_flags, page_list, vma_list);
+				     gup_flags | FOLL_LONGTERM,
+				     page_list, vma_list);
 		if (ret < 0) {
 			up_read(&mm->mmap_sem);
 			goto umem_release;
diff --git a/drivers/infiniband/hw/qib/qib_user_pages.c b/drivers/infiniband/hw/qib/qib_user_pages.c
index 123ca8f64f75..f712fb7fa82f 100644
--- a/drivers/infiniband/hw/qib/qib_user_pages.c
+++ b/drivers/infiniband/hw/qib/qib_user_pages.c
@@ -114,10 +114,10 @@ int qib_get_user_pages(unsigned long start_page, size_t num_pages,
 
 	down_read(&current->mm->mmap_sem);
 	for (got = 0; got < num_pages; got += ret) {
-		ret = get_user_pages_longterm(start_page + got * PAGE_SIZE,
-					      num_pages - got,
-					      FOLL_WRITE | FOLL_FORCE,
-					      p + got, NULL);
+		ret = get_user_pages(start_page + got * PAGE_SIZE,
+				     num_pages - got,
+				     FOLL_LONGTERM | FOLL_WRITE | FOLL_FORCE,
+				     p + got, NULL);
 		if (ret < 0) {
 			up_read(&current->mm->mmap_sem);
 			goto bail_release;
diff --git a/drivers/infiniband/hw/usnic/usnic_uiom.c b/drivers/infiniband/hw/usnic/usnic_uiom.c
index 06862a6af185..1d9a182ac163 100644
--- a/drivers/infiniband/hw/usnic/usnic_uiom.c
+++ b/drivers/infiniband/hw/usnic/usnic_uiom.c
@@ -143,10 +143,11 @@ static int usnic_uiom_get_pages(unsigned long addr, size_t size, int writable,
 	ret = 0;
 
 	while (npages) {
-		ret = get_user_pages_longterm(cur_base,
-					min_t(unsigned long, npages,
-					PAGE_SIZE / sizeof(struct page *)),
-					gup_flags, page_list, NULL);
+		ret = get_user_pages(cur_base,
+				     min_t(unsigned long, npages,
+				     PAGE_SIZE / sizeof(struct page *)),
+				     gup_flags | FOLL_LONGTERM,
+				     page_list, NULL);
 
 		if (ret < 0)
 			goto out;
diff --git a/drivers/media/v4l2-core/videobuf-dma-sg.c b/drivers/media/v4l2-core/videobuf-dma-sg.c
index 08929c087e27..870a2a526e0b 100644
--- a/drivers/media/v4l2-core/videobuf-dma-sg.c
+++ b/drivers/media/v4l2-core/videobuf-dma-sg.c
@@ -186,12 +186,12 @@ static int videobuf_dma_init_user_locked(struct videobuf_dmabuf *dma,
 	dprintk(1, "init user [0x%lx+0x%lx => %d pages]\n",
 		data, size, dma->nr_pages);
 
-	err = get_user_pages_longterm(data & PAGE_MASK, dma->nr_pages,
-			     flags, dma->pages, NULL);
+	err = get_user_pages(data & PAGE_MASK, dma->nr_pages,
+			     flags | FOLL_LONGTERM, dma->pages, NULL);
 
 	if (err != dma->nr_pages) {
 		dma->nr_pages = (err >= 0) ? err : 0;
-		dprintk(1, "get_user_pages_longterm: err=%d [%d]\n", err,
+		dprintk(1, "get_user_pages: err=%d [%d]\n", err,
 			dma->nr_pages);
 		return err < 0 ? err : -EINVAL;
 	}
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 73652e21efec..1500bd0bb6da 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -351,7 +351,8 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr,
 
 	down_read(&mm->mmap_sem);
 	if (mm == current->mm) {
-		ret = get_user_pages_longterm(vaddr, 1, flags, page, vmas);
+		ret = get_user_pages(vaddr, 1, flags | FOLL_LONGTERM, page,
+				     vmas);
 	} else {
 		ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags, page,
 					    vmas, NULL);
diff --git a/fs/io_uring.c b/fs/io_uring.c
index e2bd77da5e21..8d61b81373bd 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2450,8 +2450,9 @@ static int io_sqe_buffer_register(struct io_ring_ctx *ctx, void __user *arg,
 
 		ret = 0;
 		down_read(&current->mm->mmap_sem);
-		pret = get_user_pages_longterm(ubuf, nr_pages, FOLL_WRITE,
-						pages, vmas);
+		pret = get_user_pages(ubuf, nr_pages,
+				      FOLL_WRITE | FOLL_LONGTERM,
+				      pages, vmas);
 		if (pret == nr_pages) {
 			/* don't support file backed memory */
 			for (j = 0; j < nr_pages; j++) {
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 2d483dbdffc0..6831077d126c 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1531,19 +1531,6 @@ long get_user_pages_locked(unsigned long start, unsigned long nr_pages,
 long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
 		    struct page **pages, unsigned int gup_flags);
 
-#if defined(CONFIG_FS_DAX) || defined(CONFIG_CMA)
-long get_user_pages_longterm(unsigned long start, unsigned long nr_pages,
-			    unsigned int gup_flags, struct page **pages,
-			    struct vm_area_struct **vmas);
-#else
-static inline long get_user_pages_longterm(unsigned long start,
-		unsigned long nr_pages, unsigned int gup_flags,
-		struct page **pages, struct vm_area_struct **vmas)
-{
-	return get_user_pages(start, nr_pages, gup_flags, pages, vmas);
-}
-#endif /* CONFIG_FS_DAX */
-
 int get_user_pages_fast(unsigned long start, int nr_pages, int write,
 			struct page **pages);
 
@@ -2609,6 +2596,7 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address,
 #define FOLL_REMOTE	0x2000	/* we are working on non-current tsk/mm */
 #define FOLL_COW	0x4000	/* internal GUP flag */
 #define FOLL_ANON	0x8000	/* don't do file mappings */
+#define FOLL_LONGTERM	0x10000	/* mapping is intended for a long term pin */
 
 static inline int vm_fault_to_errno(vm_fault_t vm_fault, int foll_flags)
 {
diff --git a/mm/gup.c b/mm/gup.c
index f84e22685aaa..8cb4cff067bc 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1112,26 +1112,7 @@ long get_user_pages_remote(struct task_struct *tsk, struct mm_struct *mm,
 }
 EXPORT_SYMBOL(get_user_pages_remote);
 
-/*
- * This is the same as get_user_pages_remote(), just with a
- * less-flexible calling convention where we assume that the task
- * and mm being operated on are the current task's and don't allow
- * passing of a locked parameter.  We also obviously don't pass
- * FOLL_REMOTE in here.
- */
-long get_user_pages(unsigned long start, unsigned long nr_pages,
-		unsigned int gup_flags, struct page **pages,
-		struct vm_area_struct **vmas)
-{
-	return __get_user_pages_locked(current, current->mm, start, nr_pages,
-				       pages, vmas, NULL,
-				       gup_flags | FOLL_TOUCH);
-}
-EXPORT_SYMBOL(get_user_pages);
-
 #if defined(CONFIG_FS_DAX) || defined (CONFIG_CMA)
-
-#ifdef CONFIG_FS_DAX
 static bool check_dax_vmas(struct vm_area_struct **vmas, long nr_pages)
 {
 	long i;
@@ -1150,12 +1131,6 @@ static bool check_dax_vmas(struct vm_area_struct **vmas, long nr_pages)
 	}
 	return false;
 }
-#else
-static inline bool check_dax_vmas(struct vm_area_struct **vmas, long nr_pages)
-{
-	return false;
-}
-#endif
 
 #ifdef CONFIG_CMA
 static struct page *new_non_cma_page(struct page *page, unsigned long private)
@@ -1209,10 +1184,13 @@ static struct page *new_non_cma_page(struct page *page, unsigned long private)
 	return __alloc_pages_node(nid, gfp_mask, 0);
 }
 
-static long check_and_migrate_cma_pages(unsigned long start, long nr_pages,
-					unsigned int gup_flags,
+static long check_and_migrate_cma_pages(struct task_struct *tsk,
+					struct mm_struct *mm,
+					unsigned long start,
+					unsigned long nr_pages,
 					struct page **pages,
-					struct vm_area_struct **vmas)
+					struct vm_area_struct **vmas,
+					unsigned int gup_flags)
 {
 	long i;
 	bool drain_allow = true;
@@ -1268,10 +1246,14 @@ static long check_and_migrate_cma_pages(unsigned long start, long nr_pages,
 				putback_movable_pages(&cma_page_list);
 		}
 		/*
-		 * We did migrate all the pages, Try to get the page references again
-		 * migrating any new CMA pages which we failed to isolate earlier.
+		 * We did migrate all the pages, Try to get the page references
+		 * again migrating any new CMA pages which we failed to isolate
+		 * earlier.
 		 */
-		nr_pages = get_user_pages(start, nr_pages, gup_flags, pages, vmas);
+		nr_pages = __get_user_pages_locked(tsk, mm, start, nr_pages,
+						   pages, vmas, NULL,
+						   gup_flags);
+
 		if ((nr_pages > 0) && migrate_allow) {
 			drain_allow = true;
 			goto check_again;
@@ -1281,66 +1263,115 @@ static long check_and_migrate_cma_pages(unsigned long start, long nr_pages,
 	return nr_pages;
 }
 #else
-static inline long check_and_migrate_cma_pages(unsigned long start, long nr_pages,
-					       unsigned int gup_flags,
-					       struct page **pages,
-					       struct vm_area_struct **vmas)
+static long check_and_migrate_cma_pages(struct task_struct *tsk,
+					struct mm_struct *mm,
+					unsigned long start,
+					unsigned long nr_pages,
+					struct page **pages,
+					struct vm_area_struct **vmas,
+					unsigned int gup_flags)
 {
 	return nr_pages;
 }
 #endif
 
 /*
- * This is the same as get_user_pages() in that it assumes we are
- * operating on the current task's mm, but it goes further to validate
- * that the vmas associated with the address range are suitable for
- * longterm elevated page reference counts. For example, filesystem-dax
- * mappings are subject to the lifetime enforced by the filesystem and
- * we need guarantees that longterm users like RDMA and V4L2 only
- * establish mappings that have a kernel enforced revocation mechanism.
+ * __gup_longterm_locked() is a wrapper for __get_uer_pages_locked which
+ * allows us to process the FOLL_LONGTERM flag if present.
+ *
+ * FOLL_LONGTERM Checks for either DAX VMAs or PPC CMA regions and either fails
+ * the pin or attempts to migrate the page as appropriate.
+ *
+ * In the filesystem-dax case mappings are subject to the lifetime enforced by
+ * the filesystem and we need guarantees that longterm users like RDMA and V4L2
+ * only establish mappings that have a kernel enforced revocation mechanism.
+ *
+ * In the CMA case pages can't be pinned in a CMA region as this would
+ * unnecessarily fragment that region.  So CMA attempts to migrate the page
+ * before pinning.
  *
  * "longterm" == userspace controlled elevated page count lifetime.
  * Contrast this to iov_iter_get_pages() usages which are transient.
  */
-long get_user_pages_longterm(unsigned long start, unsigned long nr_pages,
-			     unsigned int gup_flags, struct page **pages,
-			     struct vm_area_struct **vmas_arg)
+static __always_inline long __gup_longterm_locked(struct task_struct *tsk,
+						  struct mm_struct *mm,
+						  unsigned long start,
+						  unsigned long nr_pages,
+						  struct page **pages,
+						  struct vm_area_struct **vmas,
+						  unsigned int gup_flags)
 {
-	struct vm_area_struct **vmas = vmas_arg;
-	unsigned long flags;
+	struct vm_area_struct **vmas_tmp = vmas;
+	unsigned long flags = 0;
 	long rc, i;
 
-	if (!pages)
-		return -EINVAL;
-
-	if (!vmas) {
-		vmas = kcalloc(nr_pages, sizeof(struct vm_area_struct *),
-			       GFP_KERNEL);
-		if (!vmas)
-			return -ENOMEM;
+	if (gup_flags & FOLL_LONGTERM) {
+		if (!pages)
+			return -EINVAL;
+
+		if (!vmas_tmp) {
+			vmas_tmp = kcalloc(nr_pages,
+					   sizeof(struct vm_area_struct *),
+					   GFP_KERNEL);
+			if (!vmas_tmp)
+				return -ENOMEM;
+		}
+		flags = memalloc_nocma_save();
 	}
 
-	flags = memalloc_nocma_save();
-	rc = get_user_pages(start, nr_pages, gup_flags, pages, vmas);
-	memalloc_nocma_restore(flags);
-	if (rc < 0)
-		goto out;
+	rc = __get_user_pages_locked(tsk, mm, start, nr_pages, pages,
+				     vmas_tmp, NULL, gup_flags);
 
-	if (check_dax_vmas(vmas, rc)) {
-		for (i = 0; i < rc; i++)
-			put_page(pages[i]);
-		rc = -EOPNOTSUPP;
-		goto out;
+	if (gup_flags & FOLL_LONGTERM) {
+		memalloc_nocma_restore(flags);
+		if (rc < 0)
+			goto out;
+
+		if (check_dax_vmas(vmas_tmp, rc)) {
+			for (i = 0; i < rc; i++)
+				put_page(pages[i]);
+			rc = -EOPNOTSUPP;
+			goto out;
+		}
+
+		rc = check_and_migrate_cma_pages(tsk, mm, start, rc, pages,
+						 vmas_tmp, gup_flags);
 	}
 
-	rc = check_and_migrate_cma_pages(start, rc, gup_flags, pages, vmas);
 out:
-	if (vmas != vmas_arg)
-		kfree(vmas);
+	if (vmas_tmp != vmas)
+		kfree(vmas_tmp);
 	return rc;
 }
-EXPORT_SYMBOL(get_user_pages_longterm);
-#endif /* CONFIG_FS_DAX */
+#else /* !CONFIG_FS_DAX && !CONFIG_CMA */
+static __always_inline long __gup_longterm_locked(struct task_struct *tsk,
+						  struct mm_struct *mm,
+						  unsigned long start,
+						  unsigned long nr_pages,
+						  struct page **pages,
+						  struct vm_area_struct **vmas,
+						  unsigned int flags)
+{
+	return __get_user_pages_locked(tsk, mm, start, nr_pages, pages, vmas,
+				       NULL, flags);
+}
+#endif /* CONFIG_FS_DAX || CONFIG_CMA */
+
+/*
+ * This is the same as get_user_pages_remote(), just with a
+ * less-flexible calling convention where we assume that the task
+ * and mm being operated on are the current task's and don't allow
+ * passing of a locked parameter.  We also obviously don't pass
+ * FOLL_REMOTE in here.
+ */
+long get_user_pages(unsigned long start, unsigned long nr_pages,
+		unsigned int gup_flags, struct page **pages,
+		struct vm_area_struct **vmas)
+{
+	return __gup_longterm_locked(current, current->mm, start, nr_pages,
+				     pages, vmas, gup_flags | FOLL_TOUCH);
+}
+EXPORT_SYMBOL(get_user_pages);
 
 /**
  * populate_vma_page_range() -  populate a range of pages in the vma.
diff --git a/mm/gup_benchmark.c b/mm/gup_benchmark.c
index 6c0279e70cc4..7dd602d7f8db 100644
--- a/mm/gup_benchmark.c
+++ b/mm/gup_benchmark.c
@@ -54,8 +54,9 @@ static int __gup_benchmark_ioctl(unsigned int cmd,
 						 pages + i);
 			break;
 		case GUP_LONGTERM_BENCHMARK:
-			nr = get_user_pages_longterm(addr, nr, gup->flags & 1,
-						     pages + i, NULL);
+			nr = get_user_pages(addr, nr,
+					    (gup->flags & 1) | FOLL_LONGTERM,
+					    pages + i, NULL);
 			break;
 		case GUP_BENCHMARK:
 			nr = get_user_pages(addr, nr, gup->flags & 1, pages + i,
diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
index 77520eacee8f..ab489454a63e 100644
--- a/net/xdp/xdp_umem.c
+++ b/net/xdp/xdp_umem.c
@@ -267,8 +267,8 @@ static int xdp_umem_pin_pages(struct xdp_umem *umem)
 		return -ENOMEM;
 
 	down_read(&current->mm->mmap_sem);
-	npgs = get_user_pages_longterm(umem->address, umem->npgs,
-				       gup_flags, &umem->pgs[0], NULL);
+	npgs = get_user_pages(umem->address, umem->npgs,
+			      gup_flags | FOLL_LONGTERM, &umem->pgs[0], NULL);
 	up_read(&current->mm->mmap_sem);
 
 	if (npgs != umem->npgs) {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RESEND 2/7] mm/gup: Change write parameter to flags in fast walk
  2019-03-17 18:34 [RESEND PATCH 0/7] Add FOLL_LONGTERM to GUP fast and use it ira.weiny
  2019-03-17 18:34 ` [RESEND 1/7] mm/gup: Replace get_user_pages_longterm() with FOLL_LONGTERM ira.weiny
@ 2019-03-17 18:34 ` ira.weiny
  2019-03-22 21:30   ` Dan Williams
  2019-03-17 18:34 ` [RESEND 3/7] mm/gup: Change GUP fast to use flags rather than a write 'bool' ira.weiny
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 29+ messages in thread
From: ira.weiny @ 2019-03-17 18:34 UTC (permalink / raw)
  To: Andrew Morton, John Hubbard, Michal Hocko, Kirill A. Shutemov,
	Peter Zijlstra, Jason Gunthorpe, Benjamin Herrenschmidt,
	Paul Mackerras, David S. Miller, Martin Schwidefsky,
	Heiko Carstens, Rich Felker, Yoshinori Sato, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Ralf Baechle, James Hogan
  Cc: Ira Weiny, linux-mm, linux-kernel, linux-mips, linuxppc-dev,
	linux-s390, linux-sh, sparclinux, linux-rdma, netdev

From: Ira Weiny <ira.weiny@intel.com>

In order to support more options in the GUP fast walk, change
the write parameter to flags throughout the call stack.

This patch does not change functionality and passes FOLL_WRITE
where write was previously used.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>
---
 mm/gup.c | 52 ++++++++++++++++++++++++++--------------------------
 1 file changed, 26 insertions(+), 26 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index 8cb4cff067bc..2b21eeaf8cc8 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1578,7 +1578,7 @@ static void undo_dev_pagemap(int *nr, int nr_start, struct page **pages)
 
 #ifdef CONFIG_ARCH_HAS_PTE_SPECIAL
 static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
-			 int write, struct page **pages, int *nr)
+			 unsigned int flags, struct page **pages, int *nr)
 {
 	struct dev_pagemap *pgmap = NULL;
 	int nr_start = *nr, ret = 0;
@@ -1596,7 +1596,7 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
 		if (pte_protnone(pte))
 			goto pte_unmap;
 
-		if (!pte_access_permitted(pte, write))
+		if (!pte_access_permitted(pte, flags & FOLL_WRITE))
 			goto pte_unmap;
 
 		if (pte_devmap(pte)) {
@@ -1648,7 +1648,7 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
  * useful to have gup_huge_pmd even if we can't operate on ptes.
  */
 static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
-			 int write, struct page **pages, int *nr)
+			 unsigned int flags, struct page **pages, int *nr)
 {
 	return 0;
 }
@@ -1731,12 +1731,12 @@ static int __gup_device_huge_pud(pud_t pud, pud_t *pudp, unsigned long addr,
 #endif
 
 static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
-		unsigned long end, int write, struct page **pages, int *nr)
+		unsigned long end, unsigned int flags, struct page **pages, int *nr)
 {
 	struct page *head, *page;
 	int refs;
 
-	if (!pmd_access_permitted(orig, write))
+	if (!pmd_access_permitted(orig, flags & FOLL_WRITE))
 		return 0;
 
 	if (pmd_devmap(orig))
@@ -1769,12 +1769,12 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
 }
 
 static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
-		unsigned long end, int write, struct page **pages, int *nr)
+		unsigned long end, unsigned int flags, struct page **pages, int *nr)
 {
 	struct page *head, *page;
 	int refs;
 
-	if (!pud_access_permitted(orig, write))
+	if (!pud_access_permitted(orig, flags & FOLL_WRITE))
 		return 0;
 
 	if (pud_devmap(orig))
@@ -1807,13 +1807,13 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
 }
 
 static int gup_huge_pgd(pgd_t orig, pgd_t *pgdp, unsigned long addr,
-			unsigned long end, int write,
+			unsigned long end, unsigned int flags,
 			struct page **pages, int *nr)
 {
 	int refs;
 	struct page *head, *page;
 
-	if (!pgd_access_permitted(orig, write))
+	if (!pgd_access_permitted(orig, flags & FOLL_WRITE))
 		return 0;
 
 	BUILD_BUG_ON(pgd_devmap(orig));
@@ -1844,7 +1844,7 @@ static int gup_huge_pgd(pgd_t orig, pgd_t *pgdp, unsigned long addr,
 }
 
 static int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end,
-		int write, struct page **pages, int *nr)
+		unsigned int flags, struct page **pages, int *nr)
 {
 	unsigned long next;
 	pmd_t *pmdp;
@@ -1867,7 +1867,7 @@ static int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end,
 			if (pmd_protnone(pmd))
 				return 0;
 
-			if (!gup_huge_pmd(pmd, pmdp, addr, next, write,
+			if (!gup_huge_pmd(pmd, pmdp, addr, next, flags,
 				pages, nr))
 				return 0;
 
@@ -1877,9 +1877,9 @@ static int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end,
 			 * pmd format and THP pmd format
 			 */
 			if (!gup_huge_pd(__hugepd(pmd_val(pmd)), addr,
-					 PMD_SHIFT, next, write, pages, nr))
+					 PMD_SHIFT, next, flags, pages, nr))
 				return 0;
-		} else if (!gup_pte_range(pmd, addr, next, write, pages, nr))
+		} else if (!gup_pte_range(pmd, addr, next, flags, pages, nr))
 			return 0;
 	} while (pmdp++, addr = next, addr != end);
 
@@ -1887,7 +1887,7 @@ static int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end,
 }
 
 static int gup_pud_range(p4d_t p4d, unsigned long addr, unsigned long end,
-			 int write, struct page **pages, int *nr)
+			 unsigned int flags, struct page **pages, int *nr)
 {
 	unsigned long next;
 	pud_t *pudp;
@@ -1900,14 +1900,14 @@ static int gup_pud_range(p4d_t p4d, unsigned long addr, unsigned long end,
 		if (pud_none(pud))
 			return 0;
 		if (unlikely(pud_huge(pud))) {
-			if (!gup_huge_pud(pud, pudp, addr, next, write,
+			if (!gup_huge_pud(pud, pudp, addr, next, flags,
 					  pages, nr))
 				return 0;
 		} else if (unlikely(is_hugepd(__hugepd(pud_val(pud))))) {
 			if (!gup_huge_pd(__hugepd(pud_val(pud)), addr,
-					 PUD_SHIFT, next, write, pages, nr))
+					 PUD_SHIFT, next, flags, pages, nr))
 				return 0;
-		} else if (!gup_pmd_range(pud, addr, next, write, pages, nr))
+		} else if (!gup_pmd_range(pud, addr, next, flags, pages, nr))
 			return 0;
 	} while (pudp++, addr = next, addr != end);
 
@@ -1915,7 +1915,7 @@ static int gup_pud_range(p4d_t p4d, unsigned long addr, unsigned long end,
 }
 
 static int gup_p4d_range(pgd_t pgd, unsigned long addr, unsigned long end,
-			 int write, struct page **pages, int *nr)
+			 unsigned int flags, struct page **pages, int *nr)
 {
 	unsigned long next;
 	p4d_t *p4dp;
@@ -1930,9 +1930,9 @@ static int gup_p4d_range(pgd_t pgd, unsigned long addr, unsigned long end,
 		BUILD_BUG_ON(p4d_huge(p4d));
 		if (unlikely(is_hugepd(__hugepd(p4d_val(p4d))))) {
 			if (!gup_huge_pd(__hugepd(p4d_val(p4d)), addr,
-					 P4D_SHIFT, next, write, pages, nr))
+					 P4D_SHIFT, next, flags, pages, nr))
 				return 0;
-		} else if (!gup_pud_range(p4d, addr, next, write, pages, nr))
+		} else if (!gup_pud_range(p4d, addr, next, flags, pages, nr))
 			return 0;
 	} while (p4dp++, addr = next, addr != end);
 
@@ -1940,7 +1940,7 @@ static int gup_p4d_range(pgd_t pgd, unsigned long addr, unsigned long end,
 }
 
 static void gup_pgd_range(unsigned long addr, unsigned long end,
-		int write, struct page **pages, int *nr)
+		unsigned int flags, struct page **pages, int *nr)
 {
 	unsigned long next;
 	pgd_t *pgdp;
@@ -1953,14 +1953,14 @@ static void gup_pgd_range(unsigned long addr, unsigned long end,
 		if (pgd_none(pgd))
 			return;
 		if (unlikely(pgd_huge(pgd))) {
-			if (!gup_huge_pgd(pgd, pgdp, addr, next, write,
+			if (!gup_huge_pgd(pgd, pgdp, addr, next, flags,
 					  pages, nr))
 				return;
 		} else if (unlikely(is_hugepd(__hugepd(pgd_val(pgd))))) {
 			if (!gup_huge_pd(__hugepd(pgd_val(pgd)), addr,
-					 PGDIR_SHIFT, next, write, pages, nr))
+					 PGDIR_SHIFT, next, flags, pages, nr))
 				return;
-		} else if (!gup_p4d_range(pgd, addr, next, write, pages, nr))
+		} else if (!gup_p4d_range(pgd, addr, next, flags, pages, nr))
 			return;
 	} while (pgdp++, addr = next, addr != end);
 }
@@ -2014,7 +2014,7 @@ int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
 
 	if (gup_fast_permitted(start, nr_pages)) {
 		local_irq_save(flags);
-		gup_pgd_range(start, end, write, pages, &nr);
+		gup_pgd_range(start, end, write ? FOLL_WRITE : 0, pages, &nr);
 		local_irq_restore(flags);
 	}
 
@@ -2056,7 +2056,7 @@ int get_user_pages_fast(unsigned long start, int nr_pages, int write,
 
 	if (gup_fast_permitted(start, nr_pages)) {
 		local_irq_disable();
-		gup_pgd_range(addr, end, write, pages, &nr);
+		gup_pgd_range(addr, end, write ? FOLL_WRITE : 0, pages, &nr);
 		local_irq_enable();
 		ret = nr;
 	}
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RESEND 3/7] mm/gup: Change GUP fast to use flags rather than a write 'bool'
  2019-03-17 18:34 [RESEND PATCH 0/7] Add FOLL_LONGTERM to GUP fast and use it ira.weiny
  2019-03-17 18:34 ` [RESEND 1/7] mm/gup: Replace get_user_pages_longterm() with FOLL_LONGTERM ira.weiny
  2019-03-17 18:34 ` [RESEND 2/7] mm/gup: Change write parameter to flags in fast walk ira.weiny
@ 2019-03-17 18:34 ` ira.weiny
  2019-03-22 22:05   ` Dan Williams
  2019-03-17 18:34 ` [RESEND 4/7] mm/gup: Add FOLL_LONGTERM capability to GUP fast ira.weiny
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 29+ messages in thread
From: ira.weiny @ 2019-03-17 18:34 UTC (permalink / raw)
  To: Andrew Morton, John Hubbard, Michal Hocko, Kirill A. Shutemov,
	Peter Zijlstra, Jason Gunthorpe, Benjamin Herrenschmidt,
	Paul Mackerras, David S. Miller, Martin Schwidefsky,
	Heiko Carstens, Rich Felker, Yoshinori Sato, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Ralf Baechle, James Hogan
  Cc: Ira Weiny, linux-mm, linux-kernel, linux-mips, linuxppc-dev,
	linux-s390, linux-sh, sparclinux, linux-rdma, netdev

From: Ira Weiny <ira.weiny@intel.com>

To facilitate additional options to get_user_pages_fast() change the
singular write parameter to be gup_flags.

This patch does not change any functionality.  New functionality will
follow in subsequent patches.

Some of the get_user_pages_fast() call sites were unchanged because they
already passed FOLL_WRITE or 0 for the write parameter.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes from V1:
	Rebase to current merge tree
	arch/powerpc/mm/mmu_context_iommu.c no longer calls gup_fast
		The gup_longterm was converted in patch 1

 arch/mips/mm/gup.c                         | 11 ++++++-----
 arch/powerpc/kvm/book3s_64_mmu_hv.c        |  4 ++--
 arch/powerpc/kvm/e500_mmu.c                |  2 +-
 arch/s390/kvm/interrupt.c                  |  2 +-
 arch/s390/mm/gup.c                         | 12 ++++++------
 arch/sh/mm/gup.c                           | 11 ++++++-----
 arch/sparc/mm/gup.c                        |  9 +++++----
 arch/x86/kvm/paging_tmpl.h                 |  2 +-
 arch/x86/kvm/svm.c                         |  2 +-
 drivers/fpga/dfl-afu-dma-region.c          |  2 +-
 drivers/gpu/drm/via/via_dmablit.c          |  3 ++-
 drivers/infiniband/hw/hfi1/user_pages.c    |  3 ++-
 drivers/misc/genwqe/card_utils.c           |  2 +-
 drivers/misc/vmw_vmci/vmci_host.c          |  2 +-
 drivers/misc/vmw_vmci/vmci_queue_pair.c    |  6 ++++--
 drivers/platform/goldfish/goldfish_pipe.c  |  3 ++-
 drivers/rapidio/devices/rio_mport_cdev.c   |  4 +++-
 drivers/sbus/char/oradax.c                 |  2 +-
 drivers/scsi/st.c                          |  3 ++-
 drivers/staging/gasket/gasket_page_table.c |  4 ++--
 drivers/tee/tee_shm.c                      |  2 +-
 drivers/vfio/vfio_iommu_spapr_tce.c        |  3 ++-
 drivers/vhost/vhost.c                      |  2 +-
 drivers/video/fbdev/pvr2fb.c               |  2 +-
 drivers/virt/fsl_hypervisor.c              |  2 +-
 drivers/xen/gntdev.c                       |  2 +-
 fs/orangefs/orangefs-bufmap.c              |  2 +-
 include/linux/mm.h                         |  4 ++--
 kernel/futex.c                             |  2 +-
 lib/iov_iter.c                             |  7 +++++--
 mm/gup.c                                   | 10 +++++-----
 mm/util.c                                  |  8 ++++----
 net/ceph/pagevec.c                         |  2 +-
 net/rds/info.c                             |  2 +-
 net/rds/rdma.c                             |  3 ++-
 35 files changed, 79 insertions(+), 63 deletions(-)

diff --git a/arch/mips/mm/gup.c b/arch/mips/mm/gup.c
index 0d14e0d8eacf..4c2b4483683c 100644
--- a/arch/mips/mm/gup.c
+++ b/arch/mips/mm/gup.c
@@ -235,7 +235,7 @@ int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
  * get_user_pages_fast() - pin user pages in memory
  * @start:	starting user address
  * @nr_pages:	number of pages from start to pin
- * @write:	whether pages will be written to
+ * @gup_flags:	flags modifying pin behaviour
  * @pages:	array that receives pointers to the pages pinned.
  *		Should be at least nr_pages long.
  *
@@ -247,8 +247,8 @@ int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
  * requested. If nr_pages is 0 or negative, returns 0. If no pages
  * were pinned, returns -errno.
  */
-int get_user_pages_fast(unsigned long start, int nr_pages, int write,
-			struct page **pages)
+int get_user_pages_fast(unsigned long start, int nr_pages,
+			unsigned int gup_flags, struct page **pages)
 {
 	struct mm_struct *mm = current->mm;
 	unsigned long addr, len, end;
@@ -273,7 +273,8 @@ int get_user_pages_fast(unsigned long start, int nr_pages, int write,
 		next = pgd_addr_end(addr, end);
 		if (pgd_none(pgd))
 			goto slow;
-		if (!gup_pud_range(pgd, addr, next, write, pages, &nr))
+		if (!gup_pud_range(pgd, addr, next, gup_flags & FOLL_WRITE,
+				   pages, &nr))
 			goto slow;
 	} while (pgdp++, addr = next, addr != end);
 	local_irq_enable();
@@ -289,7 +290,7 @@ int get_user_pages_fast(unsigned long start, int nr_pages, int write,
 	pages += nr;
 
 	ret = get_user_pages_unlocked(start, (end - start) >> PAGE_SHIFT,
-				      pages, write ? FOLL_WRITE : 0);
+				      pages, gup_flags);
 
 	/* Have to be a bit careful with return values */
 	if (nr > 0) {
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index be7bc070eae5..ab3d484c5e2e 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -600,7 +600,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
 	/* If writing != 0, then the HPTE must allow writing, if we get here */
 	write_ok = writing;
 	hva = gfn_to_hva_memslot(memslot, gfn);
-	npages = get_user_pages_fast(hva, 1, writing, pages);
+	npages = get_user_pages_fast(hva, 1, writing ? FOLL_WRITE : 0, pages);
 	if (npages < 1) {
 		/* Check if it's an I/O mapping */
 		down_read(&current->mm->mmap_sem);
@@ -1193,7 +1193,7 @@ void *kvmppc_pin_guest_page(struct kvm *kvm, unsigned long gpa,
 	if (!memslot || (memslot->flags & KVM_MEMSLOT_INVALID))
 		goto err;
 	hva = gfn_to_hva_memslot(memslot, gfn);
-	npages = get_user_pages_fast(hva, 1, 1, pages);
+	npages = get_user_pages_fast(hva, 1, FOLL_WRITE, pages);
 	if (npages < 1)
 		goto err;
 	page = pages[0];
diff --git a/arch/powerpc/kvm/e500_mmu.c b/arch/powerpc/kvm/e500_mmu.c
index 24296f4cadc6..e0af53fd78c5 100644
--- a/arch/powerpc/kvm/e500_mmu.c
+++ b/arch/powerpc/kvm/e500_mmu.c
@@ -783,7 +783,7 @@ int kvm_vcpu_ioctl_config_tlb(struct kvm_vcpu *vcpu,
 	if (!pages)
 		return -ENOMEM;
 
-	ret = get_user_pages_fast(cfg->array, num_pages, 1, pages);
+	ret = get_user_pages_fast(cfg->array, num_pages, FOLL_WRITE, pages);
 	if (ret < 0)
 		goto free_pages;
 
diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
index 82162867f378..95ab979541be 100644
--- a/arch/s390/kvm/interrupt.c
+++ b/arch/s390/kvm/interrupt.c
@@ -2376,7 +2376,7 @@ static int kvm_s390_adapter_map(struct kvm *kvm, unsigned int id, __u64 addr)
 		ret = -EFAULT;
 		goto out;
 	}
-	ret = get_user_pages_fast(map->addr, 1, 1, &map->page);
+	ret = get_user_pages_fast(map->addr, 1, FOLL_WRITE, &map->page);
 	if (ret < 0)
 		goto out;
 	BUG_ON(ret != 1);
diff --git a/arch/s390/mm/gup.c b/arch/s390/mm/gup.c
index 2809d11c7a28..0a6faf3d9960 100644
--- a/arch/s390/mm/gup.c
+++ b/arch/s390/mm/gup.c
@@ -265,7 +265,7 @@ int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
  * get_user_pages_fast() - pin user pages in memory
  * @start:	starting user address
  * @nr_pages:	number of pages from start to pin
- * @write:	whether pages will be written to
+ * @gup_flags:	flags modifying pin behaviour
  * @pages:	array that receives pointers to the pages pinned.
  *		Should be at least nr_pages long.
  *
@@ -277,22 +277,22 @@ int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
  * requested. If nr_pages is 0 or negative, returns 0. If no pages
  * were pinned, returns -errno.
  */
-int get_user_pages_fast(unsigned long start, int nr_pages, int write,
-			struct page **pages)
+int get_user_pages_fast(unsigned long start, int nr_pages,
+			unsigned int gup_flags, struct page **pages)
 {
 	int nr, ret;
 
 	might_sleep();
 	start &= PAGE_MASK;
-	nr = __get_user_pages_fast(start, nr_pages, write, pages);
+	nr = __get_user_pages_fast(start, nr_pages, gup_flags & FOLL_WRITE,
+				   pages);
 	if (nr == nr_pages)
 		return nr;
 
 	/* Try to get the remaining pages with get_user_pages */
 	start += nr << PAGE_SHIFT;
 	pages += nr;
-	ret = get_user_pages_unlocked(start, nr_pages - nr, pages,
-				      write ? FOLL_WRITE : 0);
+	ret = get_user_pages_unlocked(start, nr_pages - nr, pages, gup_flags);
 	/* Have to be a bit careful with return values */
 	if (nr > 0)
 		ret = (ret < 0) ? nr : ret + nr;
diff --git a/arch/sh/mm/gup.c b/arch/sh/mm/gup.c
index 3e27f6d1f1ec..277c882f7489 100644
--- a/arch/sh/mm/gup.c
+++ b/arch/sh/mm/gup.c
@@ -204,7 +204,7 @@ int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
  * get_user_pages_fast() - pin user pages in memory
  * @start:	starting user address
  * @nr_pages:	number of pages from start to pin
- * @write:	whether pages will be written to
+ * @gup_flags:	flags modifying pin behaviour
  * @pages:	array that receives pointers to the pages pinned.
  *		Should be at least nr_pages long.
  *
@@ -216,8 +216,8 @@ int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
  * requested. If nr_pages is 0 or negative, returns 0. If no pages
  * were pinned, returns -errno.
  */
-int get_user_pages_fast(unsigned long start, int nr_pages, int write,
-			struct page **pages)
+int get_user_pages_fast(unsigned long start, int nr_pages,
+			unsigned int gup_flags, struct page **pages)
 {
 	struct mm_struct *mm = current->mm;
 	unsigned long addr, len, end;
@@ -241,7 +241,8 @@ int get_user_pages_fast(unsigned long start, int nr_pages, int write,
 		next = pgd_addr_end(addr, end);
 		if (pgd_none(pgd))
 			goto slow;
-		if (!gup_pud_range(pgd, addr, next, write, pages, &nr))
+		if (!gup_pud_range(pgd, addr, next, gup_flags & FOLL_WRITE,
+				   pages, &nr))
 			goto slow;
 	} while (pgdp++, addr = next, addr != end);
 	local_irq_enable();
@@ -261,7 +262,7 @@ int get_user_pages_fast(unsigned long start, int nr_pages, int write,
 
 		ret = get_user_pages_unlocked(start,
 			(end - start) >> PAGE_SHIFT, pages,
-			write ? FOLL_WRITE : 0);
+			gup_flags);
 
 		/* Have to be a bit careful with return values */
 		if (nr > 0) {
diff --git a/arch/sparc/mm/gup.c b/arch/sparc/mm/gup.c
index aee6dba83d0e..1e770a517d4a 100644
--- a/arch/sparc/mm/gup.c
+++ b/arch/sparc/mm/gup.c
@@ -245,8 +245,8 @@ int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
 	return nr;
 }
 
-int get_user_pages_fast(unsigned long start, int nr_pages, int write,
-			struct page **pages)
+int get_user_pages_fast(unsigned long start, int nr_pages,
+			unsigned int gup_flags, struct page **pages)
 {
 	struct mm_struct *mm = current->mm;
 	unsigned long addr, len, end;
@@ -303,7 +303,8 @@ int get_user_pages_fast(unsigned long start, int nr_pages, int write,
 		next = pgd_addr_end(addr, end);
 		if (pgd_none(pgd))
 			goto slow;
-		if (!gup_pud_range(pgd, addr, next, write, pages, &nr))
+		if (!gup_pud_range(pgd, addr, next, gup_flags & FOLL_WRITE,
+				   pages, &nr))
 			goto slow;
 	} while (pgdp++, addr = next, addr != end);
 
@@ -324,7 +325,7 @@ int get_user_pages_fast(unsigned long start, int nr_pages, int write,
 
 		ret = get_user_pages_unlocked(start,
 			(end - start) >> PAGE_SHIFT, pages,
-			write ? FOLL_WRITE : 0);
+			gup_flags);
 
 		/* Have to be a bit careful with return values */
 		if (nr > 0) {
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 6bdca39829bc..08715034e315 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -140,7 +140,7 @@ static int FNAME(cmpxchg_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 	pt_element_t *table;
 	struct page *page;
 
-	npages = get_user_pages_fast((unsigned long)ptep_user, 1, 1, &page);
+	npages = get_user_pages_fast((unsigned long)ptep_user, 1, FOLL_WRITE, &page);
 	/* Check if the user is doing something meaningless. */
 	if (unlikely(npages != 1))
 		return -EFAULT;
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index b5b128a0a051..9d8c8d108d55 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1804,7 +1804,7 @@ static struct page **sev_pin_memory(struct kvm *kvm, unsigned long uaddr,
 		return NULL;
 
 	/* Pin the user virtual address. */
-	npinned = get_user_pages_fast(uaddr, npages, write ? FOLL_WRITE : 0, pages);
+	npinned = get_user_pages_fast(uaddr, npages, FOLL_WRITE, pages);
 	if (npinned != npages) {
 		pr_err("SEV: Failure locking %lu pages.\n", npages);
 		goto err;
diff --git a/drivers/fpga/dfl-afu-dma-region.c b/drivers/fpga/dfl-afu-dma-region.c
index e18a786fc943..c438722bf4e1 100644
--- a/drivers/fpga/dfl-afu-dma-region.c
+++ b/drivers/fpga/dfl-afu-dma-region.c
@@ -102,7 +102,7 @@ static int afu_dma_pin_pages(struct dfl_feature_platform_data *pdata,
 		goto unlock_vm;
 	}
 
-	pinned = get_user_pages_fast(region->user_addr, npages, 1,
+	pinned = get_user_pages_fast(region->user_addr, npages, FOLL_WRITE,
 				     region->pages);
 	if (pinned < 0) {
 		ret = pinned;
diff --git a/drivers/gpu/drm/via/via_dmablit.c b/drivers/gpu/drm/via/via_dmablit.c
index 8bf3a7c23ed3..062067438f1d 100644
--- a/drivers/gpu/drm/via/via_dmablit.c
+++ b/drivers/gpu/drm/via/via_dmablit.c
@@ -243,7 +243,8 @@ via_lock_all_dma_pages(drm_via_sg_info_t *vsg,  drm_via_dmablit_t *xfer)
 	if (NULL == vsg->pages)
 		return -ENOMEM;
 	ret = get_user_pages_fast((unsigned long)xfer->mem_addr,
-			vsg->num_pages, vsg->direction == DMA_FROM_DEVICE,
+			vsg->num_pages,
+			vsg->direction == DMA_FROM_DEVICE ? FOLL_WRITE : 0,
 			vsg->pages);
 	if (ret != vsg->num_pages) {
 		if (ret < 0)
diff --git a/drivers/infiniband/hw/hfi1/user_pages.c b/drivers/infiniband/hw/hfi1/user_pages.c
index 24b592c6522e..78ccacaf97d0 100644
--- a/drivers/infiniband/hw/hfi1/user_pages.c
+++ b/drivers/infiniband/hw/hfi1/user_pages.c
@@ -105,7 +105,8 @@ int hfi1_acquire_user_pages(struct mm_struct *mm, unsigned long vaddr, size_t np
 {
 	int ret;
 
-	ret = get_user_pages_fast(vaddr, npages, writable, pages);
+	ret = get_user_pages_fast(vaddr, npages, writable ? FOLL_WRITE : 0,
+				  pages);
 	if (ret < 0)
 		return ret;
 
diff --git a/drivers/misc/genwqe/card_utils.c b/drivers/misc/genwqe/card_utils.c
index 25265fd0fd6e..89cff9d1012b 100644
--- a/drivers/misc/genwqe/card_utils.c
+++ b/drivers/misc/genwqe/card_utils.c
@@ -603,7 +603,7 @@ int genwqe_user_vmap(struct genwqe_dev *cd, struct dma_mapping *m, void *uaddr,
 	/* pin user pages in memory */
 	rc = get_user_pages_fast(data & PAGE_MASK, /* page aligned addr */
 				 m->nr_pages,
-				 m->write,		/* readable/writable */
+				 m->write ? FOLL_WRITE : 0,	/* readable/writable */
 				 m->page_list);	/* ptrs to pages */
 	if (rc < 0)
 		goto fail_get_user_pages;
diff --git a/drivers/misc/vmw_vmci/vmci_host.c b/drivers/misc/vmw_vmci/vmci_host.c
index 997f92543dd4..422d08da3244 100644
--- a/drivers/misc/vmw_vmci/vmci_host.c
+++ b/drivers/misc/vmw_vmci/vmci_host.c
@@ -242,7 +242,7 @@ static int vmci_host_setup_notify(struct vmci_ctx *context,
 	/*
 	 * Lock physical page backing a given user VA.
 	 */
-	retval = get_user_pages_fast(uva, 1, 1, &context->notify_page);
+	retval = get_user_pages_fast(uva, 1, FOLL_WRITE, &context->notify_page);
 	if (retval != 1) {
 		context->notify_page = NULL;
 		return VMCI_ERROR_GENERIC;
diff --git a/drivers/misc/vmw_vmci/vmci_queue_pair.c b/drivers/misc/vmw_vmci/vmci_queue_pair.c
index f5f1aac9d163..1174735f003d 100644
--- a/drivers/misc/vmw_vmci/vmci_queue_pair.c
+++ b/drivers/misc/vmw_vmci/vmci_queue_pair.c
@@ -659,7 +659,8 @@ static int qp_host_get_user_memory(u64 produce_uva,
 	int err = VMCI_SUCCESS;
 
 	retval = get_user_pages_fast((uintptr_t) produce_uva,
-				     produce_q->kernel_if->num_pages, 1,
+				     produce_q->kernel_if->num_pages,
+				     FOLL_WRITE,
 				     produce_q->kernel_if->u.h.header_page);
 	if (retval < (int)produce_q->kernel_if->num_pages) {
 		pr_debug("get_user_pages_fast(produce) failed (retval=%d)",
@@ -671,7 +672,8 @@ static int qp_host_get_user_memory(u64 produce_uva,
 	}
 
 	retval = get_user_pages_fast((uintptr_t) consume_uva,
-				     consume_q->kernel_if->num_pages, 1,
+				     consume_q->kernel_if->num_pages,
+				     FOLL_WRITE,
 				     consume_q->kernel_if->u.h.header_page);
 	if (retval < (int)consume_q->kernel_if->num_pages) {
 		pr_debug("get_user_pages_fast(consume) failed (retval=%d)",
diff --git a/drivers/platform/goldfish/goldfish_pipe.c b/drivers/platform/goldfish/goldfish_pipe.c
index 321bc673c417..cef0133aa47a 100644
--- a/drivers/platform/goldfish/goldfish_pipe.c
+++ b/drivers/platform/goldfish/goldfish_pipe.c
@@ -274,7 +274,8 @@ static int pin_user_pages(unsigned long first_page,
 		*iter_last_page_size = last_page_size;
 	}
 
-	ret = get_user_pages_fast(first_page, requested_pages, !is_write,
+	ret = get_user_pages_fast(first_page, requested_pages,
+				  !is_write ? FOLL_WRITE : 0,
 				  pages);
 	if (ret <= 0)
 		return -EFAULT;
diff --git a/drivers/rapidio/devices/rio_mport_cdev.c b/drivers/rapidio/devices/rio_mport_cdev.c
index 1e1f42e210a0..4a4a75fa26d5 100644
--- a/drivers/rapidio/devices/rio_mport_cdev.c
+++ b/drivers/rapidio/devices/rio_mport_cdev.c
@@ -868,7 +868,9 @@ rio_dma_transfer(struct file *filp, u32 transfer_mode,
 
 		pinned = get_user_pages_fast(
 				(unsigned long)xfer->loc_addr & PAGE_MASK,
-				nr_pages, dir == DMA_FROM_DEVICE, page_list);
+				nr_pages,
+				dir == DMA_FROM_DEVICE ? FOLL_WRITE : 0,
+				page_list);
 
 		if (pinned != nr_pages) {
 			if (pinned < 0) {
diff --git a/drivers/sbus/char/oradax.c b/drivers/sbus/char/oradax.c
index 6516bc3cb58b..790aa148670d 100644
--- a/drivers/sbus/char/oradax.c
+++ b/drivers/sbus/char/oradax.c
@@ -437,7 +437,7 @@ static int dax_lock_page(void *va, struct page **p)
 
 	dax_dbg("uva %p", va);
 
-	ret = get_user_pages_fast((unsigned long)va, 1, 1, p);
+	ret = get_user_pages_fast((unsigned long)va, 1, FOLL_WRITE, p);
 	if (ret == 1) {
 		dax_dbg("locked page %p, for VA %p", *p, va);
 		return 0;
diff --git a/drivers/scsi/st.c b/drivers/scsi/st.c
index 19c022e66d63..3c6a18ad9a87 100644
--- a/drivers/scsi/st.c
+++ b/drivers/scsi/st.c
@@ -4922,7 +4922,8 @@ static int sgl_map_user_pages(struct st_buffer *STbp,
 
         /* Try to fault in all of the necessary pages */
         /* rw==READ means read from drive, write into memory area */
-	res = get_user_pages_fast(uaddr, nr_pages, rw == READ, pages);
+	res = get_user_pages_fast(uaddr, nr_pages, rw == READ ? FOLL_WRITE : 0,
+				  pages);
 
 	/* Errors and no page mapped should return here */
 	if (res < nr_pages)
diff --git a/drivers/staging/gasket/gasket_page_table.c b/drivers/staging/gasket/gasket_page_table.c
index 26755d9ca41d..f67fdf1d3817 100644
--- a/drivers/staging/gasket/gasket_page_table.c
+++ b/drivers/staging/gasket/gasket_page_table.c
@@ -486,8 +486,8 @@ static int gasket_perform_mapping(struct gasket_page_table *pg_tbl,
 			ptes[i].dma_addr = pg_tbl->coherent_pages[0].paddr +
 					   off + i * PAGE_SIZE;
 		} else {
-			ret = get_user_pages_fast(page_addr - offset, 1, 1,
-						  &page);
+			ret = get_user_pages_fast(page_addr - offset, 1,
+						  FOLL_WRITE, &page);
 
 			if (ret <= 0) {
 				dev_err(pg_tbl->device,
diff --git a/drivers/tee/tee_shm.c b/drivers/tee/tee_shm.c
index 0b9ab1d0dd45..49fd7312e2aa 100644
--- a/drivers/tee/tee_shm.c
+++ b/drivers/tee/tee_shm.c
@@ -273,7 +273,7 @@ struct tee_shm *tee_shm_register(struct tee_context *ctx, unsigned long addr,
 		goto err;
 	}
 
-	rc = get_user_pages_fast(start, num_pages, 1, shm->pages);
+	rc = get_user_pages_fast(start, num_pages, FOLL_WRITE, shm->pages);
 	if (rc > 0)
 		shm->num_pages = rc;
 	if (rc != num_pages) {
diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c
index 8dbb270998f4..089c59c50d97 100644
--- a/drivers/vfio/vfio_iommu_spapr_tce.c
+++ b/drivers/vfio/vfio_iommu_spapr_tce.c
@@ -532,7 +532,8 @@ static int tce_iommu_use_page(unsigned long tce, unsigned long *hpa)
 	enum dma_data_direction direction = iommu_tce_direction(tce);
 
 	if (get_user_pages_fast(tce & PAGE_MASK, 1,
-			direction != DMA_TO_DEVICE, &page) != 1)
+			direction != DMA_TO_DEVICE ? FOLL_WRITE : 0,
+			&page) != 1)
 		return -EFAULT;
 
 	*hpa = __pa((unsigned long) page_address(page));
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index a2e5dc7716e2..2da5969bccb9 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -1700,7 +1700,7 @@ static int set_bit_to_user(int nr, void __user *addr)
 	int bit = nr + (log % PAGE_SIZE) * 8;
 	int r;
 
-	r = get_user_pages_fast(log, 1, 1, &page);
+	r = get_user_pages_fast(log, 1, FOLL_WRITE, &page);
 	if (r < 0)
 		return r;
 	BUG_ON(r != 1);
diff --git a/drivers/video/fbdev/pvr2fb.c b/drivers/video/fbdev/pvr2fb.c
index 8a53d1de611d..41390c8e0f67 100644
--- a/drivers/video/fbdev/pvr2fb.c
+++ b/drivers/video/fbdev/pvr2fb.c
@@ -686,7 +686,7 @@ static ssize_t pvr2fb_write(struct fb_info *info, const char *buf,
 	if (!pages)
 		return -ENOMEM;
 
-	ret = get_user_pages_fast((unsigned long)buf, nr_pages, true, pages);
+	ret = get_user_pages_fast((unsigned long)buf, nr_pages, FOLL_WRITE, pages);
 	if (ret < nr_pages) {
 		nr_pages = ret;
 		ret = -EINVAL;
diff --git a/drivers/virt/fsl_hypervisor.c b/drivers/virt/fsl_hypervisor.c
index 8ba726e600e9..6446bcab4185 100644
--- a/drivers/virt/fsl_hypervisor.c
+++ b/drivers/virt/fsl_hypervisor.c
@@ -244,7 +244,7 @@ static long ioctl_memcpy(struct fsl_hv_ioctl_memcpy __user *p)
 
 	/* Get the physical addresses of the source buffer */
 	num_pinned = get_user_pages_fast(param.local_vaddr - lb_offset,
-		num_pages, param.source != -1, pages);
+		num_pages, param.source != -1 ? FOLL_WRITE : 0, pages);
 
 	if (num_pinned != num_pages) {
 		/* get_user_pages() failed */
diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
index 7cf9c51318aa..02bc815982d4 100644
--- a/drivers/xen/gntdev.c
+++ b/drivers/xen/gntdev.c
@@ -852,7 +852,7 @@ static int gntdev_get_page(struct gntdev_copy_batch *batch, void __user *virt,
 	unsigned long xen_pfn;
 	int ret;
 
-	ret = get_user_pages_fast(addr, 1, writeable, &page);
+	ret = get_user_pages_fast(addr, 1, writeable ? FOLL_WRITE : 0, &page);
 	if (ret < 0)
 		return ret;
 
diff --git a/fs/orangefs/orangefs-bufmap.c b/fs/orangefs/orangefs-bufmap.c
index 443bcd8c3c19..5a7c4fda682f 100644
--- a/fs/orangefs/orangefs-bufmap.c
+++ b/fs/orangefs/orangefs-bufmap.c
@@ -269,7 +269,7 @@ orangefs_bufmap_map(struct orangefs_bufmap *bufmap,
 
 	/* map the pages */
 	ret = get_user_pages_fast((unsigned long)user_desc->ptr,
-			     bufmap->page_count, 1, bufmap->page_array);
+			     bufmap->page_count, FOLL_WRITE, bufmap->page_array);
 
 	if (ret < 0)
 		return ret;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 6831077d126c..6607defbd8cc 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1531,8 +1531,8 @@ long get_user_pages_locked(unsigned long start, unsigned long nr_pages,
 long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
 		    struct page **pages, unsigned int gup_flags);
 
-int get_user_pages_fast(unsigned long start, int nr_pages, int write,
-			struct page **pages);
+int get_user_pages_fast(unsigned long start, int nr_pages,
+			unsigned int gup_flags, struct page **pages);
 
 /* Container for pinned pfns / pages */
 struct frame_vector {
diff --git a/kernel/futex.c b/kernel/futex.c
index c3b73b0311bc..1b9eca04fb0a 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -543,7 +543,7 @@ get_futex_key(u32 __user *uaddr, int fshared, union futex_key *key, enum futex_a
 	if (unlikely(should_fail_futex(fshared)))
 		return -EFAULT;
 
-	err = get_user_pages_fast(address, 1, 1, &page);
+	err = get_user_pages_fast(address, 1, FOLL_WRITE, &page);
 	/*
 	 * If write access is not required (eg. FUTEX_WAIT), try
 	 * and get read-only access.
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index ea36dc355da1..50e77ec427d0 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1293,7 +1293,9 @@ ssize_t iov_iter_get_pages(struct iov_iter *i,
 			len = maxpages * PAGE_SIZE;
 		addr &= ~(PAGE_SIZE - 1);
 		n = DIV_ROUND_UP(len, PAGE_SIZE);
-		res = get_user_pages_fast(addr, n, iov_iter_rw(i) != WRITE, pages);
+		res = get_user_pages_fast(addr, n,
+				iov_iter_rw(i) != WRITE ?  FOLL_WRITE : 0,
+				pages);
 		if (unlikely(res < 0))
 			return res;
 		return (res == n ? len : res * PAGE_SIZE) - *start;
@@ -1374,7 +1376,8 @@ ssize_t iov_iter_get_pages_alloc(struct iov_iter *i,
 		p = get_pages_array(n);
 		if (!p)
 			return -ENOMEM;
-		res = get_user_pages_fast(addr, n, iov_iter_rw(i) != WRITE, p);
+		res = get_user_pages_fast(addr, n,
+				iov_iter_rw(i) != WRITE ?  FOLL_WRITE : 0, p);
 		if (unlikely(res < 0)) {
 			kvfree(p);
 			return res;
diff --git a/mm/gup.c b/mm/gup.c
index 2b21eeaf8cc8..0684a9536207 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2025,7 +2025,7 @@ int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
  * get_user_pages_fast() - pin user pages in memory
  * @start:	starting user address
  * @nr_pages:	number of pages from start to pin
- * @write:	whether pages will be written to
+ * @gup_flags:	flags modifying pin behaviour
  * @pages:	array that receives pointers to the pages pinned.
  *		Should be at least nr_pages long.
  *
@@ -2037,8 +2037,8 @@ int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
  * requested. If nr_pages is 0 or negative, returns 0. If no pages
  * were pinned, returns -errno.
  */
-int get_user_pages_fast(unsigned long start, int nr_pages, int write,
-			struct page **pages)
+int get_user_pages_fast(unsigned long start, int nr_pages,
+			unsigned int gup_flags, struct page **pages)
 {
 	unsigned long addr, len, end;
 	int nr = 0, ret = 0;
@@ -2056,7 +2056,7 @@ int get_user_pages_fast(unsigned long start, int nr_pages, int write,
 
 	if (gup_fast_permitted(start, nr_pages)) {
 		local_irq_disable();
-		gup_pgd_range(addr, end, write ? FOLL_WRITE : 0, pages, &nr);
+		gup_pgd_range(addr, end, gup_flags, pages, &nr);
 		local_irq_enable();
 		ret = nr;
 	}
@@ -2067,7 +2067,7 @@ int get_user_pages_fast(unsigned long start, int nr_pages, int write,
 		pages += nr;
 
 		ret = get_user_pages_unlocked(start, nr_pages - nr, pages,
-				write ? FOLL_WRITE : 0);
+					      gup_flags);
 
 		/* Have to be a bit careful with return values */
 		if (nr > 0) {
diff --git a/mm/util.c b/mm/util.c
index d559bde497a9..f94a1ac09cd8 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -318,7 +318,7 @@ EXPORT_SYMBOL_GPL(__get_user_pages_fast);
  * get_user_pages_fast() - pin user pages in memory
  * @start:	starting user address
  * @nr_pages:	number of pages from start to pin
- * @write:	whether pages will be written to
+ * @gup_flags:	flags modifying pin behaviour
  * @pages:	array that receives pointers to the pages pinned.
  *		Should be at least nr_pages long.
  *
@@ -339,10 +339,10 @@ EXPORT_SYMBOL_GPL(__get_user_pages_fast);
  * were pinned, returns -errno.
  */
 int __weak get_user_pages_fast(unsigned long start,
-				int nr_pages, int write, struct page **pages)
+				int nr_pages, unsigned int gup_flags,
+				struct page **pages)
 {
-	return get_user_pages_unlocked(start, nr_pages, pages,
-				       write ? FOLL_WRITE : 0);
+	return get_user_pages_unlocked(start, nr_pages, pages, gup_flags);
 }
 EXPORT_SYMBOL_GPL(get_user_pages_fast);
 
diff --git a/net/ceph/pagevec.c b/net/ceph/pagevec.c
index d3736f5bffec..74cafc0142ea 100644
--- a/net/ceph/pagevec.c
+++ b/net/ceph/pagevec.c
@@ -27,7 +27,7 @@ struct page **ceph_get_direct_page_vector(const void __user *data,
 	while (got < num_pages) {
 		rc = get_user_pages_fast(
 		    (unsigned long)data + ((unsigned long)got * PAGE_SIZE),
-		    num_pages - got, write_page, pages + got);
+		    num_pages - got, write_page ? FOLL_WRITE : 0, pages + got);
 		if (rc < 0)
 			break;
 		BUG_ON(rc == 0);
diff --git a/net/rds/info.c b/net/rds/info.c
index e367a97a18c8..03f6fd56d237 100644
--- a/net/rds/info.c
+++ b/net/rds/info.c
@@ -193,7 +193,7 @@ int rds_info_getsockopt(struct socket *sock, int optname, char __user *optval,
 		ret = -ENOMEM;
 		goto out;
 	}
-	ret = get_user_pages_fast(start, nr_pages, 1, pages);
+	ret = get_user_pages_fast(start, nr_pages, FOLL_WRITE, pages);
 	if (ret != nr_pages) {
 		if (ret > 0)
 			nr_pages = ret;
diff --git a/net/rds/rdma.c b/net/rds/rdma.c
index 182ab8430594..b340ed4fc43a 100644
--- a/net/rds/rdma.c
+++ b/net/rds/rdma.c
@@ -158,7 +158,8 @@ static int rds_pin_pages(unsigned long user_addr, unsigned int nr_pages,
 {
 	int ret;
 
-	ret = get_user_pages_fast(user_addr, nr_pages, write, pages);
+	ret = get_user_pages_fast(user_addr, nr_pages, write ? FOLL_WRITE : 0,
+				  pages);
 
 	if (ret >= 0 && ret < nr_pages) {
 		while (ret--)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RESEND 4/7] mm/gup: Add FOLL_LONGTERM capability to GUP fast
  2019-03-17 18:34 [RESEND PATCH 0/7] Add FOLL_LONGTERM to GUP fast and use it ira.weiny
                   ` (2 preceding siblings ...)
  2019-03-17 18:34 ` [RESEND 3/7] mm/gup: Change GUP fast to use flags rather than a write 'bool' ira.weiny
@ 2019-03-17 18:34 ` ira.weiny
  2019-03-22 22:12   ` Dan Williams
  2019-03-17 18:34 ` [RESEND 5/7] IB/hfi1: Use the new FOLL_LONGTERM flag to get_user_pages_fast() ira.weiny
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 29+ messages in thread
From: ira.weiny @ 2019-03-17 18:34 UTC (permalink / raw)
  To: Andrew Morton, John Hubbard, Michal Hocko, Kirill A. Shutemov,
	Peter Zijlstra, Jason Gunthorpe, Benjamin Herrenschmidt,
	Paul Mackerras, David S. Miller, Martin Schwidefsky,
	Heiko Carstens, Rich Felker, Yoshinori Sato, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Ralf Baechle, James Hogan
  Cc: Ira Weiny, linux-mm, linux-kernel, linux-mips, linuxppc-dev,
	linux-s390, linux-sh, sparclinux, linux-rdma, netdev

From: Ira Weiny <ira.weiny@intel.com>

DAX pages were previously unprotected from longterm pins when users
called get_user_pages_fast().

Use the new FOLL_LONGTERM flag to check for DEVMAP pages and fall
back to regular GUP processing if a DEVMAP page is encountered.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>
---
 mm/gup.c | 29 +++++++++++++++++++++++++----
 1 file changed, 25 insertions(+), 4 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index 0684a9536207..173db0c44678 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1600,6 +1600,9 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
 			goto pte_unmap;
 
 		if (pte_devmap(pte)) {
+			if (unlikely(flags & FOLL_LONGTERM))
+				goto pte_unmap;
+
 			pgmap = get_dev_pagemap(pte_pfn(pte), pgmap);
 			if (unlikely(!pgmap)) {
 				undo_dev_pagemap(nr, nr_start, pages);
@@ -1739,8 +1742,11 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
 	if (!pmd_access_permitted(orig, flags & FOLL_WRITE))
 		return 0;
 
-	if (pmd_devmap(orig))
+	if (pmd_devmap(orig)) {
+		if (unlikely(flags & FOLL_LONGTERM))
+			return 0;
 		return __gup_device_huge_pmd(orig, pmdp, addr, end, pages, nr);
+	}
 
 	refs = 0;
 	page = pmd_page(orig) + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
@@ -1777,8 +1783,11 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
 	if (!pud_access_permitted(orig, flags & FOLL_WRITE))
 		return 0;
 
-	if (pud_devmap(orig))
+	if (pud_devmap(orig)) {
+		if (unlikely(flags & FOLL_LONGTERM))
+			return 0;
 		return __gup_device_huge_pud(orig, pudp, addr, end, pages, nr);
+	}
 
 	refs = 0;
 	page = pud_page(orig) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
@@ -2066,8 +2075,20 @@ int get_user_pages_fast(unsigned long start, int nr_pages,
 		start += nr << PAGE_SHIFT;
 		pages += nr;
 
-		ret = get_user_pages_unlocked(start, nr_pages - nr, pages,
-					      gup_flags);
+		if (gup_flags & FOLL_LONGTERM) {
+			down_read(&current->mm->mmap_sem);
+			ret = __gup_longterm_locked(current, current->mm,
+						    start, nr_pages - nr,
+						    pages, NULL, gup_flags);
+			up_read(&current->mm->mmap_sem);
+		} else {
+			/*
+			 * retain FAULT_FOLL_ALLOW_RETRY optimization if
+			 * possible
+			 */
+			ret = get_user_pages_unlocked(start, nr_pages - nr,
+						      pages, gup_flags);
+		}
 
 		/* Have to be a bit careful with return values */
 		if (nr > 0) {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RESEND 5/7] IB/hfi1: Use the new FOLL_LONGTERM flag to get_user_pages_fast()
  2019-03-17 18:34 [RESEND PATCH 0/7] Add FOLL_LONGTERM to GUP fast and use it ira.weiny
                   ` (3 preceding siblings ...)
  2019-03-17 18:34 ` [RESEND 4/7] mm/gup: Add FOLL_LONGTERM capability to GUP fast ira.weiny
@ 2019-03-17 18:34 ` ira.weiny
  2019-03-22 22:14   ` Dan Williams
  2019-03-17 18:34 ` [RESEND 6/7] IB/qib: " ira.weiny
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 29+ messages in thread
From: ira.weiny @ 2019-03-17 18:34 UTC (permalink / raw)
  To: Andrew Morton, John Hubbard, Michal Hocko, Kirill A. Shutemov,
	Peter Zijlstra, Jason Gunthorpe, Benjamin Herrenschmidt,
	Paul Mackerras, David S. Miller, Martin Schwidefsky,
	Heiko Carstens, Rich Felker, Yoshinori Sato, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Ralf Baechle, James Hogan
  Cc: Ira Weiny, linux-mm, linux-kernel, linux-mips, linuxppc-dev,
	linux-s390, linux-sh, sparclinux, linux-rdma, netdev

From: Ira Weiny <ira.weiny@intel.com>

Use the new FOLL_LONGTERM to get_user_pages_fast() to protect against
FS DAX pages being mapped.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>
---
 drivers/infiniband/hw/hfi1/user_pages.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/hfi1/user_pages.c b/drivers/infiniband/hw/hfi1/user_pages.c
index 78ccacaf97d0..6a7f9cd5a94e 100644
--- a/drivers/infiniband/hw/hfi1/user_pages.c
+++ b/drivers/infiniband/hw/hfi1/user_pages.c
@@ -104,9 +104,11 @@ int hfi1_acquire_user_pages(struct mm_struct *mm, unsigned long vaddr, size_t np
 			    bool writable, struct page **pages)
 {
 	int ret;
+	unsigned int gup_flags = writable ? FOLL_WRITE : 0;
 
-	ret = get_user_pages_fast(vaddr, npages, writable ? FOLL_WRITE : 0,
-				  pages);
+	gup_flags |= FOLL_LONGTERM;
+
+	ret = get_user_pages_fast(vaddr, npages, gup_flags, pages);
 	if (ret < 0)
 		return ret;
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RESEND 6/7] IB/qib: Use the new FOLL_LONGTERM flag to get_user_pages_fast()
  2019-03-17 18:34 [RESEND PATCH 0/7] Add FOLL_LONGTERM to GUP fast and use it ira.weiny
                   ` (4 preceding siblings ...)
  2019-03-17 18:34 ` [RESEND 5/7] IB/hfi1: Use the new FOLL_LONGTERM flag to get_user_pages_fast() ira.weiny
@ 2019-03-17 18:34 ` ira.weiny
  2019-03-22 22:15   ` Dan Williams
  2019-03-17 18:34 ` [RESEND 7/7] IB/mthca: " ira.weiny
  2019-03-19 22:19 ` [RESEND PATCH 0/7] Add FOLL_LONGTERM to GUP fast and use it Andrew Morton
  7 siblings, 1 reply; 29+ messages in thread
From: ira.weiny @ 2019-03-17 18:34 UTC (permalink / raw)
  To: Andrew Morton, John Hubbard, Michal Hocko, Kirill A. Shutemov,
	Peter Zijlstra, Jason Gunthorpe, Benjamin Herrenschmidt,
	Paul Mackerras, David S. Miller, Martin Schwidefsky,
	Heiko Carstens, Rich Felker, Yoshinori Sato, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Ralf Baechle, James Hogan
  Cc: Ira Weiny, linux-mm, linux-kernel, linux-mips, linuxppc-dev,
	linux-s390, linux-sh, sparclinux, linux-rdma, netdev

From: Ira Weiny <ira.weiny@intel.com>

Use the new FOLL_LONGTERM to get_user_pages_fast() to protect against
FS DAX pages being mapped.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>
---
 drivers/infiniband/hw/qib/qib_user_sdma.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/qib/qib_user_sdma.c b/drivers/infiniband/hw/qib/qib_user_sdma.c
index 31c523b2a9f5..b53cc0240e02 100644
--- a/drivers/infiniband/hw/qib/qib_user_sdma.c
+++ b/drivers/infiniband/hw/qib/qib_user_sdma.c
@@ -673,7 +673,7 @@ static int qib_user_sdma_pin_pages(const struct qib_devdata *dd,
 		else
 			j = npages;
 
-		ret = get_user_pages_fast(addr, j, 0, pages);
+		ret = get_user_pages_fast(addr, j, FOLL_LONGTERM, pages);
 		if (ret != j) {
 			i = 0;
 			j = ret;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RESEND 7/7] IB/mthca: Use the new FOLL_LONGTERM flag to get_user_pages_fast()
  2019-03-17 18:34 [RESEND PATCH 0/7] Add FOLL_LONGTERM to GUP fast and use it ira.weiny
                   ` (5 preceding siblings ...)
  2019-03-17 18:34 ` [RESEND 6/7] IB/qib: " ira.weiny
@ 2019-03-17 18:34 ` ira.weiny
  2019-03-19 22:19 ` [RESEND PATCH 0/7] Add FOLL_LONGTERM to GUP fast and use it Andrew Morton
  7 siblings, 0 replies; 29+ messages in thread
From: ira.weiny @ 2019-03-17 18:34 UTC (permalink / raw)
  To: Andrew Morton, John Hubbard, Michal Hocko, Kirill A. Shutemov,
	Peter Zijlstra, Jason Gunthorpe, Benjamin Herrenschmidt,
	Paul Mackerras, David S. Miller, Martin Schwidefsky,
	Heiko Carstens, Rich Felker, Yoshinori Sato, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Ralf Baechle, James Hogan
  Cc: Ira Weiny, linux-mm, linux-kernel, linux-mips, linuxppc-dev,
	linux-s390, linux-sh, sparclinux, linux-rdma, netdev

From: Ira Weiny <ira.weiny@intel.com>

Use the new FOLL_LONGTERM to get_user_pages_fast() to protect against
FS DAX pages being mapped.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>
---
 drivers/infiniband/hw/mthca/mthca_memfree.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/mthca/mthca_memfree.c b/drivers/infiniband/hw/mthca/mthca_memfree.c
index 112d2f38e0de..8ff0e90d7564 100644
--- a/drivers/infiniband/hw/mthca/mthca_memfree.c
+++ b/drivers/infiniband/hw/mthca/mthca_memfree.c
@@ -472,7 +472,8 @@ int mthca_map_user_db(struct mthca_dev *dev, struct mthca_uar *uar,
 		goto out;
 	}
 
-	ret = get_user_pages_fast(uaddr & PAGE_MASK, 1, FOLL_WRITE, pages);
+	ret = get_user_pages_fast(uaddr & PAGE_MASK, 1,
+				  FOLL_WRITE | FOLL_LONGTERM, pages);
 	if (ret < 0)
 		goto out;
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [RESEND PATCH 0/7] Add FOLL_LONGTERM to GUP fast and use it
  2019-03-17 18:34 [RESEND PATCH 0/7] Add FOLL_LONGTERM to GUP fast and use it ira.weiny
                   ` (6 preceding siblings ...)
  2019-03-17 18:34 ` [RESEND 7/7] IB/mthca: " ira.weiny
@ 2019-03-19 22:19 ` Andrew Morton
  2019-03-21  8:40   ` Ira Weiny
  7 siblings, 1 reply; 29+ messages in thread
From: Andrew Morton @ 2019-03-19 22:19 UTC (permalink / raw)
  To: ira.weiny
  Cc: John Hubbard, Michal Hocko, Kirill A. Shutemov, Peter Zijlstra,
	Jason Gunthorpe, Benjamin Herrenschmidt, Paul Mackerras,
	David S. Miller, Martin Schwidefsky, Heiko Carstens, Rich Felker,
	Yoshinori Sato, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Ralf Baechle, James Hogan, linux-mm, linux-kernel, linux-mips,
	linuxppc-dev, linux-s390, linux-sh, sparclinux, linux-rdma,
	netdev, Dan Williams

On Sun, 17 Mar 2019 11:34:31 -0700 ira.weiny@intel.com wrote:

> Resending after rebasing to the latest mm tree.
> 
> HFI1, qib, and mthca, use get_user_pages_fast() due to it performance
> advantages.  These pages can be held for a significant time.  But
> get_user_pages_fast() does not protect against mapping FS DAX pages.
> 
> Introduce FOLL_LONGTERM and use this flag in get_user_pages_fast() which
> retains the performance while also adding the FS DAX checks.  XDP has also
> shown interest in using this functionality.[1]
> 
> In addition we change get_user_pages() to use the new FOLL_LONGTERM flag and
> remove the specialized get_user_pages_longterm call.

It would be helpful to include your response to Christoph's question
(http://lkml.kernel.org/r/20190220180255.GA12020@iweiny-DESK2.sc.intel.com)
in the changelog.  Because if one person was wondering about this,
others will likely do so.

We have no record of acks or reviewed-by's.  At least one was missed
(http://lkml.kernel.org/r/CAOg9mSTTcD-9bCSDfC0WRYqfVrNB4TwOzL0c4+6QXi-N_Y43Vw@mail.gmail.com),
but that is very very partial.

This patchset is fairly DAX-centered, but Dan wasn't cc'ed!

So ho hum.  I'll scoop them up and shall make the above changes to the
[1/n] changelog, but we still have some work to do.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RESEND PATCH 0/7] Add FOLL_LONGTERM to GUP fast and use it
  2019-03-19 22:19 ` [RESEND PATCH 0/7] Add FOLL_LONGTERM to GUP fast and use it Andrew Morton
@ 2019-03-21  8:40   ` Ira Weiny
  0 siblings, 0 replies; 29+ messages in thread
From: Ira Weiny @ 2019-03-21  8:40 UTC (permalink / raw)
  To: Andrew Morton
  Cc: John Hubbard, Michal Hocko, Kirill A. Shutemov, Peter Zijlstra,
	Jason Gunthorpe, Benjamin Herrenschmidt, Paul Mackerras,
	David S. Miller, Martin Schwidefsky, Heiko Carstens, Rich Felker,
	Yoshinori Sato, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Ralf Baechle, James Hogan, linux-mm, linux-kernel, linux-mips,
	linuxppc-dev, linux-s390, linux-sh, sparclinux, linux-rdma,
	netdev, Dan Williams

On Tue, Mar 19, 2019 at 03:19:30PM -0700, Andrew Morton wrote:
> On Sun, 17 Mar 2019 11:34:31 -0700 ira.weiny@intel.com wrote:
> 
> > Resending after rebasing to the latest mm tree.
> > 
> > HFI1, qib, and mthca, use get_user_pages_fast() due to it performance
> > advantages.  These pages can be held for a significant time.  But
> > get_user_pages_fast() does not protect against mapping FS DAX pages.
> > 
> > Introduce FOLL_LONGTERM and use this flag in get_user_pages_fast() which
> > retains the performance while also adding the FS DAX checks.  XDP has also
> > shown interest in using this functionality.[1]
> > 
> > In addition we change get_user_pages() to use the new FOLL_LONGTERM flag and
> > remove the specialized get_user_pages_longterm call.
> 
> It would be helpful to include your response to Christoph's question
> (http://lkml.kernel.org/r/20190220180255.GA12020@iweiny-DESK2.sc.intel.com)
> in the changelog.  Because if one person was wondering about this,
> others will likely do so.
> 
> We have no record of acks or reviewed-by's.  At least one was missed
> (http://lkml.kernel.org/r/CAOg9mSTTcD-9bCSDfC0WRYqfVrNB4TwOzL0c4+6QXi-N_Y43Vw@mail.gmail.com),
> but that is very very partial.

That is my bad.  Sorry to Mike.  And I have added him.

> 
> This patchset is fairly DAX-centered, but Dan wasn't cc'ed!

Agreed, I'm new to changing things which affect this many sub-systems and I
struggled with who should be CC'ed (get_maintainer.pl returned a very large
list  :-(.

I fear I may have cc'ed too many people, and the wrong people apparently, so
that may be affecting the review...

So again my apologies.  I don't know if Dan is going to get a chance to put a
reviewed-by on them this week but I thought I would send this note to let you
know I'm not ignoring your feedback.  Just waiting a bit before resending to
hopefully get some more acks/reviewed bys.

Thanks,
Ira

> 
> So ho hum.  I'll scoop them up and shall make the above changes to the
> [1/n] changelog, but we still have some work to do.
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RESEND 1/7] mm/gup: Replace get_user_pages_longterm() with FOLL_LONGTERM
  2019-03-17 18:34 ` [RESEND 1/7] mm/gup: Replace get_user_pages_longterm() with FOLL_LONGTERM ira.weiny
@ 2019-03-22 21:24   ` Dan Williams
  2019-03-25  6:19     ` Ira Weiny
  2019-03-25 10:27     ` Ira Weiny
  0 siblings, 2 replies; 29+ messages in thread
From: Dan Williams @ 2019-03-22 21:24 UTC (permalink / raw)
  To: ira.weiny
  Cc: Andrew Morton, John Hubbard, Michal Hocko, Kirill A. Shutemov,
	Peter Zijlstra, Jason Gunthorpe, Benjamin Herrenschmidt,
	Paul Mackerras, David S. Miller, Martin Schwidefsky,
	Heiko Carstens, Rich Felker, Yoshinori Sato, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Ralf Baechle, James Hogan,
	Aneesh Kumar K . V, Michal Hocko, linux-mm,
	Linux Kernel Mailing List, linux-mips, linuxppc-dev, linux-s390,
	Linux-sh, sparclinux, linux-rdma, netdev

On Sun, Mar 17, 2019 at 7:36 PM <ira.weiny@intel.com> wrote:
>
> From: Ira Weiny <ira.weiny@intel.com>
>
> Rather than have a separate get_user_pages_longterm() call,
> introduce FOLL_LONGTERM and change the longterm callers to use
> it.
>
> This patch does not change any functionality.
>
> FOLL_LONGTERM can only be supported with get_user_pages() as it
> requires vmas to determine if DAX is in use.
>
> CC: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> CC: Andrew Morton <akpm@linux-foundation.org>
> CC: Michal Hocko <mhocko@kernel.org>
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
[..]
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 2d483dbdffc0..6831077d126c 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
[..]
> @@ -2609,6 +2596,7 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address,
>  #define FOLL_REMOTE    0x2000  /* we are working on non-current tsk/mm */
>  #define FOLL_COW       0x4000  /* internal GUP flag */
>  #define FOLL_ANON      0x8000  /* don't do file mappings */
> +#define FOLL_LONGTERM  0x10000 /* mapping is intended for a long term pin */

Let's change this comment to say something like /* mapping lifetime is
indefinite / at the discretion of userspace */, since "longterm is not
well defined.

I think it should also include a /* FIXME: */ to say something about
the havoc a long term pin might wreak on fs and mm code paths.

>  static inline int vm_fault_to_errno(vm_fault_t vm_fault, int foll_flags)
>  {
> diff --git a/mm/gup.c b/mm/gup.c
> index f84e22685aaa..8cb4cff067bc 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -1112,26 +1112,7 @@ long get_user_pages_remote(struct task_struct *tsk, struct mm_struct *mm,
>  }
>  EXPORT_SYMBOL(get_user_pages_remote);
>
> -/*
> - * This is the same as get_user_pages_remote(), just with a
> - * less-flexible calling convention where we assume that the task
> - * and mm being operated on are the current task's and don't allow
> - * passing of a locked parameter.  We also obviously don't pass
> - * FOLL_REMOTE in here.
> - */
> -long get_user_pages(unsigned long start, unsigned long nr_pages,
> -               unsigned int gup_flags, struct page **pages,
> -               struct vm_area_struct **vmas)
> -{
> -       return __get_user_pages_locked(current, current->mm, start, nr_pages,
> -                                      pages, vmas, NULL,
> -                                      gup_flags | FOLL_TOUCH);
> -}
> -EXPORT_SYMBOL(get_user_pages);
> -
>  #if defined(CONFIG_FS_DAX) || defined (CONFIG_CMA)
> -
> -#ifdef CONFIG_FS_DAX
>  static bool check_dax_vmas(struct vm_area_struct **vmas, long nr_pages)
>  {
>         long i;
> @@ -1150,12 +1131,6 @@ static bool check_dax_vmas(struct vm_area_struct **vmas, long nr_pages)
>         }
>         return false;
>  }
> -#else
> -static inline bool check_dax_vmas(struct vm_area_struct **vmas, long nr_pages)
> -{
> -       return false;
> -}
> -#endif
>
>  #ifdef CONFIG_CMA
>  static struct page *new_non_cma_page(struct page *page, unsigned long private)
> @@ -1209,10 +1184,13 @@ static struct page *new_non_cma_page(struct page *page, unsigned long private)
>         return __alloc_pages_node(nid, gfp_mask, 0);
>  }
>
> -static long check_and_migrate_cma_pages(unsigned long start, long nr_pages,
> -                                       unsigned int gup_flags,
> +static long check_and_migrate_cma_pages(struct task_struct *tsk,
> +                                       struct mm_struct *mm,
> +                                       unsigned long start,
> +                                       unsigned long nr_pages,
>                                         struct page **pages,
> -                                       struct vm_area_struct **vmas)
> +                                       struct vm_area_struct **vmas,
> +                                       unsigned int gup_flags)
>  {
>         long i;
>         bool drain_allow = true;
> @@ -1268,10 +1246,14 @@ static long check_and_migrate_cma_pages(unsigned long start, long nr_pages,
>                                 putback_movable_pages(&cma_page_list);
>                 }
>                 /*
> -                * We did migrate all the pages, Try to get the page references again
> -                * migrating any new CMA pages which we failed to isolate earlier.
> +                * We did migrate all the pages, Try to get the page references
> +                * again migrating any new CMA pages which we failed to isolate
> +                * earlier.
>                  */
> -               nr_pages = get_user_pages(start, nr_pages, gup_flags, pages, vmas);
> +               nr_pages = __get_user_pages_locked(tsk, mm, start, nr_pages,
> +                                                  pages, vmas, NULL,
> +                                                  gup_flags);
> +

Why did this need to change to __get_user_pages_locked?

>                 if ((nr_pages > 0) && migrate_allow) {
>                         drain_allow = true;
>                         goto check_again;
> @@ -1281,66 +1263,115 @@ static long check_and_migrate_cma_pages(unsigned long start, long nr_pages,
>         return nr_pages;
>  }
>  #else
> -static inline long check_and_migrate_cma_pages(unsigned long start, long nr_pages,
> -                                              unsigned int gup_flags,
> -                                              struct page **pages,
> -                                              struct vm_area_struct **vmas)
> +static long check_and_migrate_cma_pages(struct task_struct *tsk,
> +                                       struct mm_struct *mm,
> +                                       unsigned long start,
> +                                       unsigned long nr_pages,
> +                                       struct page **pages,
> +                                       struct vm_area_struct **vmas,
> +                                       unsigned int gup_flags)
>  {
>         return nr_pages;
>  }
>  #endif
>
>  /*
> - * This is the same as get_user_pages() in that it assumes we are
> - * operating on the current task's mm, but it goes further to validate
> - * that the vmas associated with the address range are suitable for
> - * longterm elevated page reference counts. For example, filesystem-dax
> - * mappings are subject to the lifetime enforced by the filesystem and
> - * we need guarantees that longterm users like RDMA and V4L2 only
> - * establish mappings that have a kernel enforced revocation mechanism.
> + * __gup_longterm_locked() is a wrapper for __get_uer_pages_locked which

s/uer/user/

> + * allows us to process the FOLL_LONGTERM flag if present.
> + *
> + * FOLL_LONGTERM Checks for either DAX VMAs or PPC CMA regions and either fails
> + * the pin or attempts to migrate the page as appropriate.
> + *
> + * In the filesystem-dax case mappings are subject to the lifetime enforced by
> + * the filesystem and we need guarantees that longterm users like RDMA and V4L2
> + * only establish mappings that have a kernel enforced revocation mechanism.
> + *
> + * In the CMA case pages can't be pinned in a CMA region as this would
> + * unnecessarily fragment that region.  So CMA attempts to migrate the page
> + * before pinning.
>   *
>   * "longterm" == userspace controlled elevated page count lifetime.
>   * Contrast this to iov_iter_get_pages() usages which are transient.

Ah, here's the longterm documentation, but if I was a developer
considering whether to use FOLL_LONGTERM or not I would expect to find
the documentation at the flag definition site.

I think it has become more clear since get_user_pages_longterm() was
initially merged that we need to warn people not to use it, or at
least seriously reconsider whether they want an interface to support
indefinite pins.

>   */
> -long get_user_pages_longterm(unsigned long start, unsigned long nr_pages,
> -                            unsigned int gup_flags, struct page **pages,
> -                            struct vm_area_struct **vmas_arg)
> +static __always_inline long __gup_longterm_locked(struct task_struct *tsk,

...why the __always_inline?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RESEND 2/7] mm/gup: Change write parameter to flags in fast walk
  2019-03-17 18:34 ` [RESEND 2/7] mm/gup: Change write parameter to flags in fast walk ira.weiny
@ 2019-03-22 21:30   ` Dan Williams
  0 siblings, 0 replies; 29+ messages in thread
From: Dan Williams @ 2019-03-22 21:30 UTC (permalink / raw)
  To: ira.weiny
  Cc: Andrew Morton, John Hubbard, Michal Hocko, Kirill A. Shutemov,
	Peter Zijlstra, Jason Gunthorpe, Benjamin Herrenschmidt,
	Paul Mackerras, David S. Miller, Martin Schwidefsky,
	Heiko Carstens, Rich Felker, Yoshinori Sato, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Ralf Baechle, James Hogan,
	linux-mm, Linux Kernel Mailing List, linux-mips, linuxppc-dev,
	linux-s390, Linux-sh, sparclinux, linux-rdma, netdev

On Sun, Mar 17, 2019 at 7:36 PM <ira.weiny@intel.com> wrote:
>
> From: Ira Weiny <ira.weiny@intel.com>
>
> In order to support more options in the GUP fast walk, change
> the write parameter to flags throughout the call stack.
>
> This patch does not change functionality and passes FOLL_WRITE
> where write was previously used.
>
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>

Looks good,

Reviewed-by: Dan Williams <dan.j.williams@intel.com>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RESEND 3/7] mm/gup: Change GUP fast to use flags rather than a write 'bool'
  2019-03-17 18:34 ` [RESEND 3/7] mm/gup: Change GUP fast to use flags rather than a write 'bool' ira.weiny
@ 2019-03-22 22:05   ` Dan Williams
  2019-03-25  8:26     ` Ira Weiny
  0 siblings, 1 reply; 29+ messages in thread
From: Dan Williams @ 2019-03-22 22:05 UTC (permalink / raw)
  To: ira.weiny
  Cc: Andrew Morton, John Hubbard, Michal Hocko, Kirill A. Shutemov,
	Peter Zijlstra, Jason Gunthorpe, Benjamin Herrenschmidt,
	Paul Mackerras, David S. Miller, Martin Schwidefsky,
	Heiko Carstens, Rich Felker, Yoshinori Sato, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Ralf Baechle, James Hogan,
	linux-mm, Linux Kernel Mailing List, linux-mips, linuxppc-dev,
	linux-s390, Linux-sh, sparclinux, linux-rdma, netdev

On Sun, Mar 17, 2019 at 7:36 PM <ira.weiny@intel.com> wrote:
>
> From: Ira Weiny <ira.weiny@intel.com>
>
> To facilitate additional options to get_user_pages_fast() change the
> singular write parameter to be gup_flags.
>
> This patch does not change any functionality.  New functionality will
> follow in subsequent patches.
>
> Some of the get_user_pages_fast() call sites were unchanged because they
> already passed FOLL_WRITE or 0 for the write parameter.
>
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
>
> ---
> Changes from V1:
>         Rebase to current merge tree
>         arch/powerpc/mm/mmu_context_iommu.c no longer calls gup_fast
>                 The gup_longterm was converted in patch 1
>
>  arch/mips/mm/gup.c                         | 11 ++++++-----
>  arch/powerpc/kvm/book3s_64_mmu_hv.c        |  4 ++--
>  arch/powerpc/kvm/e500_mmu.c                |  2 +-
>  arch/s390/kvm/interrupt.c                  |  2 +-
>  arch/s390/mm/gup.c                         | 12 ++++++------
>  arch/sh/mm/gup.c                           | 11 ++++++-----
>  arch/sparc/mm/gup.c                        |  9 +++++----
>  arch/x86/kvm/paging_tmpl.h                 |  2 +-
>  arch/x86/kvm/svm.c                         |  2 +-
>  drivers/fpga/dfl-afu-dma-region.c          |  2 +-
>  drivers/gpu/drm/via/via_dmablit.c          |  3 ++-
>  drivers/infiniband/hw/hfi1/user_pages.c    |  3 ++-
>  drivers/misc/genwqe/card_utils.c           |  2 +-
>  drivers/misc/vmw_vmci/vmci_host.c          |  2 +-
>  drivers/misc/vmw_vmci/vmci_queue_pair.c    |  6 ++++--
>  drivers/platform/goldfish/goldfish_pipe.c  |  3 ++-
>  drivers/rapidio/devices/rio_mport_cdev.c   |  4 +++-
>  drivers/sbus/char/oradax.c                 |  2 +-
>  drivers/scsi/st.c                          |  3 ++-
>  drivers/staging/gasket/gasket_page_table.c |  4 ++--
>  drivers/tee/tee_shm.c                      |  2 +-
>  drivers/vfio/vfio_iommu_spapr_tce.c        |  3 ++-
>  drivers/vhost/vhost.c                      |  2 +-
>  drivers/video/fbdev/pvr2fb.c               |  2 +-
>  drivers/virt/fsl_hypervisor.c              |  2 +-
>  drivers/xen/gntdev.c                       |  2 +-
>  fs/orangefs/orangefs-bufmap.c              |  2 +-
>  include/linux/mm.h                         |  4 ++--
>  kernel/futex.c                             |  2 +-
>  lib/iov_iter.c                             |  7 +++++--
>  mm/gup.c                                   | 10 +++++-----
>  mm/util.c                                  |  8 ++++----
>  net/ceph/pagevec.c                         |  2 +-
>  net/rds/info.c                             |  2 +-
>  net/rds/rdma.c                             |  3 ++-
>  35 files changed, 79 insertions(+), 63 deletions(-)


>
> diff --git a/arch/mips/mm/gup.c b/arch/mips/mm/gup.c
> index 0d14e0d8eacf..4c2b4483683c 100644
> --- a/arch/mips/mm/gup.c
> +++ b/arch/mips/mm/gup.c
> @@ -235,7 +235,7 @@ int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
>   * get_user_pages_fast() - pin user pages in memory
>   * @start:     starting user address
>   * @nr_pages:  number of pages from start to pin
> - * @write:     whether pages will be written to
> + * @gup_flags: flags modifying pin behaviour
>   * @pages:     array that receives pointers to the pages pinned.
>   *             Should be at least nr_pages long.
>   *
> @@ -247,8 +247,8 @@ int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
>   * requested. If nr_pages is 0 or negative, returns 0. If no pages
>   * were pinned, returns -errno.
>   */
> -int get_user_pages_fast(unsigned long start, int nr_pages, int write,
> -                       struct page **pages)
> +int get_user_pages_fast(unsigned long start, int nr_pages,
> +                       unsigned int gup_flags, struct page **pages)

This looks a tad scary given all related thrash especially when it's
only 1 user that wants to do get_user_page_fast_longterm, right? Maybe
something like the following. Note I explicitly moved the flags to the
end so that someone half paying attention that calls
__get_user_pages_fast will get a compile error if they specify the
args in the same order.

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 76ba638ceda8..c6c743bc2c68 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1505,8 +1505,15 @@ static inline long
get_user_pages_longterm(unsigned long start,
 }
 #endif /* CONFIG_FS_DAX */

-int get_user_pages_fast(unsigned long start, int nr_pages, int write,
-                       struct page **pages);
+
+int __get_user_pages_fast(unsigned long start, int nr_pages,
+               struct page **pages, unsigned int gup_flags);
+
+static inline int get_user_pages_fast(unsigned long start, int
nr_pages, int write,
+                       struct page **pages)
+{
+       return __get_user_pages_fast(start, nr_pages, pages, write ?
FOLL_WRITE);
+}

 /* Container for pinned pfns / pages */
 struct frame_vector {

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [RESEND 4/7] mm/gup: Add FOLL_LONGTERM capability to GUP fast
  2019-03-17 18:34 ` [RESEND 4/7] mm/gup: Add FOLL_LONGTERM capability to GUP fast ira.weiny
@ 2019-03-22 22:12   ` Dan Williams
  2019-03-25  8:42     ` Ira Weiny
  0 siblings, 1 reply; 29+ messages in thread
From: Dan Williams @ 2019-03-22 22:12 UTC (permalink / raw)
  To: ira.weiny
  Cc: Andrew Morton, John Hubbard, Michal Hocko, Kirill A. Shutemov,
	Peter Zijlstra, Jason Gunthorpe, Benjamin Herrenschmidt,
	Paul Mackerras, David S. Miller, Martin Schwidefsky,
	Heiko Carstens, Rich Felker, Yoshinori Sato, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Ralf Baechle, James Hogan,
	linux-mm, Linux Kernel Mailing List, linux-mips, linuxppc-dev,
	linux-s390, Linux-sh, sparclinux, linux-rdma, netdev

On Sun, Mar 17, 2019 at 7:36 PM <ira.weiny@intel.com> wrote:
>
> From: Ira Weiny <ira.weiny@intel.com>
>
> DAX pages were previously unprotected from longterm pins when users
> called get_user_pages_fast().
>
> Use the new FOLL_LONGTERM flag to check for DEVMAP pages and fall
> back to regular GUP processing if a DEVMAP page is encountered.
>
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> ---
>  mm/gup.c | 29 +++++++++++++++++++++++++----
>  1 file changed, 25 insertions(+), 4 deletions(-)
>
> diff --git a/mm/gup.c b/mm/gup.c
> index 0684a9536207..173db0c44678 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -1600,6 +1600,9 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
>                         goto pte_unmap;
>
>                 if (pte_devmap(pte)) {
> +                       if (unlikely(flags & FOLL_LONGTERM))
> +                               goto pte_unmap;
> +
>                         pgmap = get_dev_pagemap(pte_pfn(pte), pgmap);
>                         if (unlikely(!pgmap)) {
>                                 undo_dev_pagemap(nr, nr_start, pages);
> @@ -1739,8 +1742,11 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
>         if (!pmd_access_permitted(orig, flags & FOLL_WRITE))
>                 return 0;
>
> -       if (pmd_devmap(orig))
> +       if (pmd_devmap(orig)) {
> +               if (unlikely(flags & FOLL_LONGTERM))
> +                       return 0;
>                 return __gup_device_huge_pmd(orig, pmdp, addr, end, pages, nr);
> +       }
>
>         refs = 0;
>         page = pmd_page(orig) + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
> @@ -1777,8 +1783,11 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
>         if (!pud_access_permitted(orig, flags & FOLL_WRITE))
>                 return 0;
>
> -       if (pud_devmap(orig))
> +       if (pud_devmap(orig)) {
> +               if (unlikely(flags & FOLL_LONGTERM))
> +                       return 0;
>                 return __gup_device_huge_pud(orig, pudp, addr, end, pages, nr);
> +       }
>
>         refs = 0;
>         page = pud_page(orig) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
> @@ -2066,8 +2075,20 @@ int get_user_pages_fast(unsigned long start, int nr_pages,
>                 start += nr << PAGE_SHIFT;
>                 pages += nr;
>
> -               ret = get_user_pages_unlocked(start, nr_pages - nr, pages,
> -                                             gup_flags);
> +               if (gup_flags & FOLL_LONGTERM) {
> +                       down_read(&current->mm->mmap_sem);
> +                       ret = __gup_longterm_locked(current, current->mm,
> +                                                   start, nr_pages - nr,
> +                                                   pages, NULL, gup_flags);
> +                       up_read(&current->mm->mmap_sem);
> +               } else {
> +                       /*
> +                        * retain FAULT_FOLL_ALLOW_RETRY optimization if
> +                        * possible
> +                        */
> +                       ret = get_user_pages_unlocked(start, nr_pages - nr,
> +                                                     pages, gup_flags);

I couldn't immediately grok why this path needs to branch on
FOLL_LONGTERM? Won't get_user_pages_unlocked(..., FOLL_LONGTERM) do
the right thing?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RESEND 5/7] IB/hfi1: Use the new FOLL_LONGTERM flag to get_user_pages_fast()
  2019-03-17 18:34 ` [RESEND 5/7] IB/hfi1: Use the new FOLL_LONGTERM flag to get_user_pages_fast() ira.weiny
@ 2019-03-22 22:14   ` Dan Williams
  2019-03-25  8:43     ` Ira Weiny
  0 siblings, 1 reply; 29+ messages in thread
From: Dan Williams @ 2019-03-22 22:14 UTC (permalink / raw)
  To: ira.weiny
  Cc: Andrew Morton, John Hubbard, Michal Hocko, Kirill A. Shutemov,
	Peter Zijlstra, Jason Gunthorpe, Benjamin Herrenschmidt,
	Paul Mackerras, David S. Miller, Martin Schwidefsky,
	Heiko Carstens, Rich Felker, Yoshinori Sato, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Ralf Baechle, James Hogan,
	linux-mm, Linux Kernel Mailing List, linux-mips, linuxppc-dev,
	linux-s390, Linux-sh, sparclinux, linux-rdma, netdev

On Sun, Mar 17, 2019 at 7:36 PM <ira.weiny@intel.com> wrote:
>
> From: Ira Weiny <ira.weiny@intel.com>
>
> Use the new FOLL_LONGTERM to get_user_pages_fast() to protect against
> FS DAX pages being mapped.
>
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> ---
>  drivers/infiniband/hw/hfi1/user_pages.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/infiniband/hw/hfi1/user_pages.c b/drivers/infiniband/hw/hfi1/user_pages.c
> index 78ccacaf97d0..6a7f9cd5a94e 100644
> --- a/drivers/infiniband/hw/hfi1/user_pages.c
> +++ b/drivers/infiniband/hw/hfi1/user_pages.c
> @@ -104,9 +104,11 @@ int hfi1_acquire_user_pages(struct mm_struct *mm, unsigned long vaddr, size_t np
>                             bool writable, struct page **pages)
>  {
>         int ret;
> +       unsigned int gup_flags = writable ? FOLL_WRITE : 0;

Maybe:

    unsigned int gup_flags = FOLL_LONGTERM | (writable ? FOLL_WRITE : 0);

?

>
> -       ret = get_user_pages_fast(vaddr, npages, writable ? FOLL_WRITE : 0,
> -                                 pages);
> +       gup_flags |= FOLL_LONGTERM;
> +
> +       ret = get_user_pages_fast(vaddr, npages, gup_flags, pages);
>         if (ret < 0)
>                 return ret;
>
> --
> 2.20.1
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RESEND 6/7] IB/qib: Use the new FOLL_LONGTERM flag to get_user_pages_fast()
  2019-03-17 18:34 ` [RESEND 6/7] IB/qib: " ira.weiny
@ 2019-03-22 22:15   ` Dan Williams
  0 siblings, 0 replies; 29+ messages in thread
From: Dan Williams @ 2019-03-22 22:15 UTC (permalink / raw)
  To: ira.weiny
  Cc: Andrew Morton, John Hubbard, Michal Hocko, Kirill A. Shutemov,
	Peter Zijlstra, Jason Gunthorpe, Benjamin Herrenschmidt,
	Paul Mackerras, David S. Miller, Martin Schwidefsky,
	Heiko Carstens, Rich Felker, Yoshinori Sato, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Ralf Baechle, James Hogan,
	linux-mm, Linux Kernel Mailing List, linux-mips, linuxppc-dev,
	linux-s390, Linux-sh, sparclinux, linux-rdma, netdev

On Sun, Mar 17, 2019 at 7:36 PM <ira.weiny@intel.com> wrote:
>
> From: Ira Weiny <ira.weiny@intel.com>
>
> Use the new FOLL_LONGTERM to get_user_pages_fast() to protect against
> FS DAX pages being mapped.
>
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>

Looks good modulo potential  __get_user_pages_fast() suggestion.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RESEND 1/7] mm/gup: Replace get_user_pages_longterm() with FOLL_LONGTERM
  2019-03-22 21:24   ` Dan Williams
@ 2019-03-25  6:19     ` Ira Weiny
  2019-03-25 16:45       ` Dan Williams
  2019-03-25 10:27     ` Ira Weiny
  1 sibling, 1 reply; 29+ messages in thread
From: Ira Weiny @ 2019-03-25  6:19 UTC (permalink / raw)
  To: Dan Williams
  Cc: Andrew Morton, John Hubbard, Michal Hocko, Kirill A. Shutemov,
	Peter Zijlstra, Jason Gunthorpe, Benjamin Herrenschmidt,
	Paul Mackerras, David S. Miller, Martin Schwidefsky,
	Heiko Carstens, Rich Felker, Yoshinori Sato, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Ralf Baechle, James Hogan,
	Aneesh Kumar K . V, Michal Hocko, linux-mm,
	Linux Kernel Mailing List, linux-mips, linuxppc-dev, linux-s390,
	Linux-sh, sparclinux, linux-rdma, netdev

On Fri, Mar 22, 2019 at 02:24:40PM -0700, Dan Williams wrote:
> On Sun, Mar 17, 2019 at 7:36 PM <ira.weiny@intel.com> wrote:
> >
> > From: Ira Weiny <ira.weiny@intel.com>
> >
> > Rather than have a separate get_user_pages_longterm() call,
> > introduce FOLL_LONGTERM and change the longterm callers to use
> > it.
> >
> > This patch does not change any functionality.
> >
> > FOLL_LONGTERM can only be supported with get_user_pages() as it
> > requires vmas to determine if DAX is in use.
> >
> > CC: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> > CC: Andrew Morton <akpm@linux-foundation.org>
> > CC: Michal Hocko <mhocko@kernel.org>
> > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> [..]
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index 2d483dbdffc0..6831077d126c 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> [..]
> > @@ -2609,6 +2596,7 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address,
> >  #define FOLL_REMOTE    0x2000  /* we are working on non-current tsk/mm */
> >  #define FOLL_COW       0x4000  /* internal GUP flag */
> >  #define FOLL_ANON      0x8000  /* don't do file mappings */
> > +#define FOLL_LONGTERM  0x10000 /* mapping is intended for a long term pin */
> 
> Let's change this comment to say something like /* mapping lifetime is
> indefinite / at the discretion of userspace */, since "longterm is not
> well defined.
> 
> I think it should also include a /* FIXME: */ to say something about
> the havoc a long term pin might wreak on fs and mm code paths.

Will do.

> 
> >  static inline int vm_fault_to_errno(vm_fault_t vm_fault, int foll_flags)
> >  {
> > diff --git a/mm/gup.c b/mm/gup.c
> > index f84e22685aaa..8cb4cff067bc 100644
> > --- a/mm/gup.c
> > +++ b/mm/gup.c
> > @@ -1112,26 +1112,7 @@ long get_user_pages_remote(struct task_struct *tsk, struct mm_struct *mm,
> >  }
> >  EXPORT_SYMBOL(get_user_pages_remote);
> >
> > -/*
> > - * This is the same as get_user_pages_remote(), just with a
> > - * less-flexible calling convention where we assume that the task
> > - * and mm being operated on are the current task's and don't allow
> > - * passing of a locked parameter.  We also obviously don't pass
> > - * FOLL_REMOTE in here.
> > - */
> > -long get_user_pages(unsigned long start, unsigned long nr_pages,
> > -               unsigned int gup_flags, struct page **pages,
> > -               struct vm_area_struct **vmas)
> > -{
> > -       return __get_user_pages_locked(current, current->mm, start, nr_pages,
> > -                                      pages, vmas, NULL,
> > -                                      gup_flags | FOLL_TOUCH);
> > -}
> > -EXPORT_SYMBOL(get_user_pages);
> > -
> >  #if defined(CONFIG_FS_DAX) || defined (CONFIG_CMA)
> > -
> > -#ifdef CONFIG_FS_DAX
> >  static bool check_dax_vmas(struct vm_area_struct **vmas, long nr_pages)
> >  {
> >         long i;
> > @@ -1150,12 +1131,6 @@ static bool check_dax_vmas(struct vm_area_struct **vmas, long nr_pages)
> >         }
> >         return false;
> >  }
> > -#else
> > -static inline bool check_dax_vmas(struct vm_area_struct **vmas, long nr_pages)
> > -{
> > -       return false;
> > -}
> > -#endif
> >
> >  #ifdef CONFIG_CMA
> >  static struct page *new_non_cma_page(struct page *page, unsigned long private)
> > @@ -1209,10 +1184,13 @@ static struct page *new_non_cma_page(struct page *page, unsigned long private)
> >         return __alloc_pages_node(nid, gfp_mask, 0);
> >  }
> >
> > -static long check_and_migrate_cma_pages(unsigned long start, long nr_pages,
> > -                                       unsigned int gup_flags,
> > +static long check_and_migrate_cma_pages(struct task_struct *tsk,
> > +                                       struct mm_struct *mm,
> > +                                       unsigned long start,
> > +                                       unsigned long nr_pages,
> >                                         struct page **pages,
> > -                                       struct vm_area_struct **vmas)
> > +                                       struct vm_area_struct **vmas,
> > +                                       unsigned int gup_flags)
> >  {
> >         long i;
> >         bool drain_allow = true;
> > @@ -1268,10 +1246,14 @@ static long check_and_migrate_cma_pages(unsigned long start, long nr_pages,
> >                                 putback_movable_pages(&cma_page_list);
> >                 }
> >                 /*
> > -                * We did migrate all the pages, Try to get the page references again
> > -                * migrating any new CMA pages which we failed to isolate earlier.
> > +                * We did migrate all the pages, Try to get the page references
> > +                * again migrating any new CMA pages which we failed to isolate
> > +                * earlier.
> >                  */
> > -               nr_pages = get_user_pages(start, nr_pages, gup_flags, pages, vmas);
> > +               nr_pages = __get_user_pages_locked(tsk, mm, start, nr_pages,
> > +                                                  pages, vmas, NULL,
> > +                                                  gup_flags);
> > +
> 
> Why did this need to change to __get_user_pages_locked?

__get_uer_pages_locked() is now the "internal call" for get_user_pages.

Technically it did not _have_ to change but there is no need to call
get_user_pages() again because the FOLL_TOUCH flags is already set.  Also this
call then matches the __get_user_pages_locked() which was called on the pages
we migrated from.  Mostly this keeps the code "symmetrical" in that we called
__get_user_pages_locked() on the pages we are migrating from and the same call
on the pages we migrated to.

While the change here looks funny I think the final code is better.

> 
> >                 if ((nr_pages > 0) && migrate_allow) {
> >                         drain_allow = true;
> >                         goto check_again;
> > @@ -1281,66 +1263,115 @@ static long check_and_migrate_cma_pages(unsigned long start, long nr_pages,
> >         return nr_pages;
> >  }
> >  #else
> > -static inline long check_and_migrate_cma_pages(unsigned long start, long nr_pages,
> > -                                              unsigned int gup_flags,
> > -                                              struct page **pages,
> > -                                              struct vm_area_struct **vmas)
> > +static long check_and_migrate_cma_pages(struct task_struct *tsk,
> > +                                       struct mm_struct *mm,
> > +                                       unsigned long start,
> > +                                       unsigned long nr_pages,
> > +                                       struct page **pages,
> > +                                       struct vm_area_struct **vmas,
> > +                                       unsigned int gup_flags)
> >  {
> >         return nr_pages;
> >  }
> >  #endif
> >
> >  /*
> > - * This is the same as get_user_pages() in that it assumes we are
> > - * operating on the current task's mm, but it goes further to validate
> > - * that the vmas associated with the address range are suitable for
> > - * longterm elevated page reference counts. For example, filesystem-dax
> > - * mappings are subject to the lifetime enforced by the filesystem and
> > - * we need guarantees that longterm users like RDMA and V4L2 only
> > - * establish mappings that have a kernel enforced revocation mechanism.
> > + * __gup_longterm_locked() is a wrapper for __get_uer_pages_locked which
> 
> s/uer/user/

Check

> 
> > + * allows us to process the FOLL_LONGTERM flag if present.
> > + *
> > + * FOLL_LONGTERM Checks for either DAX VMAs or PPC CMA regions and either fails
> > + * the pin or attempts to migrate the page as appropriate.
> > + *
> > + * In the filesystem-dax case mappings are subject to the lifetime enforced by
> > + * the filesystem and we need guarantees that longterm users like RDMA and V4L2
> > + * only establish mappings that have a kernel enforced revocation mechanism.
> > + *
> > + * In the CMA case pages can't be pinned in a CMA region as this would
> > + * unnecessarily fragment that region.  So CMA attempts to migrate the page
> > + * before pinning.
> >   *
> >   * "longterm" == userspace controlled elevated page count lifetime.
> >   * Contrast this to iov_iter_get_pages() usages which are transient.
> 
> Ah, here's the longterm documentation, but if I was a developer
> considering whether to use FOLL_LONGTERM or not I would expect to find
> the documentation at the flag definition site.
> 
> I think it has become more clear since get_user_pages_longterm() was
> initially merged that we need to warn people not to use it, or at
> least seriously reconsider whether they want an interface to support
> indefinite pins.

Good point will move

> 
> >   */
> > -long get_user_pages_longterm(unsigned long start, unsigned long nr_pages,
> > -                            unsigned int gup_flags, struct page **pages,
> > -                            struct vm_area_struct **vmas_arg)
> > +static __always_inline long __gup_longterm_locked(struct task_struct *tsk,
> 
> ...why the __always_inline?
 
This was because it was only called from get_user_pages() in this patch.  But
later on I use it elsewhere so __always_inline is wrong.

Ira


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RESEND 3/7] mm/gup: Change GUP fast to use flags rather than a write 'bool'
  2019-03-22 22:05   ` Dan Williams
@ 2019-03-25  8:26     ` Ira Weiny
  0 siblings, 0 replies; 29+ messages in thread
From: Ira Weiny @ 2019-03-25  8:26 UTC (permalink / raw)
  To: Dan Williams
  Cc: Andrew Morton, John Hubbard, Michal Hocko, Kirill A. Shutemov,
	Peter Zijlstra, Jason Gunthorpe, Benjamin Herrenschmidt,
	Paul Mackerras, David S. Miller, Martin Schwidefsky,
	Heiko Carstens, Rich Felker, Yoshinori Sato, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Ralf Baechle, James Hogan,
	linux-mm, Linux Kernel Mailing List, linux-mips, linuxppc-dev,
	linux-s390, Linux-sh, sparclinux, linux-rdma, netdev

On Fri, Mar 22, 2019 at 03:05:53PM -0700, Dan Williams wrote:
> On Sun, Mar 17, 2019 at 7:36 PM <ira.weiny@intel.com> wrote:
> >
> > From: Ira Weiny <ira.weiny@intel.com>
> >
> > To facilitate additional options to get_user_pages_fast() change the
> > singular write parameter to be gup_flags.
> >
> > This patch does not change any functionality.  New functionality will
> > follow in subsequent patches.
> >
> > Some of the get_user_pages_fast() call sites were unchanged because they
> > already passed FOLL_WRITE or 0 for the write parameter.
> >
> > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> >
> > ---
> > Changes from V1:
> >         Rebase to current merge tree
> >         arch/powerpc/mm/mmu_context_iommu.c no longer calls gup_fast
> >                 The gup_longterm was converted in patch 1
> >
> >  arch/mips/mm/gup.c                         | 11 ++++++-----
> >  arch/powerpc/kvm/book3s_64_mmu_hv.c        |  4 ++--
> >  arch/powerpc/kvm/e500_mmu.c                |  2 +-
> >  arch/s390/kvm/interrupt.c                  |  2 +-
> >  arch/s390/mm/gup.c                         | 12 ++++++------
> >  arch/sh/mm/gup.c                           | 11 ++++++-----
> >  arch/sparc/mm/gup.c                        |  9 +++++----
> >  arch/x86/kvm/paging_tmpl.h                 |  2 +-
> >  arch/x86/kvm/svm.c                         |  2 +-
> >  drivers/fpga/dfl-afu-dma-region.c          |  2 +-
> >  drivers/gpu/drm/via/via_dmablit.c          |  3 ++-
> >  drivers/infiniband/hw/hfi1/user_pages.c    |  3 ++-
> >  drivers/misc/genwqe/card_utils.c           |  2 +-
> >  drivers/misc/vmw_vmci/vmci_host.c          |  2 +-
> >  drivers/misc/vmw_vmci/vmci_queue_pair.c    |  6 ++++--
> >  drivers/platform/goldfish/goldfish_pipe.c  |  3 ++-
> >  drivers/rapidio/devices/rio_mport_cdev.c   |  4 +++-
> >  drivers/sbus/char/oradax.c                 |  2 +-
> >  drivers/scsi/st.c                          |  3 ++-
> >  drivers/staging/gasket/gasket_page_table.c |  4 ++--
> >  drivers/tee/tee_shm.c                      |  2 +-
> >  drivers/vfio/vfio_iommu_spapr_tce.c        |  3 ++-
> >  drivers/vhost/vhost.c                      |  2 +-
> >  drivers/video/fbdev/pvr2fb.c               |  2 +-
> >  drivers/virt/fsl_hypervisor.c              |  2 +-
> >  drivers/xen/gntdev.c                       |  2 +-
> >  fs/orangefs/orangefs-bufmap.c              |  2 +-
> >  include/linux/mm.h                         |  4 ++--
> >  kernel/futex.c                             |  2 +-
> >  lib/iov_iter.c                             |  7 +++++--
> >  mm/gup.c                                   | 10 +++++-----
> >  mm/util.c                                  |  8 ++++----
> >  net/ceph/pagevec.c                         |  2 +-
> >  net/rds/info.c                             |  2 +-
> >  net/rds/rdma.c                             |  3 ++-
> >  35 files changed, 79 insertions(+), 63 deletions(-)
> 
> 
> >
> > diff --git a/arch/mips/mm/gup.c b/arch/mips/mm/gup.c
> > index 0d14e0d8eacf..4c2b4483683c 100644
> > --- a/arch/mips/mm/gup.c
> > +++ b/arch/mips/mm/gup.c
> > @@ -235,7 +235,7 @@ int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
> >   * get_user_pages_fast() - pin user pages in memory
> >   * @start:     starting user address
> >   * @nr_pages:  number of pages from start to pin
> > - * @write:     whether pages will be written to
> > + * @gup_flags: flags modifying pin behaviour
> >   * @pages:     array that receives pointers to the pages pinned.
> >   *             Should be at least nr_pages long.
> >   *
> > @@ -247,8 +247,8 @@ int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
> >   * requested. If nr_pages is 0 or negative, returns 0. If no pages
> >   * were pinned, returns -errno.
> >   */
> > -int get_user_pages_fast(unsigned long start, int nr_pages, int write,
> > -                       struct page **pages)
> > +int get_user_pages_fast(unsigned long start, int nr_pages,
> > +                       unsigned int gup_flags, struct page **pages)
> 
> This looks a tad scary given all related thrash especially when it's
> only 1 user that wants to do get_user_page_fast_longterm, right?

Agreed but the discussion back in Feb agreed that it would be better to make
get_user_pages_fast() use flags rather than add another *_longterm call, and I
agree.

> Maybe
> something like the following. Note I explicitly moved the flags to the
> end so that someone half paying attention that calls
> __get_user_pages_fast will get a compile error if they specify the
> args in the same order.

I did this to remain consistent with the other public calls.  They put the
"return" pages parameter at the end.  For example get_user_pages() is defined
this way with pages and vmas at the end.

long get_user_pages(unsigned long start, unsigned long nr_pages,
                unsigned int gup_flags, struct page **pages,
                struct vm_area_struct **vmas)

I'm pretty sure I got all the callers.  Is this worth making the signature
"non-standard" WRT the other calls?

Ira

> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 76ba638ceda8..c6c743bc2c68 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1505,8 +1505,15 @@ static inline long
> get_user_pages_longterm(unsigned long start,
>  }
>  #endif /* CONFIG_FS_DAX */
> 
> -int get_user_pages_fast(unsigned long start, int nr_pages, int write,
> -                       struct page **pages);
> +
> +int __get_user_pages_fast(unsigned long start, int nr_pages,
> +               struct page **pages, unsigned int gup_flags);
> +
> +static inline int get_user_pages_fast(unsigned long start, int
> nr_pages, int write,
> +                       struct page **pages)
> +{
> +       return __get_user_pages_fast(start, nr_pages, pages, write ?
> FOLL_WRITE);
> +}
> 
>  /* Container for pinned pfns / pages */
>  struct frame_vector {
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RESEND 4/7] mm/gup: Add FOLL_LONGTERM capability to GUP fast
  2019-03-22 22:12   ` Dan Williams
@ 2019-03-25  8:42     ` Ira Weiny
  2019-03-25 16:47       ` Jason Gunthorpe
  0 siblings, 1 reply; 29+ messages in thread
From: Ira Weiny @ 2019-03-25  8:42 UTC (permalink / raw)
  To: Dan Williams
  Cc: Andrew Morton, John Hubbard, Michal Hocko, Kirill A. Shutemov,
	Peter Zijlstra, Jason Gunthorpe, Benjamin Herrenschmidt,
	Paul Mackerras, David S. Miller, Martin Schwidefsky,
	Heiko Carstens, Rich Felker, Yoshinori Sato, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Ralf Baechle, James Hogan,
	linux-mm, Linux Kernel Mailing List, linux-mips, linuxppc-dev,
	linux-s390, Linux-sh, sparclinux, linux-rdma, netdev

On Fri, Mar 22, 2019 at 03:12:55PM -0700, Dan Williams wrote:
> On Sun, Mar 17, 2019 at 7:36 PM <ira.weiny@intel.com> wrote:
> >
> > From: Ira Weiny <ira.weiny@intel.com>
> >
> > DAX pages were previously unprotected from longterm pins when users
> > called get_user_pages_fast().
> >
> > Use the new FOLL_LONGTERM flag to check for DEVMAP pages and fall
> > back to regular GUP processing if a DEVMAP page is encountered.
> >
> > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> > ---
> >  mm/gup.c | 29 +++++++++++++++++++++++++----
> >  1 file changed, 25 insertions(+), 4 deletions(-)
> >
> > diff --git a/mm/gup.c b/mm/gup.c
> > index 0684a9536207..173db0c44678 100644
> > --- a/mm/gup.c
> > +++ b/mm/gup.c
> > @@ -1600,6 +1600,9 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
> >                         goto pte_unmap;
> >
> >                 if (pte_devmap(pte)) {
> > +                       if (unlikely(flags & FOLL_LONGTERM))
> > +                               goto pte_unmap;
> > +
> >                         pgmap = get_dev_pagemap(pte_pfn(pte), pgmap);
> >                         if (unlikely(!pgmap)) {
> >                                 undo_dev_pagemap(nr, nr_start, pages);
> > @@ -1739,8 +1742,11 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
> >         if (!pmd_access_permitted(orig, flags & FOLL_WRITE))
> >                 return 0;
> >
> > -       if (pmd_devmap(orig))
> > +       if (pmd_devmap(orig)) {
> > +               if (unlikely(flags & FOLL_LONGTERM))
> > +                       return 0;
> >                 return __gup_device_huge_pmd(orig, pmdp, addr, end, pages, nr);
> > +       }
> >
> >         refs = 0;
> >         page = pmd_page(orig) + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
> > @@ -1777,8 +1783,11 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
> >         if (!pud_access_permitted(orig, flags & FOLL_WRITE))
> >                 return 0;
> >
> > -       if (pud_devmap(orig))
> > +       if (pud_devmap(orig)) {
> > +               if (unlikely(flags & FOLL_LONGTERM))
> > +                       return 0;
> >                 return __gup_device_huge_pud(orig, pudp, addr, end, pages, nr);
> > +       }
> >
> >         refs = 0;
> >         page = pud_page(orig) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
> > @@ -2066,8 +2075,20 @@ int get_user_pages_fast(unsigned long start, int nr_pages,
> >                 start += nr << PAGE_SHIFT;
> >                 pages += nr;
> >
> > -               ret = get_user_pages_unlocked(start, nr_pages - nr, pages,
> > -                                             gup_flags);
> > +               if (gup_flags & FOLL_LONGTERM) {
> > +                       down_read(&current->mm->mmap_sem);
> > +                       ret = __gup_longterm_locked(current, current->mm,
> > +                                                   start, nr_pages - nr,
> > +                                                   pages, NULL, gup_flags);
> > +                       up_read(&current->mm->mmap_sem);
> > +               } else {
> > +                       /*
> > +                        * retain FAULT_FOLL_ALLOW_RETRY optimization if
> > +                        * possible
> > +                        */
> > +                       ret = get_user_pages_unlocked(start, nr_pages - nr,
> > +                                                     pages, gup_flags);
> 
> I couldn't immediately grok why this path needs to branch on
> FOLL_LONGTERM? Won't get_user_pages_unlocked(..., FOLL_LONGTERM) do
> the right thing?

Unfortunately holding the lock is required to support FOLL_LONGTERM (to check
the VMAs) but we don't want to hold the lock to be optimal (specifically allow
FAULT_FOLL_ALLOW_RETRY).  So I'm maintaining the optimization for *_fast users
who do not specify FOLL_LONGTERM.

Another way to do this would have been to define __gup_longterm_unlocked with
the above logic, but that seemed overkill at this point.

Ira


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RESEND 5/7] IB/hfi1: Use the new FOLL_LONGTERM flag to get_user_pages_fast()
  2019-03-22 22:14   ` Dan Williams
@ 2019-03-25  8:43     ` Ira Weiny
  0 siblings, 0 replies; 29+ messages in thread
From: Ira Weiny @ 2019-03-25  8:43 UTC (permalink / raw)
  To: Dan Williams
  Cc: Andrew Morton, John Hubbard, Michal Hocko, Kirill A. Shutemov,
	Peter Zijlstra, Jason Gunthorpe, Benjamin Herrenschmidt,
	Paul Mackerras, David S. Miller, Martin Schwidefsky,
	Heiko Carstens, Rich Felker, Yoshinori Sato, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Ralf Baechle, James Hogan,
	linux-mm, Linux Kernel Mailing List, linux-mips, linuxppc-dev,
	linux-s390, Linux-sh, sparclinux, linux-rdma, netdev

On Fri, Mar 22, 2019 at 03:14:26PM -0700, Dan Williams wrote:
> On Sun, Mar 17, 2019 at 7:36 PM <ira.weiny@intel.com> wrote:
> >
> > From: Ira Weiny <ira.weiny@intel.com>
> >
> > Use the new FOLL_LONGTERM to get_user_pages_fast() to protect against
> > FS DAX pages being mapped.
> >
> > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> > ---
> >  drivers/infiniband/hw/hfi1/user_pages.c | 6 ++++--
> >  1 file changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/infiniband/hw/hfi1/user_pages.c b/drivers/infiniband/hw/hfi1/user_pages.c
> > index 78ccacaf97d0..6a7f9cd5a94e 100644
> > --- a/drivers/infiniband/hw/hfi1/user_pages.c
> > +++ b/drivers/infiniband/hw/hfi1/user_pages.c
> > @@ -104,9 +104,11 @@ int hfi1_acquire_user_pages(struct mm_struct *mm, unsigned long vaddr, size_t np
> >                             bool writable, struct page **pages)
> >  {
> >         int ret;
> > +       unsigned int gup_flags = writable ? FOLL_WRITE : 0;
> 
> Maybe:
> 
>     unsigned int gup_flags = FOLL_LONGTERM | (writable ? FOLL_WRITE : 0);
> 
> ?

Sure looks good.

Ira

> 
> >
> > -       ret = get_user_pages_fast(vaddr, npages, writable ? FOLL_WRITE : 0,
> > -                                 pages);
> > +       gup_flags |= FOLL_LONGTERM;
> > +
> > +       ret = get_user_pages_fast(vaddr, npages, gup_flags, pages);
> >         if (ret < 0)
> >                 return ret;
> >
> > --
> > 2.20.1
> >
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RESEND 1/7] mm/gup: Replace get_user_pages_longterm() with FOLL_LONGTERM
  2019-03-25 16:45       ` Dan Williams
@ 2019-03-25  8:46         ` Ira Weiny
  0 siblings, 0 replies; 29+ messages in thread
From: Ira Weiny @ 2019-03-25  8:46 UTC (permalink / raw)
  To: Dan Williams
  Cc: Andrew Morton, John Hubbard, Michal Hocko, Kirill A. Shutemov,
	Peter Zijlstra, Jason Gunthorpe, Benjamin Herrenschmidt,
	Paul Mackerras, David S. Miller, Martin Schwidefsky,
	Heiko Carstens, Rich Felker, Yoshinori Sato, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Ralf Baechle, James Hogan,
	Aneesh Kumar K . V, Michal Hocko, linux-mm,
	Linux Kernel Mailing List, linux-mips, linuxppc-dev, linux-s390,
	Linux-sh, sparclinux, linux-rdma, netdev

On Mon, Mar 25, 2019 at 09:45:12AM -0700, Dan Williams wrote:
> On Mon, Mar 25, 2019 at 7:21 AM Ira Weiny <ira.weiny@intel.com> wrote:
> [..]
> > > > @@ -1268,10 +1246,14 @@ static long check_and_migrate_cma_pages(unsigned long start, long nr_pages,
> > > >                                 putback_movable_pages(&cma_page_list);
> > > >                 }
> > > >                 /*
> > > > -                * We did migrate all the pages, Try to get the page references again
> > > > -                * migrating any new CMA pages which we failed to isolate earlier.
> > > > +                * We did migrate all the pages, Try to get the page references
> > > > +                * again migrating any new CMA pages which we failed to isolate
> > > > +                * earlier.
> > > >                  */
> > > > -               nr_pages = get_user_pages(start, nr_pages, gup_flags, pages, vmas);
> > > > +               nr_pages = __get_user_pages_locked(tsk, mm, start, nr_pages,
> > > > +                                                  pages, vmas, NULL,
> > > > +                                                  gup_flags);
> > > > +
> > >
> > > Why did this need to change to __get_user_pages_locked?
> >
> > __get_uer_pages_locked() is now the "internal call" for get_user_pages.
> >
> > Technically it did not _have_ to change but there is no need to call
> > get_user_pages() again because the FOLL_TOUCH flags is already set.  Also this
> > call then matches the __get_user_pages_locked() which was called on the pages
> > we migrated from.  Mostly this keeps the code "symmetrical" in that we called
> > __get_user_pages_locked() on the pages we are migrating from and the same call
> > on the pages we migrated to.
> >
> > While the change here looks funny I think the final code is better.
> 
> Agree, but I think that either needs to be noted in the changelog so
> it's not a surprise, or moved to a follow-on cleanup patch.
> 

Can do...

Thanks!
Ira


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RESEND 4/7] mm/gup: Add FOLL_LONGTERM capability to GUP fast
  2019-03-25 16:47       ` Jason Gunthorpe
@ 2019-03-25  9:23         ` Ira Weiny
  2019-03-25 17:51           ` Jason Gunthorpe
  0 siblings, 1 reply; 29+ messages in thread
From: Ira Weiny @ 2019-03-25  9:23 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Dan Williams, Andrew Morton, John Hubbard, Michal Hocko,
	Kirill A. Shutemov, Peter Zijlstra, Benjamin Herrenschmidt,
	Paul Mackerras, David S. Miller, Martin Schwidefsky,
	Heiko Carstens, Rich Felker, Yoshinori Sato, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Ralf Baechle, James Hogan,
	linux-mm, Linux Kernel Mailing List, linux-mips, linuxppc-dev,
	linux-s390, Linux-sh, sparclinux, linux-rdma, netdev

On Mon, Mar 25, 2019 at 01:47:13PM -0300, Jason Gunthorpe wrote:
> On Mon, Mar 25, 2019 at 01:42:26AM -0700, Ira Weiny wrote:
> > On Fri, Mar 22, 2019 at 03:12:55PM -0700, Dan Williams wrote:
> > > On Sun, Mar 17, 2019 at 7:36 PM <ira.weiny@intel.com> wrote:
> > > >
> > > > From: Ira Weiny <ira.weiny@intel.com>
> > > >
> > > > DAX pages were previously unprotected from longterm pins when users
> > > > called get_user_pages_fast().
> > > >
> > > > Use the new FOLL_LONGTERM flag to check for DEVMAP pages and fall
> > > > back to regular GUP processing if a DEVMAP page is encountered.
> > > >
> > > > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> > > >  mm/gup.c | 29 +++++++++++++++++++++++++----
> > > >  1 file changed, 25 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/mm/gup.c b/mm/gup.c
> > > > index 0684a9536207..173db0c44678 100644
> > > > +++ b/mm/gup.c
> > > > @@ -1600,6 +1600,9 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
> > > >                         goto pte_unmap;
> > > >
> > > >                 if (pte_devmap(pte)) {
> > > > +                       if (unlikely(flags & FOLL_LONGTERM))
> > > > +                               goto pte_unmap;
> > > > +
> > > >                         pgmap = get_dev_pagemap(pte_pfn(pte), pgmap);
> > > >                         if (unlikely(!pgmap)) {
> > > >                                 undo_dev_pagemap(nr, nr_start, pages);
> > > > @@ -1739,8 +1742,11 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
> > > >         if (!pmd_access_permitted(orig, flags & FOLL_WRITE))
> > > >                 return 0;
> > > >
> > > > -       if (pmd_devmap(orig))
> > > > +       if (pmd_devmap(orig)) {
> > > > +               if (unlikely(flags & FOLL_LONGTERM))
> > > > +                       return 0;
> > > >                 return __gup_device_huge_pmd(orig, pmdp, addr, end, pages, nr);
> > > > +       }
> > > >
> > > >         refs = 0;
> > > >         page = pmd_page(orig) + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
> > > > @@ -1777,8 +1783,11 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
> > > >         if (!pud_access_permitted(orig, flags & FOLL_WRITE))
> > > >                 return 0;
> > > >
> > > > -       if (pud_devmap(orig))
> > > > +       if (pud_devmap(orig)) {
> > > > +               if (unlikely(flags & FOLL_LONGTERM))
> > > > +                       return 0;
> > > >                 return __gup_device_huge_pud(orig, pudp, addr, end, pages, nr);
> > > > +       }
> > > >
> > > >         refs = 0;
> > > >         page = pud_page(orig) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
> > > > @@ -2066,8 +2075,20 @@ int get_user_pages_fast(unsigned long start, int nr_pages,
> > > >                 start += nr << PAGE_SHIFT;
> > > >                 pages += nr;
> > > >
> > > > -               ret = get_user_pages_unlocked(start, nr_pages - nr, pages,
> > > > -                                             gup_flags);
> > > > +               if (gup_flags & FOLL_LONGTERM) {
> > > > +                       down_read(&current->mm->mmap_sem);
> > > > +                       ret = __gup_longterm_locked(current, current->mm,
> > > > +                                                   start, nr_pages - nr,
> > > > +                                                   pages, NULL, gup_flags);
> > > > +                       up_read(&current->mm->mmap_sem);
> > > > +               } else {
> > > > +                       /*
> > > > +                        * retain FAULT_FOLL_ALLOW_RETRY optimization if
> > > > +                        * possible
> > > > +                        */
> > > > +                       ret = get_user_pages_unlocked(start, nr_pages - nr,
> > > > +                                                     pages, gup_flags);
> > > 
> > > I couldn't immediately grok why this path needs to branch on
> > > FOLL_LONGTERM? Won't get_user_pages_unlocked(..., FOLL_LONGTERM) do
> > > the right thing?
> > 
> > Unfortunately holding the lock is required to support FOLL_LONGTERM (to check
> > the VMAs) but we don't want to hold the lock to be optimal (specifically allow
> > FAULT_FOLL_ALLOW_RETRY).  So I'm maintaining the optimization for *_fast users
> > who do not specify FOLL_LONGTERM.
> > 
> > Another way to do this would have been to define __gup_longterm_unlocked with
> > the above logic, but that seemed overkill at this point.
> 
> get_user_pages_unlocked() is an exported symbol, shouldn't it work
> with the FOLL_LONGTERM flag?
> 
> I think it should even though we have no user..
> 
> Otherwise the GUP API just gets more confusing.

I agree WRT to the API.  But I think callers of get_user_pages_unlocked() are
not going to get the behavior they want if they specify FOLL_LONGTERM.

What I could do is BUG_ON (or just WARN_ON) if unlocked is called with
FOLL_LONGTERM similar to the code in get_user_pages_locked() which does not
allow locked and vmas to be passed together:

        if (locked) {
                /* if VM_FAULT_RETRY can be returned, vmas become invalid */
                BUG_ON(vmas);
                /* check caller initialized locked */
                BUG_ON(*locked != 1);
        }

Combining Dan's comment and yours; I could do something like below? (not even
compile tested)  Coded with WARN_ON because technically I _think_ the
FOLL_LONGTERM would "work" but not be optimal.  I'm just not 100% sure.  A
BUG_ON would be most protective IMO.


diff --git a/mm/gup.c b/mm/gup.c
index 173db0c44678..8e31411f485f 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1014,6 +1014,29 @@ long get_user_pages_locked(unsigned long start, unsigned long nr_pages,
 }
 EXPORT_SYMBOL(get_user_pages_locked);
 
+long __gup_longterm_unlocked(unsigned long start, unsigned long nr_pages,
+			     struct page **pages, unsigned int gup_flags)
+{
+	struct mm_struct *mm = current->mm;
+	int locked = 1;
+	long ret;
+
+	down_read(&mm->mmap_sem);
+	if (gup_flags & FOLL_LONGTERM) {
+		ret = __gup_longterm_locked(current, mm,
+					    start, nr_pages - nr,
+					    pages, NULL, gup_flags);
+	} else {
+		ret = __get_user_pages_locked(current, mm, start, nr_pages,
+					      pages, NULL, &locked,
+					      gup_flags | FOLL_TOUCH);
+	}
+	if (locked)
+		up_read(&mm->mmap_sem);
+
+	return ret;
+}
+
 /*
  * get_user_pages_unlocked() is suitable to replace the form:
  *
@@ -1032,16 +1055,9 @@ EXPORT_SYMBOL(get_user_pages_locked);
 long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
 			     struct page **pages, unsigned int gup_flags)
 {
-	struct mm_struct *mm = current->mm;
-	int locked = 1;
-	long ret;
+	WARN_ON(gup_flags & FOLL_LONGTERM);
 
-	down_read(&mm->mmap_sem);
-	ret = __get_user_pages_locked(current, mm, start, nr_pages, pages, NULL,
-				      &locked, gup_flags | FOLL_TOUCH);
-	if (locked)
-		up_read(&mm->mmap_sem);
-	return ret;
+	__gup_longterm_unlocked(start, nr_pages, pages, gup_flags);
 }
 EXPORT_SYMBOL(get_user_pages_unlocked);
 
@@ -2075,20 +2091,7 @@ int get_user_pages_fast(unsigned long start, int nr_pages,
 		start += nr << PAGE_SHIFT;
 		pages += nr;
 
-		if (gup_flags & FOLL_LONGTERM) {
-			down_read(&current->mm->mmap_sem);
-			ret = __gup_longterm_locked(current, current->mm,
-						    start, nr_pages - nr,
-						    pages, NULL, gup_flags);
-			up_read(&current->mm->mmap_sem);
-		} else {
-			/*
-			 * retain FAULT_FOLL_ALLOW_RETRY optimization if
-			 * possible
-			 */
-			ret = get_user_pages_unlocked(start, nr_pages - nr,
-						      pages, gup_flags);
-		}
+		__gup_longterm_unlocked();
 
 		/* Have to be a bit careful with return values */
 		if (nr > 0) {

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [RESEND 1/7] mm/gup: Replace get_user_pages_longterm() with FOLL_LONGTERM
  2019-03-22 21:24   ` Dan Williams
  2019-03-25  6:19     ` Ira Weiny
@ 2019-03-25 10:27     ` Ira Weiny
  1 sibling, 0 replies; 29+ messages in thread
From: Ira Weiny @ 2019-03-25 10:27 UTC (permalink / raw)
  To: Dan Williams
  Cc: Andrew Morton, John Hubbard, Michal Hocko, Kirill A. Shutemov,
	Peter Zijlstra, Jason Gunthorpe, Benjamin Herrenschmidt,
	Paul Mackerras, David S. Miller, Martin Schwidefsky,
	Heiko Carstens, Rich Felker, Yoshinori Sato, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Ralf Baechle, James Hogan,
	Aneesh Kumar K . V, Michal Hocko, linux-mm,
	Linux Kernel Mailing List, linux-mips, linuxppc-dev, linux-s390,
	Linux-sh, sparclinux, linux-rdma, netdev

On Fri, Mar 22, 2019 at 02:24:40PM -0700, Dan Williams wrote:
> On Sun, Mar 17, 2019 at 7:36 PM <ira.weiny@intel.com> wrote:

[snip]

> > + * __gup_longterm_locked() is a wrapper for __get_uer_pages_locked which
> 
> s/uer/user/
> 
> > + * allows us to process the FOLL_LONGTERM flag if present.
> > + *
> > + * FOLL_LONGTERM Checks for either DAX VMAs or PPC CMA regions and either fails
> > + * the pin or attempts to migrate the page as appropriate.
> > + *
> > + * In the filesystem-dax case mappings are subject to the lifetime enforced by
> > + * the filesystem and we need guarantees that longterm users like RDMA and V4L2
> > + * only establish mappings that have a kernel enforced revocation mechanism.
> > + *
> > + * In the CMA case pages can't be pinned in a CMA region as this would
> > + * unnecessarily fragment that region.  So CMA attempts to migrate the page
> > + * before pinning.
> >   *
> >   * "longterm" == userspace controlled elevated page count lifetime.
> >   * Contrast this to iov_iter_get_pages() usages which are transient.
> 
> Ah, here's the longterm documentation, but if I was a developer
> considering whether to use FOLL_LONGTERM or not I would expect to find
> the documentation at the flag definition site.
> 
> I think it has become more clear since get_user_pages_longterm() was
> initially merged that we need to warn people not to use it, or at
> least seriously reconsider whether they want an interface to support
> indefinite pins.

I will move the comment to the flag definition but...

In reviewing this comment it occurs to me that the addition of special casing
CMA regions via FOLL_LONGTERM has made it less experimental/temporary and now
simply implies intent to the GUP code as to the use of the pages.

As I'm not super familiar with the CMA use case I can't say for certain but it
seems that it is not a temporary solution.

So I'm not going to refrain from a FIXME WRT removing the flag.

New suggested text below.

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 6831077d126c..5db9d8e894aa 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2596,7 +2596,28 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address,
 #define FOLL_REMOTE    0x2000  /* we are working on non-current tsk/mm */
 #define FOLL_COW       0x4000  /* internal GUP flag */
 #define FOLL_ANON      0x8000  /* don't do file mappings */
-#define FOLL_LONGTERM  0x10000 /* mapping is intended for a long term pin */
+#define FOLL_LONGTERM  0x10000 /* mapping lifetime is indefinite: see below */
+
+/*
+ * NOTE on FOLL_LONGTERM:
+ *
+ * FOLL_LONGTERM indicates that the page will be held for an indefinite time
+ * period _often_ under userspace control.  This is contrasted with
+ * iov_iter_get_pages() where usages which are transient.
+ *
+ * FIXME: For pages which are part of a filesystem, mappings are subject to the
+ * lifetime enforced by the filesystem and we need guarantees that longterm
+ * users like RDMA and V4L2 only establish mappings which coordinate usage with
+ * the filesystem.  Ideas for this coordination include revoking the longterm
+ * pin, delaying writeback, bounce buffer page writeback, etc.  As FS DAX was
+ * added after the problem with filesystems was found FS DAX VMAs are
+ * specifically failed.  Filesystem pages are still subject to bugs and use of
+ * FOLL_LONGTERM should be avoided on those pages.
+ *
+ * In the CMA case: longterm pins in a CMA region would unnecessarily fragment
+ * that region.  And so CMA attempts to migrate the page before pinning when
+ * FOLL_LONGTERM is specified.
+ */
 
 static inline int vm_fault_to_errno(vm_fault_t vm_fault, int foll_flags)
 {


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [RESEND 4/7] mm/gup: Add FOLL_LONGTERM capability to GUP fast
  2019-03-25 17:51           ` Jason Gunthorpe
@ 2019-03-25 14:21             ` Ira Weiny
  2019-03-25 22:36               ` Dan Williams
  0 siblings, 1 reply; 29+ messages in thread
From: Ira Weiny @ 2019-03-25 14:21 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Dan Williams, Andrew Morton, John Hubbard, Michal Hocko,
	Kirill A. Shutemov, Peter Zijlstra, Benjamin Herrenschmidt,
	Paul Mackerras, David S. Miller, Martin Schwidefsky,
	Heiko Carstens, Rich Felker, Yoshinori Sato, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Ralf Baechle, James Hogan,
	linux-mm, Linux Kernel Mailing List, linux-mips, linuxppc-dev,
	linux-s390, Linux-sh, sparclinux, linux-rdma, netdev

On Mon, Mar 25, 2019 at 02:51:50PM -0300, Jason Gunthorpe wrote:
> On Mon, Mar 25, 2019 at 02:23:15AM -0700, Ira Weiny wrote:
> > > > Unfortunately holding the lock is required to support FOLL_LONGTERM (to check
> > > > the VMAs) but we don't want to hold the lock to be optimal (specifically allow
> > > > FAULT_FOLL_ALLOW_RETRY).  So I'm maintaining the optimization for *_fast users
> > > > who do not specify FOLL_LONGTERM.
> > > > 
> > > > Another way to do this would have been to define __gup_longterm_unlocked with
> > > > the above logic, but that seemed overkill at this point.
> > > 
> > > get_user_pages_unlocked() is an exported symbol, shouldn't it work
> > > with the FOLL_LONGTERM flag?
> > > 
> > > I think it should even though we have no user..
> > > 
> > > Otherwise the GUP API just gets more confusing.
> > 
> > I agree WRT to the API.  But I think callers of get_user_pages_unlocked() are
> > not going to get the behavior they want if they specify FOLL_LONGTERM.
> 
> Oh? Isn't the only thing FOLL_LONGTERM does is block the call on DAX?

From an API yes.

> Why does the locking mode matter to this test?

DAX checks for VMA's being Filesystem DAX.  Therefore, it requires collection
of VMA's as the GUP code executes.  The unlocked version can drop the lock and
therefore the VMAs may become invalid.  Therefore, the 2 code paths are
incompatible.

Users of GUP unlocked are going to want the benefit of FAULT_FOLL_ALLOW_RETRY.
So I don't anticipate anyone using FOLL_LONGTERM with
get_user_pages_unlocked().

FWIW this thread is making me think my original patch which simply implemented
get_user_pages_fast_longterm() would be more clear.  There is some evidence
that the GUP API was trending that way (see get_user_pages_remote).  That seems
wrong but I don't know how to ensure users don't specify the wrong flag.

> 
> > What I could do is BUG_ON (or just WARN_ON) if unlocked is called with
> > FOLL_LONGTERM similar to the code in get_user_pages_locked() which does not
> > allow locked and vmas to be passed together:
> 
> The GUP call should fail if you are doing something like this. But I'd
> rather not see confusing specialc cases in code without a clear
> comment explaining why it has to be there.

Code comment would be necessary, sure.  Was just throwing ideas out there.

Ira


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RESEND 4/7] mm/gup: Add FOLL_LONGTERM capability to GUP fast
  2019-03-25 22:36               ` Dan Williams
@ 2019-03-25 14:54                 ` Ira Weiny
  0 siblings, 0 replies; 29+ messages in thread
From: Ira Weiny @ 2019-03-25 14:54 UTC (permalink / raw)
  To: Dan Williams
  Cc: Jason Gunthorpe, Andrew Morton, John Hubbard, Michal Hocko,
	Kirill A. Shutemov, Peter Zijlstra, Benjamin Herrenschmidt,
	Paul Mackerras, David S. Miller, Martin Schwidefsky,
	Heiko Carstens, Rich Felker, Yoshinori Sato, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Ralf Baechle, James Hogan,
	linux-mm, Linux Kernel Mailing List, linux-mips, linuxppc-dev,
	linux-s390, Linux-sh, sparclinux, linux-rdma, netdev

On Mon, Mar 25, 2019 at 03:36:28PM -0700, Dan Williams wrote:
> On Mon, Mar 25, 2019 at 3:22 PM Ira Weiny <ira.weiny@intel.com> wrote:
> [..]
> > FWIW this thread is making me think my original patch which simply implemented
> > get_user_pages_fast_longterm() would be more clear.  There is some evidence
> > that the GUP API was trending that way (see get_user_pages_remote).  That seems
> > wrong but I don't know how to ensure users don't specify the wrong flag.
> 
> What about just making the existing get_user_pages_longterm() have a
> fast path option?

That would work but was not the direction we agreed upon before.[1]

At this point I would rather see this patch set applied, focus on fixing the
filesystem issues, and once that is done determine if FOLL_LONGTERM is needed
in any GUP calls.

Ira

[1] https://lkml.org/lkml/2019/2/11/2038


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RESEND 1/7] mm/gup: Replace get_user_pages_longterm() with FOLL_LONGTERM
  2019-03-25  6:19     ` Ira Weiny
@ 2019-03-25 16:45       ` Dan Williams
  2019-03-25  8:46         ` Ira Weiny
  0 siblings, 1 reply; 29+ messages in thread
From: Dan Williams @ 2019-03-25 16:45 UTC (permalink / raw)
  To: Ira Weiny
  Cc: Andrew Morton, John Hubbard, Michal Hocko, Kirill A. Shutemov,
	Peter Zijlstra, Jason Gunthorpe, Benjamin Herrenschmidt,
	Paul Mackerras, David S. Miller, Martin Schwidefsky,
	Heiko Carstens, Rich Felker, Yoshinori Sato, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Ralf Baechle, James Hogan,
	Aneesh Kumar K . V, Michal Hocko, linux-mm,
	Linux Kernel Mailing List, linux-mips, linuxppc-dev, linux-s390,
	Linux-sh, sparclinux, linux-rdma, netdev

On Mon, Mar 25, 2019 at 7:21 AM Ira Weiny <ira.weiny@intel.com> wrote:
[..]
> > > @@ -1268,10 +1246,14 @@ static long check_and_migrate_cma_pages(unsigned long start, long nr_pages,
> > >                                 putback_movable_pages(&cma_page_list);
> > >                 }
> > >                 /*
> > > -                * We did migrate all the pages, Try to get the page references again
> > > -                * migrating any new CMA pages which we failed to isolate earlier.
> > > +                * We did migrate all the pages, Try to get the page references
> > > +                * again migrating any new CMA pages which we failed to isolate
> > > +                * earlier.
> > >                  */
> > > -               nr_pages = get_user_pages(start, nr_pages, gup_flags, pages, vmas);
> > > +               nr_pages = __get_user_pages_locked(tsk, mm, start, nr_pages,
> > > +                                                  pages, vmas, NULL,
> > > +                                                  gup_flags);
> > > +
> >
> > Why did this need to change to __get_user_pages_locked?
>
> __get_uer_pages_locked() is now the "internal call" for get_user_pages.
>
> Technically it did not _have_ to change but there is no need to call
> get_user_pages() again because the FOLL_TOUCH flags is already set.  Also this
> call then matches the __get_user_pages_locked() which was called on the pages
> we migrated from.  Mostly this keeps the code "symmetrical" in that we called
> __get_user_pages_locked() on the pages we are migrating from and the same call
> on the pages we migrated to.
>
> While the change here looks funny I think the final code is better.

Agree, but I think that either needs to be noted in the changelog so
it's not a surprise, or moved to a follow-on cleanup patch.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RESEND 4/7] mm/gup: Add FOLL_LONGTERM capability to GUP fast
  2019-03-25  8:42     ` Ira Weiny
@ 2019-03-25 16:47       ` Jason Gunthorpe
  2019-03-25  9:23         ` Ira Weiny
  0 siblings, 1 reply; 29+ messages in thread
From: Jason Gunthorpe @ 2019-03-25 16:47 UTC (permalink / raw)
  To: Ira Weiny
  Cc: Dan Williams, Andrew Morton, John Hubbard, Michal Hocko,
	Kirill A. Shutemov, Peter Zijlstra, Benjamin Herrenschmidt,
	Paul Mackerras, David S. Miller, Martin Schwidefsky,
	Heiko Carstens, Rich Felker, Yoshinori Sato, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Ralf Baechle, James Hogan,
	linux-mm, Linux Kernel Mailing List, linux-mips, linuxppc-dev,
	linux-s390, Linux-sh, sparclinux, linux-rdma, netdev

On Mon, Mar 25, 2019 at 01:42:26AM -0700, Ira Weiny wrote:
> On Fri, Mar 22, 2019 at 03:12:55PM -0700, Dan Williams wrote:
> > On Sun, Mar 17, 2019 at 7:36 PM <ira.weiny@intel.com> wrote:
> > >
> > > From: Ira Weiny <ira.weiny@intel.com>
> > >
> > > DAX pages were previously unprotected from longterm pins when users
> > > called get_user_pages_fast().
> > >
> > > Use the new FOLL_LONGTERM flag to check for DEVMAP pages and fall
> > > back to regular GUP processing if a DEVMAP page is encountered.
> > >
> > > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> > >  mm/gup.c | 29 +++++++++++++++++++++++++----
> > >  1 file changed, 25 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/mm/gup.c b/mm/gup.c
> > > index 0684a9536207..173db0c44678 100644
> > > +++ b/mm/gup.c
> > > @@ -1600,6 +1600,9 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
> > >                         goto pte_unmap;
> > >
> > >                 if (pte_devmap(pte)) {
> > > +                       if (unlikely(flags & FOLL_LONGTERM))
> > > +                               goto pte_unmap;
> > > +
> > >                         pgmap = get_dev_pagemap(pte_pfn(pte), pgmap);
> > >                         if (unlikely(!pgmap)) {
> > >                                 undo_dev_pagemap(nr, nr_start, pages);
> > > @@ -1739,8 +1742,11 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
> > >         if (!pmd_access_permitted(orig, flags & FOLL_WRITE))
> > >                 return 0;
> > >
> > > -       if (pmd_devmap(orig))
> > > +       if (pmd_devmap(orig)) {
> > > +               if (unlikely(flags & FOLL_LONGTERM))
> > > +                       return 0;
> > >                 return __gup_device_huge_pmd(orig, pmdp, addr, end, pages, nr);
> > > +       }
> > >
> > >         refs = 0;
> > >         page = pmd_page(orig) + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
> > > @@ -1777,8 +1783,11 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
> > >         if (!pud_access_permitted(orig, flags & FOLL_WRITE))
> > >                 return 0;
> > >
> > > -       if (pud_devmap(orig))
> > > +       if (pud_devmap(orig)) {
> > > +               if (unlikely(flags & FOLL_LONGTERM))
> > > +                       return 0;
> > >                 return __gup_device_huge_pud(orig, pudp, addr, end, pages, nr);
> > > +       }
> > >
> > >         refs = 0;
> > >         page = pud_page(orig) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
> > > @@ -2066,8 +2075,20 @@ int get_user_pages_fast(unsigned long start, int nr_pages,
> > >                 start += nr << PAGE_SHIFT;
> > >                 pages += nr;
> > >
> > > -               ret = get_user_pages_unlocked(start, nr_pages - nr, pages,
> > > -                                             gup_flags);
> > > +               if (gup_flags & FOLL_LONGTERM) {
> > > +                       down_read(&current->mm->mmap_sem);
> > > +                       ret = __gup_longterm_locked(current, current->mm,
> > > +                                                   start, nr_pages - nr,
> > > +                                                   pages, NULL, gup_flags);
> > > +                       up_read(&current->mm->mmap_sem);
> > > +               } else {
> > > +                       /*
> > > +                        * retain FAULT_FOLL_ALLOW_RETRY optimization if
> > > +                        * possible
> > > +                        */
> > > +                       ret = get_user_pages_unlocked(start, nr_pages - nr,
> > > +                                                     pages, gup_flags);
> > 
> > I couldn't immediately grok why this path needs to branch on
> > FOLL_LONGTERM? Won't get_user_pages_unlocked(..., FOLL_LONGTERM) do
> > the right thing?
> 
> Unfortunately holding the lock is required to support FOLL_LONGTERM (to check
> the VMAs) but we don't want to hold the lock to be optimal (specifically allow
> FAULT_FOLL_ALLOW_RETRY).  So I'm maintaining the optimization for *_fast users
> who do not specify FOLL_LONGTERM.
> 
> Another way to do this would have been to define __gup_longterm_unlocked with
> the above logic, but that seemed overkill at this point.

get_user_pages_unlocked() is an exported symbol, shouldn't it work
with the FOLL_LONGTERM flag?

I think it should even though we have no user..

Otherwise the GUP API just gets more confusing.

Jason

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RESEND 4/7] mm/gup: Add FOLL_LONGTERM capability to GUP fast
  2019-03-25  9:23         ` Ira Weiny
@ 2019-03-25 17:51           ` Jason Gunthorpe
  2019-03-25 14:21             ` Ira Weiny
  0 siblings, 1 reply; 29+ messages in thread
From: Jason Gunthorpe @ 2019-03-25 17:51 UTC (permalink / raw)
  To: Ira Weiny
  Cc: Dan Williams, Andrew Morton, John Hubbard, Michal Hocko,
	Kirill A. Shutemov, Peter Zijlstra, Benjamin Herrenschmidt,
	Paul Mackerras, David S. Miller, Martin Schwidefsky,
	Heiko Carstens, Rich Felker, Yoshinori Sato, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Ralf Baechle, James Hogan,
	linux-mm, Linux Kernel Mailing List, linux-mips, linuxppc-dev,
	linux-s390, Linux-sh, sparclinux, linux-rdma, netdev

On Mon, Mar 25, 2019 at 02:23:15AM -0700, Ira Weiny wrote:
> > > Unfortunately holding the lock is required to support FOLL_LONGTERM (to check
> > > the VMAs) but we don't want to hold the lock to be optimal (specifically allow
> > > FAULT_FOLL_ALLOW_RETRY).  So I'm maintaining the optimization for *_fast users
> > > who do not specify FOLL_LONGTERM.
> > > 
> > > Another way to do this would have been to define __gup_longterm_unlocked with
> > > the above logic, but that seemed overkill at this point.
> > 
> > get_user_pages_unlocked() is an exported symbol, shouldn't it work
> > with the FOLL_LONGTERM flag?
> > 
> > I think it should even though we have no user..
> > 
> > Otherwise the GUP API just gets more confusing.
> 
> I agree WRT to the API.  But I think callers of get_user_pages_unlocked() are
> not going to get the behavior they want if they specify FOLL_LONGTERM.

Oh? Isn't the only thing FOLL_LONGTERM does is block the call on DAX?
Why does the locking mode matter to this test?

> What I could do is BUG_ON (or just WARN_ON) if unlocked is called with
> FOLL_LONGTERM similar to the code in get_user_pages_locked() which does not
> allow locked and vmas to be passed together:

The GUP call should fail if you are doing something like this. But I'd
rather not see confusing specialc cases in code without a clear
comment explaining why it has to be there.

Jason

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RESEND 4/7] mm/gup: Add FOLL_LONGTERM capability to GUP fast
  2019-03-25 14:21             ` Ira Weiny
@ 2019-03-25 22:36               ` Dan Williams
  2019-03-25 14:54                 ` Ira Weiny
  0 siblings, 1 reply; 29+ messages in thread
From: Dan Williams @ 2019-03-25 22:36 UTC (permalink / raw)
  To: Ira Weiny
  Cc: Jason Gunthorpe, Andrew Morton, John Hubbard, Michal Hocko,
	Kirill A. Shutemov, Peter Zijlstra, Benjamin Herrenschmidt,
	Paul Mackerras, David S. Miller, Martin Schwidefsky,
	Heiko Carstens, Rich Felker, Yoshinori Sato, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Ralf Baechle, James Hogan,
	linux-mm, Linux Kernel Mailing List, linux-mips, linuxppc-dev,
	linux-s390, Linux-sh, sparclinux, linux-rdma, netdev

On Mon, Mar 25, 2019 at 3:22 PM Ira Weiny <ira.weiny@intel.com> wrote:
[..]
> FWIW this thread is making me think my original patch which simply implemented
> get_user_pages_fast_longterm() would be more clear.  There is some evidence
> that the GUP API was trending that way (see get_user_pages_remote).  That seems
> wrong but I don't know how to ensure users don't specify the wrong flag.

What about just making the existing get_user_pages_longterm() have a
fast path option?

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2019-03-25 22:55 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-17 18:34 [RESEND PATCH 0/7] Add FOLL_LONGTERM to GUP fast and use it ira.weiny
2019-03-17 18:34 ` [RESEND 1/7] mm/gup: Replace get_user_pages_longterm() with FOLL_LONGTERM ira.weiny
2019-03-22 21:24   ` Dan Williams
2019-03-25  6:19     ` Ira Weiny
2019-03-25 16:45       ` Dan Williams
2019-03-25  8:46         ` Ira Weiny
2019-03-25 10:27     ` Ira Weiny
2019-03-17 18:34 ` [RESEND 2/7] mm/gup: Change write parameter to flags in fast walk ira.weiny
2019-03-22 21:30   ` Dan Williams
2019-03-17 18:34 ` [RESEND 3/7] mm/gup: Change GUP fast to use flags rather than a write 'bool' ira.weiny
2019-03-22 22:05   ` Dan Williams
2019-03-25  8:26     ` Ira Weiny
2019-03-17 18:34 ` [RESEND 4/7] mm/gup: Add FOLL_LONGTERM capability to GUP fast ira.weiny
2019-03-22 22:12   ` Dan Williams
2019-03-25  8:42     ` Ira Weiny
2019-03-25 16:47       ` Jason Gunthorpe
2019-03-25  9:23         ` Ira Weiny
2019-03-25 17:51           ` Jason Gunthorpe
2019-03-25 14:21             ` Ira Weiny
2019-03-25 22:36               ` Dan Williams
2019-03-25 14:54                 ` Ira Weiny
2019-03-17 18:34 ` [RESEND 5/7] IB/hfi1: Use the new FOLL_LONGTERM flag to get_user_pages_fast() ira.weiny
2019-03-22 22:14   ` Dan Williams
2019-03-25  8:43     ` Ira Weiny
2019-03-17 18:34 ` [RESEND 6/7] IB/qib: " ira.weiny
2019-03-22 22:15   ` Dan Williams
2019-03-17 18:34 ` [RESEND 7/7] IB/mthca: " ira.weiny
2019-03-19 22:19 ` [RESEND PATCH 0/7] Add FOLL_LONGTERM to GUP fast and use it Andrew Morton
2019-03-21  8:40   ` Ira Weiny

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).