Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
* [RFC PATCH 0/2] mm/gup: introduce vaddr_pin_pages_remote(), FOLL_PIN
@ 2019-08-12  1:50 john.hubbard
  2019-08-12  1:50 ` [RFC PATCH 1/2] mm/gup: introduce FOLL_PIN flag for get_user_pages() john.hubbard
  2019-08-12  1:50 ` [RFC PATCH 2/2] mm/gup: introduce vaddr_pin_pages_remote() john.hubbard
  0 siblings, 2 replies; 31+ messages in thread
From: john.hubbard @ 2019-08-12  1:50 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Hellwig, Dan Williams, Dave Chinner, Ira Weiny,
	Jan Kara, Jason Gunthorpe, Jérôme Glisse, LKML,
	linux-mm, linux-fsdevel, linux-rdma, John Hubbard

From: John Hubbard <jhubbard@nvidia.com>

Hi,

Dave Chinner's head didn't seem to explode...much, when he saw Ira's
series, so I optimistically started taking it from there...this builds on
top of Ira's patchset that he just sent out:

  "[RFC PATCH v2 00/19] RDMA/FS DAX truncate proposal V1,000,002   ;-)" [1]

...which in turn is based on the latest -mmotm.

If Ira's series and this one are both acceptable, then

    a) I'll rework the 41-patch "put_user_pages(): miscellaneous call
       sites" series, and

    b) note that this will take rather longer and will be quite a bit more
       intrusive for each call site (but it's worth it), due to the
       need to plumb the owning struct file* all the way down to the gup()
       call. whew.

[1] https://lore.kernel.org/r/20190809225833.6657-1-ira.weiny@intel.com

[2] https://lore.kernel.org/r/20190807013340.9706-1-jhubbard@nvidia.com

John Hubbard (2):
  mm/gup: introduce FOLL_PIN flag for get_user_pages()
  mm/gup: introduce vaddr_pin_pages_remote()

 drivers/infiniband/core/umem_odp.c | 15 ++++----
 include/linux/mm.h                 |  8 +++++
 mm/gup.c                           | 55 +++++++++++++++++++++++++++++-
 3 files changed, 71 insertions(+), 7 deletions(-)

-- 
2.22.0



^ permalink raw reply	[flat|nested] 31+ messages in thread

* [RFC PATCH 1/2] mm/gup: introduce FOLL_PIN flag for get_user_pages()
  2019-08-12  1:50 [RFC PATCH 0/2] mm/gup: introduce vaddr_pin_pages_remote(), FOLL_PIN john.hubbard
@ 2019-08-12  1:50 ` john.hubbard
  2019-08-12  1:50 ` [RFC PATCH 2/2] mm/gup: introduce vaddr_pin_pages_remote() john.hubbard
  1 sibling, 0 replies; 31+ messages in thread
From: john.hubbard @ 2019-08-12  1:50 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Hellwig, Dan Williams, Dave Chinner, Ira Weiny,
	Jan Kara, Jason Gunthorpe, Jérôme Glisse, LKML,
	linux-mm, linux-fsdevel, linux-rdma, John Hubbard, Michal Hocko

From: John Hubbard <jhubbard@nvidia.com>

FOLL_PIN is set by vaddr_pin_pages(). This is different than
FOLL_LONGTERM, because even short term page pins need a new kind
of tracking, if those pinned pages' data is going to potentially
be modified.

This situation is described in more detail in commit fc1d8e7cca2d
("mm: introduce put_user_page*(), placeholder versions").

FOLL_PIN is added now, rather than waiting until there is code that
takes action based on FOLL_PIN. That's because having FOLL_PIN in
the code helps to highlight the differences between:

    a) get_user_pages(): soon to be deprecated. Used to pin pages,
       but without awareness of file systems that might use those
       pages,

    b) The original vaddr_pin_pages(): intended only for
       FOLL_LONGTERM and DAX use cases. This assumes direct IO
       and therefore is not applicable the most of the other
       callers of get_user_pages(), and

    c) The new vaddr_pin_pages(), which provides the correct
       get_user_pages() flags for all cases, by setting FOLL_PIN.

Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
---
 include/linux/mm.h | 1 +
 mm/gup.c           | 5 ++++-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 90c5802866df..61b616cd9243 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2663,6 +2663,7 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address,
 #define FOLL_ANON	0x8000	/* don't do file mappings */
 #define FOLL_LONGTERM	0x10000	/* mapping lifetime is indefinite: see below */
 #define FOLL_SPLIT_PMD	0x20000	/* split huge pmd before returning */
+#define FOLL_PIN	0x40000	/* pages must be released via put_user_page() */
 
 /*
  * NOTE on FOLL_LONGTERM:
diff --git a/mm/gup.c b/mm/gup.c
index 58f008a3c153..85f09958fbdc 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2494,6 +2494,9 @@ EXPORT_SYMBOL_GPL(get_user_pages_fast);
  * being made against.  Usually "current->mm".
  *
  * Expects mmap_sem to be read locked.
+ *
+ * Implementation note: this sets FOLL_PIN, which means that the pages must
+ * ultimately be released by put_user_page().
  */
 long vaddr_pin_pages(unsigned long addr, unsigned long nr_pages,
 		     unsigned int gup_flags, struct page **pages,
@@ -2501,7 +2504,7 @@ long vaddr_pin_pages(unsigned long addr, unsigned long nr_pages,
 {
 	long ret;
 
-	gup_flags |= FOLL_LONGTERM;
+	gup_flags |= FOLL_LONGTERM | FOLL_PIN;
 
 	if (!vaddr_pin || (!vaddr_pin->mm && !vaddr_pin->f_owner))
 		return -EINVAL;
-- 
2.22.0



^ permalink raw reply	[flat|nested] 31+ messages in thread

* [RFC PATCH 2/2] mm/gup: introduce vaddr_pin_pages_remote()
  2019-08-12  1:50 [RFC PATCH 0/2] mm/gup: introduce vaddr_pin_pages_remote(), FOLL_PIN john.hubbard
  2019-08-12  1:50 ` [RFC PATCH 1/2] mm/gup: introduce FOLL_PIN flag for get_user_pages() john.hubbard
@ 2019-08-12  1:50 ` john.hubbard
  2019-08-12 22:03   ` Ira Weiny
  2019-08-12 23:49   ` Ira Weiny
  1 sibling, 2 replies; 31+ messages in thread
From: john.hubbard @ 2019-08-12  1:50 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Hellwig, Dan Williams, Dave Chinner, Ira Weiny,
	Jan Kara, Jason Gunthorpe, Jérôme Glisse, LKML,
	linux-mm, linux-fsdevel, linux-rdma, John Hubbard

From: John Hubbard <jhubbard@nvidia.com>

This is the "vaddr_pin_pages" corresponding variant to
get_user_pages_remote(), but with FOLL_PIN semantics: the implementation
sets FOLL_PIN. That, in turn, means that the pages must ultimately be
released by put_user_page*()--typically, via vaddr_unpin_pages*().

Note that the put_user_page*() requirement won't be truly
required until all of the call sites have been converted, and
the tracking of pages is actually activated.

Also introduce vaddr_unpin_pages(), in order to have a simpler
call for the error handling cases.

Use both of these new calls in the Infiniband drive, replacing
get_user_pages_remote() and put_user_pages().

Signed-off-by: John Hubbard <jhubbard@nvidia.com>
---
 drivers/infiniband/core/umem_odp.c | 15 +++++----
 include/linux/mm.h                 |  7 +++++
 mm/gup.c                           | 50 ++++++++++++++++++++++++++++++
 3 files changed, 66 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/umem_odp.c
index 53085896d718..fdff034a8a30 100644
--- a/drivers/infiniband/core/umem_odp.c
+++ b/drivers/infiniband/core/umem_odp.c
@@ -534,7 +534,7 @@ static int ib_umem_odp_map_dma_single_page(
 	}
 
 out:
-	put_user_page(page);
+	vaddr_unpin_pages(&page, 1, &umem_odp->umem.vaddr_pin);
 
 	if (remove_existing_mapping) {
 		ib_umem_notifier_start_account(umem_odp);
@@ -635,9 +635,10 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, u64 user_virt,
 		 * complex (and doesn't gain us much performance in most use
 		 * cases).
 		 */
-		npages = get_user_pages_remote(owning_process, owning_mm,
+		npages = vaddr_pin_pages_remote(owning_process, owning_mm,
 				user_virt, gup_num_pages,
-				flags, local_page_list, NULL, NULL);
+				flags, local_page_list, NULL, NULL,
+				&umem_odp->umem.vaddr_pin);
 		up_read(&owning_mm->mmap_sem);
 
 		if (npages < 0) {
@@ -657,7 +658,8 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, u64 user_virt,
 					ret = -EFAULT;
 					break;
 				}
-				put_user_page(local_page_list[j]);
+				vaddr_unpin_pages(&local_page_list[j], 1,
+						  &umem_odp->umem.vaddr_pin);
 				continue;
 			}
 
@@ -684,8 +686,9 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, u64 user_virt,
 			 * ib_umem_odp_map_dma_single_page().
 			 */
 			if (npages - (j + 1) > 0)
-				put_user_pages(&local_page_list[j+1],
-					       npages - (j + 1));
+				vaddr_unpin_pages(&local_page_list[j+1],
+						  npages - (j + 1),
+						  &umem_odp->umem.vaddr_pin);
 			break;
 		}
 	}
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 61b616cd9243..2bd76ad8787e 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1606,6 +1606,13 @@ int __account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc,
 long vaddr_pin_pages(unsigned long addr, unsigned long nr_pages,
 		     unsigned int gup_flags, struct page **pages,
 		     struct vaddr_pin *vaddr_pin);
+long vaddr_pin_pages_remote(struct task_struct *tsk, struct mm_struct *mm,
+			    unsigned long start, unsigned long nr_pages,
+			    unsigned int gup_flags, struct page **pages,
+			    struct vm_area_struct **vmas, int *locked,
+			    struct vaddr_pin *vaddr_pin);
+void vaddr_unpin_pages(struct page **pages, unsigned long nr_pages,
+		       struct vaddr_pin *vaddr_pin);
 void vaddr_unpin_pages_dirty_lock(struct page **pages, unsigned long nr_pages,
 				  struct vaddr_pin *vaddr_pin, bool make_dirty);
 bool mapping_inode_has_layout(struct vaddr_pin *vaddr_pin, struct page *page);
diff --git a/mm/gup.c b/mm/gup.c
index 85f09958fbdc..bb95adfaf9b6 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2518,6 +2518,38 @@ long vaddr_pin_pages(unsigned long addr, unsigned long nr_pages,
 }
 EXPORT_SYMBOL(vaddr_pin_pages);
 
+/**
+ * vaddr_pin_pages pin pages by virtual address and return the pages to the
+ * user.
+ *
+ * @tsk:	the task_struct to use for page fault accounting, or
+ *		NULL if faults are not to be recorded.
+ * @mm:		mm_struct of target mm
+ * @addr:	start address
+ * @nr_pages:	number of pages to pin
+ * @gup_flags:	flags to use for the pin
+ * @pages:	array of pages returned
+ * @vaddr_pin:	initialized meta information this pin is to be associated
+ * with.
+ *
+ * This is the "vaddr_pin_pages" corresponding variant to
+ * get_user_pages_remote(), but with FOLL_PIN semantics: the implementation sets
+ * FOLL_PIN. That, in turn, means that the pages must ultimately be released
+ * by put_user_page().
+ */
+long vaddr_pin_pages_remote(struct task_struct *tsk, struct mm_struct *mm,
+			    unsigned long start, unsigned long nr_pages,
+			    unsigned int gup_flags, struct page **pages,
+			    struct vm_area_struct **vmas, int *locked,
+			    struct vaddr_pin *vaddr_pin)
+{
+	gup_flags |= FOLL_TOUCH | FOLL_REMOTE | FOLL_PIN;
+
+	return __get_user_pages_locked(tsk, mm, start, nr_pages, pages, vmas,
+				       locked, gup_flags, vaddr_pin);
+}
+EXPORT_SYMBOL(vaddr_pin_pages_remote);
+
 /**
  * vaddr_unpin_pages_dirty_lock - counterpart to vaddr_pin_pages
  *
@@ -2536,3 +2568,21 @@ void vaddr_unpin_pages_dirty_lock(struct page **pages, unsigned long nr_pages,
 	__put_user_pages_dirty_lock(vaddr_pin, pages, nr_pages, make_dirty);
 }
 EXPORT_SYMBOL(vaddr_unpin_pages_dirty_lock);
+
+/**
+ * vaddr_unpin_pages - simple, non-dirtying counterpart to vaddr_pin_pages
+ *
+ * @pages: array of pages returned
+ * @nr_pages: number of pages in pages
+ * @vaddr_pin: same information passed to vaddr_pin_pages
+ *
+ * Like vaddr_unpin_pages_dirty_lock, but for non-dirty pages. Useful in putting
+ * back pages in an error case: they were never made dirty.
+ */
+void vaddr_unpin_pages(struct page **pages, unsigned long nr_pages,
+		       struct vaddr_pin *vaddr_pin)
+{
+	__put_user_pages_dirty_lock(vaddr_pin, pages, nr_pages, false);
+}
+EXPORT_SYMBOL(vaddr_unpin_pages);
+
-- 
2.22.0



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 2/2] mm/gup: introduce vaddr_pin_pages_remote()
  2019-08-12  1:50 ` [RFC PATCH 2/2] mm/gup: introduce vaddr_pin_pages_remote() john.hubbard
@ 2019-08-12 22:03   ` Ira Weiny
  2019-08-12 22:21     ` John Hubbard
  2019-08-12 23:49   ` Ira Weiny
  1 sibling, 1 reply; 31+ messages in thread
From: Ira Weiny @ 2019-08-12 22:03 UTC (permalink / raw)
  To: john.hubbard
  Cc: Andrew Morton, Christoph Hellwig, Dan Williams, Dave Chinner,
	Jan Kara, Jason Gunthorpe, Jérôme Glisse, LKML,
	linux-mm, linux-fsdevel, linux-rdma, John Hubbard

On Sun, Aug 11, 2019 at 06:50:44PM -0700, john.hubbard@gmail.com wrote:
> From: John Hubbard <jhubbard@nvidia.com>
> 
> This is the "vaddr_pin_pages" corresponding variant to
> get_user_pages_remote(), but with FOLL_PIN semantics: the implementation
> sets FOLL_PIN. That, in turn, means that the pages must ultimately be
> released by put_user_page*()--typically, via vaddr_unpin_pages*().
> 
> Note that the put_user_page*() requirement won't be truly
> required until all of the call sites have been converted, and
> the tracking of pages is actually activated.
> 
> Also introduce vaddr_unpin_pages(), in order to have a simpler
> call for the error handling cases.
> 
> Use both of these new calls in the Infiniband drive, replacing
> get_user_pages_remote() and put_user_pages().
> 
> Signed-off-by: John Hubbard <jhubbard@nvidia.com>
> ---
>  drivers/infiniband/core/umem_odp.c | 15 +++++----
>  include/linux/mm.h                 |  7 +++++
>  mm/gup.c                           | 50 ++++++++++++++++++++++++++++++
>  3 files changed, 66 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/umem_odp.c
> index 53085896d718..fdff034a8a30 100644
> --- a/drivers/infiniband/core/umem_odp.c
> +++ b/drivers/infiniband/core/umem_odp.c
> @@ -534,7 +534,7 @@ static int ib_umem_odp_map_dma_single_page(
>  	}
>  
>  out:
> -	put_user_page(page);
> +	vaddr_unpin_pages(&page, 1, &umem_odp->umem.vaddr_pin);
>  
>  	if (remove_existing_mapping) {
>  		ib_umem_notifier_start_account(umem_odp);
> @@ -635,9 +635,10 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, u64 user_virt,
>  		 * complex (and doesn't gain us much performance in most use
>  		 * cases).
>  		 */
> -		npages = get_user_pages_remote(owning_process, owning_mm,
> +		npages = vaddr_pin_pages_remote(owning_process, owning_mm,
>  				user_virt, gup_num_pages,
> -				flags, local_page_list, NULL, NULL);
> +				flags, local_page_list, NULL, NULL,
> +				&umem_odp->umem.vaddr_pin);
>  		up_read(&owning_mm->mmap_sem);
>  
>  		if (npages < 0) {
> @@ -657,7 +658,8 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, u64 user_virt,
>  					ret = -EFAULT;
>  					break;
>  				}
> -				put_user_page(local_page_list[j]);
> +				vaddr_unpin_pages(&local_page_list[j], 1,
> +						  &umem_odp->umem.vaddr_pin);
>  				continue;
>  			}
>  
> @@ -684,8 +686,9 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, u64 user_virt,
>  			 * ib_umem_odp_map_dma_single_page().
>  			 */
>  			if (npages - (j + 1) > 0)
> -				put_user_pages(&local_page_list[j+1],
> -					       npages - (j + 1));
> +				vaddr_unpin_pages(&local_page_list[j+1],
> +						  npages - (j + 1),
> +						  &umem_odp->umem.vaddr_pin);
>  			break;
>  		}
>  	}
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 61b616cd9243..2bd76ad8787e 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1606,6 +1606,13 @@ int __account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc,
>  long vaddr_pin_pages(unsigned long addr, unsigned long nr_pages,
>  		     unsigned int gup_flags, struct page **pages,
>  		     struct vaddr_pin *vaddr_pin);
> +long vaddr_pin_pages_remote(struct task_struct *tsk, struct mm_struct *mm,
> +			    unsigned long start, unsigned long nr_pages,
> +			    unsigned int gup_flags, struct page **pages,
> +			    struct vm_area_struct **vmas, int *locked,
> +			    struct vaddr_pin *vaddr_pin);
> +void vaddr_unpin_pages(struct page **pages, unsigned long nr_pages,
> +		       struct vaddr_pin *vaddr_pin);
>  void vaddr_unpin_pages_dirty_lock(struct page **pages, unsigned long nr_pages,
>  				  struct vaddr_pin *vaddr_pin, bool make_dirty);
>  bool mapping_inode_has_layout(struct vaddr_pin *vaddr_pin, struct page *page);
> diff --git a/mm/gup.c b/mm/gup.c
> index 85f09958fbdc..bb95adfaf9b6 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -2518,6 +2518,38 @@ long vaddr_pin_pages(unsigned long addr, unsigned long nr_pages,
>  }
>  EXPORT_SYMBOL(vaddr_pin_pages);
>  
> +/**
> + * vaddr_pin_pages pin pages by virtual address and return the pages to the

vaddr_pin_pages_remote

Fixed in my tree.

> + * user.
> + *
> + * @tsk:	the task_struct to use for page fault accounting, or
> + *		NULL if faults are not to be recorded.
> + * @mm:		mm_struct of target mm
> + * @addr:	start address
> + * @nr_pages:	number of pages to pin
> + * @gup_flags:	flags to use for the pin
> + * @pages:	array of pages returned
> + * @vaddr_pin:	initialized meta information this pin is to be associated
> + * with.
> + *
> + * This is the "vaddr_pin_pages" corresponding variant to
> + * get_user_pages_remote(), but with FOLL_PIN semantics: the implementation sets
> + * FOLL_PIN. That, in turn, means that the pages must ultimately be released
> + * by put_user_page().
> + */
> +long vaddr_pin_pages_remote(struct task_struct *tsk, struct mm_struct *mm,
> +			    unsigned long start, unsigned long nr_pages,
> +			    unsigned int gup_flags, struct page **pages,
> +			    struct vm_area_struct **vmas, int *locked,
> +			    struct vaddr_pin *vaddr_pin)
> +{
> +	gup_flags |= FOLL_TOUCH | FOLL_REMOTE | FOLL_PIN;
> +
> +	return __get_user_pages_locked(tsk, mm, start, nr_pages, pages, vmas,
> +				       locked, gup_flags, vaddr_pin);
> +}
> +EXPORT_SYMBOL(vaddr_pin_pages_remote);
> +
>  /**
>   * vaddr_unpin_pages_dirty_lock - counterpart to vaddr_pin_pages
>   *
> @@ -2536,3 +2568,21 @@ void vaddr_unpin_pages_dirty_lock(struct page **pages, unsigned long nr_pages,
>  	__put_user_pages_dirty_lock(vaddr_pin, pages, nr_pages, make_dirty);
>  }
>  EXPORT_SYMBOL(vaddr_unpin_pages_dirty_lock);
> +
> +/**
> + * vaddr_unpin_pages - simple, non-dirtying counterpart to vaddr_pin_pages
> + *
> + * @pages: array of pages returned
> + * @nr_pages: number of pages in pages
> + * @vaddr_pin: same information passed to vaddr_pin_pages
> + *
> + * Like vaddr_unpin_pages_dirty_lock, but for non-dirty pages. Useful in putting
> + * back pages in an error case: they were never made dirty.
> + */
> +void vaddr_unpin_pages(struct page **pages, unsigned long nr_pages,
> +		       struct vaddr_pin *vaddr_pin)
> +{
> +	__put_user_pages_dirty_lock(vaddr_pin, pages, nr_pages, false);
> +}
> +EXPORT_SYMBOL(vaddr_unpin_pages);

Rather than have another wrapping call why don't we just do this?  Would it be
so bad to just have to specify false for make_dirty?


diff --git a/mm/gup.c b/mm/gup.c
index e77b250c1307..ca660a5e8206 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2540,7 +2540,7 @@ long vaddr_pin_pages_remote(struct task_struct *tsk, struct mm_struct *mm,
 EXPORT_SYMBOL(vaddr_pin_pages_remote);
 
 /**
- * vaddr_unpin_pages_dirty_lock - counterpart to vaddr_pin_pages
+ * vaddr_unpin_pages - counterpart to vaddr_pin_pages
  *
  * @pages: array of pages returned
  * @nr_pages: number of pages in pages
@@ -2551,26 +2551,9 @@ EXPORT_SYMBOL(vaddr_pin_pages_remote);
  * in vaddr_pin_pages should be passed back into this call for proper
  * tracking.
  */
-void vaddr_unpin_pages_dirty_lock(struct page **pages, unsigned long nr_pages,
-                                 struct vaddr_pin *vaddr_pin, bool make_dirty)
+void vaddr_unpin_pages(struct page **pages, unsigned long nr_pages,
+                      struct vaddr_pin *vaddr_pin, bool make_dirty)
 {
        __put_user_pages_dirty_lock(vaddr_pin, pages, nr_pages, make_dirty);
 }
 EXPORT_SYMBOL(vaddr_unpin_pages_dirty_lock);
-
-/**
- * vaddr_unpin_pages - simple, non-dirtying counterpart to vaddr_pin_pages
- *
- * @pages: array of pages returned
- * @nr_pages: number of pages in pages
- * @vaddr_pin: same information passed to vaddr_pin_pages
- *
- * Like vaddr_unpin_pages_dirty_lock, but for non-dirty pages. Useful in putting
- * back pages in an error case: they were never made dirty.
- */
-void vaddr_unpin_pages(struct page **pages, unsigned long nr_pages,
-                      struct vaddr_pin *vaddr_pin)
-{
-       __put_user_pages_dirty_lock(vaddr_pin, pages, nr_pages, false);
-}
-EXPORT_SYMBOL(vaddr_unpin_pages);


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 2/2] mm/gup: introduce vaddr_pin_pages_remote()
  2019-08-12 22:03   ` Ira Weiny
@ 2019-08-12 22:21     ` John Hubbard
  0 siblings, 0 replies; 31+ messages in thread
From: John Hubbard @ 2019-08-12 22:21 UTC (permalink / raw)
  To: Ira Weiny
  Cc: Andrew Morton, Christoph Hellwig, Dan Williams, Dave Chinner,
	Jan Kara, Jason Gunthorpe, Jérôme Glisse, LKML,
	linux-mm, linux-fsdevel, linux-rdma

On 8/12/19 3:03 PM, Ira Weiny wrote:
> On Sun, Aug 11, 2019 at 06:50:44PM -0700, john.hubbard@gmail.com wrote:
>> From: John Hubbard <jhubbard@nvidia.com>
...
>> +/**
>> + * vaddr_pin_pages pin pages by virtual address and return the pages to the
> 
> vaddr_pin_pages_remote
> 
> Fixed in my tree.


thanks. :)


> 
>> + * user.
>> + *
>> + * @tsk:	the task_struct to use for page fault accounting, or
>> + *		NULL if faults are not to be recorded.
>> + * @mm:		mm_struct of target mm
>> + * @addr:	start address
>> + * @nr_pages:	number of pages to pin
>> + * @gup_flags:	flags to use for the pin
>> + * @pages:	array of pages returned
>> + * @vaddr_pin:	initialized meta information this pin is to be associated
>> + * with.
>> + *
>> + * This is the "vaddr_pin_pages" corresponding variant to
>> + * get_user_pages_remote(), but with FOLL_PIN semantics: the implementation sets
>> + * FOLL_PIN. That, in turn, means that the pages must ultimately be released
>> + * by put_user_page().
>> + */
>> +long vaddr_pin_pages_remote(struct task_struct *tsk, struct mm_struct *mm,
>> +			    unsigned long start, unsigned long nr_pages,
>> +			    unsigned int gup_flags, struct page **pages,
>> +			    struct vm_area_struct **vmas, int *locked,
>> +			    struct vaddr_pin *vaddr_pin)
>> +{
>> +	gup_flags |= FOLL_TOUCH | FOLL_REMOTE | FOLL_PIN;
>> +
>> +	return __get_user_pages_locked(tsk, mm, start, nr_pages, pages, vmas,
>> +				       locked, gup_flags, vaddr_pin);
>> +}
>> +EXPORT_SYMBOL(vaddr_pin_pages_remote);
>> +
>>  /**
>>   * vaddr_unpin_pages_dirty_lock - counterpart to vaddr_pin_pages
>>   *
>> @@ -2536,3 +2568,21 @@ void vaddr_unpin_pages_dirty_lock(struct page **pages, unsigned long nr_pages,
>>  	__put_user_pages_dirty_lock(vaddr_pin, pages, nr_pages, make_dirty);
>>  }
>>  EXPORT_SYMBOL(vaddr_unpin_pages_dirty_lock);
>> +
>> +/**
>> + * vaddr_unpin_pages - simple, non-dirtying counterpart to vaddr_pin_pages
>> + *
>> + * @pages: array of pages returned
>> + * @nr_pages: number of pages in pages
>> + * @vaddr_pin: same information passed to vaddr_pin_pages
>> + *
>> + * Like vaddr_unpin_pages_dirty_lock, but for non-dirty pages. Useful in putting
>> + * back pages in an error case: they were never made dirty.
>> + */
>> +void vaddr_unpin_pages(struct page **pages, unsigned long nr_pages,
>> +		       struct vaddr_pin *vaddr_pin)
>> +{
>> +	__put_user_pages_dirty_lock(vaddr_pin, pages, nr_pages, false);
>> +}
>> +EXPORT_SYMBOL(vaddr_unpin_pages);
> 
> Rather than have another wrapping call why don't we just do this?  Would it be
> so bad to just have to specify false for make_dirty?

Sure, passing in false for make_dirty is fine, and in fact, there may even be
error cases I've forgotten about that *want* to dirty the page. 

I thought about these variants, and realized that we don't generally need to 
say "lock" anymore, because we're going to forcibly use set_page_dirty_lock 
(rather than set_page_dirty) in this part of the code. And a shorter name 
is nice. Since you've dropped both "_dirty" and "_lock" from the function 
name, it's still nice and short even though we pass in make_dirty as an arg.

So that's a long-winded, "the API below looks good to me". :)

> 
> 
> diff --git a/mm/gup.c b/mm/gup.c
> index e77b250c1307..ca660a5e8206 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -2540,7 +2540,7 @@ long vaddr_pin_pages_remote(struct task_struct *tsk, struct mm_struct *mm,
>  EXPORT_SYMBOL(vaddr_pin_pages_remote);
>  
>  /**
> - * vaddr_unpin_pages_dirty_lock - counterpart to vaddr_pin_pages
> + * vaddr_unpin_pages - counterpart to vaddr_pin_pages
>   *
>   * @pages: array of pages returned
>   * @nr_pages: number of pages in pages
> @@ -2551,26 +2551,9 @@ EXPORT_SYMBOL(vaddr_pin_pages_remote);
>   * in vaddr_pin_pages should be passed back into this call for proper
>   * tracking.
>   */
> -void vaddr_unpin_pages_dirty_lock(struct page **pages, unsigned long nr_pages,
> -                                 struct vaddr_pin *vaddr_pin, bool make_dirty)
> +void vaddr_unpin_pages(struct page **pages, unsigned long nr_pages,
> +                      struct vaddr_pin *vaddr_pin, bool make_dirty)
>  {
>         __put_user_pages_dirty_lock(vaddr_pin, pages, nr_pages, make_dirty);
>  }
>  EXPORT_SYMBOL(vaddr_unpin_pages_dirty_lock);
> -
> -/**
> - * vaddr_unpin_pages - simple, non-dirtying counterpart to vaddr_pin_pages
> - *
> - * @pages: array of pages returned
> - * @nr_pages: number of pages in pages
> - * @vaddr_pin: same information passed to vaddr_pin_pages
> - *
> - * Like vaddr_unpin_pages_dirty_lock, but for non-dirty pages. Useful in putting
> - * back pages in an error case: they were never made dirty.
> - */
> -void vaddr_unpin_pages(struct page **pages, unsigned long nr_pages,
> -                      struct vaddr_pin *vaddr_pin)
> -{
> -       __put_user_pages_dirty_lock(vaddr_pin, pages, nr_pages, false);
> -}
> -EXPORT_SYMBOL(vaddr_unpin_pages);
> 

thanks,
-- 
John Hubbard
NVIDIA


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 2/2] mm/gup: introduce vaddr_pin_pages_remote()
  2019-08-12  1:50 ` [RFC PATCH 2/2] mm/gup: introduce vaddr_pin_pages_remote() john.hubbard
  2019-08-12 22:03   ` Ira Weiny
@ 2019-08-12 23:49   ` Ira Weiny
  2019-08-13  0:07     ` John Hubbard
  1 sibling, 1 reply; 31+ messages in thread
From: Ira Weiny @ 2019-08-12 23:49 UTC (permalink / raw)
  To: john.hubbard
  Cc: Andrew Morton, Christoph Hellwig, Dan Williams, Dave Chinner,
	Jan Kara, Jason Gunthorpe, Jérôme Glisse, LKML,
	linux-mm, linux-fsdevel, linux-rdma, John Hubbard

On Sun, Aug 11, 2019 at 06:50:44PM -0700, john.hubbard@gmail.com wrote:
> From: John Hubbard <jhubbard@nvidia.com>
> 
> This is the "vaddr_pin_pages" corresponding variant to
> get_user_pages_remote(), but with FOLL_PIN semantics: the implementation
> sets FOLL_PIN. That, in turn, means that the pages must ultimately be
> released by put_user_page*()--typically, via vaddr_unpin_pages*().
> 
> Note that the put_user_page*() requirement won't be truly
> required until all of the call sites have been converted, and
> the tracking of pages is actually activated.
> 
> Also introduce vaddr_unpin_pages(), in order to have a simpler
> call for the error handling cases.
> 
> Use both of these new calls in the Infiniband drive, replacing
> get_user_pages_remote() and put_user_pages().
> 
> Signed-off-by: John Hubbard <jhubbard@nvidia.com>
> ---
>  drivers/infiniband/core/umem_odp.c | 15 +++++----
>  include/linux/mm.h                 |  7 +++++
>  mm/gup.c                           | 50 ++++++++++++++++++++++++++++++
>  3 files changed, 66 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/umem_odp.c
> index 53085896d718..fdff034a8a30 100644
> --- a/drivers/infiniband/core/umem_odp.c
> +++ b/drivers/infiniband/core/umem_odp.c
> @@ -534,7 +534,7 @@ static int ib_umem_odp_map_dma_single_page(
>  	}
>  
>  out:
> -	put_user_page(page);
> +	vaddr_unpin_pages(&page, 1, &umem_odp->umem.vaddr_pin);
>  
>  	if (remove_existing_mapping) {
>  		ib_umem_notifier_start_account(umem_odp);
> @@ -635,9 +635,10 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, u64 user_virt,
>  		 * complex (and doesn't gain us much performance in most use
>  		 * cases).
>  		 */
> -		npages = get_user_pages_remote(owning_process, owning_mm,
> +		npages = vaddr_pin_pages_remote(owning_process, owning_mm,
>  				user_virt, gup_num_pages,
> -				flags, local_page_list, NULL, NULL);
> +				flags, local_page_list, NULL, NULL,
> +				&umem_odp->umem.vaddr_pin);

Thinking about this part of the patch... is this pin really necessary?  This
code is not doing a long term pin.  The page just needs a reference while we
map it into the devices page tables.  Once that is done we should get notifiers
if anything changes and we can adjust.  right?

Ira

>  		up_read(&owning_mm->mmap_sem);
>  
>  		if (npages < 0) {
> @@ -657,7 +658,8 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, u64 user_virt,
>  					ret = -EFAULT;
>  					break;
>  				}
> -				put_user_page(local_page_list[j]);
> +				vaddr_unpin_pages(&local_page_list[j], 1,
> +						  &umem_odp->umem.vaddr_pin);
>  				continue;
>  			}
>  
> @@ -684,8 +686,9 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, u64 user_virt,
>  			 * ib_umem_odp_map_dma_single_page().
>  			 */
>  			if (npages - (j + 1) > 0)
> -				put_user_pages(&local_page_list[j+1],
> -					       npages - (j + 1));
> +				vaddr_unpin_pages(&local_page_list[j+1],
> +						  npages - (j + 1),
> +						  &umem_odp->umem.vaddr_pin);
>  			break;
>  		}
>  	}
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 61b616cd9243..2bd76ad8787e 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1606,6 +1606,13 @@ int __account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc,
>  long vaddr_pin_pages(unsigned long addr, unsigned long nr_pages,
>  		     unsigned int gup_flags, struct page **pages,
>  		     struct vaddr_pin *vaddr_pin);
> +long vaddr_pin_pages_remote(struct task_struct *tsk, struct mm_struct *mm,
> +			    unsigned long start, unsigned long nr_pages,
> +			    unsigned int gup_flags, struct page **pages,
> +			    struct vm_area_struct **vmas, int *locked,
> +			    struct vaddr_pin *vaddr_pin);
> +void vaddr_unpin_pages(struct page **pages, unsigned long nr_pages,
> +		       struct vaddr_pin *vaddr_pin);
>  void vaddr_unpin_pages_dirty_lock(struct page **pages, unsigned long nr_pages,
>  				  struct vaddr_pin *vaddr_pin, bool make_dirty);
>  bool mapping_inode_has_layout(struct vaddr_pin *vaddr_pin, struct page *page);
> diff --git a/mm/gup.c b/mm/gup.c
> index 85f09958fbdc..bb95adfaf9b6 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -2518,6 +2518,38 @@ long vaddr_pin_pages(unsigned long addr, unsigned long nr_pages,
>  }
>  EXPORT_SYMBOL(vaddr_pin_pages);
>  
> +/**
> + * vaddr_pin_pages pin pages by virtual address and return the pages to the
> + * user.
> + *
> + * @tsk:	the task_struct to use for page fault accounting, or
> + *		NULL if faults are not to be recorded.
> + * @mm:		mm_struct of target mm
> + * @addr:	start address
> + * @nr_pages:	number of pages to pin
> + * @gup_flags:	flags to use for the pin
> + * @pages:	array of pages returned
> + * @vaddr_pin:	initialized meta information this pin is to be associated
> + * with.
> + *
> + * This is the "vaddr_pin_pages" corresponding variant to
> + * get_user_pages_remote(), but with FOLL_PIN semantics: the implementation sets
> + * FOLL_PIN. That, in turn, means that the pages must ultimately be released
> + * by put_user_page().
> + */
> +long vaddr_pin_pages_remote(struct task_struct *tsk, struct mm_struct *mm,
> +			    unsigned long start, unsigned long nr_pages,
> +			    unsigned int gup_flags, struct page **pages,
> +			    struct vm_area_struct **vmas, int *locked,
> +			    struct vaddr_pin *vaddr_pin)
> +{
> +	gup_flags |= FOLL_TOUCH | FOLL_REMOTE | FOLL_PIN;
> +
> +	return __get_user_pages_locked(tsk, mm, start, nr_pages, pages, vmas,
> +				       locked, gup_flags, vaddr_pin);
> +}
> +EXPORT_SYMBOL(vaddr_pin_pages_remote);
> +
>  /**
>   * vaddr_unpin_pages_dirty_lock - counterpart to vaddr_pin_pages
>   *
> @@ -2536,3 +2568,21 @@ void vaddr_unpin_pages_dirty_lock(struct page **pages, unsigned long nr_pages,
>  	__put_user_pages_dirty_lock(vaddr_pin, pages, nr_pages, make_dirty);
>  }
>  EXPORT_SYMBOL(vaddr_unpin_pages_dirty_lock);
> +
> +/**
> + * vaddr_unpin_pages - simple, non-dirtying counterpart to vaddr_pin_pages
> + *
> + * @pages: array of pages returned
> + * @nr_pages: number of pages in pages
> + * @vaddr_pin: same information passed to vaddr_pin_pages
> + *
> + * Like vaddr_unpin_pages_dirty_lock, but for non-dirty pages. Useful in putting
> + * back pages in an error case: they were never made dirty.
> + */
> +void vaddr_unpin_pages(struct page **pages, unsigned long nr_pages,
> +		       struct vaddr_pin *vaddr_pin)
> +{
> +	__put_user_pages_dirty_lock(vaddr_pin, pages, nr_pages, false);
> +}
> +EXPORT_SYMBOL(vaddr_unpin_pages);
> +
> -- 
> 2.22.0
> 


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 2/2] mm/gup: introduce vaddr_pin_pages_remote()
  2019-08-12 23:49   ` Ira Weiny
@ 2019-08-13  0:07     ` John Hubbard
  2019-08-13 21:08       ` Ira Weiny
  0 siblings, 1 reply; 31+ messages in thread
From: John Hubbard @ 2019-08-13  0:07 UTC (permalink / raw)
  To: Ira Weiny
  Cc: Andrew Morton, Christoph Hellwig, Dan Williams, Dave Chinner,
	Jan Kara, Jason Gunthorpe, Jérôme Glisse, LKML,
	linux-mm, linux-fsdevel, linux-rdma

On 8/12/19 4:49 PM, Ira Weiny wrote:
> On Sun, Aug 11, 2019 at 06:50:44PM -0700, john.hubbard@gmail.com wrote:
>> From: John Hubbard <jhubbard@nvidia.com>
...
>> diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/umem_odp.c
>> index 53085896d718..fdff034a8a30 100644
>> --- a/drivers/infiniband/core/umem_odp.c
>> +++ b/drivers/infiniband/core/umem_odp.c
>> @@ -534,7 +534,7 @@ static int ib_umem_odp_map_dma_single_page(
>>   	}
>>   
>>   out:
>> -	put_user_page(page);
>> +	vaddr_unpin_pages(&page, 1, &umem_odp->umem.vaddr_pin);
>>   
>>   	if (remove_existing_mapping) {
>>   		ib_umem_notifier_start_account(umem_odp);
>> @@ -635,9 +635,10 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, u64 user_virt,
>>   		 * complex (and doesn't gain us much performance in most use
>>   		 * cases).
>>   		 */
>> -		npages = get_user_pages_remote(owning_process, owning_mm,
>> +		npages = vaddr_pin_pages_remote(owning_process, owning_mm,
>>   				user_virt, gup_num_pages,
>> -				flags, local_page_list, NULL, NULL);
>> +				flags, local_page_list, NULL, NULL,
>> +				&umem_odp->umem.vaddr_pin);
> 
> Thinking about this part of the patch... is this pin really necessary?  This
> code is not doing a long term pin.  The page just needs a reference while we
> map it into the devices page tables.  Once that is done we should get notifiers
> if anything changes and we can adjust.  right?
> 

OK, now it's a little interesting: the FOLL_PIN is necessary, but maybe not
FOLL_LONGTERM. Illustrating once again that it's actually necessary to allow
these flags to vary independently.

And that leads to another API refinement idea: let's set FOLL_PIN within the
vaddr_pin_pages*() wrappers, and set FOLL_LONGTER in the *callers* of those
wrappers, yes?

thanks,
-- 
John Hubbard
NVIDIA


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 2/2] mm/gup: introduce vaddr_pin_pages_remote()
  2019-08-13  0:07     ` John Hubbard
@ 2019-08-13 21:08       ` Ira Weiny
  2019-08-14  0:51         ` John Hubbard
  0 siblings, 1 reply; 31+ messages in thread
From: Ira Weiny @ 2019-08-13 21:08 UTC (permalink / raw)
  To: John Hubbard
  Cc: Andrew Morton, Christoph Hellwig, Dan Williams, Dave Chinner,
	Jan Kara, Jason Gunthorpe, Jérôme Glisse, LKML,
	linux-mm, linux-fsdevel, linux-rdma

On Mon, Aug 12, 2019 at 05:07:32PM -0700, John Hubbard wrote:
> On 8/12/19 4:49 PM, Ira Weiny wrote:
> > On Sun, Aug 11, 2019 at 06:50:44PM -0700, john.hubbard@gmail.com wrote:
> > > From: John Hubbard <jhubbard@nvidia.com>
> ...
> > > diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/umem_odp.c
> > > index 53085896d718..fdff034a8a30 100644
> > > --- a/drivers/infiniband/core/umem_odp.c
> > > +++ b/drivers/infiniband/core/umem_odp.c
> > > @@ -534,7 +534,7 @@ static int ib_umem_odp_map_dma_single_page(
> > >   	}
> > >   out:
> > > -	put_user_page(page);
> > > +	vaddr_unpin_pages(&page, 1, &umem_odp->umem.vaddr_pin);
> > >   	if (remove_existing_mapping) {
> > >   		ib_umem_notifier_start_account(umem_odp);
> > > @@ -635,9 +635,10 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, u64 user_virt,
> > >   		 * complex (and doesn't gain us much performance in most use
> > >   		 * cases).
> > >   		 */
> > > -		npages = get_user_pages_remote(owning_process, owning_mm,
> > > +		npages = vaddr_pin_pages_remote(owning_process, owning_mm,
> > >   				user_virt, gup_num_pages,
> > > -				flags, local_page_list, NULL, NULL);
> > > +				flags, local_page_list, NULL, NULL,
> > > +				&umem_odp->umem.vaddr_pin);
> > 
> > Thinking about this part of the patch... is this pin really necessary?  This
> > code is not doing a long term pin.  The page just needs a reference while we
> > map it into the devices page tables.  Once that is done we should get notifiers
> > if anything changes and we can adjust.  right?
> > 
> 
> OK, now it's a little interesting: the FOLL_PIN is necessary, but maybe not
> FOLL_LONGTERM. Illustrating once again that it's actually necessary to allow
> these flags to vary independently.

Why is PIN necessary?  I think we do want all drivers to use the new
user_uaddr_vaddr_pin_user_pages() call...  :-P  But in this case I think a
simple "get" reference is enough to reference the page while we are using it.
If it changes after the "put/unpin" we get a fault which should handle the
change right?

The other issue I have with FOLL_PIN is what does it mean to call "...pin...()"
without FOLL_PIN?

This is another confusion of get_user_pages()...  you can actually call it
without FOLL_GET...  :-/  And you just don't get pages back.  I've never really
dug into how (or if) you "put" them later...

> 
> And that leads to another API refinement idea: let's set FOLL_PIN within the
> vaddr_pin_pages*() wrappers, and set FOLL_LONGTER in the *callers* of those
> wrappers, yes?

I've thought about this before and I think any default flags should simply
define what we want follow_pages to do.

Also, the addition of vaddr_pin information creates an implicit flag which if
not there disallows any file pages from being pinned.  It becomes our new
"longterm" flag.  FOLL_PIN _could_ be what we should use "internally".  But we
could also just use this implicit vaddr_pin flag and not add a new flag.

Finally, I struggle with converting everyone to a new call.  It is more
overhead to use vaddr_pin in the call above because now the GUP code is going
to associate a file pin object with that file when in ODP we don't need that
because the pages can move around.

This overhead may be fine, not sure in this case, but I don't see everyone
wanting it.

Ira



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 2/2] mm/gup: introduce vaddr_pin_pages_remote()
  2019-08-13 21:08       ` Ira Weiny
@ 2019-08-14  0:51         ` John Hubbard
  2019-08-14  0:56           ` John Hubbard
  0 siblings, 1 reply; 31+ messages in thread
From: John Hubbard @ 2019-08-14  0:51 UTC (permalink / raw)
  To: Ira Weiny
  Cc: Andrew Morton, Christoph Hellwig, Dan Williams, Dave Chinner,
	Jan Kara, Jason Gunthorpe, Jérôme Glisse, LKML,
	linux-mm, linux-fsdevel, linux-rdma

On 8/13/19 2:08 PM, Ira Weiny wrote:
> On Mon, Aug 12, 2019 at 05:07:32PM -0700, John Hubbard wrote:
>> On 8/12/19 4:49 PM, Ira Weiny wrote:
>>> On Sun, Aug 11, 2019 at 06:50:44PM -0700, john.hubbard@gmail.com wrote:
>>>> From: John Hubbard <jhubbard@nvidia.com>
>> ...
>>> Thinking about this part of the patch... is this pin really necessary?  This
>>> code is not doing a long term pin.  The page just needs a reference while we
>>> map it into the devices page tables.  Once that is done we should get notifiers
>>> if anything changes and we can adjust.  right?
>>>
>>
>> OK, now it's a little interesting: the FOLL_PIN is necessary, but maybe not
>> FOLL_LONGTERM. Illustrating once again that it's actually necessary to allow
>> these flags to vary independently.
> 
> Why is PIN necessary?  I think we do want all drivers to use the new
> user_uaddr_vaddr_pin_user_pages() call...  :-P  But in this case I think a
> simple "get" reference is enough to reference the page while we are using it.
> If it changes after the "put/unpin" we get a fault which should handle the
> change right?
> 

FOLL_PIN is necessary because the caller is clearly in the use case that 
requires it--however briefly they might be there. As Jan described it,

"Anything that gets page reference and then touches page data (e.g. direct IO)
needs the new kind of tracking so that filesystem knows someone is messing with
the page data." [1]

> The other issue I have with FOLL_PIN is what does it mean to call "...pin...()"
> without FOLL_PIN?
> 
> This is another confusion of get_user_pages()...  you can actually call it
> without FOLL_GET...  :-/  And you just don't get pages back.  I've never really
> dug into how (or if) you "put" them later...
>

Yes, you are talking to someone who has been suffering through that. The
problem here is that gup() has evolved into a do-everything tool. I think we're
getting closer to teasing it apart into more specific interfaces that do
more limited things.

Anyway, I want FOLL_PIN as a way to help clarify this...

 
>>
>> And that leads to another API refinement idea: let's set FOLL_PIN within the
>> vaddr_pin_pages*() wrappers, and set FOLL_LONGTER in the *callers* of those
>> wrappers, yes?
> 
> I've thought about this before and I think any default flags should simply
> define what we want follow_pages to do.

Hmm, so don't forget that we need to know what gup_fast() should do, too.

Anyway, I'm not worried about which combination of wrapper calls set which flags,
I'm open to suggestion there. But it does still seem to me that we should have
independent FOLL_LONGTERM and FOLL_PIN flags, once the API churn settles. 


> 
> Also, the addition of vaddr_pin information creates an implicit flag which if
> not there disallows any file pages from being pinned.  It becomes our new
> "longterm" flag.  FOLL_PIN _could_ be what we should use "internally".  But we
> could also just use this implicit vaddr_pin flag and not add a new flag.

I'd like to have FOLL_PIN internally, in order to solve the problems that
you just raised, above! Namely, that it's too hard to figure out all the
cases that gup() is handling.

With FOLL_PIN, we know that we should be taking the new pin refcount, and
releasing it via the (currently named) put_user_page*(), ultimately.

> 
> Finally, I struggle with converting everyone to a new call.  It is more
> overhead to use vaddr_pin in the call above because now the GUP code is going
> to associate a file pin object with that file when in ODP we don't need that
> because the pages can move around.

What if the pages in ODP are file-backed? 

> 
> This overhead may be fine, not sure in this case, but I don't see everyone
> wanting it.

I think most callers don't have much choice, otherwise they'll be broken with
file-backed memory.

thanks,
-- 
John Hubbard
NVIDIA


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 2/2] mm/gup: introduce vaddr_pin_pages_remote()
  2019-08-14  0:51         ` John Hubbard
@ 2019-08-14  0:56           ` John Hubbard
  2019-08-14 23:50             ` Ira Weiny
  0 siblings, 1 reply; 31+ messages in thread
From: John Hubbard @ 2019-08-14  0:56 UTC (permalink / raw)
  To: Ira Weiny
  Cc: Andrew Morton, Christoph Hellwig, Dan Williams, Dave Chinner,
	Jan Kara, Jason Gunthorpe, Jérôme Glisse, LKML,
	linux-mm, linux-fsdevel, linux-rdma

On 8/13/19 5:51 PM, John Hubbard wrote:
> On 8/13/19 2:08 PM, Ira Weiny wrote:
>> On Mon, Aug 12, 2019 at 05:07:32PM -0700, John Hubbard wrote:
>>> On 8/12/19 4:49 PM, Ira Weiny wrote:
>>>> On Sun, Aug 11, 2019 at 06:50:44PM -0700, john.hubbard@gmail.com wrote:
>>>>> From: John Hubbard <jhubbard@nvidia.com>
>>> ...
>> Finally, I struggle with converting everyone to a new call.  It is more
>> overhead to use vaddr_pin in the call above because now the GUP code is going
>> to associate a file pin object with that file when in ODP we don't need that
>> because the pages can move around.
> 
> What if the pages in ODP are file-backed? 
> 

oops, strike that, you're right: in that case, even the file system case is covered.
Don't mind me. :)

>>
>> This overhead may be fine, not sure in this case, but I don't see everyone
>> wanting it.

So now I see why you said that, but I will note that ODP hardware is rare,
and will likely remain rare: replayable page faults require really special 
hardware, and after all this time, we still only have CPUs, GPUs, and the
Mellanox cards that do it.

That leaves a lot of other hardware to take care of.

thanks,
-- 
John Hubbard
NVIDIA



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 2/2] mm/gup: introduce vaddr_pin_pages_remote()
  2019-08-14  0:56           ` John Hubbard
@ 2019-08-14 23:50             ` Ira Weiny
  2019-08-15  0:02               ` John Hubbard
  0 siblings, 1 reply; 31+ messages in thread
From: Ira Weiny @ 2019-08-14 23:50 UTC (permalink / raw)
  To: John Hubbard
  Cc: Andrew Morton, Christoph Hellwig, Dan Williams, Dave Chinner,
	Jan Kara, Jason Gunthorpe, Jérôme Glisse, LKML,
	linux-mm, linux-fsdevel, linux-rdma

On Tue, Aug 13, 2019 at 05:56:31PM -0700, John Hubbard wrote:
> On 8/13/19 5:51 PM, John Hubbard wrote:
> > On 8/13/19 2:08 PM, Ira Weiny wrote:
> >> On Mon, Aug 12, 2019 at 05:07:32PM -0700, John Hubbard wrote:
> >>> On 8/12/19 4:49 PM, Ira Weiny wrote:
> >>>> On Sun, Aug 11, 2019 at 06:50:44PM -0700, john.hubbard@gmail.com wrote:
> >>>>> From: John Hubbard <jhubbard@nvidia.com>
> >>> ...
> >> Finally, I struggle with converting everyone to a new call.  It is more
> >> overhead to use vaddr_pin in the call above because now the GUP code is going
> >> to associate a file pin object with that file when in ODP we don't need that
> >> because the pages can move around.
> > 
> > What if the pages in ODP are file-backed? 
> > 
> 
> oops, strike that, you're right: in that case, even the file system case is covered.
> Don't mind me. :)

Ok so are we agreed we will drop the patch to the ODP code?  I'm going to keep
the FOLL_PIN flag and addition in the vaddr_pin_pages.

Ira

> 
> >>
> >> This overhead may be fine, not sure in this case, but I don't see everyone
> >> wanting it.
> 
> So now I see why you said that, but I will note that ODP hardware is rare,
> and will likely remain rare: replayable page faults require really special 
> hardware, and after all this time, we still only have CPUs, GPUs, and the
> Mellanox cards that do it.
> 
> That leaves a lot of other hardware to take care of.
> 
> thanks,
> -- 
> John Hubbard
> NVIDIA
> 


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 2/2] mm/gup: introduce vaddr_pin_pages_remote()
  2019-08-14 23:50             ` Ira Weiny
@ 2019-08-15  0:02               ` John Hubbard
  2019-08-15  3:01                 ` John Hubbard
  0 siblings, 1 reply; 31+ messages in thread
From: John Hubbard @ 2019-08-15  0:02 UTC (permalink / raw)
  To: Ira Weiny
  Cc: Andrew Morton, Christoph Hellwig, Dan Williams, Dave Chinner,
	Jan Kara, Jason Gunthorpe, Jérôme Glisse, LKML,
	linux-mm, linux-fsdevel, linux-rdma

On 8/14/19 4:50 PM, Ira Weiny wrote:
> On Tue, Aug 13, 2019 at 05:56:31PM -0700, John Hubbard wrote:
>> On 8/13/19 5:51 PM, John Hubbard wrote:
>>> On 8/13/19 2:08 PM, Ira Weiny wrote:
>>>> On Mon, Aug 12, 2019 at 05:07:32PM -0700, John Hubbard wrote:
>>>>> On 8/12/19 4:49 PM, Ira Weiny wrote:
>>>>>> On Sun, Aug 11, 2019 at 06:50:44PM -0700, john.hubbard@gmail.com wrote:
>>>>>>> From: John Hubbard <jhubbard@nvidia.com>
>>>>> ...
>>>> Finally, I struggle with converting everyone to a new call.  It is more
>>>> overhead to use vaddr_pin in the call above because now the GUP code is going
>>>> to associate a file pin object with that file when in ODP we don't need that
>>>> because the pages can move around.
>>>
>>> What if the pages in ODP are file-backed?
>>>
>>
>> oops, strike that, you're right: in that case, even the file system case is covered.
>> Don't mind me. :)
> 
> Ok so are we agreed we will drop the patch to the ODP code?  I'm going to keep
> the FOLL_PIN flag and addition in the vaddr_pin_pages.
> 

Yes. I hope I'm not overlooking anything, but it all seems to make sense to
let ODP just rely on the MMU notifiers.

thanks,
-- 
John Hubbard
NVIDIA


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 2/2] mm/gup: introduce vaddr_pin_pages_remote()
  2019-08-15  0:02               ` John Hubbard
@ 2019-08-15  3:01                 ` John Hubbard
  2019-08-15 13:26                   ` Jan Kara
  0 siblings, 1 reply; 31+ messages in thread
From: John Hubbard @ 2019-08-15  3:01 UTC (permalink / raw)
  To: Ira Weiny
  Cc: Andrew Morton, Christoph Hellwig, Dan Williams, Dave Chinner,
	Jan Kara, Jason Gunthorpe, Jérôme Glisse, LKML,
	linux-mm, linux-fsdevel, linux-rdma

On 8/14/19 5:02 PM, John Hubbard wrote:
> On 8/14/19 4:50 PM, Ira Weiny wrote:
>> On Tue, Aug 13, 2019 at 05:56:31PM -0700, John Hubbard wrote:
>>> On 8/13/19 5:51 PM, John Hubbard wrote:
>>>> On 8/13/19 2:08 PM, Ira Weiny wrote:
>>>>> On Mon, Aug 12, 2019 at 05:07:32PM -0700, John Hubbard wrote:
>>>>>> On 8/12/19 4:49 PM, Ira Weiny wrote:
>>>>>>> On Sun, Aug 11, 2019 at 06:50:44PM -0700, john.hubbard@gmail.com wrote:
>>>>>>>> From: John Hubbard <jhubbard@nvidia.com>
>>>>>> ...
>>>>> Finally, I struggle with converting everyone to a new call.  It is more
>>>>> overhead to use vaddr_pin in the call above because now the GUP code is going
>>>>> to associate a file pin object with that file when in ODP we don't need that
>>>>> because the pages can move around.
>>>>
>>>> What if the pages in ODP are file-backed?
>>>>
>>>
>>> oops, strike that, you're right: in that case, even the file system case is covered.
>>> Don't mind me. :)
>>
>> Ok so are we agreed we will drop the patch to the ODP code?  I'm going to keep
>> the FOLL_PIN flag and addition in the vaddr_pin_pages.
>>
> 
> Yes. I hope I'm not overlooking anything, but it all seems to make sense to
> let ODP just rely on the MMU notifiers.
> 

Hold on, I *was* forgetting something: this was a two part thing, and you're
conflating the two points, but they need to remain separate and distinct. There
were:

1. FOLL_PIN is necessary because the caller is clearly in the use case that
requires it--however briefly they might be there. As Jan described it,

"Anything that gets page reference and then touches page data (e.g. direct IO)
needs the new kind of tracking so that filesystem knows someone is messing with
the page data." [1]

2. Releasing the pin: for ODP, we can use MMU notifiers instead of requiring a
lease.

This second point does not invalidate the first point. Therefore, I still see the
need for the call within ODP, to something that sets FOLL_PIN. And that means
either vaddr_pin_[user?]_pages_remote, or some other wrapper of your choice. :)

I guess shows that the API might need to be refined. We're trying to solve
two closely related issues, but they're not identical.

thanks,
-- 
John Hubbard
NVIDIA


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 2/2] mm/gup: introduce vaddr_pin_pages_remote()
  2019-08-15  3:01                 ` John Hubbard
@ 2019-08-15 13:26                   ` Jan Kara
  2019-08-15 13:35                     ` Jan Kara
  0 siblings, 1 reply; 31+ messages in thread
From: Jan Kara @ 2019-08-15 13:26 UTC (permalink / raw)
  To: John Hubbard
  Cc: Ira Weiny, Andrew Morton, Christoph Hellwig, Dan Williams,
	Dave Chinner, Jan Kara, Jason Gunthorpe, Jérôme Glisse,
	LKML, linux-mm, linux-fsdevel, linux-rdma

On Wed 14-08-19 20:01:07, John Hubbard wrote:
> On 8/14/19 5:02 PM, John Hubbard wrote:
> > On 8/14/19 4:50 PM, Ira Weiny wrote:
> > > On Tue, Aug 13, 2019 at 05:56:31PM -0700, John Hubbard wrote:
> > > > On 8/13/19 5:51 PM, John Hubbard wrote:
> > > > > On 8/13/19 2:08 PM, Ira Weiny wrote:
> > > > > > On Mon, Aug 12, 2019 at 05:07:32PM -0700, John Hubbard wrote:
> > > > > > > On 8/12/19 4:49 PM, Ira Weiny wrote:
> > > > > > > > On Sun, Aug 11, 2019 at 06:50:44PM -0700, john.hubbard@gmail.com wrote:
> > > > > > > > > From: John Hubbard <jhubbard@nvidia.com>
> > > > > > > ...
> > > > > > Finally, I struggle with converting everyone to a new call.  It is more
> > > > > > overhead to use vaddr_pin in the call above because now the GUP code is going
> > > > > > to associate a file pin object with that file when in ODP we don't need that
> > > > > > because the pages can move around.
> > > > > 
> > > > > What if the pages in ODP are file-backed?
> > > > > 
> > > > 
> > > > oops, strike that, you're right: in that case, even the file system case is covered.
> > > > Don't mind me. :)
> > > 
> > > Ok so are we agreed we will drop the patch to the ODP code?  I'm going to keep
> > > the FOLL_PIN flag and addition in the vaddr_pin_pages.
> > > 
> > 
> > Yes. I hope I'm not overlooking anything, but it all seems to make sense to
> > let ODP just rely on the MMU notifiers.
> > 
> 
> Hold on, I *was* forgetting something: this was a two part thing, and
> you're conflating the two points, but they need to remain separate and
> distinct. There were:
> 
> 1. FOLL_PIN is necessary because the caller is clearly in the use case that
> requires it--however briefly they might be there. As Jan described it,
> 
> "Anything that gets page reference and then touches page data (e.g.
> direct IO) needs the new kind of tracking so that filesystem knows
> someone is messing with the page data." [1]

So when the GUP user uses MMU notifiers to stop writing to pages whenever
they are writeprotected with page_mkclean(), they don't really need page
pin - their access is then fully equivalent to any other mmap userspace
access and filesystem knows how to deal with those. I forgot out this case
when I wrote the above sentence.

So to sum up there are three cases:
1) DIO case - GUP references to pages serving as DIO buffers are needed for
   relatively short time, no special synchronization with page_mkclean() or
   munmap() => needs FOLL_PIN
2) RDMA case - GUP references to pages serving as DMA buffers needed for a
   long time, no special synchronization with page_mkclean() or munmap()
   => needs FOLL_PIN | FOLL_LONGTERM
   This case has also a special case when the pages are actually DAX. Then
   the caller additionally needs file lease and additional file_pin
   structure is used for tracking this usage.
3) ODP case - GUP references to pages serving as DMA buffers, MMU notifiers
   used to synchronize with page_mkclean() and munmap() => normal page
   references are fine.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 2/2] mm/gup: introduce vaddr_pin_pages_remote()
  2019-08-15 13:26                   ` Jan Kara
@ 2019-08-15 13:35                     ` Jan Kara
  2019-08-15 14:51                       ` Jason Gunthorpe
                                         ` (2 more replies)
  0 siblings, 3 replies; 31+ messages in thread
From: Jan Kara @ 2019-08-15 13:35 UTC (permalink / raw)
  To: John Hubbard
  Cc: Ira Weiny, Andrew Morton, Christoph Hellwig, Dan Williams,
	Dave Chinner, Jan Kara, Jason Gunthorpe, Jérôme Glisse,
	LKML, linux-mm, linux-fsdevel, linux-rdma

On Thu 15-08-19 15:26:22, Jan Kara wrote:
> On Wed 14-08-19 20:01:07, John Hubbard wrote:
> > On 8/14/19 5:02 PM, John Hubbard wrote:
> > > On 8/14/19 4:50 PM, Ira Weiny wrote:
> > > > On Tue, Aug 13, 2019 at 05:56:31PM -0700, John Hubbard wrote:
> > > > > On 8/13/19 5:51 PM, John Hubbard wrote:
> > > > > > On 8/13/19 2:08 PM, Ira Weiny wrote:
> > > > > > > On Mon, Aug 12, 2019 at 05:07:32PM -0700, John Hubbard wrote:
> > > > > > > > On 8/12/19 4:49 PM, Ira Weiny wrote:
> > > > > > > > > On Sun, Aug 11, 2019 at 06:50:44PM -0700, john.hubbard@gmail.com wrote:
> > > > > > > > > > From: John Hubbard <jhubbard@nvidia.com>
> > > > > > > > ...
> > > > > > > Finally, I struggle with converting everyone to a new call.  It is more
> > > > > > > overhead to use vaddr_pin in the call above because now the GUP code is going
> > > > > > > to associate a file pin object with that file when in ODP we don't need that
> > > > > > > because the pages can move around.
> > > > > > 
> > > > > > What if the pages in ODP are file-backed?
> > > > > > 
> > > > > 
> > > > > oops, strike that, you're right: in that case, even the file system case is covered.
> > > > > Don't mind me. :)
> > > > 
> > > > Ok so are we agreed we will drop the patch to the ODP code?  I'm going to keep
> > > > the FOLL_PIN flag and addition in the vaddr_pin_pages.
> > > > 
> > > 
> > > Yes. I hope I'm not overlooking anything, but it all seems to make sense to
> > > let ODP just rely on the MMU notifiers.
> > > 
> > 
> > Hold on, I *was* forgetting something: this was a two part thing, and
> > you're conflating the two points, but they need to remain separate and
> > distinct. There were:
> > 
> > 1. FOLL_PIN is necessary because the caller is clearly in the use case that
> > requires it--however briefly they might be there. As Jan described it,
> > 
> > "Anything that gets page reference and then touches page data (e.g.
> > direct IO) needs the new kind of tracking so that filesystem knows
> > someone is messing with the page data." [1]
> 
> So when the GUP user uses MMU notifiers to stop writing to pages whenever
> they are writeprotected with page_mkclean(), they don't really need page
> pin - their access is then fully equivalent to any other mmap userspace
> access and filesystem knows how to deal with those. I forgot out this case
> when I wrote the above sentence.
> 
> So to sum up there are three cases:
> 1) DIO case - GUP references to pages serving as DIO buffers are needed for
>    relatively short time, no special synchronization with page_mkclean() or
>    munmap() => needs FOLL_PIN
> 2) RDMA case - GUP references to pages serving as DMA buffers needed for a
>    long time, no special synchronization with page_mkclean() or munmap()
>    => needs FOLL_PIN | FOLL_LONGTERM
>    This case has also a special case when the pages are actually DAX. Then
>    the caller additionally needs file lease and additional file_pin
>    structure is used for tracking this usage.
> 3) ODP case - GUP references to pages serving as DMA buffers, MMU notifiers
>    used to synchronize with page_mkclean() and munmap() => normal page
>    references are fine.

I want to add that I'd like to convert users in cases 1) and 2) from using
GUP to using differently named function. Users in case 3) can stay as they
are for now although ultimately I'd like to denote such use cases in a
special way as well...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 2/2] mm/gup: introduce vaddr_pin_pages_remote()
  2019-08-15 13:35                     ` Jan Kara
@ 2019-08-15 14:51                       ` Jason Gunthorpe
  2019-08-15 17:32                       ` Ira Weiny
  2019-08-16  8:47                       ` Vlastimil Babka
  2 siblings, 0 replies; 31+ messages in thread
From: Jason Gunthorpe @ 2019-08-15 14:51 UTC (permalink / raw)
  To: Jan Kara
  Cc: John Hubbard, Ira Weiny, Andrew Morton, Christoph Hellwig,
	Dan Williams, Dave Chinner, Jérôme Glisse, LKML,
	linux-mm, linux-fsdevel, linux-rdma

On Thu, Aug 15, 2019 at 03:35:10PM +0200, Jan Kara wrote:

> > 3) ODP case - GUP references to pages serving as DMA buffers, MMU notifiers
> >    used to synchronize with page_mkclean() and munmap() => normal page
> >    references are fine.
> 
> I want to add that I'd like to convert users in cases 1) and 2) from using
> GUP to using differently named function. Users in case 3) can stay as they
> are for now although ultimately I'd like to denote such use cases in a
> special way as well...

3) users also want a special function and path, right now it is called
hmm_range_fault() but perhaps it would be good to harmonize it more
with the GUP infrastructure?

I'm not quite sure what the best plan for that is yet.

Jason


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 2/2] mm/gup: introduce vaddr_pin_pages_remote()
  2019-08-15 13:35                     ` Jan Kara
  2019-08-15 14:51                       ` Jason Gunthorpe
@ 2019-08-15 17:32                       ` Ira Weiny
  2019-08-15 17:41                         ` John Hubbard
  2019-08-16  8:47                       ` Vlastimil Babka
  2 siblings, 1 reply; 31+ messages in thread
From: Ira Weiny @ 2019-08-15 17:32 UTC (permalink / raw)
  To: Jan Kara
  Cc: John Hubbard, Andrew Morton, Christoph Hellwig, Dan Williams,
	Dave Chinner, Jason Gunthorpe, Jérôme Glisse, LKML,
	linux-mm, linux-fsdevel, linux-rdma

On Thu, Aug 15, 2019 at 03:35:10PM +0200, Jan Kara wrote:
> On Thu 15-08-19 15:26:22, Jan Kara wrote:
> > On Wed 14-08-19 20:01:07, John Hubbard wrote:
> > > On 8/14/19 5:02 PM, John Hubbard wrote:
> > > 
> > > Hold on, I *was* forgetting something: this was a two part thing, and
> > > you're conflating the two points, but they need to remain separate and
> > > distinct. There were:
> > > 
> > > 1. FOLL_PIN is necessary because the caller is clearly in the use case that
> > > requires it--however briefly they might be there. As Jan described it,
> > > 
> > > "Anything that gets page reference and then touches page data (e.g.
> > > direct IO) needs the new kind of tracking so that filesystem knows
> > > someone is messing with the page data." [1]
> > 
> > So when the GUP user uses MMU notifiers to stop writing to pages whenever
> > they are writeprotected with page_mkclean(), they don't really need page
> > pin - their access is then fully equivalent to any other mmap userspace
> > access and filesystem knows how to deal with those. I forgot out this case
> > when I wrote the above sentence.
> > 
> > So to sum up there are three cases:
> > 1) DIO case - GUP references to pages serving as DIO buffers are needed for
> >    relatively short time, no special synchronization with page_mkclean() or
> >    munmap() => needs FOLL_PIN
> > 2) RDMA case - GUP references to pages serving as DMA buffers needed for a
> >    long time, no special synchronization with page_mkclean() or munmap()
> >    => needs FOLL_PIN | FOLL_LONGTERM
> >    This case has also a special case when the pages are actually DAX. Then
> >    the caller additionally needs file lease and additional file_pin
> >    structure is used for tracking this usage.
> > 3) ODP case - GUP references to pages serving as DMA buffers, MMU notifiers
> >    used to synchronize with page_mkclean() and munmap() => normal page
> >    references are fine.
> 
> I want to add that I'd like to convert users in cases 1) and 2) from using
> GUP to using differently named function. Users in case 3) can stay as they
> are for now although ultimately I'd like to denote such use cases in a
> special way as well...
> 

Ok just to make this clear I threw up my current tree with your patches here:

https://github.com/weiny2/linux-kernel/commits/mmotm-rdmafsdax-b0-v4

I'm talking about dropping the final patch:
05fd2d3afa6b rdma/umem_odp: Use vaddr_pin_pages_remote() in ODP

The other 2 can stay.  I split out the *_remote() call.  We don't have a user
but I'll keep it around for a bit.

This tree is still WIP as I work through all the comments.  So I've not changed
names or variable types etc...  Just wanted to settle this.

Ira



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 2/2] mm/gup: introduce vaddr_pin_pages_remote()
  2019-08-15 17:32                       ` Ira Weiny
@ 2019-08-15 17:41                         ` John Hubbard
  2019-08-16  2:14                           ` John Hubbard
  0 siblings, 1 reply; 31+ messages in thread
From: John Hubbard @ 2019-08-15 17:41 UTC (permalink / raw)
  To: Ira Weiny, Jan Kara
  Cc: Andrew Morton, Christoph Hellwig, Dan Williams, Dave Chinner,
	Jason Gunthorpe, Jérôme Glisse, LKML, linux-mm,
	linux-fsdevel, linux-rdma

On 8/15/19 10:32 AM, Ira Weiny wrote:
> On Thu, Aug 15, 2019 at 03:35:10PM +0200, Jan Kara wrote:
>> On Thu 15-08-19 15:26:22, Jan Kara wrote:
>>> On Wed 14-08-19 20:01:07, John Hubbard wrote:
>>>> On 8/14/19 5:02 PM, John Hubbard wrote:
>>>>
>>>> Hold on, I *was* forgetting something: this was a two part thing, and
>>>> you're conflating the two points, but they need to remain separate and
>>>> distinct. There were:
>>>>
>>>> 1. FOLL_PIN is necessary because the caller is clearly in the use case that
>>>> requires it--however briefly they might be there. As Jan described it,
>>>>
>>>> "Anything that gets page reference and then touches page data (e.g.
>>>> direct IO) needs the new kind of tracking so that filesystem knows
>>>> someone is messing with the page data." [1]
>>>
>>> So when the GUP user uses MMU notifiers to stop writing to pages whenever
>>> they are writeprotected with page_mkclean(), they don't really need page
>>> pin - their access is then fully equivalent to any other mmap userspace
>>> access and filesystem knows how to deal with those. I forgot out this case
>>> when I wrote the above sentence.
>>>
>>> So to sum up there are three cases:
>>> 1) DIO case - GUP references to pages serving as DIO buffers are needed for
>>>    relatively short time, no special synchronization with page_mkclean() or
>>>    munmap() => needs FOLL_PIN
>>> 2) RDMA case - GUP references to pages serving as DMA buffers needed for a
>>>    long time, no special synchronization with page_mkclean() or munmap()
>>>    => needs FOLL_PIN | FOLL_LONGTERM
>>>    This case has also a special case when the pages are actually DAX. Then
>>>    the caller additionally needs file lease and additional file_pin
>>>    structure is used for tracking this usage.
>>> 3) ODP case - GUP references to pages serving as DMA buffers, MMU notifiers
>>>    used to synchronize with page_mkclean() and munmap() => normal page
>>>    references are fine.
>>

Thanks Jan, once again, for clarifying all of this!

>> I want to add that I'd like to convert users in cases 1) and 2) from using
>> GUP to using differently named function. Users in case 3) can stay as they
>> are for now although ultimately I'd like to denote such use cases in a
>> special way as well...
>>
> 
> Ok just to make this clear I threw up my current tree with your patches here:
> 
> https://github.com/weiny2/linux-kernel/commits/mmotm-rdmafsdax-b0-v4
> 
> I'm talking about dropping the final patch:
> 05fd2d3afa6b rdma/umem_odp: Use vaddr_pin_pages_remote() in ODP
> 
> The other 2 can stay.  I split out the *_remote() call.  We don't have a user
> but I'll keep it around for a bit.
> 
> This tree is still WIP as I work through all the comments.  So I've not changed
> names or variable types etc...  Just wanted to settle this.
> 

Right. And now that ODP is not a user, I'll take a quick look through my other
call site conversions and see if I can find an easy one, to include here as
the first user of vaddr_pin_pages_remote(). I'll send it your way if that
works out.


thanks,
-- 
John Hubbard
NVIDIA


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 2/2] mm/gup: introduce vaddr_pin_pages_remote()
  2019-08-15 17:41                         ` John Hubbard
@ 2019-08-16  2:14                           ` John Hubbard
  2019-08-16 15:41                             ` Jan Kara
  0 siblings, 1 reply; 31+ messages in thread
From: John Hubbard @ 2019-08-16  2:14 UTC (permalink / raw)
  To: Ira Weiny, Jan Kara
  Cc: Andrew Morton, Christoph Hellwig, Dan Williams, Dave Chinner,
	Jason Gunthorpe, Jérôme Glisse, LKML, linux-mm,
	linux-fsdevel, linux-rdma

On 8/15/19 10:41 AM, John Hubbard wrote:
> On 8/15/19 10:32 AM, Ira Weiny wrote:
>> On Thu, Aug 15, 2019 at 03:35:10PM +0200, Jan Kara wrote:
>>> On Thu 15-08-19 15:26:22, Jan Kara wrote:
>>>> On Wed 14-08-19 20:01:07, John Hubbard wrote:
>>>>> On 8/14/19 5:02 PM, John Hubbard wrote:
...
>> Ok just to make this clear I threw up my current tree with your patches here:
>>
>> https://github.com/weiny2/linux-kernel/commits/mmotm-rdmafsdax-b0-v4
>>
>> I'm talking about dropping the final patch:
>> 05fd2d3afa6b rdma/umem_odp: Use vaddr_pin_pages_remote() in ODP
>>
>> The other 2 can stay.  I split out the *_remote() call.  We don't have a user
>> but I'll keep it around for a bit.
>>
>> This tree is still WIP as I work through all the comments.  So I've not changed
>> names or variable types etc...  Just wanted to settle this.
>>
> 
> Right. And now that ODP is not a user, I'll take a quick look through my other
> call site conversions and see if I can find an easy one, to include here as
> the first user of vaddr_pin_pages_remote(). I'll send it your way if that
> works out.
> 

OK, there was only process_vm_access.c, plus (sort of) Bharath's sgi-gru
patch, maybe eventually [1].  But looking at process_vm_access.c, I think 
it is one of the patches that is no longer applicable, and I can just
drop it entirely...I'd welcome a second opinion on that...

So we might be all out of potential users for vaddr_pin_pages_remote()!

For quick reference, it looks like this:
 
diff --git a/mm/process_vm_access.c b/mm/process_vm_access.c
index 357aa7bef6c0..4d29d54ec93f 100644
--- a/mm/process_vm_access.c
+++ b/mm/process_vm_access.c
@@ -96,7 +96,7 @@ static int process_vm_rw_single_vec(unsigned long addr,
                flags |= FOLL_WRITE;
 
        while (!rc && nr_pages && iov_iter_count(iter)) {
-               int pages = min(nr_pages, max_pages_per_loop);
+               int pinned_pages = min(nr_pages, max_pages_per_loop);
                int locked = 1;
                size_t bytes;
 
@@ -106,14 +106,15 @@ static int process_vm_rw_single_vec(unsigned long addr,
                 * current/current->mm
                 */
                down_read(&mm->mmap_sem);
-               pages = get_user_pages_remote(task, mm, pa, pages, flags,
-                                             process_pages, NULL, &locked);
+               pinned_pages = get_user_pages_remote(task, mm, pa, pinned_pages,
+                                                    flags, process_pages, NULL,
+                                                    &locked);
                if (locked)
                        up_read(&mm->mmap_sem);
-               if (pages <= 0)
+               if (pinned_pages <= 0)
                        return -EFAULT;
 
-               bytes = pages * PAGE_SIZE - start_offset;
+               bytes = pinned_pages * PAGE_SIZE - start_offset;
                if (bytes > len)
                        bytes = len;
 
@@ -122,10 +123,9 @@ static int process_vm_rw_single_vec(unsigned long addr,
                                         vm_write);
                len -= bytes;
                start_offset = 0;
-               nr_pages -= pages;
-               pa += pages * PAGE_SIZE;
-               while (pages)
-                       put_page(process_pages[--pages]);
+               nr_pages -= pinned_pages;
+               pa += pinned_pages * PAGE_SIZE;
+               put_user_pages(process_pages, pinned_pages);
        }
 
        return rc;


[1] https://lore.kernel.org/r/1565379497-29266-2-git-send-email-linux.bhar@gmail.com

thanks,
-- 
John Hubbard
NVIDIA


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 2/2] mm/gup: introduce vaddr_pin_pages_remote()
  2019-08-15 13:35                     ` Jan Kara
  2019-08-15 14:51                       ` Jason Gunthorpe
  2019-08-15 17:32                       ` Ira Weiny
@ 2019-08-16  8:47                       ` Vlastimil Babka
  2019-08-16 15:44                         ` Jan Kara
  2 siblings, 1 reply; 31+ messages in thread
From: Vlastimil Babka @ 2019-08-16  8:47 UTC (permalink / raw)
  To: Jan Kara, John Hubbard
  Cc: Ira Weiny, Andrew Morton, Christoph Hellwig, Dan Williams,
	Dave Chinner, Jason Gunthorpe, Jérôme Glisse, LKML,
	linux-mm, linux-fsdevel, linux-rdma

On 8/15/19 3:35 PM, Jan Kara wrote:
>> 
>> So when the GUP user uses MMU notifiers to stop writing to pages whenever
>> they are writeprotected with page_mkclean(), they don't really need page
>> pin - their access is then fully equivalent to any other mmap userspace
>> access and filesystem knows how to deal with those. I forgot out this case
>> when I wrote the above sentence.
>> 
>> So to sum up there are three cases:
>> 1) DIO case - GUP references to pages serving as DIO buffers are needed for
>>    relatively short time, no special synchronization with page_mkclean() or
>>    munmap() => needs FOLL_PIN
>> 2) RDMA case - GUP references to pages serving as DMA buffers needed for a
>>    long time, no special synchronization with page_mkclean() or munmap()
>>    => needs FOLL_PIN | FOLL_LONGTERM
>>    This case has also a special case when the pages are actually DAX. Then
>>    the caller additionally needs file lease and additional file_pin
>>    structure is used for tracking this usage.
>> 3) ODP case - GUP references to pages serving as DMA buffers, MMU notifiers
>>    used to synchronize with page_mkclean() and munmap() => normal page
>>    references are fine.

IMHO the munlock lesson told us about another one, that's in the end equivalent
to 3)

4) pinning for struct page manipulation only => normal page references are fine

> I want to add that I'd like to convert users in cases 1) and 2) from using
> GUP to using differently named function. Users in case 3) can stay as they
> are for now although ultimately I'd like to denote such use cases in a
> special way as well...

So after 1/2/3 is renamed/specially denoted, only 4) keeps the current interface?

> 								Honza
> 



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 2/2] mm/gup: introduce vaddr_pin_pages_remote()
  2019-08-16  2:14                           ` John Hubbard
@ 2019-08-16 15:41                             ` Jan Kara
  2019-08-16 18:33                               ` Ira Weiny
  0 siblings, 1 reply; 31+ messages in thread
From: Jan Kara @ 2019-08-16 15:41 UTC (permalink / raw)
  To: John Hubbard
  Cc: Ira Weiny, Jan Kara, Andrew Morton, Christoph Hellwig,
	Dan Williams, Dave Chinner, Jason Gunthorpe,
	Jérôme Glisse, LKML, linux-mm, linux-fsdevel,
	linux-rdma

On Thu 15-08-19 19:14:08, John Hubbard wrote:
> On 8/15/19 10:41 AM, John Hubbard wrote:
> > On 8/15/19 10:32 AM, Ira Weiny wrote:
> >> On Thu, Aug 15, 2019 at 03:35:10PM +0200, Jan Kara wrote:
> >>> On Thu 15-08-19 15:26:22, Jan Kara wrote:
> >>>> On Wed 14-08-19 20:01:07, John Hubbard wrote:
> >>>>> On 8/14/19 5:02 PM, John Hubbard wrote:
> ...
> >> Ok just to make this clear I threw up my current tree with your patches here:
> >>
> >> https://github.com/weiny2/linux-kernel/commits/mmotm-rdmafsdax-b0-v4
> >>
> >> I'm talking about dropping the final patch:
> >> 05fd2d3afa6b rdma/umem_odp: Use vaddr_pin_pages_remote() in ODP
> >>
> >> The other 2 can stay.  I split out the *_remote() call.  We don't have a user
> >> but I'll keep it around for a bit.
> >>
> >> This tree is still WIP as I work through all the comments.  So I've not changed
> >> names or variable types etc...  Just wanted to settle this.
> >>
> > 
> > Right. And now that ODP is not a user, I'll take a quick look through my other
> > call site conversions and see if I can find an easy one, to include here as
> > the first user of vaddr_pin_pages_remote(). I'll send it your way if that
> > works out.
> > 
> 
> OK, there was only process_vm_access.c, plus (sort of) Bharath's sgi-gru
> patch, maybe eventually [1].  But looking at process_vm_access.c, I think 
> it is one of the patches that is no longer applicable, and I can just
> drop it entirely...I'd welcome a second opinion on that...

I don't think you can drop the patch. process_vm_rw_pages() clearly touches
page contents and does not synchronize with page_mkclean(). So it is case
1) and needs FOLL_PIN semantics.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 2/2] mm/gup: introduce vaddr_pin_pages_remote()
  2019-08-16  8:47                       ` Vlastimil Babka
@ 2019-08-16 15:44                         ` Jan Kara
  2019-08-16 15:52                           ` Jerome Glisse
  0 siblings, 1 reply; 31+ messages in thread
From: Jan Kara @ 2019-08-16 15:44 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Jan Kara, John Hubbard, Ira Weiny, Andrew Morton,
	Christoph Hellwig, Dan Williams, Dave Chinner, Jason Gunthorpe,
	Jérôme Glisse, LKML, linux-mm, linux-fsdevel,
	linux-rdma

On Fri 16-08-19 10:47:21, Vlastimil Babka wrote:
> On 8/15/19 3:35 PM, Jan Kara wrote:
> >> 
> >> So when the GUP user uses MMU notifiers to stop writing to pages whenever
> >> they are writeprotected with page_mkclean(), they don't really need page
> >> pin - their access is then fully equivalent to any other mmap userspace
> >> access and filesystem knows how to deal with those. I forgot out this case
> >> when I wrote the above sentence.
> >> 
> >> So to sum up there are three cases:
> >> 1) DIO case - GUP references to pages serving as DIO buffers are needed for
> >>    relatively short time, no special synchronization with page_mkclean() or
> >>    munmap() => needs FOLL_PIN
> >> 2) RDMA case - GUP references to pages serving as DMA buffers needed for a
> >>    long time, no special synchronization with page_mkclean() or munmap()
> >>    => needs FOLL_PIN | FOLL_LONGTERM
> >>    This case has also a special case when the pages are actually DAX. Then
> >>    the caller additionally needs file lease and additional file_pin
> >>    structure is used for tracking this usage.
> >> 3) ODP case - GUP references to pages serving as DMA buffers, MMU notifiers
> >>    used to synchronize with page_mkclean() and munmap() => normal page
> >>    references are fine.
> 
> IMHO the munlock lesson told us about another one, that's in the end equivalent
> to 3)
> 
> 4) pinning for struct page manipulation only => normal page references
> are fine

Right, it's good to have this for clarity.

> > I want to add that I'd like to convert users in cases 1) and 2) from using
> > GUP to using differently named function. Users in case 3) can stay as they
> > are for now although ultimately I'd like to denote such use cases in a
> > special way as well...
> 
> So after 1/2/3 is renamed/specially denoted, only 4) keeps the current
> interface?

Well, munlock() code doesn't even use GUP, just follow_page(). I'd wait to
see what's left after handling cases 1), 2), and 3) to decide about the
interface for the remainder.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 2/2] mm/gup: introduce vaddr_pin_pages_remote()
  2019-08-16 15:44                         ` Jan Kara
@ 2019-08-16 15:52                           ` Jerome Glisse
  2019-08-16 16:13                             ` Jan Kara
  0 siblings, 1 reply; 31+ messages in thread
From: Jerome Glisse @ 2019-08-16 15:52 UTC (permalink / raw)
  To: Jan Kara
  Cc: Vlastimil Babka, John Hubbard, Ira Weiny, Andrew Morton,
	Christoph Hellwig, Dan Williams, Dave Chinner, Jason Gunthorpe,
	LKML, linux-mm, linux-fsdevel, linux-rdma

On Fri, Aug 16, 2019 at 05:44:04PM +0200, Jan Kara wrote:
> On Fri 16-08-19 10:47:21, Vlastimil Babka wrote:
> > On 8/15/19 3:35 PM, Jan Kara wrote:
> > >> 
> > >> So when the GUP user uses MMU notifiers to stop writing to pages whenever
> > >> they are writeprotected with page_mkclean(), they don't really need page
> > >> pin - their access is then fully equivalent to any other mmap userspace
> > >> access and filesystem knows how to deal with those. I forgot out this case
> > >> when I wrote the above sentence.
> > >> 
> > >> So to sum up there are three cases:
> > >> 1) DIO case - GUP references to pages serving as DIO buffers are needed for
> > >>    relatively short time, no special synchronization with page_mkclean() or
> > >>    munmap() => needs FOLL_PIN
> > >> 2) RDMA case - GUP references to pages serving as DMA buffers needed for a
> > >>    long time, no special synchronization with page_mkclean() or munmap()
> > >>    => needs FOLL_PIN | FOLL_LONGTERM
> > >>    This case has also a special case when the pages are actually DAX. Then
> > >>    the caller additionally needs file lease and additional file_pin
> > >>    structure is used for tracking this usage.
> > >> 3) ODP case - GUP references to pages serving as DMA buffers, MMU notifiers
> > >>    used to synchronize with page_mkclean() and munmap() => normal page
> > >>    references are fine.
> > 
> > IMHO the munlock lesson told us about another one, that's in the end equivalent
> > to 3)
> > 
> > 4) pinning for struct page manipulation only => normal page references
> > are fine
> 
> Right, it's good to have this for clarity.
> 
> > > I want to add that I'd like to convert users in cases 1) and 2) from using
> > > GUP to using differently named function. Users in case 3) can stay as they
> > > are for now although ultimately I'd like to denote such use cases in a
> > > special way as well...
> > 
> > So after 1/2/3 is renamed/specially denoted, only 4) keeps the current
> > interface?
> 
> Well, munlock() code doesn't even use GUP, just follow_page(). I'd wait to
> see what's left after handling cases 1), 2), and 3) to decide about the
> interface for the remainder.
> 

For 3 we do not need to take a reference at all :) So just forget about 3
it does not exist. For 3 the reference is the reference the CPU page table
has on the page and that's it. GUP is no longer involve in ODP or anything
like that.

Cheers,
Jérôme


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 2/2] mm/gup: introduce vaddr_pin_pages_remote()
  2019-08-16 15:52                           ` Jerome Glisse
@ 2019-08-16 16:13                             ` Jan Kara
  2019-08-16 16:31                               ` Jason Gunthorpe
  2019-08-16 16:54                               ` Jerome Glisse
  0 siblings, 2 replies; 31+ messages in thread
From: Jan Kara @ 2019-08-16 16:13 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Jan Kara, Vlastimil Babka, John Hubbard, Ira Weiny,
	Andrew Morton, Christoph Hellwig, Dan Williams, Dave Chinner,
	Jason Gunthorpe, LKML, linux-mm, linux-fsdevel, linux-rdma

On Fri 16-08-19 11:52:20, Jerome Glisse wrote:
> On Fri, Aug 16, 2019 at 05:44:04PM +0200, Jan Kara wrote:
> > On Fri 16-08-19 10:47:21, Vlastimil Babka wrote:
> > > On 8/15/19 3:35 PM, Jan Kara wrote:
> > > >> 
> > > >> So when the GUP user uses MMU notifiers to stop writing to pages whenever
> > > >> they are writeprotected with page_mkclean(), they don't really need page
> > > >> pin - their access is then fully equivalent to any other mmap userspace
> > > >> access and filesystem knows how to deal with those. I forgot out this case
> > > >> when I wrote the above sentence.
> > > >> 
> > > >> So to sum up there are three cases:
> > > >> 1) DIO case - GUP references to pages serving as DIO buffers are needed for
> > > >>    relatively short time, no special synchronization with page_mkclean() or
> > > >>    munmap() => needs FOLL_PIN
> > > >> 2) RDMA case - GUP references to pages serving as DMA buffers needed for a
> > > >>    long time, no special synchronization with page_mkclean() or munmap()
> > > >>    => needs FOLL_PIN | FOLL_LONGTERM
> > > >>    This case has also a special case when the pages are actually DAX. Then
> > > >>    the caller additionally needs file lease and additional file_pin
> > > >>    structure is used for tracking this usage.
> > > >> 3) ODP case - GUP references to pages serving as DMA buffers, MMU notifiers
> > > >>    used to synchronize with page_mkclean() and munmap() => normal page
> > > >>    references are fine.
> > > 
> > > IMHO the munlock lesson told us about another one, that's in the end equivalent
> > > to 3)
> > > 
> > > 4) pinning for struct page manipulation only => normal page references
> > > are fine
> > 
> > Right, it's good to have this for clarity.
> > 
> > > > I want to add that I'd like to convert users in cases 1) and 2) from using
> > > > GUP to using differently named function. Users in case 3) can stay as they
> > > > are for now although ultimately I'd like to denote such use cases in a
> > > > special way as well...
> > > 
> > > So after 1/2/3 is renamed/specially denoted, only 4) keeps the current
> > > interface?
> > 
> > Well, munlock() code doesn't even use GUP, just follow_page(). I'd wait to
> > see what's left after handling cases 1), 2), and 3) to decide about the
> > interface for the remainder.
> > 
> 
> For 3 we do not need to take a reference at all :) So just forget about 3
> it does not exist. For 3 the reference is the reference the CPU page table
> has on the page and that's it. GUP is no longer involve in ODP or anything
> like that.

Yes, I understand. But the fact is that GUP calls are currently still there
e.g. in ODP code. If you can make the code work without taking a page
reference at all, I'm only happy :)

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 2/2] mm/gup: introduce vaddr_pin_pages_remote()
  2019-08-16 16:13                             ` Jan Kara
@ 2019-08-16 16:31                               ` Jason Gunthorpe
  2019-08-16 16:54                               ` Jerome Glisse
  1 sibling, 0 replies; 31+ messages in thread
From: Jason Gunthorpe @ 2019-08-16 16:31 UTC (permalink / raw)
  To: Jan Kara
  Cc: Jerome Glisse, Vlastimil Babka, John Hubbard, Ira Weiny,
	Andrew Morton, Christoph Hellwig, Dan Williams, Dave Chinner,
	LKML, linux-mm, linux-fsdevel, linux-rdma

On Fri, Aug 16, 2019 at 06:13:55PM +0200, Jan Kara wrote:

> > For 3 we do not need to take a reference at all :) So just forget about 3
> > it does not exist. For 3 the reference is the reference the CPU page table
> > has on the page and that's it. GUP is no longer involve in ODP or anything
> > like that.
> 
> Yes, I understand. But the fact is that GUP calls are currently still there
> e.g. in ODP code. If you can make the code work without taking a page
> reference at all, I'm only happy :)

We are working on it :)

Jason


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 2/2] mm/gup: introduce vaddr_pin_pages_remote()
  2019-08-16 16:13                             ` Jan Kara
  2019-08-16 16:31                               ` Jason Gunthorpe
@ 2019-08-16 16:54                               ` Jerome Glisse
  2019-08-16 17:04                                 ` Jason Gunthorpe
  1 sibling, 1 reply; 31+ messages in thread
From: Jerome Glisse @ 2019-08-16 16:54 UTC (permalink / raw)
  To: Jan Kara
  Cc: Vlastimil Babka, John Hubbard, Ira Weiny, Andrew Morton,
	Christoph Hellwig, Dan Williams, Dave Chinner, Jason Gunthorpe,
	LKML, linux-mm, linux-fsdevel, linux-rdma

On Fri, Aug 16, 2019 at 06:13:55PM +0200, Jan Kara wrote:
> On Fri 16-08-19 11:52:20, Jerome Glisse wrote:
> > On Fri, Aug 16, 2019 at 05:44:04PM +0200, Jan Kara wrote:
> > > On Fri 16-08-19 10:47:21, Vlastimil Babka wrote:
> > > > On 8/15/19 3:35 PM, Jan Kara wrote:
> > > > >> 
> > > > >> So when the GUP user uses MMU notifiers to stop writing to pages whenever
> > > > >> they are writeprotected with page_mkclean(), they don't really need page
> > > > >> pin - their access is then fully equivalent to any other mmap userspace
> > > > >> access and filesystem knows how to deal with those. I forgot out this case
> > > > >> when I wrote the above sentence.
> > > > >> 
> > > > >> So to sum up there are three cases:
> > > > >> 1) DIO case - GUP references to pages serving as DIO buffers are needed for
> > > > >>    relatively short time, no special synchronization with page_mkclean() or
> > > > >>    munmap() => needs FOLL_PIN
> > > > >> 2) RDMA case - GUP references to pages serving as DMA buffers needed for a
> > > > >>    long time, no special synchronization with page_mkclean() or munmap()
> > > > >>    => needs FOLL_PIN | FOLL_LONGTERM
> > > > >>    This case has also a special case when the pages are actually DAX. Then
> > > > >>    the caller additionally needs file lease and additional file_pin
> > > > >>    structure is used for tracking this usage.
> > > > >> 3) ODP case - GUP references to pages serving as DMA buffers, MMU notifiers
> > > > >>    used to synchronize with page_mkclean() and munmap() => normal page
> > > > >>    references are fine.
> > > > 
> > > > IMHO the munlock lesson told us about another one, that's in the end equivalent
> > > > to 3)
> > > > 
> > > > 4) pinning for struct page manipulation only => normal page references
> > > > are fine
> > > 
> > > Right, it's good to have this for clarity.
> > > 
> > > > > I want to add that I'd like to convert users in cases 1) and 2) from using
> > > > > GUP to using differently named function. Users in case 3) can stay as they
> > > > > are for now although ultimately I'd like to denote such use cases in a
> > > > > special way as well...
> > > > 
> > > > So after 1/2/3 is renamed/specially denoted, only 4) keeps the current
> > > > interface?
> > > 
> > > Well, munlock() code doesn't even use GUP, just follow_page(). I'd wait to
> > > see what's left after handling cases 1), 2), and 3) to decide about the
> > > interface for the remainder.
> > > 
> > 
> > For 3 we do not need to take a reference at all :) So just forget about 3
> > it does not exist. For 3 the reference is the reference the CPU page table
> > has on the page and that's it. GUP is no longer involve in ODP or anything
> > like that.
> 
> Yes, I understand. But the fact is that GUP calls are currently still there
> e.g. in ODP code. If you can make the code work without taking a page
> reference at all, I'm only happy :)

Already in rdma next AFAIK so in 5.4 it will be gone :) i have been
removing all GUP users that do not need reference. Intel i915 driver
is a left over i will work some more with them to get rid of it too.

Cheers,
Jérôme


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 2/2] mm/gup: introduce vaddr_pin_pages_remote()
  2019-08-16 16:54                               ` Jerome Glisse
@ 2019-08-16 17:04                                 ` Jason Gunthorpe
  0 siblings, 0 replies; 31+ messages in thread
From: Jason Gunthorpe @ 2019-08-16 17:04 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Jan Kara, Vlastimil Babka, John Hubbard, Ira Weiny,
	Andrew Morton, Christoph Hellwig, Dan Williams, Dave Chinner,
	LKML, linux-mm, linux-fsdevel, linux-rdma

On Fri, Aug 16, 2019 at 12:54:45PM -0400, Jerome Glisse wrote:

> > Yes, I understand. But the fact is that GUP calls are currently still there
> > e.g. in ODP code. If you can make the code work without taking a page
> > reference at all, I'm only happy :)
> 
> Already in rdma next AFAIK so in 5.4 it will be gone :)

Unfortunately no.. only a lot of patches supporting this change will
be in 5.4. The notifiers are still a problem, and I need to figure out
if the edge cases in hmm_range_fault are OK for ODP or not. :(

This is taking a long time in part because ODP itself has all sorts of
problems that make it hard to tell if the other changes are safe or
not..

Lots of effort is being spent to get there though.

Jason


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 2/2] mm/gup: introduce vaddr_pin_pages_remote()
  2019-08-16 15:41                             ` Jan Kara
@ 2019-08-16 18:33                               ` Ira Weiny
  2019-08-16 18:50                                 ` John Hubbard
  0 siblings, 1 reply; 31+ messages in thread
From: Ira Weiny @ 2019-08-16 18:33 UTC (permalink / raw)
  To: Jan Kara
  Cc: John Hubbard, Andrew Morton, Christoph Hellwig, Dan Williams,
	Dave Chinner, Jason Gunthorpe, Jérôme Glisse, LKML,
	linux-mm, linux-fsdevel, linux-rdma

On Fri, Aug 16, 2019 at 05:41:08PM +0200, Jan Kara wrote:
> On Thu 15-08-19 19:14:08, John Hubbard wrote:
> > On 8/15/19 10:41 AM, John Hubbard wrote:
> > > On 8/15/19 10:32 AM, Ira Weiny wrote:
> > >> On Thu, Aug 15, 2019 at 03:35:10PM +0200, Jan Kara wrote:
> > >>> On Thu 15-08-19 15:26:22, Jan Kara wrote:
> > >>>> On Wed 14-08-19 20:01:07, John Hubbard wrote:
> > >>>>> On 8/14/19 5:02 PM, John Hubbard wrote:
> > ...
> > >> Ok just to make this clear I threw up my current tree with your patches here:
> > >>
> > >> https://github.com/weiny2/linux-kernel/commits/mmotm-rdmafsdax-b0-v4
> > >>
> > >> I'm talking about dropping the final patch:
> > >> 05fd2d3afa6b rdma/umem_odp: Use vaddr_pin_pages_remote() in ODP
> > >>
> > >> The other 2 can stay.  I split out the *_remote() call.  We don't have a user
> > >> but I'll keep it around for a bit.
> > >>
> > >> This tree is still WIP as I work through all the comments.  So I've not changed
> > >> names or variable types etc...  Just wanted to settle this.
> > >>
> > > 
> > > Right. And now that ODP is not a user, I'll take a quick look through my other
> > > call site conversions and see if I can find an easy one, to include here as
> > > the first user of vaddr_pin_pages_remote(). I'll send it your way if that
> > > works out.
> > > 
> > 
> > OK, there was only process_vm_access.c, plus (sort of) Bharath's sgi-gru
> > patch, maybe eventually [1].  But looking at process_vm_access.c, I think 
> > it is one of the patches that is no longer applicable, and I can just
> > drop it entirely...I'd welcome a second opinion on that...
> 
> I don't think you can drop the patch. process_vm_rw_pages() clearly touches
> page contents and does not synchronize with page_mkclean(). So it is case
> 1) and needs FOLL_PIN semantics.

John could you send a formal patch using vaddr_pin* and I'll add it to the
tree?

Ira

> 
> 								Honza
> -- 
> Jan Kara <jack@suse.com>
> SUSE Labs, CR
> 


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 2/2] mm/gup: introduce vaddr_pin_pages_remote()
  2019-08-16 18:33                               ` Ira Weiny
@ 2019-08-16 18:50                                 ` John Hubbard
  2019-08-16 21:59                                   ` Ira Weiny
  0 siblings, 1 reply; 31+ messages in thread
From: John Hubbard @ 2019-08-16 18:50 UTC (permalink / raw)
  To: Ira Weiny, Jan Kara
  Cc: Andrew Morton, Christoph Hellwig, Dan Williams, Dave Chinner,
	Jason Gunthorpe, Jérôme Glisse, LKML, linux-mm,
	linux-fsdevel, linux-rdma

On 8/16/19 11:33 AM, Ira Weiny wrote:
> On Fri, Aug 16, 2019 at 05:41:08PM +0200, Jan Kara wrote:
>> On Thu 15-08-19 19:14:08, John Hubbard wrote:
>>> On 8/15/19 10:41 AM, John Hubbard wrote:
>>>> On 8/15/19 10:32 AM, Ira Weiny wrote:
>>>>> On Thu, Aug 15, 2019 at 03:35:10PM +0200, Jan Kara wrote:
>>>>>> On Thu 15-08-19 15:26:22, Jan Kara wrote:
>>>>>>> On Wed 14-08-19 20:01:07, John Hubbard wrote:
>>>>>>>> On 8/14/19 5:02 PM, John Hubbard wrote:
>>> ...
>>>
>>> OK, there was only process_vm_access.c, plus (sort of) Bharath's sgi-gru
>>> patch, maybe eventually [1].  But looking at process_vm_access.c, I think
>>> it is one of the patches that is no longer applicable, and I can just
>>> drop it entirely...I'd welcome a second opinion on that...
>>
>> I don't think you can drop the patch. process_vm_rw_pages() clearly touches
>> page contents and does not synchronize with page_mkclean(). So it is case
>> 1) and needs FOLL_PIN semantics.
> 
> John could you send a formal patch using vaddr_pin* and I'll add it to the
> tree?
> 

Yes...hints about which struct file to use here are very welcome, btw. This part
of mm is fairly new to me.

thanks,
-- 
John Hubbard
NVIDIA


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 2/2] mm/gup: introduce vaddr_pin_pages_remote()
  2019-08-16 18:50                                 ` John Hubbard
@ 2019-08-16 21:59                                   ` Ira Weiny
  2019-08-16 22:36                                     ` John Hubbard
  0 siblings, 1 reply; 31+ messages in thread
From: Ira Weiny @ 2019-08-16 21:59 UTC (permalink / raw)
  To: John Hubbard
  Cc: Jan Kara, Andrew Morton, Christoph Hellwig, Dan Williams,
	Dave Chinner, Jason Gunthorpe, Jérôme Glisse, LKML,
	linux-mm, linux-fsdevel, linux-rdma

On Fri, Aug 16, 2019 at 11:50:09AM -0700, John Hubbard wrote:
> On 8/16/19 11:33 AM, Ira Weiny wrote:
> > On Fri, Aug 16, 2019 at 05:41:08PM +0200, Jan Kara wrote:
> > > On Thu 15-08-19 19:14:08, John Hubbard wrote:
> > > > On 8/15/19 10:41 AM, John Hubbard wrote:
> > > > > On 8/15/19 10:32 AM, Ira Weiny wrote:
> > > > > > On Thu, Aug 15, 2019 at 03:35:10PM +0200, Jan Kara wrote:
> > > > > > > On Thu 15-08-19 15:26:22, Jan Kara wrote:
> > > > > > > > On Wed 14-08-19 20:01:07, John Hubbard wrote:
> > > > > > > > > On 8/14/19 5:02 PM, John Hubbard wrote:
> > > > ...
> > > > 
> > > > OK, there was only process_vm_access.c, plus (sort of) Bharath's sgi-gru
> > > > patch, maybe eventually [1].  But looking at process_vm_access.c, I think
> > > > it is one of the patches that is no longer applicable, and I can just
> > > > drop it entirely...I'd welcome a second opinion on that...
> > > 
> > > I don't think you can drop the patch. process_vm_rw_pages() clearly touches
> > > page contents and does not synchronize with page_mkclean(). So it is case
> > > 1) and needs FOLL_PIN semantics.
> > 
> > John could you send a formal patch using vaddr_pin* and I'll add it to the
> > tree?
> > 
> 
> Yes...hints about which struct file to use here are very welcome, btw. This part
> of mm is fairly new to me.

I'm still working out the final semantics of vaddr_pin*.  But right now you
don't need a vaddr_pin if you don't specify FOLL_LONGTERM.

Since case 1, this case, does not need FOLL_LONGTERM I think it is safe to
simply pass NULL here.

OTOH we could just track this against the mm_struct.  But I don't think we need
to because this pin should be transient.

And this is why I keep leaning toward _not_ putting these flags in the
vaddr_pin*() calls.  I know this is what I did but I think I'm wrong.  It should
be the caller specifying what they want and the vaddr_pin*() calls check that
what they are asking for is correct.

Ira

> 
> thanks,
> -- 
> John Hubbard
> NVIDIA


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 2/2] mm/gup: introduce vaddr_pin_pages_remote()
  2019-08-16 21:59                                   ` Ira Weiny
@ 2019-08-16 22:36                                     ` John Hubbard
  0 siblings, 0 replies; 31+ messages in thread
From: John Hubbard @ 2019-08-16 22:36 UTC (permalink / raw)
  To: Ira Weiny
  Cc: Jan Kara, Andrew Morton, Christoph Hellwig, Dan Williams,
	Dave Chinner, Jason Gunthorpe, Jérôme Glisse, LKML,
	linux-mm, linux-fsdevel, linux-rdma

On 8/16/19 2:59 PM, Ira Weiny wrote:
> On Fri, Aug 16, 2019 at 11:50:09AM -0700, John Hubbard wrote:
...
>>> John could you send a formal patch using vaddr_pin* and I'll add it to the
>>> tree?
>>>
>>
>> Yes...hints about which struct file to use here are very welcome, btw. This part
>> of mm is fairly new to me.
> 
> I'm still working out the final semantics of vaddr_pin*.  But right now you
> don't need a vaddr_pin if you don't specify FOLL_LONGTERM.
> 

ah OK.

> Since case 1, this case, does not need FOLL_LONGTERM I think it is safe to
> simply pass NULL here.
> 
> OTOH we could just track this against the mm_struct.  But I don't think we need
> to because this pin should be transient.
> 

Thanks for looking at that, I'm definitely in learning mode here.

> And this is why I keep leaning toward _not_ putting these flags in the
> vaddr_pin*() calls.  I know this is what I did but I think I'm wrong.  It should
> be the caller specifying what they want and the vaddr_pin*() calls check that
> what they are asking for is correct.
> 

Yes. I think we're nearly done finding the right balance of wrapper calls and
FOLL_* flags. I've seen Jan and others asking that the call sites do *not*
set the flags, but we also know that FOLL_PIN and FOLL_LONGTERM need to vary
independently.

That means either:

a) another trivial wrapper calls, on top of vaddr_pin_*(), for each supported 
combination of FOLL_PIN and FOLL_LONGTERM, or

b) just setting FOLL_PIN and FOLL_LONGTERM at each callsite.

I think either way is easy to grep for, so it's hard to get too excited
(fortunately) about which one to pick. Let's start simple with (b) and it's 
easy to convert later if someone wants that.

Meanwhile, we do need to pull the flag setting out of vaddr_pin_pages().

So I will post these small patches for your mmotm-rdmafsdax-b0-v4 branch,
shortly:

1) Add FOLL_PIN 

   --also I guess it's time to add comments documenting FOLL_PIN and
FOLL_LONGTERM use, stealing Jan's and others' wording for the 4 cases,
from earlier. :)

2) Add vaddr_pin_user_pages_remote(), which will not set FOLL_PIN or FOLL_LONGTERM
itself. And add the caller, which will.

thanks,
-- 
John Hubbard
NVIDIA


^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, back to index

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-12  1:50 [RFC PATCH 0/2] mm/gup: introduce vaddr_pin_pages_remote(), FOLL_PIN john.hubbard
2019-08-12  1:50 ` [RFC PATCH 1/2] mm/gup: introduce FOLL_PIN flag for get_user_pages() john.hubbard
2019-08-12  1:50 ` [RFC PATCH 2/2] mm/gup: introduce vaddr_pin_pages_remote() john.hubbard
2019-08-12 22:03   ` Ira Weiny
2019-08-12 22:21     ` John Hubbard
2019-08-12 23:49   ` Ira Weiny
2019-08-13  0:07     ` John Hubbard
2019-08-13 21:08       ` Ira Weiny
2019-08-14  0:51         ` John Hubbard
2019-08-14  0:56           ` John Hubbard
2019-08-14 23:50             ` Ira Weiny
2019-08-15  0:02               ` John Hubbard
2019-08-15  3:01                 ` John Hubbard
2019-08-15 13:26                   ` Jan Kara
2019-08-15 13:35                     ` Jan Kara
2019-08-15 14:51                       ` Jason Gunthorpe
2019-08-15 17:32                       ` Ira Weiny
2019-08-15 17:41                         ` John Hubbard
2019-08-16  2:14                           ` John Hubbard
2019-08-16 15:41                             ` Jan Kara
2019-08-16 18:33                               ` Ira Weiny
2019-08-16 18:50                                 ` John Hubbard
2019-08-16 21:59                                   ` Ira Weiny
2019-08-16 22:36                                     ` John Hubbard
2019-08-16  8:47                       ` Vlastimil Babka
2019-08-16 15:44                         ` Jan Kara
2019-08-16 15:52                           ` Jerome Glisse
2019-08-16 16:13                             ` Jan Kara
2019-08-16 16:31                               ` Jason Gunthorpe
2019-08-16 16:54                               ` Jerome Glisse
2019-08-16 17:04                                 ` Jason Gunthorpe

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org linux-mm@archiver.kernel.org
	public-inbox-index linux-mm


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/ public-inbox