linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/1] infiniband/mm: convert put_page() to put_user_page*()
@ 2019-05-23  7:25 john.hubbard
  2019-05-23  7:25 ` [PATCH 1/1] " john.hubbard
  0 siblings, 1 reply; 13+ messages in thread
From: john.hubbard @ 2019-05-23  7:25 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Jason Gunthorpe, LKML, linux-rdma, linux-fsdevel, John Hubbard,
	Doug Ledford, Mike Marciniszyn, Dennis Dalessandro,
	Christian Benvenuti, Jan Kara, Jason Gunthorpe, Ira Weiny

From: John Hubbard <jhubbard@nvidia.com>

Hi Jason and all,

IIUC, now that we have the put_user_pages() merged in to linux.git, we can
start sending up the callsite conversions via different subsystem
maintainer trees. Here's one for linux-rdma.

I've left the various Reviewed-by: and Tested-by: tags on here, even
though it's been through a few rebases.

If anyone has hardware, it would be good to get a real test of this.

thanks,
--
John Hubbard
NVIDIA

Cc: Doug Ledford <dledford@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
Cc: Dennis Dalessandro <dennis.dalessandro@intel.com>
Cc: Christian Benvenuti <benve@cisco.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Ira Weiny <ira.weiny@intel.com>

John Hubbard (1):
  infiniband/mm: convert put_page() to put_user_page*()

 drivers/infiniband/core/umem.c              |  7 ++++---
 drivers/infiniband/core/umem_odp.c          | 10 +++++-----
 drivers/infiniband/hw/hfi1/user_pages.c     | 11 ++++-------
 drivers/infiniband/hw/mthca/mthca_memfree.c |  6 +++---
 drivers/infiniband/hw/qib/qib_user_pages.c  | 11 ++++-------
 drivers/infiniband/hw/qib/qib_user_sdma.c   |  6 +++---
 drivers/infiniband/hw/usnic/usnic_uiom.c    |  7 ++++---
 7 files changed, 27 insertions(+), 31 deletions(-)

-- 
2.21.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/1] infiniband/mm: convert put_page() to put_user_page*()
  2019-05-23  7:25 [PATCH 0/1] infiniband/mm: convert put_page() to put_user_page*() john.hubbard
@ 2019-05-23  7:25 ` john.hubbard
  2019-05-23 15:31   ` Jerome Glisse
  2019-05-23 17:28   ` Ira Weiny
  0 siblings, 2 replies; 13+ messages in thread
From: john.hubbard @ 2019-05-23  7:25 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Jason Gunthorpe, LKML, linux-rdma, linux-fsdevel, John Hubbard,
	Doug Ledford, Mike Marciniszyn, Dennis Dalessandro,
	Christian Benvenuti, Jan Kara, Jason Gunthorpe, Ira Weiny

From: John Hubbard <jhubbard@nvidia.com>

For infiniband code that retains pages via get_user_pages*(),
release those pages via the new put_user_page(), or
put_user_pages*(), instead of put_page()

This is a tiny part of the second step of fixing the problem described
in [1]. The steps are:

1) Provide put_user_page*() routines, intended to be used
   for releasing pages that were pinned via get_user_pages*().

2) Convert all of the call sites for get_user_pages*(), to
   invoke put_user_page*(), instead of put_page(). This involves dozens of
   call sites, and will take some time.

3) After (2) is complete, use get_user_pages*() and put_user_page*() to
   implement tracking of these pages. This tracking will be separate from
   the existing struct page refcounting.

4) Use the tracking and identification of these pages, to implement
   special handling (especially in writeback paths) when the pages are
   backed by a filesystem. Again, [1] provides details as to why that is
   desirable.

[1] https://lwn.net/Articles/753027/ : "The Trouble with get_user_pages()"

Cc: Doug Ledford <dledford@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
Cc: Dennis Dalessandro <dennis.dalessandro@intel.com>
Cc: Christian Benvenuti <benve@cisco.com>

Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Acked-by: Jason Gunthorpe <jgg@mellanox.com>
Tested-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
---
 drivers/infiniband/core/umem.c              |  7 ++++---
 drivers/infiniband/core/umem_odp.c          | 10 +++++-----
 drivers/infiniband/hw/hfi1/user_pages.c     | 11 ++++-------
 drivers/infiniband/hw/mthca/mthca_memfree.c |  6 +++---
 drivers/infiniband/hw/qib/qib_user_pages.c  | 11 ++++-------
 drivers/infiniband/hw/qib/qib_user_sdma.c   |  6 +++---
 drivers/infiniband/hw/usnic/usnic_uiom.c    |  7 ++++---
 7 files changed, 27 insertions(+), 31 deletions(-)

diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index e7ea819fcb11..673f0d240b3e 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -54,9 +54,10 @@ static void __ib_umem_release(struct ib_device *dev, struct ib_umem *umem, int d
 
 	for_each_sg_page(umem->sg_head.sgl, &sg_iter, umem->sg_nents, 0) {
 		page = sg_page_iter_page(&sg_iter);
-		if (!PageDirty(page) && umem->writable && dirty)
-			set_page_dirty_lock(page);
-		put_page(page);
+		if (umem->writable && dirty)
+			put_user_pages_dirty_lock(&page, 1);
+		else
+			put_user_page(page);
 	}
 
 	sg_free_table(&umem->sg_head);
diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/umem_odp.c
index f962b5bbfa40..17e46df3990a 100644
--- a/drivers/infiniband/core/umem_odp.c
+++ b/drivers/infiniband/core/umem_odp.c
@@ -487,7 +487,7 @@ void ib_umem_odp_release(struct ib_umem_odp *umem_odp)
  * The function returns -EFAULT if the DMA mapping operation fails. It returns
  * -EAGAIN if a concurrent invalidation prevents us from updating the page.
  *
- * The page is released via put_page even if the operation failed. For
+ * The page is released via put_user_page even if the operation failed. For
  * on-demand pinning, the page is released whenever it isn't stored in the
  * umem.
  */
@@ -536,7 +536,7 @@ static int ib_umem_odp_map_dma_single_page(
 	}
 
 out:
-	put_page(page);
+	put_user_page(page);
 
 	if (remove_existing_mapping) {
 		ib_umem_notifier_start_account(umem_odp);
@@ -659,7 +659,7 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, u64 user_virt,
 					ret = -EFAULT;
 					break;
 				}
-				put_page(local_page_list[j]);
+				put_user_page(local_page_list[j]);
 				continue;
 			}
 
@@ -686,8 +686,8 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, u64 user_virt,
 			 * ib_umem_odp_map_dma_single_page().
 			 */
 			if (npages - (j + 1) > 0)
-				release_pages(&local_page_list[j+1],
-					      npages - (j + 1));
+				put_user_pages(&local_page_list[j+1],
+					       npages - (j + 1));
 			break;
 		}
 	}
diff --git a/drivers/infiniband/hw/hfi1/user_pages.c b/drivers/infiniband/hw/hfi1/user_pages.c
index 02eee8eff1db..b89a9b9aef7a 100644
--- a/drivers/infiniband/hw/hfi1/user_pages.c
+++ b/drivers/infiniband/hw/hfi1/user_pages.c
@@ -118,13 +118,10 @@ int hfi1_acquire_user_pages(struct mm_struct *mm, unsigned long vaddr, size_t np
 void hfi1_release_user_pages(struct mm_struct *mm, struct page **p,
 			     size_t npages, bool dirty)
 {
-	size_t i;
-
-	for (i = 0; i < npages; i++) {
-		if (dirty)
-			set_page_dirty_lock(p[i]);
-		put_page(p[i]);
-	}
+	if (dirty)
+		put_user_pages_dirty_lock(p, npages);
+	else
+		put_user_pages(p, npages);
 
 	if (mm) { /* during close after signal, mm can be NULL */
 		atomic64_sub(npages, &mm->pinned_vm);
diff --git a/drivers/infiniband/hw/mthca/mthca_memfree.c b/drivers/infiniband/hw/mthca/mthca_memfree.c
index 8ff0e90d7564..edccfd6e178f 100644
--- a/drivers/infiniband/hw/mthca/mthca_memfree.c
+++ b/drivers/infiniband/hw/mthca/mthca_memfree.c
@@ -482,7 +482,7 @@ int mthca_map_user_db(struct mthca_dev *dev, struct mthca_uar *uar,
 
 	ret = pci_map_sg(dev->pdev, &db_tab->page[i].mem, 1, PCI_DMA_TODEVICE);
 	if (ret < 0) {
-		put_page(pages[0]);
+		put_user_page(pages[0]);
 		goto out;
 	}
 
@@ -490,7 +490,7 @@ int mthca_map_user_db(struct mthca_dev *dev, struct mthca_uar *uar,
 				 mthca_uarc_virt(dev, uar, i));
 	if (ret) {
 		pci_unmap_sg(dev->pdev, &db_tab->page[i].mem, 1, PCI_DMA_TODEVICE);
-		put_page(sg_page(&db_tab->page[i].mem));
+		put_user_page(sg_page(&db_tab->page[i].mem));
 		goto out;
 	}
 
@@ -556,7 +556,7 @@ void mthca_cleanup_user_db_tab(struct mthca_dev *dev, struct mthca_uar *uar,
 		if (db_tab->page[i].uvirt) {
 			mthca_UNMAP_ICM(dev, mthca_uarc_virt(dev, uar, i), 1);
 			pci_unmap_sg(dev->pdev, &db_tab->page[i].mem, 1, PCI_DMA_TODEVICE);
-			put_page(sg_page(&db_tab->page[i].mem));
+			put_user_page(sg_page(&db_tab->page[i].mem));
 		}
 	}
 
diff --git a/drivers/infiniband/hw/qib/qib_user_pages.c b/drivers/infiniband/hw/qib/qib_user_pages.c
index f712fb7fa82f..bfbfbb7e0ff4 100644
--- a/drivers/infiniband/hw/qib/qib_user_pages.c
+++ b/drivers/infiniband/hw/qib/qib_user_pages.c
@@ -40,13 +40,10 @@
 static void __qib_release_user_pages(struct page **p, size_t num_pages,
 				     int dirty)
 {
-	size_t i;
-
-	for (i = 0; i < num_pages; i++) {
-		if (dirty)
-			set_page_dirty_lock(p[i]);
-		put_page(p[i]);
-	}
+	if (dirty)
+		put_user_pages_dirty_lock(p, num_pages);
+	else
+		put_user_pages(p, num_pages);
 }
 
 /**
diff --git a/drivers/infiniband/hw/qib/qib_user_sdma.c b/drivers/infiniband/hw/qib/qib_user_sdma.c
index 0c204776263f..ac5bdb02144f 100644
--- a/drivers/infiniband/hw/qib/qib_user_sdma.c
+++ b/drivers/infiniband/hw/qib/qib_user_sdma.c
@@ -317,7 +317,7 @@ static int qib_user_sdma_page_to_frags(const struct qib_devdata *dd,
 		 * the caller can ignore this page.
 		 */
 		if (put) {
-			put_page(page);
+			put_user_page(page);
 		} else {
 			/* coalesce case */
 			kunmap(page);
@@ -631,7 +631,7 @@ static void qib_user_sdma_free_pkt_frag(struct device *dev,
 			kunmap(pkt->addr[i].page);
 
 		if (pkt->addr[i].put_page)
-			put_page(pkt->addr[i].page);
+			put_user_page(pkt->addr[i].page);
 		else
 			__free_page(pkt->addr[i].page);
 	} else if (pkt->addr[i].kvaddr) {
@@ -706,7 +706,7 @@ static int qib_user_sdma_pin_pages(const struct qib_devdata *dd,
 	/* if error, return all pages not managed by pkt */
 free_pages:
 	while (i < j)
-		put_page(pages[i++]);
+		put_user_page(pages[i++]);
 
 done:
 	return ret;
diff --git a/drivers/infiniband/hw/usnic/usnic_uiom.c b/drivers/infiniband/hw/usnic/usnic_uiom.c
index e312f522a66d..0b0237d41613 100644
--- a/drivers/infiniband/hw/usnic/usnic_uiom.c
+++ b/drivers/infiniband/hw/usnic/usnic_uiom.c
@@ -75,9 +75,10 @@ static void usnic_uiom_put_pages(struct list_head *chunk_list, int dirty)
 		for_each_sg(chunk->page_list, sg, chunk->nents, i) {
 			page = sg_page(sg);
 			pa = sg_phys(sg);
-			if (!PageDirty(page) && dirty)
-				set_page_dirty_lock(page);
-			put_page(page);
+			if (dirty)
+				put_user_pages_dirty_lock(&page, 1);
+			else
+				put_user_page(page);
 			usnic_dbg("pa: %pa\n", &pa);
 		}
 		kfree(chunk);
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/1] infiniband/mm: convert put_page() to put_user_page*()
  2019-05-23  7:25 ` [PATCH 1/1] " john.hubbard
@ 2019-05-23 15:31   ` Jerome Glisse
  2019-05-23 17:56     ` John Hubbard
  2019-05-23 17:28   ` Ira Weiny
  1 sibling, 1 reply; 13+ messages in thread
From: Jerome Glisse @ 2019-05-23 15:31 UTC (permalink / raw)
  To: john.hubbard
  Cc: Andrew Morton, linux-mm, Jason Gunthorpe, LKML, linux-rdma,
	linux-fsdevel, John Hubbard, Doug Ledford, Mike Marciniszyn,
	Dennis Dalessandro, Christian Benvenuti, Jan Kara,
	Jason Gunthorpe, Ira Weiny

On Thu, May 23, 2019 at 12:25:37AM -0700, john.hubbard@gmail.com wrote:
> From: John Hubbard <jhubbard@nvidia.com>
> 
> For infiniband code that retains pages via get_user_pages*(),
> release those pages via the new put_user_page(), or
> put_user_pages*(), instead of put_page()
> 
> This is a tiny part of the second step of fixing the problem described
> in [1]. The steps are:
> 
> 1) Provide put_user_page*() routines, intended to be used
>    for releasing pages that were pinned via get_user_pages*().
> 
> 2) Convert all of the call sites for get_user_pages*(), to
>    invoke put_user_page*(), instead of put_page(). This involves dozens of
>    call sites, and will take some time.
> 
> 3) After (2) is complete, use get_user_pages*() and put_user_page*() to
>    implement tracking of these pages. This tracking will be separate from
>    the existing struct page refcounting.
> 
> 4) Use the tracking and identification of these pages, to implement
>    special handling (especially in writeback paths) when the pages are
>    backed by a filesystem. Again, [1] provides details as to why that is
>    desirable.
> 
> [1] https://lwn.net/Articles/753027/ : "The Trouble with get_user_pages()"
> 
> Cc: Doug Ledford <dledford@redhat.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
> Cc: Dennis Dalessandro <dennis.dalessandro@intel.com>
> Cc: Christian Benvenuti <benve@cisco.com>
> 
> Reviewed-by: Jan Kara <jack@suse.cz>
> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
> Acked-by: Jason Gunthorpe <jgg@mellanox.com>
> Tested-by: Ira Weiny <ira.weiny@intel.com>
> Signed-off-by: John Hubbard <jhubbard@nvidia.com>

Reviewed-by: Jérôme Glisse <jglisse@redhat.com>

Between i have a wishlist see below


> ---
>  drivers/infiniband/core/umem.c              |  7 ++++---
>  drivers/infiniband/core/umem_odp.c          | 10 +++++-----
>  drivers/infiniband/hw/hfi1/user_pages.c     | 11 ++++-------
>  drivers/infiniband/hw/mthca/mthca_memfree.c |  6 +++---
>  drivers/infiniband/hw/qib/qib_user_pages.c  | 11 ++++-------
>  drivers/infiniband/hw/qib/qib_user_sdma.c   |  6 +++---
>  drivers/infiniband/hw/usnic/usnic_uiom.c    |  7 ++++---
>  7 files changed, 27 insertions(+), 31 deletions(-)
> 
> diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
> index e7ea819fcb11..673f0d240b3e 100644
> --- a/drivers/infiniband/core/umem.c
> +++ b/drivers/infiniband/core/umem.c
> @@ -54,9 +54,10 @@ static void __ib_umem_release(struct ib_device *dev, struct ib_umem *umem, int d
>  
>  	for_each_sg_page(umem->sg_head.sgl, &sg_iter, umem->sg_nents, 0) {
>  		page = sg_page_iter_page(&sg_iter);
> -		if (!PageDirty(page) && umem->writable && dirty)
> -			set_page_dirty_lock(page);
> -		put_page(page);
> +		if (umem->writable && dirty)
> +			put_user_pages_dirty_lock(&page, 1);
> +		else
> +			put_user_page(page);

Can we get a put_user_page_dirty(struct page 8*pages, bool dirty, npages) ?

It is a common pattern that we might have to conditionaly dirty the pages
and i feel it would look cleaner if we could move the branch within the
put_user_page*() function.

Cheers,
Jérôme

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/1] infiniband/mm: convert put_page() to put_user_page*()
  2019-05-23  7:25 ` [PATCH 1/1] " john.hubbard
  2019-05-23 15:31   ` Jerome Glisse
@ 2019-05-23 17:28   ` Ira Weiny
  2019-05-23 17:32     ` Jason Gunthorpe
  1 sibling, 1 reply; 13+ messages in thread
From: Ira Weiny @ 2019-05-23 17:28 UTC (permalink / raw)
  To: john.hubbard
  Cc: Andrew Morton, linux-mm, Jason Gunthorpe, LKML, linux-rdma,
	linux-fsdevel, John Hubbard, Doug Ledford, Mike Marciniszyn,
	Dennis Dalessandro, Christian Benvenuti, Jan Kara,
	Jason Gunthorpe

On Thu, May 23, 2019 at 12:25:37AM -0700, john.hubbard@gmail.com wrote:
> From: John Hubbard <jhubbard@nvidia.com>
> 
> For infiniband code that retains pages via get_user_pages*(),
> release those pages via the new put_user_page(), or
> put_user_pages*(), instead of put_page()
> 
> This is a tiny part of the second step of fixing the problem described
> in [1]. The steps are:
> 
> 1) Provide put_user_page*() routines, intended to be used
>    for releasing pages that were pinned via get_user_pages*().
> 
> 2) Convert all of the call sites for get_user_pages*(), to
>    invoke put_user_page*(), instead of put_page(). This involves dozens of
>    call sites, and will take some time.
> 
> 3) After (2) is complete, use get_user_pages*() and put_user_page*() to
>    implement tracking of these pages. This tracking will be separate from
>    the existing struct page refcounting.
> 
> 4) Use the tracking and identification of these pages, to implement
>    special handling (especially in writeback paths) when the pages are
>    backed by a filesystem. Again, [1] provides details as to why that is
>    desirable.
> 
> [1] https://lwn.net/Articles/753027/ : "The Trouble with get_user_pages()"
> 
> Cc: Doug Ledford <dledford@redhat.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
> Cc: Dennis Dalessandro <dennis.dalessandro@intel.com>
> Cc: Christian Benvenuti <benve@cisco.com>
> 
> Reviewed-by: Jan Kara <jack@suse.cz>
> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
> Acked-by: Jason Gunthorpe <jgg@mellanox.com>
> Tested-by: Ira Weiny <ira.weiny@intel.com>
> Signed-off-by: John Hubbard <jhubbard@nvidia.com>
> ---
>  drivers/infiniband/core/umem.c              |  7 ++++---
>  drivers/infiniband/core/umem_odp.c          | 10 +++++-----
>  drivers/infiniband/hw/hfi1/user_pages.c     | 11 ++++-------
>  drivers/infiniband/hw/mthca/mthca_memfree.c |  6 +++---
>  drivers/infiniband/hw/qib/qib_user_pages.c  | 11 ++++-------
>  drivers/infiniband/hw/qib/qib_user_sdma.c   |  6 +++---
>  drivers/infiniband/hw/usnic/usnic_uiom.c    |  7 ++++---
>  7 files changed, 27 insertions(+), 31 deletions(-)
> 
> diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
> index e7ea819fcb11..673f0d240b3e 100644
> --- a/drivers/infiniband/core/umem.c
> +++ b/drivers/infiniband/core/umem.c
> @@ -54,9 +54,10 @@ static void __ib_umem_release(struct ib_device *dev, struct ib_umem *umem, int d
>  
>  	for_each_sg_page(umem->sg_head.sgl, &sg_iter, umem->sg_nents, 0) {
>  		page = sg_page_iter_page(&sg_iter);
> -		if (!PageDirty(page) && umem->writable && dirty)
> -			set_page_dirty_lock(page);
> -		put_page(page);
> +		if (umem->writable && dirty)
> +			put_user_pages_dirty_lock(&page, 1);
> +		else
> +			put_user_page(page);
>  	}
>  
>  	sg_free_table(&umem->sg_head);
> diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/umem_odp.c
> index f962b5bbfa40..17e46df3990a 100644
> --- a/drivers/infiniband/core/umem_odp.c
> +++ b/drivers/infiniband/core/umem_odp.c
> @@ -487,7 +487,7 @@ void ib_umem_odp_release(struct ib_umem_odp *umem_odp)
>   * The function returns -EFAULT if the DMA mapping operation fails. It returns
>   * -EAGAIN if a concurrent invalidation prevents us from updating the page.
>   *
> - * The page is released via put_page even if the operation failed. For
> + * The page is released via put_user_page even if the operation failed. For
>   * on-demand pinning, the page is released whenever it isn't stored in the
>   * umem.
>   */
> @@ -536,7 +536,7 @@ static int ib_umem_odp_map_dma_single_page(
>  	}
>  
>  out:
> -	put_page(page);
> +	put_user_page(page);
>  
>  	if (remove_existing_mapping) {
>  		ib_umem_notifier_start_account(umem_odp);
> @@ -659,7 +659,7 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, u64 user_virt,
>  					ret = -EFAULT;
>  					break;
>  				}
> -				put_page(local_page_list[j]);
> +				put_user_page(local_page_list[j]);
>  				continue;
>  			}
>  
> @@ -686,8 +686,8 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, u64 user_virt,
>  			 * ib_umem_odp_map_dma_single_page().
>  			 */
>  			if (npages - (j + 1) > 0)
> -				release_pages(&local_page_list[j+1],
> -					      npages - (j + 1));
> +				put_user_pages(&local_page_list[j+1],
> +					       npages - (j + 1));

I don't know if we discussed this before but it looks like the use of
release_pages() was not entirely correct (or at least not necessary) here.  So
I think this is ok.

As for testing, I have been running with this patch for a while but I don't
have ODP hardware so that testing would not cover this code path.  So you can
add my:

Reviewed-by: Ira Weiny <ira.weiny@intel.com>

>  			break;
>  		}
>  	}
> diff --git a/drivers/infiniband/hw/hfi1/user_pages.c b/drivers/infiniband/hw/hfi1/user_pages.c
> index 02eee8eff1db..b89a9b9aef7a 100644
> --- a/drivers/infiniband/hw/hfi1/user_pages.c
> +++ b/drivers/infiniband/hw/hfi1/user_pages.c
> @@ -118,13 +118,10 @@ int hfi1_acquire_user_pages(struct mm_struct *mm, unsigned long vaddr, size_t np
>  void hfi1_release_user_pages(struct mm_struct *mm, struct page **p,
>  			     size_t npages, bool dirty)
>  {
> -	size_t i;
> -
> -	for (i = 0; i < npages; i++) {
> -		if (dirty)
> -			set_page_dirty_lock(p[i]);
> -		put_page(p[i]);
> -	}
> +	if (dirty)
> +		put_user_pages_dirty_lock(p, npages);
> +	else
> +		put_user_pages(p, npages);
>  
>  	if (mm) { /* during close after signal, mm can be NULL */
>  		atomic64_sub(npages, &mm->pinned_vm);
> diff --git a/drivers/infiniband/hw/mthca/mthca_memfree.c b/drivers/infiniband/hw/mthca/mthca_memfree.c
> index 8ff0e90d7564..edccfd6e178f 100644
> --- a/drivers/infiniband/hw/mthca/mthca_memfree.c
> +++ b/drivers/infiniband/hw/mthca/mthca_memfree.c
> @@ -482,7 +482,7 @@ int mthca_map_user_db(struct mthca_dev *dev, struct mthca_uar *uar,
>  
>  	ret = pci_map_sg(dev->pdev, &db_tab->page[i].mem, 1, PCI_DMA_TODEVICE);
>  	if (ret < 0) {
> -		put_page(pages[0]);
> +		put_user_page(pages[0]);
>  		goto out;
>  	}
>  
> @@ -490,7 +490,7 @@ int mthca_map_user_db(struct mthca_dev *dev, struct mthca_uar *uar,
>  				 mthca_uarc_virt(dev, uar, i));
>  	if (ret) {
>  		pci_unmap_sg(dev->pdev, &db_tab->page[i].mem, 1, PCI_DMA_TODEVICE);
> -		put_page(sg_page(&db_tab->page[i].mem));
> +		put_user_page(sg_page(&db_tab->page[i].mem));
>  		goto out;
>  	}
>  
> @@ -556,7 +556,7 @@ void mthca_cleanup_user_db_tab(struct mthca_dev *dev, struct mthca_uar *uar,
>  		if (db_tab->page[i].uvirt) {
>  			mthca_UNMAP_ICM(dev, mthca_uarc_virt(dev, uar, i), 1);
>  			pci_unmap_sg(dev->pdev, &db_tab->page[i].mem, 1, PCI_DMA_TODEVICE);
> -			put_page(sg_page(&db_tab->page[i].mem));
> +			put_user_page(sg_page(&db_tab->page[i].mem));
>  		}
>  	}
>  
> diff --git a/drivers/infiniband/hw/qib/qib_user_pages.c b/drivers/infiniband/hw/qib/qib_user_pages.c
> index f712fb7fa82f..bfbfbb7e0ff4 100644
> --- a/drivers/infiniband/hw/qib/qib_user_pages.c
> +++ b/drivers/infiniband/hw/qib/qib_user_pages.c
> @@ -40,13 +40,10 @@
>  static void __qib_release_user_pages(struct page **p, size_t num_pages,
>  				     int dirty)
>  {
> -	size_t i;
> -
> -	for (i = 0; i < num_pages; i++) {
> -		if (dirty)
> -			set_page_dirty_lock(p[i]);
> -		put_page(p[i]);
> -	}
> +	if (dirty)
> +		put_user_pages_dirty_lock(p, num_pages);
> +	else
> +		put_user_pages(p, num_pages);
>  }
>  
>  /**
> diff --git a/drivers/infiniband/hw/qib/qib_user_sdma.c b/drivers/infiniband/hw/qib/qib_user_sdma.c
> index 0c204776263f..ac5bdb02144f 100644
> --- a/drivers/infiniband/hw/qib/qib_user_sdma.c
> +++ b/drivers/infiniband/hw/qib/qib_user_sdma.c
> @@ -317,7 +317,7 @@ static int qib_user_sdma_page_to_frags(const struct qib_devdata *dd,
>  		 * the caller can ignore this page.
>  		 */
>  		if (put) {
> -			put_page(page);
> +			put_user_page(page);
>  		} else {
>  			/* coalesce case */
>  			kunmap(page);
> @@ -631,7 +631,7 @@ static void qib_user_sdma_free_pkt_frag(struct device *dev,
>  			kunmap(pkt->addr[i].page);
>  
>  		if (pkt->addr[i].put_page)
> -			put_page(pkt->addr[i].page);
> +			put_user_page(pkt->addr[i].page);
>  		else
>  			__free_page(pkt->addr[i].page);
>  	} else if (pkt->addr[i].kvaddr) {
> @@ -706,7 +706,7 @@ static int qib_user_sdma_pin_pages(const struct qib_devdata *dd,
>  	/* if error, return all pages not managed by pkt */
>  free_pages:
>  	while (i < j)
> -		put_page(pages[i++]);
> +		put_user_page(pages[i++]);
>  
>  done:
>  	return ret;
> diff --git a/drivers/infiniband/hw/usnic/usnic_uiom.c b/drivers/infiniband/hw/usnic/usnic_uiom.c
> index e312f522a66d..0b0237d41613 100644
> --- a/drivers/infiniband/hw/usnic/usnic_uiom.c
> +++ b/drivers/infiniband/hw/usnic/usnic_uiom.c
> @@ -75,9 +75,10 @@ static void usnic_uiom_put_pages(struct list_head *chunk_list, int dirty)
>  		for_each_sg(chunk->page_list, sg, chunk->nents, i) {
>  			page = sg_page(sg);
>  			pa = sg_phys(sg);
> -			if (!PageDirty(page) && dirty)
> -				set_page_dirty_lock(page);
> -			put_page(page);
> +			if (dirty)
> +				put_user_pages_dirty_lock(&page, 1);
> +			else
> +				put_user_page(page);
>  			usnic_dbg("pa: %pa\n", &pa);
>  		}
>  		kfree(chunk);
> -- 
> 2.21.0
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/1] infiniband/mm: convert put_page() to put_user_page*()
  2019-05-23 17:28   ` Ira Weiny
@ 2019-05-23 17:32     ` Jason Gunthorpe
  2019-05-23 17:46       ` John Hubbard
  0 siblings, 1 reply; 13+ messages in thread
From: Jason Gunthorpe @ 2019-05-23 17:32 UTC (permalink / raw)
  To: Ira Weiny
  Cc: john.hubbard, Andrew Morton, linux-mm, LKML, linux-rdma,
	linux-fsdevel, John Hubbard, Doug Ledford, Mike Marciniszyn,
	Dennis Dalessandro, Christian Benvenuti, Jan Kara

On Thu, May 23, 2019 at 10:28:52AM -0700, Ira Weiny wrote:
> >  
> > @@ -686,8 +686,8 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, u64 user_virt,
> >  			 * ib_umem_odp_map_dma_single_page().
> >  			 */
> >  			if (npages - (j + 1) > 0)
> > -				release_pages(&local_page_list[j+1],
> > -					      npages - (j + 1));
> > +				put_user_pages(&local_page_list[j+1],
> > +					       npages - (j + 1));
> 
> I don't know if we discussed this before but it looks like the use of
> release_pages() was not entirely correct (or at least not necessary) here.  So
> I think this is ok.

Oh? John switched it from a put_pages loop to release_pages() here:

commit 75a3e6a3c129cddcc683538d8702c6ef998ec589
Author: John Hubbard <jhubbard@nvidia.com>
Date:   Mon Mar 4 11:46:45 2019 -0800

    RDMA/umem: minor bug fix in error handling path
    
    1. Bug fix: fix an off by one error in the code that cleans up if it fails
       to dma-map a page, after having done a get_user_pages_remote() on a
       range of pages.
    
    2. Refinement: for that same cleanup code, release_pages() is better than
       put_page() in a loop.
    

And now we are going to back something called put_pages() that
implements the same for loop the above removed?

Seems like we are going in circles?? John?

Jason

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/1] infiniband/mm: convert put_page() to put_user_page*()
  2019-05-23 17:32     ` Jason Gunthorpe
@ 2019-05-23 17:46       ` John Hubbard
  2019-05-23 19:04         ` Ira Weiny
  2019-05-23 19:17         ` Jason Gunthorpe
  0 siblings, 2 replies; 13+ messages in thread
From: John Hubbard @ 2019-05-23 17:46 UTC (permalink / raw)
  To: Jason Gunthorpe, Ira Weiny
  Cc: john.hubbard, Andrew Morton, linux-mm, LKML, linux-rdma,
	linux-fsdevel, Doug Ledford, Mike Marciniszyn,
	Dennis Dalessandro, Christian Benvenuti, Jan Kara

On 5/23/19 10:32 AM, Jason Gunthorpe wrote:
> On Thu, May 23, 2019 at 10:28:52AM -0700, Ira Weiny wrote:
>>>   
>>> @@ -686,8 +686,8 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, u64 user_virt,
>>>   			 * ib_umem_odp_map_dma_single_page().
>>>   			 */
>>>   			if (npages - (j + 1) > 0)
>>> -				release_pages(&local_page_list[j+1],
>>> -					      npages - (j + 1));
>>> +				put_user_pages(&local_page_list[j+1],
>>> +					       npages - (j + 1));
>>
>> I don't know if we discussed this before but it looks like the use of
>> release_pages() was not entirely correct (or at least not necessary) here.  So
>> I think this is ok.
> 
> Oh? John switched it from a put_pages loop to release_pages() here:
> 
> commit 75a3e6a3c129cddcc683538d8702c6ef998ec589
> Author: John Hubbard <jhubbard@nvidia.com>
> Date:   Mon Mar 4 11:46:45 2019 -0800
> 
>      RDMA/umem: minor bug fix in error handling path
>      
>      1. Bug fix: fix an off by one error in the code that cleans up if it fails
>         to dma-map a page, after having done a get_user_pages_remote() on a
>         range of pages.
>      
>      2. Refinement: for that same cleanup code, release_pages() is better than
>         put_page() in a loop.
>      
> 
> And now we are going to back something called put_pages() that
> implements the same for loop the above removed?
> 
> Seems like we are going in circles?? John?
> 

put_user_pages() is meant to be a drop-in replacement for release_pages(),
so I made the above change as an interim step in moving the callsite from
a loop, to a single call.

And at some point, it may be possible to find a way to optimize put_user_pages()
in a similar way to the batching that release_pages() does, that was part
of the plan for this.

But I do see what you mean: in the interim, maybe put_user_pages() should
just be calling release_pages(), how does that change sound?


thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/1] infiniband/mm: convert put_page() to put_user_page*()
  2019-05-23 15:31   ` Jerome Glisse
@ 2019-05-23 17:56     ` John Hubbard
  0 siblings, 0 replies; 13+ messages in thread
From: John Hubbard @ 2019-05-23 17:56 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Andrew Morton, linux-mm, Jason Gunthorpe, LKML, linux-rdma,
	linux-fsdevel, Doug Ledford, Mike Marciniszyn,
	Dennis Dalessandro, Christian Benvenuti, Jan Kara,
	Jason Gunthorpe, Ira Weiny

On 5/23/19 8:31 AM, Jerome Glisse wrote:
[...]
> 
> Reviewed-by: Jérôme Glisse <jglisse@redhat.com>
> 

Thanks for the review!

> Between i have a wishlist see below
[...]
>> diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
>> index e7ea819fcb11..673f0d240b3e 100644
>> --- a/drivers/infiniband/core/umem.c
>> +++ b/drivers/infiniband/core/umem.c
>> @@ -54,9 +54,10 @@ static void __ib_umem_release(struct ib_device *dev, struct ib_umem *umem, int d
>>   
>>   	for_each_sg_page(umem->sg_head.sgl, &sg_iter, umem->sg_nents, 0) {
>>   		page = sg_page_iter_page(&sg_iter);
>> -		if (!PageDirty(page) && umem->writable && dirty)
>> -			set_page_dirty_lock(page);
>> -		put_page(page);
>> +		if (umem->writable && dirty)
>> +			put_user_pages_dirty_lock(&page, 1);
>> +		else
>> +			put_user_page(page);
> 
> Can we get a put_user_page_dirty(struct page 8*pages, bool dirty, npages) ?
> 
> It is a common pattern that we might have to conditionaly dirty the pages
> and i feel it would look cleaner if we could move the branch within the
> put_user_page*() function.
> 

This sounds reasonable to me, do others have a preference on this? Last time
we discussed it, I recall there was interest in trying to handle the sg lists,
which was where a lot of focus was. I'm not sure if there was a preference one 
way or the other, on adding more of these helpers.


thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/1] infiniband/mm: convert put_page() to put_user_page*()
  2019-05-23 17:46       ` John Hubbard
@ 2019-05-23 19:04         ` Ira Weiny
  2019-05-23 19:13           ` John Hubbard
  2019-05-23 19:17         ` Jason Gunthorpe
  1 sibling, 1 reply; 13+ messages in thread
From: Ira Weiny @ 2019-05-23 19:04 UTC (permalink / raw)
  To: John Hubbard
  Cc: Jason Gunthorpe, john.hubbard, Andrew Morton, linux-mm, LKML,
	linux-rdma, linux-fsdevel, Doug Ledford, Mike Marciniszyn,
	Dennis Dalessandro, Christian Benvenuti, Jan Kara

On Thu, May 23, 2019 at 10:46:38AM -0700, John Hubbard wrote:
> On 5/23/19 10:32 AM, Jason Gunthorpe wrote:
> > On Thu, May 23, 2019 at 10:28:52AM -0700, Ira Weiny wrote:
> > > > @@ -686,8 +686,8 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, u64 user_virt,
> > > >   			 * ib_umem_odp_map_dma_single_page().
> > > >   			 */
> > > >   			if (npages - (j + 1) > 0)
> > > > -				release_pages(&local_page_list[j+1],
> > > > -					      npages - (j + 1));
> > > > +				put_user_pages(&local_page_list[j+1],
> > > > +					       npages - (j + 1));
> > > 
> > > I don't know if we discussed this before but it looks like the use of
> > > release_pages() was not entirely correct (or at least not necessary) here.  So
> > > I think this is ok.
> > 
> > Oh? John switched it from a put_pages loop to release_pages() here:
> > 
> > commit 75a3e6a3c129cddcc683538d8702c6ef998ec589
> > Author: John Hubbard <jhubbard@nvidia.com>
> > Date:   Mon Mar 4 11:46:45 2019 -0800
> > 
> >      RDMA/umem: minor bug fix in error handling path
> >      1. Bug fix: fix an off by one error in the code that cleans up if it fails
> >         to dma-map a page, after having done a get_user_pages_remote() on a
> >         range of pages.
> >      2. Refinement: for that same cleanup code, release_pages() is better than
> >         put_page() in a loop.
> > 
> > And now we are going to back something called put_pages() that
> > implements the same for loop the above removed?
> > 
> > Seems like we are going in circles?? John?
> > 
> 
> put_user_pages() is meant to be a drop-in replacement for release_pages(),
> so I made the above change as an interim step in moving the callsite from
> a loop, to a single call.
> 
> And at some point, it may be possible to find a way to optimize put_user_pages()
> in a similar way to the batching that release_pages() does, that was part
> of the plan for this.
> 
> But I do see what you mean: in the interim, maybe put_user_pages() should
> just be calling release_pages(), how does that change sound?

I'm certainly not the expert here but FWICT release_pages() was originally
designed to work with the page cache.

aabfb57296e3  mm: memcontrol: do not kill uncharge batching in free_pages_and_swap_cache

But at some point it was changed to be more general?

ea1754a08476 mm, fs: remove remaining PAGE_CACHE_* and page_cache_{get,release} usage

... and it is exported and used outside of the swapping code... and used at
lease 1 place to directly "put" pages gotten from get_user_pages_fast()
[arch/x86/kvm/svm.c]

From that it seems like it is safe.

But I don't see where release_page() actually calls put_page() anywhere?  What
am I missing?

Ira


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/1] infiniband/mm: convert put_page() to put_user_page*()
  2019-05-23 19:04         ` Ira Weiny
@ 2019-05-23 19:13           ` John Hubbard
  2019-05-23 22:37             ` Ira Weiny
  0 siblings, 1 reply; 13+ messages in thread
From: John Hubbard @ 2019-05-23 19:13 UTC (permalink / raw)
  To: Ira Weiny
  Cc: Jason Gunthorpe, Andrew Morton, linux-mm, LKML, linux-rdma,
	linux-fsdevel, Doug Ledford, Mike Marciniszyn,
	Dennis Dalessandro, Christian Benvenuti, Jan Kara

On 5/23/19 12:04 PM, Ira Weiny wrote:
> On Thu, May 23, 2019 at 10:46:38AM -0700, John Hubbard wrote:
>> On 5/23/19 10:32 AM, Jason Gunthorpe wrote:
>>> On Thu, May 23, 2019 at 10:28:52AM -0700, Ira Weiny wrote:
>>>>> @@ -686,8 +686,8 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, u64 user_virt,
>>>>>    			 * ib_umem_odp_map_dma_single_page().
>>>>>    			 */
>>>>>    			if (npages - (j + 1) > 0)
>>>>> -				release_pages(&local_page_list[j+1],
>>>>> -					      npages - (j + 1));
>>>>> +				put_user_pages(&local_page_list[j+1],
>>>>> +					       npages - (j + 1));
>>>>
>>>> I don't know if we discussed this before but it looks like the use of
>>>> release_pages() was not entirely correct (or at least not necessary) here.  So
>>>> I think this is ok.
>>>
>>> Oh? John switched it from a put_pages loop to release_pages() here:
>>>
>>> commit 75a3e6a3c129cddcc683538d8702c6ef998ec589
>>> Author: John Hubbard <jhubbard@nvidia.com>
>>> Date:   Mon Mar 4 11:46:45 2019 -0800
>>>
>>>       RDMA/umem: minor bug fix in error handling path
>>>       1. Bug fix: fix an off by one error in the code that cleans up if it fails
>>>          to dma-map a page, after having done a get_user_pages_remote() on a
>>>          range of pages.
>>>       2. Refinement: for that same cleanup code, release_pages() is better than
>>>          put_page() in a loop.
>>>
>>> And now we are going to back something called put_pages() that
>>> implements the same for loop the above removed?
>>>
>>> Seems like we are going in circles?? John?
>>>
>>
>> put_user_pages() is meant to be a drop-in replacement for release_pages(),
>> so I made the above change as an interim step in moving the callsite from
>> a loop, to a single call.
>>
>> And at some point, it may be possible to find a way to optimize put_user_pages()
>> in a similar way to the batching that release_pages() does, that was part
>> of the plan for this.
>>
>> But I do see what you mean: in the interim, maybe put_user_pages() should
>> just be calling release_pages(), how does that change sound?
> 
> I'm certainly not the expert here but FWICT release_pages() was originally
> designed to work with the page cache.
> 
> aabfb57296e3  mm: memcontrol: do not kill uncharge batching in free_pages_and_swap_cache
> 
> But at some point it was changed to be more general?
> 
> ea1754a08476 mm, fs: remove remaining PAGE_CACHE_* and page_cache_{get,release} usage
> 
> ... and it is exported and used outside of the swapping code... and used at
> lease 1 place to directly "put" pages gotten from get_user_pages_fast()
> [arch/x86/kvm/svm.c]
> 
>  From that it seems like it is safe.
> 
> But I don't see where release_page() actually calls put_page() anywhere?  What
> am I missing?
> 

For that question, I recall having to look closely at this function, as well:

void release_pages(struct page **pages, int nr)
{
	int i;
	LIST_HEAD(pages_to_free);
	struct pglist_data *locked_pgdat = NULL;
	struct lruvec *lruvec;
	unsigned long uninitialized_var(flags);
	unsigned int uninitialized_var(lock_batch);

	for (i = 0; i < nr; i++) {
		struct page *page = pages[i];

		/*
		 * Make sure the IRQ-safe lock-holding time does not get
		 * excessive with a continuous string of pages from the
		 * same pgdat. The lock is held only if pgdat != NULL.
		 */
		if (locked_pgdat && ++lock_batch == SWAP_CLUSTER_MAX) {
			spin_unlock_irqrestore(&locked_pgdat->lru_lock, flags);
			locked_pgdat = NULL;
		}

		if (is_huge_zero_page(page))
			continue;

		/* Device public page can not be huge page */
		if (is_device_public_page(page)) {
			if (locked_pgdat) {
				spin_unlock_irqrestore(&locked_pgdat->lru_lock,
						       flags);
				locked_pgdat = NULL;
			}
			put_devmap_managed_page(page);
			continue;
		}

		page = compound_head(page);
		if (!put_page_testzero(page))

		     ^here is where it does the put_page() call, is that what
			you were looking for?



thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/1] infiniband/mm: convert put_page() to put_user_page*()
  2019-05-23 17:46       ` John Hubbard
  2019-05-23 19:04         ` Ira Weiny
@ 2019-05-23 19:17         ` Jason Gunthorpe
  1 sibling, 0 replies; 13+ messages in thread
From: Jason Gunthorpe @ 2019-05-23 19:17 UTC (permalink / raw)
  To: John Hubbard
  Cc: Ira Weiny, john.hubbard, Andrew Morton, linux-mm, LKML,
	linux-rdma, linux-fsdevel, Doug Ledford, Mike Marciniszyn,
	Dennis Dalessandro, Christian Benvenuti, Jan Kara

On Thu, May 23, 2019 at 10:46:38AM -0700, John Hubbard wrote:
> On 5/23/19 10:32 AM, Jason Gunthorpe wrote:
> > On Thu, May 23, 2019 at 10:28:52AM -0700, Ira Weiny wrote:
> > > > @@ -686,8 +686,8 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, u64 user_virt,
> > > >   			 * ib_umem_odp_map_dma_single_page().
> > > >   			 */
> > > >   			if (npages - (j + 1) > 0)
> > > > -				release_pages(&local_page_list[j+1],
> > > > -					      npages - (j + 1));
> > > > +				put_user_pages(&local_page_list[j+1],
> > > > +					       npages - (j + 1));
> > > 
> > > I don't know if we discussed this before but it looks like the use of
> > > release_pages() was not entirely correct (or at least not necessary) here.  So
> > > I think this is ok.
> > 
> > Oh? John switched it from a put_pages loop to release_pages() here:
> > 
> > commit 75a3e6a3c129cddcc683538d8702c6ef998ec589
> > Author: John Hubbard <jhubbard@nvidia.com>
> > Date:   Mon Mar 4 11:46:45 2019 -0800
> > 
> >      RDMA/umem: minor bug fix in error handling path
> >      1. Bug fix: fix an off by one error in the code that cleans up if it fails
> >         to dma-map a page, after having done a get_user_pages_remote() on a
> >         range of pages.
> >      2. Refinement: for that same cleanup code, release_pages() is better than
> >         put_page() in a loop.
> > 
> > And now we are going to back something called put_pages() that
> > implements the same for loop the above removed?
> > 
> > Seems like we are going in circles?? John?
> > 
> 
> put_user_pages() is meant to be a drop-in replacement for release_pages(),
> so I made the above change as an interim step in moving the callsite from
> a loop, to a single call.
> 
> And at some point, it may be possible to find a way to optimize put_user_pages()
> in a similar way to the batching that release_pages() does, that was part
> of the plan for this.
> 
> But I do see what you mean: in the interim, maybe put_user_pages() should
> just be calling release_pages(), how does that change sound?

It would have made it more consistent.. But it seems this isn't a
functional problem in this patch

Jason

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/1] infiniband/mm: convert put_page() to put_user_page*()
  2019-05-23 19:13           ` John Hubbard
@ 2019-05-23 22:37             ` Ira Weiny
  2019-05-23 22:50               ` John Hubbard
  0 siblings, 1 reply; 13+ messages in thread
From: Ira Weiny @ 2019-05-23 22:37 UTC (permalink / raw)
  To: John Hubbard
  Cc: Jason Gunthorpe, Andrew Morton, linux-mm, LKML, linux-rdma,
	linux-fsdevel, Doug Ledford, Mike Marciniszyn,
	Dennis Dalessandro, Christian Benvenuti, Jan Kara

On Thu, May 23, 2019 at 12:13:59PM -0700, John Hubbard wrote:
> On 5/23/19 12:04 PM, Ira Weiny wrote:
> > On Thu, May 23, 2019 at 10:46:38AM -0700, John Hubbard wrote:
> > > On 5/23/19 10:32 AM, Jason Gunthorpe wrote:
> > > > On Thu, May 23, 2019 at 10:28:52AM -0700, Ira Weiny wrote:
> > > > > > @@ -686,8 +686,8 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, u64 user_virt,
> > > > > >    			 * ib_umem_odp_map_dma_single_page().
> > > > > >    			 */
> > > > > >    			if (npages - (j + 1) > 0)
> > > > > > -				release_pages(&local_page_list[j+1],
> > > > > > -					      npages - (j + 1));
> > > > > > +				put_user_pages(&local_page_list[j+1],
> > > > > > +					       npages - (j + 1));
> > > > > 
> > > > > I don't know if we discussed this before but it looks like the use of
> > > > > release_pages() was not entirely correct (or at least not necessary) here.  So
> > > > > I think this is ok.
> > > > 
> > > > Oh? John switched it from a put_pages loop to release_pages() here:
> > > > 
> > > > commit 75a3e6a3c129cddcc683538d8702c6ef998ec589
> > > > Author: John Hubbard <jhubbard@nvidia.com>
> > > > Date:   Mon Mar 4 11:46:45 2019 -0800
> > > > 
> > > >       RDMA/umem: minor bug fix in error handling path
> > > >       1. Bug fix: fix an off by one error in the code that cleans up if it fails
> > > >          to dma-map a page, after having done a get_user_pages_remote() on a
> > > >          range of pages.
> > > >       2. Refinement: for that same cleanup code, release_pages() is better than
> > > >          put_page() in a loop.
> > > > 
> > > > And now we are going to back something called put_pages() that
> > > > implements the same for loop the above removed?
> > > > 
> > > > Seems like we are going in circles?? John?
> > > > 
> > > 
> > > put_user_pages() is meant to be a drop-in replacement for release_pages(),
> > > so I made the above change as an interim step in moving the callsite from
> > > a loop, to a single call.
> > > 
> > > And at some point, it may be possible to find a way to optimize put_user_pages()
> > > in a similar way to the batching that release_pages() does, that was part
> > > of the plan for this.
> > > 
> > > But I do see what you mean: in the interim, maybe put_user_pages() should
> > > just be calling release_pages(), how does that change sound?
> > 
> > I'm certainly not the expert here but FWICT release_pages() was originally
> > designed to work with the page cache.
> > 
> > aabfb57296e3  mm: memcontrol: do not kill uncharge batching in free_pages_and_swap_cache
> > 
> > But at some point it was changed to be more general?
> > 
> > ea1754a08476 mm, fs: remove remaining PAGE_CACHE_* and page_cache_{get,release} usage
> > 
> > ... and it is exported and used outside of the swapping code... and used at
> > lease 1 place to directly "put" pages gotten from get_user_pages_fast()
> > [arch/x86/kvm/svm.c]
> > 
> >  From that it seems like it is safe.
> > 
> > But I don't see where release_page() actually calls put_page() anywhere?  What
> > am I missing?
> > 
> 
> For that question, I recall having to look closely at this function, as well:
> 
> void release_pages(struct page **pages, int nr)
> {
> 	int i;
> 	LIST_HEAD(pages_to_free);
> 	struct pglist_data *locked_pgdat = NULL;
> 	struct lruvec *lruvec;
> 	unsigned long uninitialized_var(flags);
> 	unsigned int uninitialized_var(lock_batch);
> 
> 	for (i = 0; i < nr; i++) {
> 		struct page *page = pages[i];
> 
> 		/*
> 		 * Make sure the IRQ-safe lock-holding time does not get
> 		 * excessive with a continuous string of pages from the
> 		 * same pgdat. The lock is held only if pgdat != NULL.
> 		 */
> 		if (locked_pgdat && ++lock_batch == SWAP_CLUSTER_MAX) {
> 			spin_unlock_irqrestore(&locked_pgdat->lru_lock, flags);
> 			locked_pgdat = NULL;
> 		}
> 
> 		if (is_huge_zero_page(page))
> 			continue;
> 
> 		/* Device public page can not be huge page */
> 		if (is_device_public_page(page)) {
> 			if (locked_pgdat) {
> 				spin_unlock_irqrestore(&locked_pgdat->lru_lock,
> 						       flags);
> 				locked_pgdat = NULL;
> 			}
> 			put_devmap_managed_page(page);
> 			continue;
> 		}
> 
> 		page = compound_head(page);
> 		if (!put_page_testzero(page))
> 
> 		     ^here is where it does the put_page() call, is that what
> 			you were looking for?

Yes I saw that...

I've dug in further and I see now that release_pages() implements (almost the
same thing, see below) as put_page().

However, I think we need to be careful here because put_page_testzero() calls

	page_ref_dec_and_test(page);

... and after your changes it will need to call ...

	page_ref_sub_return(page, GUP_PIN_COUNTING_BIAS);

... on a GUP page:

So how do you propose calling release_pages() from within put_user_pages()?  Or
were you thinking this would be temporary?

That said, there are 2 differences I see between release_pages() and put_page()

1) release_pages() will only work for a MEMORY_DEVICE_PUBLIC page and not all
   devmem pages...
   I think this is a bug, patch to follow shortly.

2) release_pages() calls __ClearPageActive() while put_page() does not

I have no idea if the second difference is a bug or not.  But it smells of
one...

It would be nice to know if the open coding of put_page is really a performance
benefit or not.  It seems like an attempt to optimize the taking of the page
data lock.

Does anyone have any information about the performance advantage here?

Given the changes above it seems like it would be a benefit to merge the 2 call
paths more closely to make sure we do the right thing.

Ira


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/1] infiniband/mm: convert put_page() to put_user_page*()
  2019-05-23 22:37             ` Ira Weiny
@ 2019-05-23 22:50               ` John Hubbard
  2019-05-23 22:54                 ` John Hubbard
  0 siblings, 1 reply; 13+ messages in thread
From: John Hubbard @ 2019-05-23 22:50 UTC (permalink / raw)
  To: Ira Weiny
  Cc: Jason Gunthorpe, Andrew Morton, linux-mm, LKML, linux-rdma,
	linux-fsdevel, Doug Ledford, Mike Marciniszyn,
	Dennis Dalessandro, Christian Benvenuti, Jan Kara

On 5/23/19 3:37 PM, Ira Weiny wrote:
[...] 
> I've dug in further and I see now that release_pages() implements (almost the
> same thing, see below) as put_page().
> 
> However, I think we need to be careful here because put_page_testzero() calls
> 
> 	page_ref_dec_and_test(page);
> 
> ... and after your changes it will need to call ...
> 
> 	page_ref_sub_return(page, GUP_PIN_COUNTING_BIAS);
> 
> ... on a GUP page:
> 
> So how do you propose calling release_pages() from within put_user_pages()?  Or
> were you thinking this would be temporary?

I was thinking of it as a temporary measure, only up until, but not including the
point where put_user_pages() becomes active. That is, the point when put_user_pages
starts decrementing GUP_PIN_COUNTING_BIAS, instead of just forwarding to put_page().

(For other readers, that's this patch:

    "mm/gup: debug tracking of get_user_pages() references"

...in https://github.com/johnhubbard/linux/tree/gup_dma_core )

> 
> That said, there are 2 differences I see between release_pages() and put_page()
> 
> 1) release_pages() will only work for a MEMORY_DEVICE_PUBLIC page and not all
>    devmem pages...
>    I think this is a bug, patch to follow shortly.
> 
> 2) release_pages() calls __ClearPageActive() while put_page() does not
> 
> I have no idea if the second difference is a bug or not.  But it smells of
> one...
> 
> It would be nice to know if the open coding of put_page is really a performance
> benefit or not.  It seems like an attempt to optimize the taking of the page
> data lock.
> 
> Does anyone have any information about the performance advantage here?
> 
> Given the changes above it seems like it would be a benefit to merge the 2 call
> paths more closely to make sure we do the right thing.
> 

Yes, it does. Maybe best to not do the temporary measure, then, while this stuff
gets improved. I'll look at your other patch...


thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/1] infiniband/mm: convert put_page() to put_user_page*()
  2019-05-23 22:50               ` John Hubbard
@ 2019-05-23 22:54                 ` John Hubbard
  0 siblings, 0 replies; 13+ messages in thread
From: John Hubbard @ 2019-05-23 22:54 UTC (permalink / raw)
  To: Ira Weiny
  Cc: Jason Gunthorpe, Andrew Morton, linux-mm, LKML, linux-rdma,
	linux-fsdevel, Doug Ledford, Mike Marciniszyn,
	Dennis Dalessandro, Christian Benvenuti, Jan Kara

On 5/23/19 3:50 PM, John Hubbard wrote:
> [...] 
> I was thinking of it as a temporary measure, only up until, but not including the
> point where put_user_pages() becomes active. That is, the point when put_user_pages
> starts decrementing GUP_PIN_COUNTING_BIAS, instead of just forwarding to put_page().
> 
> (For other readers, that's this patch:
> 
>     "mm/gup: debug tracking of get_user_pages() references"
> 
> ...in https://github.com/johnhubbard/linux/tree/gup_dma_core )
> 

Arggh, correction, I meant this patch:

    "mm/gup: track gup-pinned pages"

...sorry for any confusion there.

thanks,
-- 
John Hubbard
NVIDIA


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2019-05-23 22:54 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-23  7:25 [PATCH 0/1] infiniband/mm: convert put_page() to put_user_page*() john.hubbard
2019-05-23  7:25 ` [PATCH 1/1] " john.hubbard
2019-05-23 15:31   ` Jerome Glisse
2019-05-23 17:56     ` John Hubbard
2019-05-23 17:28   ` Ira Weiny
2019-05-23 17:32     ` Jason Gunthorpe
2019-05-23 17:46       ` John Hubbard
2019-05-23 19:04         ` Ira Weiny
2019-05-23 19:13           ` John Hubbard
2019-05-23 22:37             ` Ira Weiny
2019-05-23 22:50               ` John Hubbard
2019-05-23 22:54                 ` John Hubbard
2019-05-23 19:17         ` Jason Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).