linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/1] RDMA/umem: minor bug fix and cleanup in error handling paths
@ 2019-03-02  3:27 john.hubbard
  2019-03-02  3:27 ` [PATCH] " john.hubbard
  0 siblings, 1 reply; 21+ messages in thread
From: john.hubbard @ 2019-03-02  3:27 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, LKML, John Hubbard, Ira Weiny, Jason Gunthorpe,
	Doug Ledford, linux-rdma

From: John Hubbard <jhubbard@nvidia.com>

Hi,

Ira Weiny alerted me to a couple of places where I'd missed a change from
put_page() to put_user_page(), in my pending patchsets. But when I
attempted to dive more deeply into that code, I ran into things that I
*think* should be fixed up a bit.

I hope I didn't completely miss something. I am not set up to test this
(no Infiniband hardware) so I'm not even sure I should send this out, but
it seems like the best way to ask "is this code really working the way I
think it does"?

This applies to the latest linux.git tree.

Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Doug Ledford <dledford@redhat.com>
Cc: linux-rdma@vger.kernel.org
Cc: linux-mm@kvack.org

John Hubbard (1):
  RDMA/umem: minor bug fix and cleanup in error handling paths

 drivers/infiniband/core/umem_odp.c | 24 +++++++++---------------
 1 file changed, 9 insertions(+), 15 deletions(-)

-- 
2.21.0


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH] RDMA/umem: minor bug fix and cleanup in error handling paths
  2019-03-02  3:27 [PATCH 0/1] RDMA/umem: minor bug fix and cleanup in error handling paths john.hubbard
@ 2019-03-02  3:27 ` john.hubbard
  2019-03-02 16:03   ` kbuild test robot
                     ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: john.hubbard @ 2019-03-02  3:27 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, LKML, John Hubbard, Ira Weiny, Jason Gunthorpe,
	Doug Ledford, linux-rdma

From: John Hubbard <jhubbard@nvidia.com>

1. Bug fix: the error handling release pages starting
at the first page that experienced an error.

2. Refinement: release_pages() is better than put_page()
in a loop.

3. Dead code removal: the check for (user_virt & ~page_mask)
is checking for a condition that can never happen,
because earlier:

    user_virt = user_virt & page_mask;

...so, remove that entire phrase.

4. Minor: As long as I'm here, shorten up a couple of long lines
in the same function, without harming the ability to
grep for the printed error message.

Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Doug Ledford <dledford@redhat.com>
Cc: linux-rdma@vger.kernel.org
Cc: linux-mm@kvack.org
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
---
 drivers/infiniband/core/umem_odp.c | 24 +++++++++---------------
 1 file changed, 9 insertions(+), 15 deletions(-)

diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/umem_odp.c
index acb882f279cb..294bf6676947 100644
--- a/drivers/infiniband/core/umem_odp.c
+++ b/drivers/infiniband/core/umem_odp.c
@@ -648,25 +648,17 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, u64 user_virt,
 
 		if (npages < 0) {
 			if (npages != -EAGAIN)
-				pr_warn("fail to get %zu user pages with error %d\n", gup_num_pages, npages);
+				pr_warn("fail to get %zu user pages with error %d\n",
+					gup_num_pages, npages);
 			else
-				pr_debug("fail to get %zu user pages with error %d\n", gup_num_pages, npages);
+				pr_debug("fail to get %zu user pages with error %d\n",
+					 gup_num_pages, npages);
 			break;
 		}
 
 		bcnt -= min_t(size_t, npages << PAGE_SHIFT, bcnt);
 		mutex_lock(&umem_odp->umem_mutex);
 		for (j = 0; j < npages; j++, user_virt += PAGE_SIZE) {
-			if (user_virt & ~page_mask) {
-				p += PAGE_SIZE;
-				if (page_to_phys(local_page_list[j]) != p) {
-					ret = -EFAULT;
-					break;
-				}
-				put_page(local_page_list[j]);
-				continue;
-			}
-
 			ret = ib_umem_odp_map_dma_single_page(
 					umem_odp, k, local_page_list[j],
 					access_mask, current_seq);
@@ -684,9 +676,11 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, u64 user_virt,
 		mutex_unlock(&umem_odp->umem_mutex);
 
 		if (ret < 0) {
-			/* Release left over pages when handling errors. */
-			for (++j; j < npages; ++j)
-				put_page(local_page_list[j]);
+			/*
+			 * Release pages, starting at the the first page
+			 * that experienced an error.
+			 */
+			release_pages(&local_page_list[j], npages - j);
 			break;
 		}
 	}
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH] RDMA/umem: minor bug fix and cleanup in error handling paths
  2019-03-02  3:27 ` [PATCH] " john.hubbard
@ 2019-03-02 16:03   ` kbuild test robot
  2019-03-02 16:14   ` kbuild test robot
  2019-03-02 20:24   ` [PATCH v2] " john.hubbard
  2 siblings, 0 replies; 21+ messages in thread
From: kbuild test robot @ 2019-03-02 16:03 UTC (permalink / raw)
  To: john.hubbard
  Cc: kbuild-all, linux-mm, Andrew Morton, LKML, John Hubbard,
	Ira Weiny, Jason Gunthorpe, Doug Ledford, linux-rdma

[-- Attachment #1: Type: text/plain, Size: 6905 bytes --]

Hi John,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on rdma/for-next]
[also build test ERROR on v5.0-rc8 next-20190301]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/john-hubbard-gmail-com/RDMA-umem-minor-bug-fix-and-cleanup-in-error-handling-paths/20190302-233314
base:   https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git for-next
config: i386-randconfig-x002-201908 (attached as .config)
compiler: gcc-8 (Debian 8.2.0-20) 8.2.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   drivers/infiniband/core/umem_odp.c: In function 'ib_umem_odp_map_dma_pages':
>> drivers/infiniband/core/umem_odp.c:684:4: error: implicit declaration of function 'release_pages'; did you mean 'release_task'? [-Werror=implicit-function-declaration]
       release_pages(&local_page_list[j], npages - j);
       ^~~~~~~~~~~~~
       release_task
   cc1: some warnings being treated as errors

vim +684 drivers/infiniband/core/umem_odp.c

   559	
   560	/**
   561	 * ib_umem_odp_map_dma_pages - Pin and DMA map userspace memory in an ODP MR.
   562	 *
   563	 * Pins the range of pages passed in the argument, and maps them to
   564	 * DMA addresses. The DMA addresses of the mapped pages is updated in
   565	 * umem_odp->dma_list.
   566	 *
   567	 * Returns the number of pages mapped in success, negative error code
   568	 * for failure.
   569	 * An -EAGAIN error code is returned when a concurrent mmu notifier prevents
   570	 * the function from completing its task.
   571	 * An -ENOENT error code indicates that userspace process is being terminated
   572	 * and mm was already destroyed.
   573	 * @umem_odp: the umem to map and pin
   574	 * @user_virt: the address from which we need to map.
   575	 * @bcnt: the minimal number of bytes to pin and map. The mapping might be
   576	 *        bigger due to alignment, and may also be smaller in case of an error
   577	 *        pinning or mapping a page. The actual pages mapped is returned in
   578	 *        the return value.
   579	 * @access_mask: bit mask of the requested access permissions for the given
   580	 *               range.
   581	 * @current_seq: the MMU notifiers sequance value for synchronization with
   582	 *               invalidations. the sequance number is read from
   583	 *               umem_odp->notifiers_seq before calling this function
   584	 */
   585	int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, u64 user_virt,
   586				      u64 bcnt, u64 access_mask,
   587				      unsigned long current_seq)
   588	{
   589		struct ib_umem *umem = &umem_odp->umem;
   590		struct task_struct *owning_process  = NULL;
   591		struct mm_struct *owning_mm = umem_odp->umem.owning_mm;
   592		struct page       **local_page_list = NULL;
   593		u64 page_mask, off;
   594		int j, k, ret = 0, start_idx, npages = 0, page_shift;
   595		unsigned int flags = 0;
   596		phys_addr_t p = 0;
   597	
   598		if (access_mask == 0)
   599			return -EINVAL;
   600	
   601		if (user_virt < ib_umem_start(umem) ||
   602		    user_virt + bcnt > ib_umem_end(umem))
   603			return -EFAULT;
   604	
   605		local_page_list = (struct page **)__get_free_page(GFP_KERNEL);
   606		if (!local_page_list)
   607			return -ENOMEM;
   608	
   609		page_shift = umem->page_shift;
   610		page_mask = ~(BIT(page_shift) - 1);
   611		off = user_virt & (~page_mask);
   612		user_virt = user_virt & page_mask;
   613		bcnt += off; /* Charge for the first page offset as well. */
   614	
   615		/*
   616		 * owning_process is allowed to be NULL, this means somehow the mm is
   617		 * existing beyond the lifetime of the originating process.. Presumably
   618		 * mmget_not_zero will fail in this case.
   619		 */
   620		owning_process = get_pid_task(umem_odp->per_mm->tgid, PIDTYPE_PID);
   621		if (WARN_ON(!mmget_not_zero(umem_odp->umem.owning_mm))) {
   622			ret = -EINVAL;
   623			goto out_put_task;
   624		}
   625	
   626		if (access_mask & ODP_WRITE_ALLOWED_BIT)
   627			flags |= FOLL_WRITE;
   628	
   629		start_idx = (user_virt - ib_umem_start(umem)) >> page_shift;
   630		k = start_idx;
   631	
   632		while (bcnt > 0) {
   633			const size_t gup_num_pages = min_t(size_t,
   634					(bcnt + BIT(page_shift) - 1) >> page_shift,
   635					PAGE_SIZE / sizeof(struct page *));
   636	
   637			down_read(&owning_mm->mmap_sem);
   638			/*
   639			 * Note: this might result in redundent page getting. We can
   640			 * avoid this by checking dma_list to be 0 before calling
   641			 * get_user_pages. However, this make the code much more
   642			 * complex (and doesn't gain us much performance in most use
   643			 * cases).
   644			 */
   645			npages = get_user_pages_remote(owning_process, owning_mm,
   646					user_virt, gup_num_pages,
   647					flags, local_page_list, NULL, NULL);
   648			up_read(&owning_mm->mmap_sem);
   649	
   650			if (npages < 0) {
   651				if (npages != -EAGAIN)
   652					pr_warn("fail to get %zu user pages with error %d\n",
   653						gup_num_pages, npages);
   654				else
   655					pr_debug("fail to get %zu user pages with error %d\n",
   656						 gup_num_pages, npages);
   657				break;
   658			}
   659	
   660			bcnt -= min_t(size_t, npages << PAGE_SHIFT, bcnt);
   661			mutex_lock(&umem_odp->umem_mutex);
   662			for (j = 0; j < npages; j++, user_virt += PAGE_SIZE) {
   663				ret = ib_umem_odp_map_dma_single_page(
   664						umem_odp, k, local_page_list[j],
   665						access_mask, current_seq);
   666				if (ret < 0) {
   667					if (ret != -EAGAIN)
   668						pr_warn("ib_umem_odp_map_dma_single_page failed with error %d\n", ret);
   669					else
   670						pr_debug("ib_umem_odp_map_dma_single_page failed with error %d\n", ret);
   671					break;
   672				}
   673	
   674				p = page_to_phys(local_page_list[j]);
   675				k++;
   676			}
   677			mutex_unlock(&umem_odp->umem_mutex);
   678	
   679			if (ret < 0) {
   680				/*
   681				 * Release pages, starting at the the first page
   682				 * that experienced an error.
   683				 */
 > 684				release_pages(&local_page_list[j], npages - j);
   685				break;
   686			}
   687		}
   688	
   689		if (ret >= 0) {
   690			if (npages < 0 && k == start_idx)
   691				ret = npages;
   692			else
   693				ret = k - start_idx;
   694		}
   695	
   696		mmput(owning_mm);
   697	out_put_task:
   698		if (owning_process)
   699			put_task_struct(owning_process);
   700		free_page((unsigned long)local_page_list);
   701		return ret;
   702	}
   703	EXPORT_SYMBOL(ib_umem_odp_map_dma_pages);
   704	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 36427 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] RDMA/umem: minor bug fix and cleanup in error handling paths
  2019-03-02  3:27 ` [PATCH] " john.hubbard
  2019-03-02 16:03   ` kbuild test robot
@ 2019-03-02 16:14   ` kbuild test robot
  2019-03-02 20:24   ` [PATCH v2] " john.hubbard
  2 siblings, 0 replies; 21+ messages in thread
From: kbuild test robot @ 2019-03-02 16:14 UTC (permalink / raw)
  To: john.hubbard
  Cc: kbuild-all, linux-mm, Andrew Morton, LKML, John Hubbard,
	Ira Weiny, Jason Gunthorpe, Doug Ledford, linux-rdma

[-- Attachment #1: Type: text/plain, Size: 7027 bytes --]

Hi John,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on rdma/for-next]
[also build test ERROR on v5.0-rc8 next-20190301]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/john-hubbard-gmail-com/RDMA-umem-minor-bug-fix-and-cleanup-in-error-handling-paths/20190302-233314
base:   https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git for-next
config: nds32-allyesconfig (attached as .config)
compiler: nds32le-linux-gcc (GCC) 6.4.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        GCC_VERSION=6.4.0 make.cross ARCH=nds32 

All errors (new ones prefixed by >>):

   drivers/infiniband/core/umem_odp.c: In function 'ib_umem_odp_map_dma_pages':
>> drivers/infiniband/core/umem_odp.c:684:4: error: implicit declaration of function 'release_pages' [-Werror=implicit-function-declaration]
       release_pages(&local_page_list[j], npages - j);
       ^~~~~~~~~~~~~
   cc1: some warnings being treated as errors

vim +/release_pages +684 drivers/infiniband/core/umem_odp.c

   559	
   560	/**
   561	 * ib_umem_odp_map_dma_pages - Pin and DMA map userspace memory in an ODP MR.
   562	 *
   563	 * Pins the range of pages passed in the argument, and maps them to
   564	 * DMA addresses. The DMA addresses of the mapped pages is updated in
   565	 * umem_odp->dma_list.
   566	 *
   567	 * Returns the number of pages mapped in success, negative error code
   568	 * for failure.
   569	 * An -EAGAIN error code is returned when a concurrent mmu notifier prevents
   570	 * the function from completing its task.
   571	 * An -ENOENT error code indicates that userspace process is being terminated
   572	 * and mm was already destroyed.
   573	 * @umem_odp: the umem to map and pin
   574	 * @user_virt: the address from which we need to map.
   575	 * @bcnt: the minimal number of bytes to pin and map. The mapping might be
   576	 *        bigger due to alignment, and may also be smaller in case of an error
   577	 *        pinning or mapping a page. The actual pages mapped is returned in
   578	 *        the return value.
   579	 * @access_mask: bit mask of the requested access permissions for the given
   580	 *               range.
   581	 * @current_seq: the MMU notifiers sequance value for synchronization with
   582	 *               invalidations. the sequance number is read from
   583	 *               umem_odp->notifiers_seq before calling this function
   584	 */
   585	int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, u64 user_virt,
   586				      u64 bcnt, u64 access_mask,
   587				      unsigned long current_seq)
   588	{
   589		struct ib_umem *umem = &umem_odp->umem;
   590		struct task_struct *owning_process  = NULL;
   591		struct mm_struct *owning_mm = umem_odp->umem.owning_mm;
   592		struct page       **local_page_list = NULL;
   593		u64 page_mask, off;
   594		int j, k, ret = 0, start_idx, npages = 0, page_shift;
   595		unsigned int flags = 0;
   596		phys_addr_t p = 0;
   597	
   598		if (access_mask == 0)
   599			return -EINVAL;
   600	
   601		if (user_virt < ib_umem_start(umem) ||
   602		    user_virt + bcnt > ib_umem_end(umem))
   603			return -EFAULT;
   604	
   605		local_page_list = (struct page **)__get_free_page(GFP_KERNEL);
   606		if (!local_page_list)
   607			return -ENOMEM;
   608	
   609		page_shift = umem->page_shift;
   610		page_mask = ~(BIT(page_shift) - 1);
   611		off = user_virt & (~page_mask);
   612		user_virt = user_virt & page_mask;
   613		bcnt += off; /* Charge for the first page offset as well. */
   614	
   615		/*
   616		 * owning_process is allowed to be NULL, this means somehow the mm is
   617		 * existing beyond the lifetime of the originating process.. Presumably
   618		 * mmget_not_zero will fail in this case.
   619		 */
   620		owning_process = get_pid_task(umem_odp->per_mm->tgid, PIDTYPE_PID);
   621		if (WARN_ON(!mmget_not_zero(umem_odp->umem.owning_mm))) {
   622			ret = -EINVAL;
   623			goto out_put_task;
   624		}
   625	
   626		if (access_mask & ODP_WRITE_ALLOWED_BIT)
   627			flags |= FOLL_WRITE;
   628	
   629		start_idx = (user_virt - ib_umem_start(umem)) >> page_shift;
   630		k = start_idx;
   631	
   632		while (bcnt > 0) {
   633			const size_t gup_num_pages = min_t(size_t,
   634					(bcnt + BIT(page_shift) - 1) >> page_shift,
   635					PAGE_SIZE / sizeof(struct page *));
   636	
   637			down_read(&owning_mm->mmap_sem);
   638			/*
   639			 * Note: this might result in redundent page getting. We can
   640			 * avoid this by checking dma_list to be 0 before calling
   641			 * get_user_pages. However, this make the code much more
   642			 * complex (and doesn't gain us much performance in most use
   643			 * cases).
   644			 */
   645			npages = get_user_pages_remote(owning_process, owning_mm,
   646					user_virt, gup_num_pages,
   647					flags, local_page_list, NULL, NULL);
   648			up_read(&owning_mm->mmap_sem);
   649	
   650			if (npages < 0) {
   651				if (npages != -EAGAIN)
   652					pr_warn("fail to get %zu user pages with error %d\n",
   653						gup_num_pages, npages);
   654				else
   655					pr_debug("fail to get %zu user pages with error %d\n",
   656						 gup_num_pages, npages);
   657				break;
   658			}
   659	
   660			bcnt -= min_t(size_t, npages << PAGE_SHIFT, bcnt);
   661			mutex_lock(&umem_odp->umem_mutex);
   662			for (j = 0; j < npages; j++, user_virt += PAGE_SIZE) {
   663				ret = ib_umem_odp_map_dma_single_page(
   664						umem_odp, k, local_page_list[j],
   665						access_mask, current_seq);
   666				if (ret < 0) {
   667					if (ret != -EAGAIN)
   668						pr_warn("ib_umem_odp_map_dma_single_page failed with error %d\n", ret);
   669					else
   670						pr_debug("ib_umem_odp_map_dma_single_page failed with error %d\n", ret);
   671					break;
   672				}
   673	
   674				p = page_to_phys(local_page_list[j]);
   675				k++;
   676			}
   677			mutex_unlock(&umem_odp->umem_mutex);
   678	
   679			if (ret < 0) {
   680				/*
   681				 * Release pages, starting at the the first page
   682				 * that experienced an error.
   683				 */
 > 684				release_pages(&local_page_list[j], npages - j);
   685				break;
   686			}
   687		}
   688	
   689		if (ret >= 0) {
   690			if (npages < 0 && k == start_idx)
   691				ret = npages;
   692			else
   693				ret = k - start_idx;
   694		}
   695	
   696		mmput(owning_mm);
   697	out_put_task:
   698		if (owning_process)
   699			put_task_struct(owning_process);
   700		free_page((unsigned long)local_page_list);
   701		return ret;
   702	}
   703	EXPORT_SYMBOL(ib_umem_odp_map_dma_pages);
   704	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 49915 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] RDMA/umem: minor bug fix and cleanup in error handling paths
  2019-03-02 20:24   ` [PATCH v2] " john.hubbard
@ 2019-03-02 19:44     ` Ira Weiny
  2019-03-03  9:52       ` Artemy Kovalyov
  0 siblings, 1 reply; 21+ messages in thread
From: Ira Weiny @ 2019-03-02 19:44 UTC (permalink / raw)
  To: john.hubbard
  Cc: linux-mm, Andrew Morton, LKML, John Hubbard, Jason Gunthorpe,
	Doug Ledford, linux-rdma, artemyko

FWIW I don't have ODP hardware either.  So I can't test this either.

On Sat, Mar 02, 2019 at 12:24:35PM -0800, john.hubbard@gmail.com wrote:
> From: John Hubbard <jhubbard@nvidia.com>
> 
> 1. Bug fix: the error handling release pages starting
> at the first page that experienced an error.
> 
> 2. Refinement: release_pages() is better than put_page()
> in a loop.
> 
> 3. Dead code removal: the check for (user_virt & ~page_mask)
> is checking for a condition that can never happen,
> because earlier:
> 
>     user_virt = user_virt & page_mask;
> 
> ...so, remove that entire phrase.
> 
> 4. Minor: As long as I'm here, shorten up a couple of long lines
> in the same function, without harming the ability to
> grep for the printed error message.
> 
> Cc: Ira Weiny <ira.weiny@intel.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Doug Ledford <dledford@redhat.com>
> Cc: linux-rdma@vger.kernel.org
> Cc: linux-mm@kvack.org
> Signed-off-by: John Hubbard <jhubbard@nvidia.com>
> ---
> 
> v2: Fixes a kbuild test robot reported build failure, by directly
>     including pagemap.h
> 
>  drivers/infiniband/core/umem_odp.c | 25 ++++++++++---------------
>  1 file changed, 10 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/umem_odp.c
> index acb882f279cb..83872c1f3f2c 100644
> --- a/drivers/infiniband/core/umem_odp.c
> +++ b/drivers/infiniband/core/umem_odp.c
> @@ -40,6 +40,7 @@
>  #include <linux/vmalloc.h>
>  #include <linux/hugetlb.h>
>  #include <linux/interval_tree_generic.h>
> +#include <linux/pagemap.h>
>  
>  #include <rdma/ib_verbs.h>
>  #include <rdma/ib_umem.h>
> @@ -648,25 +649,17 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, u64 user_virt,
>  
>  		if (npages < 0) {
>  			if (npages != -EAGAIN)
> -				pr_warn("fail to get %zu user pages with error %d\n", gup_num_pages, npages);
> +				pr_warn("fail to get %zu user pages with error %d\n",
> +					gup_num_pages, npages);
>  			else
> -				pr_debug("fail to get %zu user pages with error %d\n", gup_num_pages, npages);
> +				pr_debug("fail to get %zu user pages with error %d\n",
> +					 gup_num_pages, npages);
>  			break;
>  		}
>  
>  		bcnt -= min_t(size_t, npages << PAGE_SHIFT, bcnt);
>  		mutex_lock(&umem_odp->umem_mutex);
>  		for (j = 0; j < npages; j++, user_virt += PAGE_SIZE) {
> -			if (user_virt & ~page_mask) {
> -				p += PAGE_SIZE;
> -				if (page_to_phys(local_page_list[j]) != p) {
> -					ret = -EFAULT;
> -					break;
> -				}
> -				put_page(local_page_list[j]);
> -				continue;
> -			}
> -

I think this is trying to account for compound pages. (ie page_mask could
represent more than PAGE_SIZE which is what user_virt is being incrimented by.)
But putting the page in that case seems to be the wrong thing to do?

Yes this was added by Artemy[1] now cc'ed.

>  			ret = ib_umem_odp_map_dma_single_page(
>  					umem_odp, k, local_page_list[j],
>  					access_mask, current_seq);
> @@ -684,9 +677,11 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, u64 user_virt,
>  		mutex_unlock(&umem_odp->umem_mutex);
>  
>  		if (ret < 0) {
> -			/* Release left over pages when handling errors. */
> -			for (++j; j < npages; ++j)
> -				put_page(local_page_list[j]);
> +			/*
> +			 * Release pages, starting at the the first page
> +			 * that experienced an error.
> +			 */
> +			release_pages(&local_page_list[j], npages - j);

My concern here is that release_pages handle compound pages, perhaps
differently from the above code so calling it may may not work?  But I've not
really spent a lot of time on it...

:-/

Ira

[1]

commit 403cd12e2cf759ead5cbdcb62bf9872b9618d400
Author: Artemy Kovalyov <artemyko@mellanox.com>
Date:   Wed Apr 5 09:23:55 2017 +0300

    IB/umem: Add contiguous ODP support

    Currenlty ODP supports only regular MMU pages.
    Add ODP support for regions consisting of physically contiguous chunks
    of arbitrary order (huge pages for instance) to improve performance.

    Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
    Signed-off-by: Leon Romanovsky <leon@kernel.org>
    Signed-off-by: Doug Ledford <dledford@redhat.com>



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2] RDMA/umem: minor bug fix and cleanup in error handling paths
  2019-03-02  3:27 ` [PATCH] " john.hubbard
  2019-03-02 16:03   ` kbuild test robot
  2019-03-02 16:14   ` kbuild test robot
@ 2019-03-02 20:24   ` john.hubbard
  2019-03-02 19:44     ` Ira Weiny
  2 siblings, 1 reply; 21+ messages in thread
From: john.hubbard @ 2019-03-02 20:24 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, LKML, John Hubbard, Ira Weiny, Jason Gunthorpe,
	Doug Ledford, linux-rdma

From: John Hubbard <jhubbard@nvidia.com>

1. Bug fix: the error handling release pages starting
at the first page that experienced an error.

2. Refinement: release_pages() is better than put_page()
in a loop.

3. Dead code removal: the check for (user_virt & ~page_mask)
is checking for a condition that can never happen,
because earlier:

    user_virt = user_virt & page_mask;

...so, remove that entire phrase.

4. Minor: As long as I'm here, shorten up a couple of long lines
in the same function, without harming the ability to
grep for the printed error message.

Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Doug Ledford <dledford@redhat.com>
Cc: linux-rdma@vger.kernel.org
Cc: linux-mm@kvack.org
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
---

v2: Fixes a kbuild test robot reported build failure, by directly
    including pagemap.h

 drivers/infiniband/core/umem_odp.c | 25 ++++++++++---------------
 1 file changed, 10 insertions(+), 15 deletions(-)

diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/umem_odp.c
index acb882f279cb..83872c1f3f2c 100644
--- a/drivers/infiniband/core/umem_odp.c
+++ b/drivers/infiniband/core/umem_odp.c
@@ -40,6 +40,7 @@
 #include <linux/vmalloc.h>
 #include <linux/hugetlb.h>
 #include <linux/interval_tree_generic.h>
+#include <linux/pagemap.h>
 
 #include <rdma/ib_verbs.h>
 #include <rdma/ib_umem.h>
@@ -648,25 +649,17 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, u64 user_virt,
 
 		if (npages < 0) {
 			if (npages != -EAGAIN)
-				pr_warn("fail to get %zu user pages with error %d\n", gup_num_pages, npages);
+				pr_warn("fail to get %zu user pages with error %d\n",
+					gup_num_pages, npages);
 			else
-				pr_debug("fail to get %zu user pages with error %d\n", gup_num_pages, npages);
+				pr_debug("fail to get %zu user pages with error %d\n",
+					 gup_num_pages, npages);
 			break;
 		}
 
 		bcnt -= min_t(size_t, npages << PAGE_SHIFT, bcnt);
 		mutex_lock(&umem_odp->umem_mutex);
 		for (j = 0; j < npages; j++, user_virt += PAGE_SIZE) {
-			if (user_virt & ~page_mask) {
-				p += PAGE_SIZE;
-				if (page_to_phys(local_page_list[j]) != p) {
-					ret = -EFAULT;
-					break;
-				}
-				put_page(local_page_list[j]);
-				continue;
-			}
-
 			ret = ib_umem_odp_map_dma_single_page(
 					umem_odp, k, local_page_list[j],
 					access_mask, current_seq);
@@ -684,9 +677,11 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, u64 user_virt,
 		mutex_unlock(&umem_odp->umem_mutex);
 
 		if (ret < 0) {
-			/* Release left over pages when handling errors. */
-			for (++j; j < npages; ++j)
-				put_page(local_page_list[j]);
+			/*
+			 * Release pages, starting at the the first page
+			 * that experienced an error.
+			 */
+			release_pages(&local_page_list[j], npages - j);
 			break;
 		}
 	}
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] RDMA/umem: minor bug fix and cleanup in error handling paths
  2019-03-02 19:44     ` Ira Weiny
@ 2019-03-03  9:52       ` Artemy Kovalyov
  2019-03-03 16:55         ` Ira Weiny
  2019-03-03 22:37         ` John Hubbard
  0 siblings, 2 replies; 21+ messages in thread
From: Artemy Kovalyov @ 2019-03-03  9:52 UTC (permalink / raw)
  To: Ira Weiny, john.hubbard
  Cc: linux-mm, Andrew Morton, LKML, John Hubbard, Jason Gunthorpe,
	Doug Ledford, linux-rdma



On 02/03/2019 21:44, Ira Weiny wrote:
> 
> On Sat, Mar 02, 2019 at 12:24:35PM -0800, john.hubbard@gmail.com wrote:
>> From: John Hubbard <jhubbard@nvidia.com>
>>
>> ...
>> 3. Dead code removal: the check for (user_virt & ~page_mask)
>> is checking for a condition that can never happen,
>> because earlier:
>>
>>      user_virt = user_virt & page_mask;
>>
>> ...so, remove that entire phrase.
>>
>>   
>>   		bcnt -= min_t(size_t, npages << PAGE_SHIFT, bcnt);
>>   		mutex_lock(&umem_odp->umem_mutex);
>>   		for (j = 0; j < npages; j++, user_virt += PAGE_SIZE) {
>> -			if (user_virt & ~page_mask) {
>> -				p += PAGE_SIZE;
>> -				if (page_to_phys(local_page_list[j]) != p) {
>> -					ret = -EFAULT;
>> -					break;
>> -				}
>> -				put_page(local_page_list[j]);
>> -				continue;
>> -			}
>> -
> 
> I think this is trying to account for compound pages. (ie page_mask could
> represent more than PAGE_SIZE which is what user_virt is being incrimented by.)
> But putting the page in that case seems to be the wrong thing to do?
> 
> Yes this was added by Artemy[1] now cc'ed.

Right, this is for huge pages, please keep it.
put_page() needed to decrement refcount of the head page.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] RDMA/umem: minor bug fix and cleanup in error handling paths
  2019-03-03  9:52       ` Artemy Kovalyov
@ 2019-03-03 16:55         ` Ira Weiny
  2019-03-04 23:11           ` John Hubbard
  2019-03-03 22:37         ` John Hubbard
  1 sibling, 1 reply; 21+ messages in thread
From: Ira Weiny @ 2019-03-03 16:55 UTC (permalink / raw)
  To: Artemy Kovalyov
  Cc: john.hubbard, linux-mm, Andrew Morton, LKML, John Hubbard,
	Jason Gunthorpe, Doug Ledford, linux-rdma

On Sun, Mar 03, 2019 at 11:52:41AM +0200, Artemy Kovalyov wrote:
> 
> 
> On 02/03/2019 21:44, Ira Weiny wrote:
> > 
> > On Sat, Mar 02, 2019 at 12:24:35PM -0800, john.hubbard@gmail.com wrote:
> > > From: John Hubbard <jhubbard@nvidia.com>
> > > 
> > > ...
> > > 3. Dead code removal: the check for (user_virt & ~page_mask)
> > > is checking for a condition that can never happen,
> > > because earlier:
> > > 
> > >      user_virt = user_virt & page_mask;
> > > 
> > > ...so, remove that entire phrase.
> > > 
> > >   		bcnt -= min_t(size_t, npages << PAGE_SHIFT, bcnt);
> > >   		mutex_lock(&umem_odp->umem_mutex);
> > >   		for (j = 0; j < npages; j++, user_virt += PAGE_SIZE) {
> > > -			if (user_virt & ~page_mask) {
> > > -				p += PAGE_SIZE;
> > > -				if (page_to_phys(local_page_list[j]) != p) {
> > > -					ret = -EFAULT;
> > > -					break;
> > > -				}
> > > -				put_page(local_page_list[j]);
> > > -				continue;
> > > -			}
> > > -
> > 
> > I think this is trying to account for compound pages. (ie page_mask could
> > represent more than PAGE_SIZE which is what user_virt is being incrimented by.)
> > But putting the page in that case seems to be the wrong thing to do?
> > 
> > Yes this was added by Artemy[1] now cc'ed.
> 
> Right, this is for huge pages, please keep it.
> put_page() needed to decrement refcount of the head page.

You mean decrement the refcount of the _non_-head pages?

Ira

> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] RDMA/umem: minor bug fix and cleanup in error handling paths
  2019-03-03  9:52       ` Artemy Kovalyov
  2019-03-03 16:55         ` Ira Weiny
@ 2019-03-03 22:37         ` John Hubbard
  2019-03-04  6:44           ` Leon Romanovsky
  2019-03-06  1:02           ` Artemy Kovalyov
  1 sibling, 2 replies; 21+ messages in thread
From: John Hubbard @ 2019-03-03 22:37 UTC (permalink / raw)
  To: Artemy Kovalyov, Ira Weiny, john.hubbard
  Cc: linux-mm, Andrew Morton, LKML, Jason Gunthorpe, Doug Ledford, linux-rdma

On 3/3/19 1:52 AM, Artemy Kovalyov wrote:
> 
> 
> On 02/03/2019 21:44, Ira Weiny wrote:
>>
>> On Sat, Mar 02, 2019 at 12:24:35PM -0800, john.hubbard@gmail.com wrote:
>>> From: John Hubbard <jhubbard@nvidia.com>
>>>
>>> ...
>>> 3. Dead code removal: the check for (user_virt & ~page_mask)
>>> is checking for a condition that can never happen,
>>> because earlier:
>>>
>>>      user_virt = user_virt & page_mask;
>>>
>>> ...so, remove that entire phrase.
>>>
>>>           bcnt -= min_t(size_t, npages << PAGE_SHIFT, bcnt);
>>>           mutex_lock(&umem_odp->umem_mutex);
>>>           for (j = 0; j < npages; j++, user_virt += PAGE_SIZE) {
>>> -            if (user_virt & ~page_mask) {
>>> -                p += PAGE_SIZE;
>>> -                if (page_to_phys(local_page_list[j]) != p) {
>>> -                    ret = -EFAULT;
>>> -                    break;
>>> -                }
>>> -                put_page(local_page_list[j]);
>>> -                continue;
>>> -            }
>>> -
>>
>> I think this is trying to account for compound pages. (ie page_mask could
>> represent more than PAGE_SIZE which is what user_virt is being incrimented by.)
>> But putting the page in that case seems to be the wrong thing to do?
>>
>> Yes this was added by Artemy[1] now cc'ed.
> 
> Right, this is for huge pages, please keep it.
> put_page() needed to decrement refcount of the head page.
> 

OK, thanks for explaining! Artemy, while you're here, any thoughts about the
release_pages, and the change of the starting point, from the other part of the 
patch:

@@ -684,9 +677,11 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, 
u64 user_virt,
	mutex_unlock(&umem_odp->umem_mutex);

  		if (ret < 0) {
-			/* Release left over pages when handling errors. */
-			for (++j; j < npages; ++j)
-				put_page(local_page_list[j]);
+			/*
+			 * Release pages, starting at the the first page
+			 * that experienced an error.
+			 */
+			release_pages(&local_page_list[j], npages - j);
  			break;
  		}
  	}

?

thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] RDMA/umem: minor bug fix and cleanup in error handling paths
  2019-03-03 22:37         ` John Hubbard
@ 2019-03-04  6:44           ` Leon Romanovsky
  2019-03-06  1:02           ` Artemy Kovalyov
  1 sibling, 0 replies; 21+ messages in thread
From: Leon Romanovsky @ 2019-03-04  6:44 UTC (permalink / raw)
  To: John Hubbard
  Cc: Artemy Kovalyov, Ira Weiny, john.hubbard, linux-mm,
	Andrew Morton, LKML, Jason Gunthorpe, Doug Ledford, linux-rdma

[-- Attachment #1: Type: text/plain, Size: 2474 bytes --]

On Sun, Mar 03, 2019 at 02:37:33PM -0800, John Hubbard wrote:
> On 3/3/19 1:52 AM, Artemy Kovalyov wrote:
> >
> >
> > On 02/03/2019 21:44, Ira Weiny wrote:
> > >
> > > On Sat, Mar 02, 2019 at 12:24:35PM -0800, john.hubbard@gmail.com wrote:
> > > > From: John Hubbard <jhubbard@nvidia.com>
> > > >
> > > > ...
> > > > 3. Dead code removal: the check for (user_virt & ~page_mask)
> > > > is checking for a condition that can never happen,
> > > > because earlier:
> > > >
> > > >      user_virt = user_virt & page_mask;
> > > >
> > > > ...so, remove that entire phrase.
> > > >
> > > >           bcnt -= min_t(size_t, npages << PAGE_SHIFT, bcnt);
> > > >           mutex_lock(&umem_odp->umem_mutex);
> > > >           for (j = 0; j < npages; j++, user_virt += PAGE_SIZE) {
> > > > -            if (user_virt & ~page_mask) {
> > > > -                p += PAGE_SIZE;
> > > > -                if (page_to_phys(local_page_list[j]) != p) {
> > > > -                    ret = -EFAULT;
> > > > -                    break;
> > > > -                }
> > > > -                put_page(local_page_list[j]);
> > > > -                continue;
> > > > -            }
> > > > -
> > >
> > > I think this is trying to account for compound pages. (ie page_mask could
> > > represent more than PAGE_SIZE which is what user_virt is being incrimented by.)
> > > But putting the page in that case seems to be the wrong thing to do?
> > >
> > > Yes this was added by Artemy[1] now cc'ed.
> >
> > Right, this is for huge pages, please keep it.
> > put_page() needed to decrement refcount of the head page.
> >
>
> OK, thanks for explaining! Artemy, while you're here, any thoughts about the
> release_pages, and the change of the starting point, from the other part
> of the patch:

Your release pages code is right fix, it will be great if you prepare
proper and standalone patch.

Thanks

>
> @@ -684,9 +677,11 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp
> *umem_odp, u64 user_virt,
> 	mutex_unlock(&umem_odp->umem_mutex);
>
>  		if (ret < 0) {
> -			/* Release left over pages when handling errors. */
> -			for (++j; j < npages; ++j)
> -				put_page(local_page_list[j]);
> +			/*
> +			 * Release pages, starting at the the first page
> +			 * that experienced an error.
> +			 */
> +			release_pages(&local_page_list[j], npages - j);
>  			break;
>  		}
>  	}
>
> ?
>
> thanks,
> --
> John Hubbard
> NVIDIA
>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] RDMA/umem: minor bug fix and cleanup in error handling paths
  2019-03-04 23:11           ` John Hubbard
@ 2019-03-04 20:13             ` Ira Weiny
  2019-03-05 20:10               ` John Hubbard
  2019-03-04 23:36             ` John Hubbard
  2019-03-05  0:53             ` Jason Gunthorpe
  2 siblings, 1 reply; 21+ messages in thread
From: Ira Weiny @ 2019-03-04 20:13 UTC (permalink / raw)
  To: John Hubbard
  Cc: Artemy Kovalyov, john.hubbard, linux-mm, Andrew Morton, LKML,
	Jason Gunthorpe, Doug Ledford, linux-rdma

On Mon, Mar 04, 2019 at 03:11:05PM -0800, John Hubbard wrote:
> On 3/3/19 8:55 AM, Ira Weiny wrote:
> > On Sun, Mar 03, 2019 at 11:52:41AM +0200, Artemy Kovalyov wrote:
> >>
> >>
> >> On 02/03/2019 21:44, Ira Weiny wrote:
> >>>
> >>> On Sat, Mar 02, 2019 at 12:24:35PM -0800, john.hubbard@gmail.com wrote:
> >>>> From: John Hubbard <jhubbard@nvidia.com>
> >>>>
> >>>> ...
> >>>> 3. Dead code removal: the check for (user_virt & ~page_mask)
> >>>> is checking for a condition that can never happen,
> >>>> because earlier:
> >>>>
> >>>>      user_virt = user_virt & page_mask;
> >>>>
> >>>> ...so, remove that entire phrase.
> >>>>
> >>>>   		bcnt -= min_t(size_t, npages << PAGE_SHIFT, bcnt);
> >>>>   		mutex_lock(&umem_odp->umem_mutex);
> >>>>   		for (j = 0; j < npages; j++, user_virt += PAGE_SIZE) {
> >>>> -			if (user_virt & ~page_mask) {
> >>>> -				p += PAGE_SIZE;
> >>>> -				if (page_to_phys(local_page_list[j]) != p) {
> >>>> -					ret = -EFAULT;
> >>>> -					break;
> >>>> -				}
> >>>> -				put_page(local_page_list[j]);
> >>>> -				continue;
> >>>> -			}
> >>>> -
> >>>
> >>> I think this is trying to account for compound pages. (ie page_mask could
> >>> represent more than PAGE_SIZE which is what user_virt is being incrimented by.)
> >>> But putting the page in that case seems to be the wrong thing to do?
> >>>
> >>> Yes this was added by Artemy[1] now cc'ed.
> >>
> >> Right, this is for huge pages, please keep it.
> >> put_page() needed to decrement refcount of the head page.
> > 
> > You mean decrement the refcount of the _non_-head pages?
> > 
> > Ira
> > 
> 
> Actually, I'm sure Artemy means head page, because put_page() always
> operates on the head page. 
> 
> And this reminds me that I have a problem to solve nearby: get_user_pages
> on huge pages increments the page->_refcount *for each tail page* as well.
> That's a minor problem for my put_user_page() 
> patchset, because my approach so far assumed that I could just change us
> over to:
> 
> get_user_page(): increments page->_refcount by a large amount (1024)
> 
> put_user_page(): decrements page->_refcount by a large amount (1024)
> 
> ...and just stop doing the odd (to me) technique of incrementing once for
> each tail page. I cannot see any reason why that's actually required, as
> opposed to just "raise the page->_refcount enough to avoid losing the head
> page too soon".

What about splitting a huge page?

From Documention/vm/transhuge.rst

<quoute>
split_huge_page internally has to distribute the refcounts in the head
page to the tail pages before clearing all PG_head/tail bits from the page
structures. It can be done easily for refcounts taken by page table
entries. But we don't have enough information on how to distribute any
additional pins (i.e. from get_user_pages). split_huge_page() fails any
requests to split pinned huge page: it expects page count to be equal to
sum of mapcount of all sub-pages plus one (split_huge_page caller must
have reference for head page).
</quote>

FWIW, I'm not sure why it needs to "store" the reference in the head page for
this.  I don't see any check to make sure the ref has been "stored" but I'm not
really familiar with the compound page code yet.

Ira

> 
> However, it may be tricky to do this in one pass. Probably at first, I'll have
> to do this horrible thing approach:
> 
> get_user_page(): increments page->_refcount by a large amount (1024)
> 
> put_user_page(): decrements page->_refcount by a large amount (1024) MULTIPLIED
>                  by the number of tail pages. argghhh that's ugly.
> 
> thanks,
> -- 
> John Hubbard
> NVIDIA
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] RDMA/umem: minor bug fix and cleanup in error handling paths
  2019-03-03 16:55         ` Ira Weiny
@ 2019-03-04 23:11           ` John Hubbard
  2019-03-04 20:13             ` Ira Weiny
                               ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: John Hubbard @ 2019-03-04 23:11 UTC (permalink / raw)
  To: Ira Weiny, Artemy Kovalyov
  Cc: john.hubbard, linux-mm, Andrew Morton, LKML, Jason Gunthorpe,
	Doug Ledford, linux-rdma

On 3/3/19 8:55 AM, Ira Weiny wrote:
> On Sun, Mar 03, 2019 at 11:52:41AM +0200, Artemy Kovalyov wrote:
>>
>>
>> On 02/03/2019 21:44, Ira Weiny wrote:
>>>
>>> On Sat, Mar 02, 2019 at 12:24:35PM -0800, john.hubbard@gmail.com wrote:
>>>> From: John Hubbard <jhubbard@nvidia.com>
>>>>
>>>> ...
>>>> 3. Dead code removal: the check for (user_virt & ~page_mask)
>>>> is checking for a condition that can never happen,
>>>> because earlier:
>>>>
>>>>      user_virt = user_virt & page_mask;
>>>>
>>>> ...so, remove that entire phrase.
>>>>
>>>>   		bcnt -= min_t(size_t, npages << PAGE_SHIFT, bcnt);
>>>>   		mutex_lock(&umem_odp->umem_mutex);
>>>>   		for (j = 0; j < npages; j++, user_virt += PAGE_SIZE) {
>>>> -			if (user_virt & ~page_mask) {
>>>> -				p += PAGE_SIZE;
>>>> -				if (page_to_phys(local_page_list[j]) != p) {
>>>> -					ret = -EFAULT;
>>>> -					break;
>>>> -				}
>>>> -				put_page(local_page_list[j]);
>>>> -				continue;
>>>> -			}
>>>> -
>>>
>>> I think this is trying to account for compound pages. (ie page_mask could
>>> represent more than PAGE_SIZE which is what user_virt is being incrimented by.)
>>> But putting the page in that case seems to be the wrong thing to do?
>>>
>>> Yes this was added by Artemy[1] now cc'ed.
>>
>> Right, this is for huge pages, please keep it.
>> put_page() needed to decrement refcount of the head page.
> 
> You mean decrement the refcount of the _non_-head pages?
> 
> Ira
> 

Actually, I'm sure Artemy means head page, because put_page() always
operates on the head page. 

And this reminds me that I have a problem to solve nearby: get_user_pages
on huge pages increments the page->_refcount *for each tail page* as well.
That's a minor problem for my put_user_page() 
patchset, because my approach so far assumed that I could just change us
over to:

get_user_page(): increments page->_refcount by a large amount (1024)

put_user_page(): decrements page->_refcount by a large amount (1024)

...and just stop doing the odd (to me) technique of incrementing once for
each tail page. I cannot see any reason why that's actually required, as
opposed to just "raise the page->_refcount enough to avoid losing the head
page too soon".

However, it may be tricky to do this in one pass. Probably at first, I'll have
to do this horrible thing approach:

get_user_page(): increments page->_refcount by a large amount (1024)

put_user_page(): decrements page->_refcount by a large amount (1024) MULTIPLIED
                 by the number of tail pages. argghhh that's ugly.

thanks,
-- 
John Hubbard
NVIDIA


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] RDMA/umem: minor bug fix and cleanup in error handling paths
  2019-03-04 23:11           ` John Hubbard
  2019-03-04 20:13             ` Ira Weiny
@ 2019-03-04 23:36             ` John Hubbard
  2019-03-05  0:53             ` Jason Gunthorpe
  2 siblings, 0 replies; 21+ messages in thread
From: John Hubbard @ 2019-03-04 23:36 UTC (permalink / raw)
  To: Ira Weiny, Artemy Kovalyov
  Cc: john.hubbard, linux-mm, Andrew Morton, LKML, Jason Gunthorpe,
	Doug Ledford, linux-rdma

On 3/4/19 3:11 PM, John Hubbard wrote:
> On 3/3/19 8:55 AM, Ira Weiny wrote:
>> On Sun, Mar 03, 2019 at 11:52:41AM +0200, Artemy Kovalyov wrote:
>>>
>>>
>>> On 02/03/2019 21:44, Ira Weiny wrote:
>>>>
>>>> On Sat, Mar 02, 2019 at 12:24:35PM -0800, john.hubbard@gmail.com wrote:
>>>>> From: John Hubbard <jhubbard@nvidia.com>
>>>>>
>>>>> ...
>>>>> 3. Dead code removal: the check for (user_virt & ~page_mask)
>>>>> is checking for a condition that can never happen,
>>>>> because earlier:
>>>>>
>>>>>      user_virt = user_virt & page_mask;
>>>>>
>>>>> ...so, remove that entire phrase.
>>>>>
>>>>>   		bcnt -= min_t(size_t, npages << PAGE_SHIFT, bcnt);
>>>>>   		mutex_lock(&umem_odp->umem_mutex);
>>>>>   		for (j = 0; j < npages; j++, user_virt += PAGE_SIZE) {
>>>>> -			if (user_virt & ~page_mask) {
>>>>> -				p += PAGE_SIZE;
>>>>> -				if (page_to_phys(local_page_list[j]) != p) {
>>>>> -					ret = -EFAULT;
>>>>> -					break;
>>>>> -				}
>>>>> -				put_page(local_page_list[j]);
>>>>> -				continue;
>>>>> -			}
>>>>> -
>>>>
>>>> I think this is trying to account for compound pages. (ie page_mask could
>>>> represent more than PAGE_SIZE which is what user_virt is being incrimented by.)
>>>> But putting the page in that case seems to be the wrong thing to do?
>>>>
>>>> Yes this was added by Artemy[1] now cc'ed.
>>>
>>> Right, this is for huge pages, please keep it.
>>> put_page() needed to decrement refcount of the head page.
>>
>> You mean decrement the refcount of the _non_-head pages?
>>
>> Ira
>>
> 
> Actually, I'm sure Artemy means head page, because put_page() always
> operates on the head page. 
> 
> And this reminds me that I have a problem to solve nearby: get_user_pages
> on huge pages increments the page->_refcount *for each tail page* as well.
> That's a minor problem for my put_user_page() 
> patchset, because my approach so far assumed that I could just change us
> over to:
> 
> get_user_page(): increments page->_refcount by a large amount (1024)
> 
> put_user_page(): decrements page->_refcount by a large amount (1024)
> 
> ...and just stop doing the odd (to me) technique of incrementing once for
> each tail page. I cannot see any reason why that's actually required, as
> opposed to just "raise the page->_refcount enough to avoid losing the head
> page too soon".
> 
> However, it may be tricky to do this in one pass. Probably at first, I'll have
> to do this horrible thing approach:
> 
> get_user_page(): increments page->_refcount by a large amount (1024)
> 
> put_user_page(): decrements page->_refcount by a large amount (1024) MULTIPLIED
>                  by the number of tail pages. argghhh that's ugly.
> 

I see that this is still not stated quite right.

...to clarify, I mean to leave the existing behavior alone. So it would be
the call sites (not put_user_page as the above says) that would be doing all
that decrementing. The call sites know how many decrements are appropriate.

Unless someone thinks of a clever way to clean this up in one shot. I'm not
really seeing any way.

thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] RDMA/umem: minor bug fix and cleanup in error handling paths
  2019-03-04 23:11           ` John Hubbard
  2019-03-04 20:13             ` Ira Weiny
  2019-03-04 23:36             ` John Hubbard
@ 2019-03-05  0:53             ` Jason Gunthorpe
  2 siblings, 0 replies; 21+ messages in thread
From: Jason Gunthorpe @ 2019-03-05  0:53 UTC (permalink / raw)
  To: John Hubbard
  Cc: Ira Weiny, Artemy Kovalyov, john.hubbard, linux-mm,
	Andrew Morton, LKML, Doug Ledford, linux-rdma

On Mon, Mar 04, 2019 at 03:11:05PM -0800, John Hubbard wrote:

> get_user_page(): increments page->_refcount by a large amount (1024)
> 
> put_user_page(): decrements page->_refcount by a large amount (1024)
> 
> ...and just stop doing the odd (to me) technique of incrementing once for
> each tail page. I cannot see any reason why that's actually required, as
> opposed to just "raise the page->_refcount enough to avoid losing the head
> page too soon".

I'd very much like to see this in the infiniband umem code - the extra
work and cost of touching every page in a huge page is very much
undesired.

Jason

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] RDMA/umem: minor bug fix and cleanup in error handling paths
  2019-03-04 20:13             ` Ira Weiny
@ 2019-03-05 20:10               ` John Hubbard
  0 siblings, 0 replies; 21+ messages in thread
From: John Hubbard @ 2019-03-05 20:10 UTC (permalink / raw)
  To: Ira Weiny
  Cc: Artemy Kovalyov, john.hubbard, linux-mm, Andrew Morton, LKML,
	Jason Gunthorpe, Doug Ledford, linux-rdma

On 3/4/19 12:13 PM, Ira Weiny wrote:
[snip]
>> And this reminds me that I have a problem to solve nearby: get_user_pages
>> on huge pages increments the page->_refcount *for each tail page* as well.
>> That's a minor problem for my put_user_page() 
>> patchset, because my approach so far assumed that I could just change us
>> over to:
>>
>> get_user_page(): increments page->_refcount by a large amount (1024)
>>
>> put_user_page(): decrements page->_refcount by a large amount (1024)
>>
>> ...and just stop doing the odd (to me) technique of incrementing once for
>> each tail page. I cannot see any reason why that's actually required, as
>> opposed to just "raise the page->_refcount enough to avoid losing the head
>> page too soon".
> 
> What about splitting a huge page?
> 
> From Documention/vm/transhuge.rst
> 
> <quoute>
> split_huge_page internally has to distribute the refcounts in the head
> page to the tail pages before clearing all PG_head/tail bits from the page
> structures. It can be done easily for refcounts taken by page table
> entries. But we don't have enough information on how to distribute any
> additional pins (i.e. from get_user_pages). split_huge_page() fails any
> requests to split pinned huge page: it expects page count to be equal to
> sum of mapcount of all sub-pages plus one (split_huge_page caller must
> have reference for head page).
> </quote>
> 

heh, so in the end, split_huge_page just needs enough information to say
"no" for gup pages. So as long as page->_refcount avoids one particular
value, the code keeps working. :)


> FWIW, I'm not sure why it needs to "store" the reference in the head page for
> this.  I don't see any check to make sure the ref has been "stored" but I'm not
> really familiar with the compound page code yet.
> 
> Ira
> 

Thanks for peeking at this, I'll look deeper too.

thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] RDMA/umem: minor bug fix and cleanup in error handling paths
  2019-03-03 22:37         ` John Hubbard
  2019-03-04  6:44           ` Leon Romanovsky
@ 2019-03-06  1:02           ` Artemy Kovalyov
  2019-03-06  1:32             ` Jason Gunthorpe
  1 sibling, 1 reply; 21+ messages in thread
From: Artemy Kovalyov @ 2019-03-06  1:02 UTC (permalink / raw)
  To: John Hubbard, Ira Weiny, john.hubbard
  Cc: linux-mm, Andrew Morton, LKML, Jason Gunthorpe, Doug Ledford, linux-rdma



On 04/03/2019 00:37, John Hubbard wrote:
> On 3/3/19 1:52 AM, Artemy Kovalyov wrote:
>>
>>
>> On 02/03/2019 21:44, Ira Weiny wrote:
>>>
>>> On Sat, Mar 02, 2019 at 12:24:35PM -0800, john.hubbard@gmail.com wrote:
>>>> From: John Hubbard <jhubbard@nvidia.com>
>>>>
>>>> ...
> 
> OK, thanks for explaining! Artemy, while you're here, any thoughts about the
> release_pages, and the change of the starting point, from the other part of the
> patch:
> 
> @@ -684,9 +677,11 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp,
> u64 user_virt,
> 	mutex_unlock(&umem_odp->umem_mutex);
> 
>    		if (ret < 0) {
> -			/* Release left over pages when handling errors. */
> -			for (++j; j < npages; ++j)
release_pages() is an optimized batch put_page() so it's ok.
but! release starting from page next to one cause failure in 
ib_umem_odp_map_dma_single_page() is correct because failure flow of 
this functions already called put_page().
So release_pages(&local_page_list[j+1], npages - j-1) would be correct.
> -				put_page(local_page_list[j]);
> +			/*
> +			 * Release pages, starting at the the first page
> +			 * that experienced an error.
> +			 */
> +			release_pages(&local_page_list[j], npages - j);
>    			break;
>    		}
>    	}
> 
> ?
> 
> thanks,
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] RDMA/umem: minor bug fix and cleanup in error handling paths
  2019-03-06  1:02           ` Artemy Kovalyov
@ 2019-03-06  1:32             ` Jason Gunthorpe
  2019-03-06  1:34               ` John Hubbard
  0 siblings, 1 reply; 21+ messages in thread
From: Jason Gunthorpe @ 2019-03-06  1:32 UTC (permalink / raw)
  To: Artemy Kovalyov
  Cc: John Hubbard, Ira Weiny, john.hubbard, linux-mm, Andrew Morton,
	LKML, Doug Ledford, linux-rdma

On Wed, Mar 06, 2019 at 03:02:36AM +0200, Artemy Kovalyov wrote:
> 
> 
> On 04/03/2019 00:37, John Hubbard wrote:
> > On 3/3/19 1:52 AM, Artemy Kovalyov wrote:
> > > 
> > > 
> > > On 02/03/2019 21:44, Ira Weiny wrote:
> > > > 
> > > > On Sat, Mar 02, 2019 at 12:24:35PM -0800, john.hubbard@gmail.com wrote:
> > > > > From: John Hubbard <jhubbard@nvidia.com>
> > > > > 
> > > > > ...
> > 
> > OK, thanks for explaining! Artemy, while you're here, any thoughts about the
> > release_pages, and the change of the starting point, from the other part of the
> > patch:
> > 
> > @@ -684,9 +677,11 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp,
> > u64 user_virt,
> > 	mutex_unlock(&umem_odp->umem_mutex);
> > 
> >    		if (ret < 0) {
> > -			/* Release left over pages when handling errors. */
> > -			for (++j; j < npages; ++j)
> release_pages() is an optimized batch put_page() so it's ok.
> but! release starting from page next to one cause failure in
> ib_umem_odp_map_dma_single_page() is correct because failure flow of this
> functions already called put_page().
> So release_pages(&local_page_list[j+1], npages - j-1) would be correct.

Someone send a fixup patch please...

Jason

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] RDMA/umem: minor bug fix and cleanup in error handling paths
  2019-03-06  1:32             ` Jason Gunthorpe
@ 2019-03-06  1:34               ` John Hubbard
  2019-03-06  1:37                 ` John Hubbard
  0 siblings, 1 reply; 21+ messages in thread
From: John Hubbard @ 2019-03-06  1:34 UTC (permalink / raw)
  To: Jason Gunthorpe, Artemy Kovalyov
  Cc: Ira Weiny, john.hubbard, linux-mm, Andrew Morton, LKML,
	Doug Ledford, linux-rdma

On 3/5/19 5:32 PM, Jason Gunthorpe wrote:
> On Wed, Mar 06, 2019 at 03:02:36AM +0200, Artemy Kovalyov wrote:
>>
>>
>> On 04/03/2019 00:37, John Hubbard wrote:
>>> On 3/3/19 1:52 AM, Artemy Kovalyov wrote:
>>>>
>>>>
>>>> On 02/03/2019 21:44, Ira Weiny wrote:
>>>>>
>>>>> On Sat, Mar 02, 2019 at 12:24:35PM -0800, john.hubbard@gmail.com wrote:
>>>>>> From: John Hubbard <jhubbard@nvidia.com>
>>>>>>
>>>>>> ...
>>>
>>> OK, thanks for explaining! Artemy, while you're here, any thoughts about the
>>> release_pages, and the change of the starting point, from the other part of the
>>> patch:
>>>
>>> @@ -684,9 +677,11 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp,
>>> u64 user_virt,
>>> 	mutex_unlock(&umem_odp->umem_mutex);
>>>
>>>    		if (ret < 0) {
>>> -			/* Release left over pages when handling errors. */
>>> -			for (++j; j < npages; ++j)
>> release_pages() is an optimized batch put_page() so it's ok.
>> but! release starting from page next to one cause failure in
>> ib_umem_odp_map_dma_single_page() is correct because failure flow of this
>> functions already called put_page().
>> So release_pages(&local_page_list[j+1], npages - j-1) would be correct.
> 
> Someone send a fixup patch please...
> 
> Jason

Yeah, I'm on it. Just need to double-check that this is the case. But Jason,
you're confirming it already, so that helps too.

Patch coming shortly.

thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] RDMA/umem: minor bug fix and cleanup in error handling paths
  2019-03-06  1:34               ` John Hubbard
@ 2019-03-06  1:37                 ` John Hubbard
  2019-03-06  1:51                   ` Jason Gunthorpe
  0 siblings, 1 reply; 21+ messages in thread
From: John Hubbard @ 2019-03-06  1:37 UTC (permalink / raw)
  To: Jason Gunthorpe, Artemy Kovalyov
  Cc: Ira Weiny, john.hubbard, linux-mm, Andrew Morton, LKML,
	Doug Ledford, linux-rdma

On 3/5/19 5:34 PM, John Hubbard wrote:
[snip]
>>> So release_pages(&local_page_list[j+1], npages - j-1) would be correct.
>>
>> Someone send a fixup patch please...
>>
>> Jason
> 
> Yeah, I'm on it. Just need to double-check that this is the case. But Jason,
> you're confirming it already, so that helps too.
> 
> Patch coming shortly.
> 

Jason, btw, do you prefer a patch that fixes the previous one, or a new 
patch that stands alone? (I'm not sure how this tree is maintained, exactly.)

thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] RDMA/umem: minor bug fix and cleanup in error handling paths
  2019-03-06  1:37                 ` John Hubbard
@ 2019-03-06  1:51                   ` Jason Gunthorpe
  2019-03-06  2:04                     ` John Hubbard
  0 siblings, 1 reply; 21+ messages in thread
From: Jason Gunthorpe @ 2019-03-06  1:51 UTC (permalink / raw)
  To: John Hubbard
  Cc: Artemy Kovalyov, Ira Weiny, john.hubbard, linux-mm,
	Andrew Morton, LKML, Doug Ledford, linux-rdma

On Tue, Mar 05, 2019 at 05:37:18PM -0800, John Hubbard wrote:
> On 3/5/19 5:34 PM, John Hubbard wrote:
> [snip]
> >>> So release_pages(&local_page_list[j+1], npages - j-1) would be correct.
> >>
> >> Someone send a fixup patch please...
> >>
> >> Jason
> > 
> > Yeah, I'm on it. Just need to double-check that this is the case. But Jason,
> > you're confirming it already, so that helps too.

I didn't look, just assuming Artemy is right since he knows this
code..

> > Patch coming shortly.
> > 
> 
> Jason, btw, do you prefer a patch that fixes the previous one, or a new 
> patch that stands alone? (I'm not sure how this tree is maintained, exactly.)

It has past the point when I should apply a fixup, rdma is general has
about an approximately 1 day period after the 'thanks applied' message
where rebase is possible

Otherwise the tree is strictly no rebase

Jason

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] RDMA/umem: minor bug fix and cleanup in error handling paths
  2019-03-06  1:51                   ` Jason Gunthorpe
@ 2019-03-06  2:04                     ` John Hubbard
  0 siblings, 0 replies; 21+ messages in thread
From: John Hubbard @ 2019-03-06  2:04 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Artemy Kovalyov, Ira Weiny, john.hubbard, linux-mm,
	Andrew Morton, LKML, Doug Ledford, linux-rdma

On 3/5/19 5:51 PM, Jason Gunthorpe wrote:
> On Tue, Mar 05, 2019 at 05:37:18PM -0800, John Hubbard wrote:
>> On 3/5/19 5:34 PM, John Hubbard wrote:
>> [snip]
>>>>> So release_pages(&local_page_list[j+1], npages - j-1) would be correct.
>>>>
>>>> Someone send a fixup patch please...
>>>>
>>>> Jason
>>>
>>> Yeah, I'm on it. Just need to double-check that this is the case. But Jason,
>>> you're confirming it already, so that helps too.
> 
> I didn't look, just assuming Artemy is right since he knows this
> code..
> 

OK. And I've confirmed it, too.

>>> Patch coming shortly.
>>>
>>
>> Jason, btw, do you prefer a patch that fixes the previous one, or a new 
>> patch that stands alone? (I'm not sure how this tree is maintained, exactly.)
> 
> It has past the point when I should apply a fixup, rdma is general has
> about an approximately 1 day period after the 'thanks applied' message
> where rebase is possible
> 
> Otherwise the tree is strictly no rebase
> 
> Jason
> 

Got it, patch is posted now.

thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2019-03-06  2:04 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-02  3:27 [PATCH 0/1] RDMA/umem: minor bug fix and cleanup in error handling paths john.hubbard
2019-03-02  3:27 ` [PATCH] " john.hubbard
2019-03-02 16:03   ` kbuild test robot
2019-03-02 16:14   ` kbuild test robot
2019-03-02 20:24   ` [PATCH v2] " john.hubbard
2019-03-02 19:44     ` Ira Weiny
2019-03-03  9:52       ` Artemy Kovalyov
2019-03-03 16:55         ` Ira Weiny
2019-03-04 23:11           ` John Hubbard
2019-03-04 20:13             ` Ira Weiny
2019-03-05 20:10               ` John Hubbard
2019-03-04 23:36             ` John Hubbard
2019-03-05  0:53             ` Jason Gunthorpe
2019-03-03 22:37         ` John Hubbard
2019-03-04  6:44           ` Leon Romanovsky
2019-03-06  1:02           ` Artemy Kovalyov
2019-03-06  1:32             ` Jason Gunthorpe
2019-03-06  1:34               ` John Hubbard
2019-03-06  1:37                 ` John Hubbard
2019-03-06  1:51                   ` Jason Gunthorpe
2019-03-06  2:04                     ` John Hubbard

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).