Linux-RDMA Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v2 0/2] IB/umem: use get_user_pages_fast() to pin DMA pages
@ 2019-12-04 21:36 John Hubbard
  2019-12-04 21:36 ` [PATCH v2 1/2] mm/gup: allow FOLL_FORCE for get_user_pages_fast() John Hubbard
  2019-12-04 21:36 ` [PATCH v2 2/2] IB/umem: use get_user_pages_fast() to pin DMA pages John Hubbard
  0 siblings, 2 replies; 6+ messages in thread
From: John Hubbard @ 2019-12-04 21:36 UTC (permalink / raw)
  To: Andrew Morton, Jason Gunthorpe, Leon Romanovsky, Christoph Hellwig
  Cc: Ira Weiny, linux-rdma, linux-mm, LKML, John Hubbard

Hi,

The reason I'm posting this is to request a review for patch 1. Once
everything seems OK, these two patches are going to be part of a larger
set (again, destined for linux-mm) that introduces FOLL_PIN and renames
some get_user_pages() cases to pin_user_pages(). I'm doing this a little
before -rc1, because it's a small and easy thing to get out of the way
early.

These are ultimately destined to go in via linux-mm (as opposed to
linux-rdma), in order to change some names in just one kernel release
cycle.

The first version of this patchset only had the IB/umem changes
(patch 2), and I also lacked a runtime Infiniband test, so
Leon Romanovsky reported a failure [1]. Patch 1 fixes that failure.

Since v1, I have (finally!) set up a basic two-node Infiniband system,
and as a result, I've reproduced the failure that Leon saw, via a
trivial run of "ib_write_bw", and confirmed that it's fixed in this
new patchset. Sorry it took me so long to do that; I am going to vaguely
blame "OFED" for the delay. :)

Entertaining IB side note: Jason: sure enough, as you mentioned, the
OFED installation did in fact hopelessly mangle my system. Once I went
back to a clean distro installation without all those OFED goodies,
everything Just Worked on the first try. Much simpler to understand,
too. ha.

This applies to today's linux.git: commit aedc0650f913 ("Merge tag
'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm").

[1] https://lore.kernel.org/r/20191124100724.GH136476@unreal

John Hubbard (2):
  mm/gup: allow FOLL_FORCE for get_user_pages_fast()
  IB/umem: use get_user_pages_fast() to pin DMA pages

 drivers/infiniband/core/umem.c | 17 ++++++-----------
 mm/gup.c                       |  3 ++-
 2 files changed, 8 insertions(+), 12 deletions(-)

-- 
2.24.0


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v2 1/2] mm/gup: allow FOLL_FORCE for get_user_pages_fast()
  2019-12-04 21:36 [PATCH v2 0/2] IB/umem: use get_user_pages_fast() to pin DMA pages John Hubbard
@ 2019-12-04 21:36 ` John Hubbard
  2019-12-09 18:25   ` Leon Romanovsky
  2019-12-04 21:36 ` [PATCH v2 2/2] IB/umem: use get_user_pages_fast() to pin DMA pages John Hubbard
  1 sibling, 1 reply; 6+ messages in thread
From: John Hubbard @ 2019-12-04 21:36 UTC (permalink / raw)
  To: Andrew Morton, Jason Gunthorpe, Leon Romanovsky, Christoph Hellwig
  Cc: Ira Weiny, linux-rdma, linux-mm, LKML, John Hubbard, Christoph Hellwig

Commit 817be129e6f2 ("mm: validate get_user_pages_fast flags") allowed
only FOLL_WRITE and FOLL_LONGTERM to be passed to get_user_pages_fast().
This, combined with the fact that get_user_pages_fast() falls back to
"slow gup", which *does* accept FOLL_FORCE, leads to an odd situation:
if you need FOLL_FORCE, you cannot call get_user_pages_fast().

There does not appear to be any reason for filtering out FOLL_FORCE.
There is nothing in the _fast() implementation that requires that we
avoid writing to the pages. So it appears to have been an oversight.

Fix by allowing FOLL_FORCE to be set for get_user_pages_fast().

Fixes: 817be129e6f2 ("mm: validate get_user_pages_fast flags")
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
---
 mm/gup.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/gup.c b/mm/gup.c
index 7646bf993b25..5244b8090440 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2415,7 +2415,8 @@ int get_user_pages_fast(unsigned long start, int nr_pages,
 	unsigned long addr, len, end;
 	int nr = 0, ret = 0;
 
-	if (WARN_ON_ONCE(gup_flags & ~(FOLL_WRITE | FOLL_LONGTERM)))
+	if (WARN_ON_ONCE(gup_flags & ~(FOLL_WRITE | FOLL_LONGTERM |
+				       FOLL_FORCE)))
 		return -EINVAL;
 
 	start = untagged_addr(start) & PAGE_MASK;
-- 
2.24.0


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v2 2/2] IB/umem: use get_user_pages_fast() to pin DMA pages
  2019-12-04 21:36 [PATCH v2 0/2] IB/umem: use get_user_pages_fast() to pin DMA pages John Hubbard
  2019-12-04 21:36 ` [PATCH v2 1/2] mm/gup: allow FOLL_FORCE for get_user_pages_fast() John Hubbard
@ 2019-12-04 21:36 ` John Hubbard
  2019-12-09 18:25   ` Leon Romanovsky
  1 sibling, 1 reply; 6+ messages in thread
From: John Hubbard @ 2019-12-04 21:36 UTC (permalink / raw)
  To: Andrew Morton, Jason Gunthorpe, Leon Romanovsky, Christoph Hellwig
  Cc: Ira Weiny, linux-rdma, linux-mm, LKML, John Hubbard,
	Christoph Hellwig, Jan Kara, Jason Gunthorpe

And get rid of the mmap_sem calls, as part of that. Note
that get_user_pages_fast() will, if necessary, fall back to
__gup_longterm_unlocked(), which takes the mmap_sem as needed.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
---
 drivers/infiniband/core/umem.c | 17 ++++++-----------
 1 file changed, 6 insertions(+), 11 deletions(-)

diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index 7a3b99597ead..214e87aa609d 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -266,16 +266,13 @@ struct ib_umem *ib_umem_get(struct ib_udata *udata, unsigned long addr,
 	sg = umem->sg_head.sgl;
 
 	while (npages) {
-		down_read(&mm->mmap_sem);
-		ret = get_user_pages(cur_base,
-				     min_t(unsigned long, npages,
-					   PAGE_SIZE / sizeof (struct page *)),
-				     gup_flags | FOLL_LONGTERM,
-				     page_list, NULL);
-		if (ret < 0) {
-			up_read(&mm->mmap_sem);
+		ret = get_user_pages_fast(cur_base,
+					  min_t(unsigned long, npages,
+						PAGE_SIZE /
+						sizeof(struct page *)),
+					  gup_flags | FOLL_LONGTERM, page_list);
+		if (ret < 0)
 			goto umem_release;
-		}
 
 		cur_base += ret * PAGE_SIZE;
 		npages   -= ret;
@@ -283,8 +280,6 @@ struct ib_umem *ib_umem_get(struct ib_udata *udata, unsigned long addr,
 		sg = ib_umem_add_sg_table(sg, page_list, ret,
 			dma_get_max_seg_size(context->device->dma_device),
 			&umem->sg_nents);
-
-		up_read(&mm->mmap_sem);
 	}
 
 	sg_mark_end(sg);
-- 
2.24.0


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 1/2] mm/gup: allow FOLL_FORCE for get_user_pages_fast()
  2019-12-04 21:36 ` [PATCH v2 1/2] mm/gup: allow FOLL_FORCE for get_user_pages_fast() John Hubbard
@ 2019-12-09 18:25   ` Leon Romanovsky
  2019-12-09 19:54     ` John Hubbard
  0 siblings, 1 reply; 6+ messages in thread
From: Leon Romanovsky @ 2019-12-09 18:25 UTC (permalink / raw)
  To: John Hubbard
  Cc: Andrew Morton, Jason Gunthorpe, Christoph Hellwig, Ira Weiny,
	linux-rdma, linux-mm, LKML, Christoph Hellwig

On Wed, Dec 04, 2019 at 01:36:02PM -0800, John Hubbard wrote:
> Commit 817be129e6f2 ("mm: validate get_user_pages_fast flags") allowed
> only FOLL_WRITE and FOLL_LONGTERM to be passed to get_user_pages_fast().
> This, combined with the fact that get_user_pages_fast() falls back to
> "slow gup", which *does* accept FOLL_FORCE, leads to an odd situation:
> if you need FOLL_FORCE, you cannot call get_user_pages_fast().
>
> There does not appear to be any reason for filtering out FOLL_FORCE.
> There is nothing in the _fast() implementation that requires that we
> avoid writing to the pages. So it appears to have been an oversight.
>
> Fix by allowing FOLL_FORCE to be set for get_user_pages_fast().
>
> Fixes: 817be129e6f2 ("mm: validate get_user_pages_fast flags")
> Cc: Christoph Hellwig <hch@lst.de>
> Signed-off-by: John Hubbard <jhubbard@nvidia.com>
> ---
>  mm/gup.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>

Thanks,
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 2/2] IB/umem: use get_user_pages_fast() to pin DMA pages
  2019-12-04 21:36 ` [PATCH v2 2/2] IB/umem: use get_user_pages_fast() to pin DMA pages John Hubbard
@ 2019-12-09 18:25   ` Leon Romanovsky
  0 siblings, 0 replies; 6+ messages in thread
From: Leon Romanovsky @ 2019-12-09 18:25 UTC (permalink / raw)
  To: John Hubbard
  Cc: Andrew Morton, Jason Gunthorpe, Christoph Hellwig, Ira Weiny,
	linux-rdma, linux-mm, LKML, Christoph Hellwig, Jan Kara,
	Jason Gunthorpe

On Wed, Dec 04, 2019 at 01:36:03PM -0800, John Hubbard wrote:
> And get rid of the mmap_sem calls, as part of that. Note
> that get_user_pages_fast() will, if necessary, fall back to
> __gup_longterm_unlocked(), which takes the mmap_sem as needed.
>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Jan Kara <jack@suse.cz>
> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
> Reviewed-by: Ira Weiny <ira.weiny@intel.com>
> Signed-off-by: John Hubbard <jhubbard@nvidia.com>
> ---
>  drivers/infiniband/core/umem.c | 17 ++++++-----------
>  1 file changed, 6 insertions(+), 11 deletions(-)
>

Thanks,
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 1/2] mm/gup: allow FOLL_FORCE for get_user_pages_fast()
  2019-12-09 18:25   ` Leon Romanovsky
@ 2019-12-09 19:54     ` John Hubbard
  0 siblings, 0 replies; 6+ messages in thread
From: John Hubbard @ 2019-12-09 19:54 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Andrew Morton, Jason Gunthorpe, Christoph Hellwig, Ira Weiny,
	linux-rdma, linux-mm, LKML, Christoph Hellwig

On 12/9/19 10:25 AM, Leon Romanovsky wrote:
> On Wed, Dec 04, 2019 at 01:36:02PM -0800, John Hubbard wrote:
>> Commit 817be129e6f2 ("mm: validate get_user_pages_fast flags") allowed
>> only FOLL_WRITE and FOLL_LONGTERM to be passed to get_user_pages_fast().
>> This, combined with the fact that get_user_pages_fast() falls back to
>> "slow gup", which *does* accept FOLL_FORCE, leads to an odd situation:
>> if you need FOLL_FORCE, you cannot call get_user_pages_fast().
>>
>> There does not appear to be any reason for filtering out FOLL_FORCE.
>> There is nothing in the _fast() implementation that requires that we
>> avoid writing to the pages. So it appears to have been an oversight.
>>
>> Fix by allowing FOLL_FORCE to be set for get_user_pages_fast().
>>
>> Fixes: 817be129e6f2 ("mm: validate get_user_pages_fast flags")
>> Cc: Christoph Hellwig <hch@lst.de>
>> Signed-off-by: John Hubbard <jhubbard@nvidia.com>
>> ---
>>  mm/gup.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
> 
> Thanks,
> Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
> 

Hi Leon, thanks for the reviews, great timing! I'll add the tags to
the commits, which I'm just about to post as part of the larger 
"pin user pages" patchset.


thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, back to index

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-04 21:36 [PATCH v2 0/2] IB/umem: use get_user_pages_fast() to pin DMA pages John Hubbard
2019-12-04 21:36 ` [PATCH v2 1/2] mm/gup: allow FOLL_FORCE for get_user_pages_fast() John Hubbard
2019-12-09 18:25   ` Leon Romanovsky
2019-12-09 19:54     ` John Hubbard
2019-12-04 21:36 ` [PATCH v2 2/2] IB/umem: use get_user_pages_fast() to pin DMA pages John Hubbard
2019-12-09 18:25   ` Leon Romanovsky

Linux-RDMA Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-rdma/0 linux-rdma/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-rdma linux-rdma/ https://lore.kernel.org/linux-rdma \
		linux-rdma@vger.kernel.org
	public-inbox-index linux-rdma

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-rdma


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git