linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH/RFC G-U-P experts] IB/umem: Modernize our get_user_pages() parameters
@ 2012-01-26  5:59 Roland Dreier
  2012-01-26 20:01 ` Hugh Dickins
  0 siblings, 1 reply; 17+ messages in thread
From: Roland Dreier @ 2012-01-26  5:59 UTC (permalink / raw)
  To: linux-rdma; +Cc: Andrea Arcangeli, Hugh Dickins, linux-mm, linux-kernel

From: Roland Dreier <roland@purestorage.com>

Right now, we always pass write==1 to get_user_pages(), even when we
only intend to read the memory.  We pass force==1 if we're going for
read-only and force==0 if we want writable.  The reasoning behind this
seems to be contained in this out-of-tree changelog from 2005:

    Always ask get_user_pages() for writable pages, but pass force=1
    if the consumer has only asked for read-only pages.  This fixes a
    problem registering memory that has just been allocated but not
    touched yet, while allowing registration of read-only memory to
    continue to work.

However, I don't think the mm works like this today, and indeed GUP
will fault in pages for an untouched read-only mapping just fine with
write and force set to 0.  In fact, always passing 1 for write causes
problems with modern kernels, because we end up hitting the "early
C-O-W break" case in __do_fault(), even for read-only mappings where
this makes no sense.

Signed-off-by: Roland Dreier <roland@purestorage.com>
---
This patch comes from me trying to do userspace RDMA on a memory
region exported from a character driver and mapped with

    mmap(... PROT_READ, MAP_PRIVATE ...)

The character driver has a trivial mmap method that just sets vm_ops
and and equally trivial fault method that essentially just does

    vmf->page = vmalloc_to_page(buf + (vmf->pgoff << PAGE_SHIFT));

ie the most elementary way to export a vmalloc'ed buffer to userspace.

However, when I tried doing

    ibv_reg_mr(... IBV_ACCESS_REMOTE_READ ...)

in userspace on that mmap region, I found that COW was happening and
so neither userspace nor the registered memory ended up pointing at
the kernel buffer anymore, exactly because of the COW in __do_fault()
I mention in the changelog above.

The patch below fixes my test case, and doesn't seem to break any of
the ibverbs examples and other simple tests of userspace verbs that I
tried.  But that's far from an exhaustive test suite.

I'd definitely appreciate comments from MM experts here, since I'm not
positive of my understand of G-U-P and friends, and I don't want to
apps because this is wrong in some special case I didn't try.

Also testing from anyone with an RDMA app that does anything at all
fancy with memory allocation or registration would be helpful.

Thanks!

PS Let me know if I didn't go on long enough about this one-line patch
and I can write some more.

 drivers/infiniband/core/umem.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index 71f0c0f..fb5abd3 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -152,7 +152,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
 		ret = get_user_pages(current, current->mm, cur_base,
 				     min_t(unsigned long, npages,
 					   PAGE_SIZE / sizeof (struct page *)),
-				     1, !umem->writable, page_list, vma_list);
+				     umem->writable, 0, page_list, vma_list);
 
 		if (ret < 0)
 			goto out;
-- 
1.7.8.3


^ permalink raw reply related	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2012-02-09 22:57 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-01-26  5:59 [PATCH/RFC G-U-P experts] IB/umem: Modernize our get_user_pages() parameters Roland Dreier
2012-01-26 20:01 ` Hugh Dickins
2012-01-26 22:45   ` Roland Dreier
2012-01-27 17:28     ` Roland Dreier
2012-01-28  2:31       ` Hugh Dickins
2012-01-28 19:25         ` Jason Gunthorpe
2012-01-30 19:19           ` Roland Dreier
2012-01-28  2:19     ` Hugh Dickins
2012-01-30 19:16       ` Roland Dreier
2012-01-30 20:20         ` Andrea Arcangeli
2012-02-06 17:46           ` Roland Dreier
2012-01-30 20:34         ` Hugh Dickins
2012-02-06 17:39           ` Roland Dreier
2012-02-07 20:39             ` Hugh Dickins
2012-02-08 23:10               ` Hugh Dickins
2012-02-09 17:50                 ` Roland Dreier
2012-02-09 22:57                   ` Hugh Dickins

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).