From: Roland Dreier <roland@kernel.org>
To: linux-rdma@vger.kernel.org
Cc: Andrea Arcangeli <aarcange@redhat.com>,
Hugh Dickins <hughd@google.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: [PATCH/RFC G-U-P experts] IB/umem: Modernize our get_user_pages() parameters
Date: Wed, 25 Jan 2012 21:59:34 -0800 [thread overview]
Message-ID: <1327557574-6125-1-git-send-email-roland@kernel.org> (raw)
From: Roland Dreier <roland@purestorage.com>
Right now, we always pass write==1 to get_user_pages(), even when we
only intend to read the memory. We pass force==1 if we're going for
read-only and force==0 if we want writable. The reasoning behind this
seems to be contained in this out-of-tree changelog from 2005:
Always ask get_user_pages() for writable pages, but pass force=1
if the consumer has only asked for read-only pages. This fixes a
problem registering memory that has just been allocated but not
touched yet, while allowing registration of read-only memory to
continue to work.
However, I don't think the mm works like this today, and indeed GUP
will fault in pages for an untouched read-only mapping just fine with
write and force set to 0. In fact, always passing 1 for write causes
problems with modern kernels, because we end up hitting the "early
C-O-W break" case in __do_fault(), even for read-only mappings where
this makes no sense.
Signed-off-by: Roland Dreier <roland@purestorage.com>
---
This patch comes from me trying to do userspace RDMA on a memory
region exported from a character driver and mapped with
mmap(... PROT_READ, MAP_PRIVATE ...)
The character driver has a trivial mmap method that just sets vm_ops
and and equally trivial fault method that essentially just does
vmf->page = vmalloc_to_page(buf + (vmf->pgoff << PAGE_SHIFT));
ie the most elementary way to export a vmalloc'ed buffer to userspace.
However, when I tried doing
ibv_reg_mr(... IBV_ACCESS_REMOTE_READ ...)
in userspace on that mmap region, I found that COW was happening and
so neither userspace nor the registered memory ended up pointing at
the kernel buffer anymore, exactly because of the COW in __do_fault()
I mention in the changelog above.
The patch below fixes my test case, and doesn't seem to break any of
the ibverbs examples and other simple tests of userspace verbs that I
tried. But that's far from an exhaustive test suite.
I'd definitely appreciate comments from MM experts here, since I'm not
positive of my understand of G-U-P and friends, and I don't want to
apps because this is wrong in some special case I didn't try.
Also testing from anyone with an RDMA app that does anything at all
fancy with memory allocation or registration would be helpful.
Thanks!
PS Let me know if I didn't go on long enough about this one-line patch
and I can write some more.
drivers/infiniband/core/umem.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index 71f0c0f..fb5abd3 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -152,7 +152,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
ret = get_user_pages(current, current->mm, cur_base,
min_t(unsigned long, npages,
PAGE_SIZE / sizeof (struct page *)),
- 1, !umem->writable, page_list, vma_list);
+ umem->writable, 0, page_list, vma_list);
if (ret < 0)
goto out;
--
1.7.8.3
next reply other threads:[~2012-01-26 5:59 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-01-26 5:59 Roland Dreier [this message]
2012-01-26 20:01 ` [PATCH/RFC G-U-P experts] IB/umem: Modernize our get_user_pages() parameters Hugh Dickins
2012-01-26 22:45 ` Roland Dreier
2012-01-27 17:28 ` Roland Dreier
2012-01-28 2:31 ` Hugh Dickins
2012-01-28 19:25 ` Jason Gunthorpe
2012-01-30 19:19 ` Roland Dreier
2012-01-28 2:19 ` Hugh Dickins
2012-01-30 19:16 ` Roland Dreier
2012-01-30 20:20 ` Andrea Arcangeli
2012-02-06 17:46 ` Roland Dreier
2012-01-30 20:34 ` Hugh Dickins
2012-02-06 17:39 ` Roland Dreier
2012-02-07 20:39 ` Hugh Dickins
2012-02-08 23:10 ` Hugh Dickins
2012-02-09 17:50 ` Roland Dreier
2012-02-09 22:57 ` Hugh Dickins
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1327557574-6125-1-git-send-email-roland@kernel.org \
--to=roland@kernel.org \
--cc=aarcange@redhat.com \
--cc=hughd@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-rdma@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).