From: Roland Dreier <roland@kernel.org>
To: Hugh Dickins <hughd@google.com>
Cc: linux-rdma@vger.kernel.org,
Andrea Arcangeli <aarcange@redhat.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH/RFC G-U-P experts] IB/umem: Modernize our get_user_pages() parameters
Date: Mon, 6 Feb 2012 09:39:42 -0800 [thread overview]
Message-ID: <CAL1RGDVSBb1DVsfvuz=ijRZX06crsqQfKoXWJ+6FO4xi3aYyTg@mail.gmail.com> (raw)
In-Reply-To: <alpine.LSU.2.00.1201301217530.4505@eggly.anvils>
Sorry for the slow reply, I got caught in other business...
On Mon, Jan 30, 2012 at 12:34 PM, Hugh Dickins <hughd@google.com> wrote:
> The hardest part about implementing that is deciding what snappy
> name to give the FOLL_ flag.
Yes... FOLL_SOFT_COW ? FOLL_READONLY_COW ?
(plus a good comment explaining it I guess)
>> I don't think we want to do the "force" semantics or deal with the
>> VM_MAYWRITE possiblity -- the access we give the hardware on
>> behalf of userspace should just match the access that userspace
>> actually has. It seems that if we don't try to get pages for writing
>> when VM_WRITE isn't set, we don't need force anymore.
>
> I suspect you never needed or wanted the weird force behaviour on
> shared maywrite, but that you did need the force COW behaviour on
> private currently-unwritable maywrite. You (or your forebears)
> defined that interface to use the force flag, I'm guessing it was
> for a reason; now you want to change it not to use the force flag,
> and it sounds good, but I'm afraid you'll discover down the line
> what the force flag was for.
Actually I think I understand why the original code passed !write
as the force parameter.
If the user is registering memory with read-only access, there are
two common cases. Possibly the underlying memory really has
a read-only mapping, but probably more often it is just an ordinary
buffer allocated in userspace with malloc() or the like.
In the second case, it's quite likely we have a read/write mapping
of anonymous pages. We'll expose it read-only for RDMA but the
userspace process will write data into the memory via ordinary CPU
access. However, if we do ibv_reg_mr() before initializing the memory
it's quite possible that the mapping actually points to the zero page,
waiting for a CPU write to trigger a COW.
So in the second case, doing GUP without the write flag will leave
the COW untriggered, and we'll end up mapping the zero page to
the hardware, and RDMA won't read the data that userspace actually
writes. So (without GUP extension as we're discussing in this thread)
we're forced to pass write==1 to GUP, even if we expect hardware
to only do reads.
But if we pass write==1, then GUP on the first case (mapping that
is genuinely read-only) will fail, unless we pass force==1 too. But
this should only succeed if we're going to only access the memory
read-only, so we should set force to !writable-access-by-rdma.
Which I think explains why the code is the way it is. But clearly
we could do better if we had a better way of telling GUP our real
intentions -- ie the FOLL_READONLY_COW flag.
> Can you, for example, enforce the permissions set up by the user?
> I mean, if they do the ibv_reg_mr() on a private readonly area,
> so __get_user_pages with the FOLL_APPROPRIATELY flag will fault
> in ZERO_PAGEs, can you enforce that RDMA will never spray data
> into those pages?
Yes, the access flags passed into ibv_reg_mr() are enforced by
the RDMA hardware, so if no write access is request, no write
access is possible.
And presumably if we do GUP with write==1, force==0 that will
fail on a read-only mapping?
- R.
next prev parent reply other threads:[~2012-02-06 17:40 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-01-26 5:59 [PATCH/RFC G-U-P experts] IB/umem: Modernize our get_user_pages() parameters Roland Dreier
2012-01-26 20:01 ` Hugh Dickins
2012-01-26 22:45 ` Roland Dreier
2012-01-27 17:28 ` Roland Dreier
2012-01-28 2:31 ` Hugh Dickins
2012-01-28 19:25 ` Jason Gunthorpe
2012-01-30 19:19 ` Roland Dreier
2012-01-28 2:19 ` Hugh Dickins
2012-01-30 19:16 ` Roland Dreier
2012-01-30 20:20 ` Andrea Arcangeli
2012-02-06 17:46 ` Roland Dreier
2012-01-30 20:34 ` Hugh Dickins
2012-02-06 17:39 ` Roland Dreier [this message]
2012-02-07 20:39 ` Hugh Dickins
2012-02-08 23:10 ` Hugh Dickins
2012-02-09 17:50 ` Roland Dreier
2012-02-09 22:57 ` Hugh Dickins
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAL1RGDVSBb1DVsfvuz=ijRZX06crsqQfKoXWJ+6FO4xi3aYyTg@mail.gmail.com' \
--to=roland@kernel.org \
--cc=aarcange@redhat.com \
--cc=hughd@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-rdma@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).