From: Hugh Dickins <hughd@google.com>
To: Roland Dreier <roland@kernel.org>
Cc: linux-rdma@vger.kernel.org,
Andrea Arcangeli <aarcange@redhat.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH/RFC G-U-P experts] IB/umem: Modernize our get_user_pages() parameters
Date: Thu, 9 Feb 2012 14:57:02 -0800 (PST) [thread overview]
Message-ID: <alpine.LSU.2.00.1202091425280.1263@eggly.anvils> (raw)
In-Reply-To: <CAL1RGDWZ2LYO7ejPs9FvDzqze43cbfUEEdQVB=Ug2n3JpEe=AQ@mail.gmail.com>
On Thu, 9 Feb 2012, Roland Dreier wrote:
> On Wed, Feb 8, 2012 at 3:10 PM, Hugh Dickins <hughd@google.com> wrote:
> > A doubt assaulted me overnight: sorry, I'm back to not understanding.
> >
> > What are these access flags passed into ibv_reg_mr() that are enforced?
> > What relation do they bear to what you will pass to __get_user_pages()?
>
> The access flags are:
>
> enum ibv_access_flags {
> IBV_ACCESS_LOCAL_WRITE = 1,
> IBV_ACCESS_REMOTE_WRITE = (1<<1),
> IBV_ACCESS_REMOTE_READ = (1<<2),
> IBV_ACCESS_REMOTE_ATOMIC = (1<<3),
> IBV_ACCESS_MW_BIND = (1<<4)
> };
>
> pretty much the only one of interest is IBV_ACCESS_REMOTE_READ --
> all the others imply the possibility of RDMA HW writing to the page.
>
> So basically if any flags other than IBV_ACCESS_REMOTE_READ are
> set, we pass FOLL_WRITE to __get_user_pages(), otherwise we pass
> the new FOLL_FOLLOW. [does "Marcia, Marcia, Marcia" mean anything
> to a Brit? ;)]
[ Nothing whatsoever - I needed to avoid saying "Zilch" there, didn't I?
- I had to look her up. Not sure quite how she comes in here, if you're
implying that someone is perfect, I rather doubt you're thinking of me!
I was thrilled a year ago at last to discover who Virginia is,
celebrated in mm/memory.c and mm/page-writeback.c. ]
>
> ie the change from the status quo would be:
>
> [read-only] write=1, force=1 --> FOLL_FOLLOW
> [writeable] wrote=1, force=0 --> FOLL_WRITE (equivalent)
>
> > You are asking for a FOLL_FOLLOW ("follow permissions of the vma") flag,
> > which automatically works for read-write access to a VM_READ|VM_WRITE vma,
> > but read-only access to a VM_READ-only vma, without you having to know
> > which permission applies to which range of memory in the area specified.
>
> > But you don't need that new flag to set up read-only access, and if you
> > use that new flag to set up read-write access to an area which happens to
> > contain VM_READ-only ranges, you have set it up to write into ZERO_PAGEs.
>
> First of all, I kind of like FOLL_FOLLOW as the name :)
Yeah, it's not too bad; though below I'm now wondering if it is appropriate.
>
> Now you're confusing me:
I'm very glad to hear it, I feel less alone.
> I think we do need FOLL_FOLLOW to
> set up read-only access -- we want to trigger the COWs that userspace
> might trigger by touching the memory up front. This is to handle
> a case like
>
> [userspace]
> int *buf = malloc(16 * 4096);
> // buf now points to 16 anonymous zero_pages
> mr = ibv_reg_mr(pd, buf, 16 * 4096, IBV_ACCESS_REMOTE_READ);
> // RDMA HW will only ever read buf, but...
> buf[0] = 2012;
> // COW triggered, first page of buf changed, RDMA HW has wrong mapping!
>
> For something the RDMA HW might write to, then I agree we don't want
> FOLL_FOLLOW -- we just would use FOLL_WRITE as we currently do.
Ah, okay, something earlier in the thread had thrown me off that track,
I thought we were expecting the ibv_reg_mr to give the remote the same
permissions as the user had. Or something, maybe I'm just making excuses
for being dense.
But then I wonder if FOLL_FOLLOW is actually the behaviour you need.
Imagine a PROT_READ MAP_PRIVATE area (just as in your original mail):
what if the user does mprotect PROT_READ|PROT_WRITE on that afterwards,
and then proceeds to touch it. The old write=1 force=1 GUP would have
pre-COWed that and no problem, but FOLL_FOLLOW will not.
Maybe you can answer "don't do that"; but you do then appear to be
trading one kind of "don't do that" for another. Maybe it depends on
what libraries might get up to: aren't there (debug? garbage collection?)
memalloc libraries which give out memory protected until you touch it?
Maybe you need FOLL_PRECOW, which does write=1 force=1 on the private
areas, but just faults in the shared areas (avoiding the bizarre forced
COW on shared areas).
>
> When I get around to coding this up, I think I'm going to spend a lot
> of time on the comments and on the commit log :)
I am sorry to be driving you to such effort, honestly.
Hugh
prev parent reply other threads:[~2012-02-09 22:57 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-01-26 5:59 [PATCH/RFC G-U-P experts] IB/umem: Modernize our get_user_pages() parameters Roland Dreier
2012-01-26 20:01 ` Hugh Dickins
2012-01-26 22:45 ` Roland Dreier
2012-01-27 17:28 ` Roland Dreier
2012-01-28 2:31 ` Hugh Dickins
2012-01-28 19:25 ` Jason Gunthorpe
2012-01-30 19:19 ` Roland Dreier
2012-01-28 2:19 ` Hugh Dickins
2012-01-30 19:16 ` Roland Dreier
2012-01-30 20:20 ` Andrea Arcangeli
2012-02-06 17:46 ` Roland Dreier
2012-01-30 20:34 ` Hugh Dickins
2012-02-06 17:39 ` Roland Dreier
2012-02-07 20:39 ` Hugh Dickins
2012-02-08 23:10 ` Hugh Dickins
2012-02-09 17:50 ` Roland Dreier
2012-02-09 22:57 ` Hugh Dickins [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LSU.2.00.1202091425280.1263@eggly.anvils \
--to=hughd@google.com \
--cc=aarcange@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-rdma@vger.kernel.org \
--cc=roland@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).