linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Hugh Dickins <hughd@google.com>
To: Roland Dreier <roland@kernel.org>
Cc: linux-rdma@vger.kernel.org,
	Andrea Arcangeli <aarcange@redhat.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH/RFC G-U-P experts] IB/umem: Modernize our get_user_pages() parameters
Date: Mon, 30 Jan 2012 12:34:30 -0800 (PST)	[thread overview]
Message-ID: <alpine.LSU.2.00.1201301217530.4505@eggly.anvils> (raw)
In-Reply-To: <CAL1RGDXqguZ2QKV=yjLXtk2n_Ag4Nf3CW+kF2BFQFR4ySTNaRA@mail.gmail.com>

[-- Attachment #1: Type: TEXT/PLAIN, Size: 4318 bytes --]

On Mon, 30 Jan 2012, Roland Dreier wrote:
> On Fri, Jan 27, 2012 at 6:19 PM, Hugh Dickins <hughd@google.com> wrote:
> >> > > This patch comes from me trying to do userspace RDMA on a memory
> >> > > region exported from a character driver and mapped with
> >> > >
> >> > >     mmap(... PROT_READ, MAP_PRIVATE ...)
> >
> > Why MAP_PRIVATE?  There you are explicitly asking for COW: okay,
> > you wouldn't normally expect any COW while it's just PROT_READ, but
> > once you bring GUP into the picture, with use of write and force,
> > then you are just begging for COW with that MAP_PRIVATE.  Please
> > change it to MAP_SHARED - any reason why not?
> 
> I have no idea of the history there... probably could be changed with
> no problems.
> 
> However, get_user_pages has this comment:
> 
>  * @force:	whether to force write access even if user mapping is
>  *		readonly. This will result in the page being COWed even
>  *		in MAP_SHARED mappings. You do not want this.
> 
> but I don't see where in the code FOLL_FORCE does COW
> for MAP_SHARED mappings.

It ends up being implemented by the conditionals do_wp_page().

> But on the OTOH I don't see
> why we set force in the first place.  Why wouldn't we respect
> the user's mapping permissions.
> 
> > I feel you're trying to handle two very different cases (rdma into
> > user-supplied anonymous memory, and exporting driver memory to the
> > user) with the same set of args to get_user_pages().  In fact, I
> > don't even see why you need get_user_pages() at all when exporting
> > driver memory to the user.  Ah, perhaps you don't, but you do want
> > your standard access method (which already involves GUP) not to
> > mess up when applied to such a mapping - is that it?
> 
> Exactly.  Right now we have the libibverbs userspace API, which
> basically lets userspace create an abstract "memory region" (MR)
> that is then given to the RDMA hardware to do IO on.  Userspace does
> 
>     mr = ibv_reg_mr(..., buf, size, access_flags);
> 
> where access flags say whether we're going to let the hardware
> read and/or write the memory.
> 
> Ideally userspace should not have to know where the memory
> underlying its "buf" came from or what type of mapping it is.
> 
> Certainly there are still more unresolved issues around the case
> where userspace wants to map, say, part of a GPUs PCI memory
> (which won't have any underlying page structs at all), but I'm at
> least hoping we can come up with a way to handle both anonymous
> private maps (which will be COWed from the zero page when
> the memory is touched for writing) and shared mappings of kernel
> memory exported by a driver's mmap method.
> 
> 
> So I guess I'm left thinking that it seems at least plausible that
> what we want is a new FOLL_ flag for __get_user_pages() that triggers
> COW exactly on the pages that userspace might trigger COW on,
> and avoids COW otherwise -- ie do FOLL_WRITE exactly for the
> pages that have VM_WRITE in their mapping.

The hardest part about implementing that is deciding what snappy
name to give the FOLL_ flag.

And it's a comprehensible semantic which sounds good and plausible,
but I'm afraid we might be deceiving ourselves.

> 
> I don't think we want to do the "force" semantics or deal with the
> VM_MAYWRITE possiblity -- the access we give the hardware on
> behalf of userspace should just match the access that userspace
> actually has.  It seems that if we don't try to get pages for writing
> when VM_WRITE isn't set, we don't need force anymore.

I suspect you never needed or wanted the weird force behaviour on
shared maywrite, but that you did need the force COW behaviour on
private currently-unwritable maywrite.  You (or your forebears)
defined that interface to use the force flag, I'm guessing it was
for a reason; now you want to change it not to use the force flag,
and it sounds good, but I'm afraid you'll discover down the line
what the force flag was for.

Can you, for example, enforce the permissions set up by the user?
I mean, if they do the ibv_reg_mr() on a private readonly area,
so __get_user_pages with the FOLL_APPROPRIATELY flag will fault
in ZERO_PAGEs, can you enforce that RDMA will never spray data
into those pages?

Hugh

  parent reply	other threads:[~2012-01-30 20:34 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-01-26  5:59 [PATCH/RFC G-U-P experts] IB/umem: Modernize our get_user_pages() parameters Roland Dreier
2012-01-26 20:01 ` Hugh Dickins
2012-01-26 22:45   ` Roland Dreier
2012-01-27 17:28     ` Roland Dreier
2012-01-28  2:31       ` Hugh Dickins
2012-01-28 19:25         ` Jason Gunthorpe
2012-01-30 19:19           ` Roland Dreier
2012-01-28  2:19     ` Hugh Dickins
2012-01-30 19:16       ` Roland Dreier
2012-01-30 20:20         ` Andrea Arcangeli
2012-02-06 17:46           ` Roland Dreier
2012-01-30 20:34         ` Hugh Dickins [this message]
2012-02-06 17:39           ` Roland Dreier
2012-02-07 20:39             ` Hugh Dickins
2012-02-08 23:10               ` Hugh Dickins
2012-02-09 17:50                 ` Roland Dreier
2012-02-09 22:57                   ` Hugh Dickins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LSU.2.00.1201301217530.4505@eggly.anvils \
    --to=hughd@google.com \
    --cc=aarcange@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=roland@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).