archive mirror
 help / color / mirror / Atom feed
From: Roland Dreier <>
To: Hugh Dickins <>
	Andrea Arcangeli <>,,
Subject: Re: [PATCH/RFC G-U-P experts] IB/umem: Modernize our get_user_pages() parameters
Date: Thu, 26 Jan 2012 14:45:58 -0800	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <alpine.LSU.2.00.1201261133230.1369@eggly.anvils>

Hi Hugh,

Thanks for the thoughtful answer...

> I think this is largely about the ZERO_PAGE.  If you do a read fault
> on an untouched anonymous area, it maps in the ZERO_PAGE, and will
> only give you your own private zeroed page when there's a write fault
> to touch it.
> I think your ib_umem_get() is making sure to give the process its own
> private zeroed page: if the area is PROT_READ, MAP_PRIVATE, userspace
> will not be wanting to write into it, but presumably it expects to see
> data placed in that page by the underlying driver, and it would be very
> bad news if the driver wrote its data into the ZERO_PAGE.

I think we are actually OK.  If umem->writable == 0, that is actually
a promise by the driver/HW that they are not going to write to this
memory.  Mapping ZERO_PAGE to the hardware is fine in this case,
since the hardware will just read zeroes exactly as it should.

One question is whether we're OK if userspace maps some
anonymous memory with PROT_WRITE but doesn't touch it,
and then tries to map it to the hardware read-only.  In that case
we hit get_user_pages() with write == 0.  If I understand the code
correctly, we end up mapping ZERO_PAGE in do_anonymous_page().

But then if userspace writes to this anonymous memory, a COW
will be triggered and the hardware will be left holding a different
page than the one that is mapped into userspace (ie the device
won't read what userspace writes).  Kind of the inverse of the
problem I hit.

I don't have a good understanding of what force == 1 means -- I
guess the question is what happens if userspace tells us to
write to a read-only mapping that userspace could have mapped

> And although the ZERO_PAGE gives the most vivid example, I think it goes
> beyond that to the wider issue of pages COWed after fork() - GUPping in
> this way fixes the easy cases (and I've no desire again to go down that
> rathole of fixing the most general cases).

For IB / RDMA we kind of explicitly give up on COW after fork().
But I guess I don't know what issue you're thinking of.  Is it
similar to what I described above?  In other words, we have
a readable mapping that we'll COW on a write fault, but the driver
is only following the mapping for reading and so a COW will
mess us up.

Sigh, what a mess ... it seems what we really want to do is know
if userspace might trigger a COW because or not, and only do a
preemptive COW in that case.  (We're not really concerned with
userspace fork()ing and setting up a COW in the future, since that's
what we have MADV_DONTFORK for)

The status quo works for userspace anonymous mappings but
it doesn't work for my case of mapping a kernel buffer read-only
into userspace.  And fixing my case breaks the anonymous case.
Do you see a way out of this dilemma?  Do we need to add yet
another flag to get_user_pages()?


  reply	other threads:[~2012-01-26 22:46 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-01-26  5:59 [PATCH/RFC G-U-P experts] IB/umem: Modernize our get_user_pages() parameters Roland Dreier
2012-01-26 20:01 ` Hugh Dickins
2012-01-26 22:45   ` Roland Dreier [this message]
2012-01-27 17:28     ` Roland Dreier
2012-01-28  2:31       ` Hugh Dickins
2012-01-28 19:25         ` Jason Gunthorpe
2012-01-30 19:19           ` Roland Dreier
2012-01-28  2:19     ` Hugh Dickins
2012-01-30 19:16       ` Roland Dreier
2012-01-30 20:20         ` Andrea Arcangeli
2012-02-06 17:46           ` Roland Dreier
2012-01-30 20:34         ` Hugh Dickins
2012-02-06 17:39           ` Roland Dreier
2012-02-07 20:39             ` Hugh Dickins
2012-02-08 23:10               ` Hugh Dickins
2012-02-09 17:50                 ` Roland Dreier
2012-02-09 22:57                   ` Hugh Dickins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).