linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Mike Rapoport <rppt@kernel.org>
To: Andy Lutomirski <luto@kernel.org>
Cc: LKML <linux-kernel@vger.kernel.org>,
	Alexey Dobriyan <adobriyan@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Arnd Bergmann <arnd@arndb.de>, Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	James Bottomley <jejb@linux.ibm.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
	Linux API <linux-api@vger.kernel.org>,
	Linux-MM <linux-mm@kvack.org>, X86 ML <x86@kernel.org>,
	Mike Rapoport <rppt@linux.ibm.com>
Subject: Re: [PATCH RFC] mm: add MAP_EXCLUSIVE to create exclusive user mappings
Date: Thu, 31 Oct 2019 08:21:13 +0100	[thread overview]
Message-ID: <20191031072112.GA6990@rapoport-lnx> (raw)
In-Reply-To: <CALCETrXajrY+0SmzkL7t++ndYwRoYLLE9VPKwSGSyW8HZx-TeA@mail.gmail.com>

On Wed, Oct 30, 2019 at 02:28:21PM -0700, Andy Lutomirski wrote:
> On Wed, Oct 30, 2019 at 1:40 AM Mike Rapoport <rppt@kernel.org> wrote:
> >
> > On Tue, Oct 29, 2019 at 10:00:55AM -0700, Andy Lutomirski wrote:
> > > On Tue, Oct 29, 2019 at 2:33 AM Mike Rapoport <rppt@kernel.org> wrote:
> > > >
> > > > On Mon, Oct 28, 2019 at 02:44:23PM -0600, Andy Lutomirski wrote:
> > > > >
> > > > > > On Oct 27, 2019, at 4:17 AM, Mike Rapoport <rppt@kernel.org> wrote:
> > > > > >
> > > > > > From: Mike Rapoport <rppt@linux.ibm.com>
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > The patch below aims to allow applications to create mappins that have
> > > > > > pages visible only to the owning process. Such mappings could be used to
> > > > > > store secrets so that these secrets are not visible neither to other
> > > > > > processes nor to the kernel.
> > > > > >
> > > > > > I've only tested the basic functionality, the changes should be verified
> > > > > > against THP/migration/compaction. Yet, I'd appreciate early feedback.
> > > > >
> > > > > I’ve contemplated the concept a fair amount, and I think you should
> > > > > consider a change to the API. In particular, rather than having it be a
> > > > > MAP_ flag, make it a chardev.  You can, at least at first, allow only
> > > > > MAP_SHARED, and admins can decide who gets to use it.  It might also play
> > > > > better with the VM overall, and you won’t need a VM_ flag for it — you
> > > > > can just wire up .fault to do the right thing.
> > > >
> > > > I think mmap()/mprotect()/madvise() are the natural APIs for such
> > > > interface.
> > >
> > > Then you have a whole bunch of questions to answer.  For example:
> > >
> > > What happens if you mprotect() or similar when the mapping is already
> > > in use in a way that's incompatible with MAP_EXCLUSIVE?
> >
> > Then we refuse to mprotect()? Like in any other case when vm_flags are not
> > compatible with required madvise()/mprotect() operation.
> >
> 
> I'm not talking about flags.  I'm talking about the case where one
> thread (or RDMA or whatever) has get_user_pages()'d a mapping and
> another thread mprotect()s it MAP_EXCLUSIVE.
> 
> > > Is it actually reasonable to malloc() some memory and then make it exclusive?
> > >
> > > Are you permitted to map a file MAP_EXCLUSIVE?  What does it mean?
> >
> > I'd limit MAP_EXCLUSIVE only to anonymous memory.
> >
> > > What does MAP_PRIVATE | MAP_EXCLUSIVE do?
> >
> > My preference is to have only mmap() and then the semantics is more clear:
> >
> > MAP_PRIVATE | MAP_EXCLUSIVE creates a pre-populated region, marks it locked
> > and drops the pages in this region from the direct map.
> > The pages are returned back on munmap().
> > Then there is no way to change an existing area to be exclusive or vice
> > versa.
> 
> And what happens if you fork()?  Limiting it to MAP_SHARED |
> MAP_EXCLUSIVE would about this particular nasty question.
> 
> >
> > > How does one pass exclusive memory via SCM_RIGHTS?  (If it's a
> > > memfd-like or chardev interface, it's trivial.  mmap(), not so much.)
> >
> > Why passing such memory via SCM_RIGHTS would be useful?
> 
> Suppose I want to put a secret into exclusive memory and then send
> that secret to some other process.  The obvious approach would be to
> SCM_RIGHTS an fd over, but you can't do that with MAP_EXCLUSIVE as
> you've defined it.  In general, there are lots of use cases for memfd
> and other fd-backed memory.
> 
> >
> > > And finally, there's my personal giant pet peeve: a major use of this
> > > will be for virtualization.  I suspect that a lot of people would like
> > > the majority of KVM guest memory to be unmapped from the host
> > > pagetables.  But people might also like for guest memory to be
> > > unmapped in *QEMU's* pagetables, and mmap() is a basically worthless
> > > interface for this.  Getting fd-backed memory into a guest will take
> > > some possibly major work in the kernel, but getting vma-backed memory
> > > into a guest without mapping it in the host user address space seems
> > > much, much worse.
> >
> > Well, in my view, the MAP_EXCLUSIVE is intended to keep small secrets
> > rather than use it for the entire guest memory. I even considered adding a
> > limit for the mapping size, but then I decided that since RLIMIT_MEMLOCK is
> > anyway enforced there is no need for a new one.
> >
> > I agree that getting fd-backed memory into a guest would be less pain that
> > VMA, but KVM can already use memory outside the control of the kernel via
> > /dev/map [1].
> 
> That series doesn't address the problem I'm talking about at all.  I'm
> saying that there is a legitimate use case where QEMU should *not*
> have a mapping of the memory.  So QEMU would create some exclusive
> memory using /dev/exclusive_memory and would tell KVM to map it into
> the guest without mapping it into QEMU's address space at all.
> 
> (In fact, the way that SEV currently works is *functionally* like
> this, except that there's a bogus incoherent mapping in the QEMU
> process that is a giant can of worms.
> 
> 
> IMO a major benefit of a chardev approach is that you don't need a new
> VM_ flag and you don't need to worry about wiring it up everywhere in
> the core mm code.

Ok, at last I'm starting to see your and Christoph's point.

Just to reiterate, we can use fd-backed memory using /dev/exclusive_memory
chardev (or some other name we'll pick after long bikeshedding) and then
the .mmap method of this character device can do interesting things with
the backing physical memory. Since the memory is not VMA-mapped, we do not
have to find all the places in the core that might require a check of a VM_
flag to ensure there is no clashes with the exclusive memory.

Still, whatever we do with the mapping  properties of this memory, we need
a solution to the splitting of huge pages that map the direct map, but this
is an orthogonal problem in a way.

-- 
Sincerely yours,
Mike.


  reply	other threads:[~2019-10-31  7:21 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-27 10:17 [PATCH RFC] mm: add MAP_EXCLUSIVE to create exclusive user mappings Mike Rapoport
2019-10-27 10:17 ` Mike Rapoport
2019-10-28 12:31   ` Kirill A. Shutemov
2019-10-28 13:00     ` Mike Rapoport
2019-10-28 13:16       ` Kirill A. Shutemov
2019-10-28 13:55         ` Peter Zijlstra
2019-10-28 19:59           ` Edgecombe, Rick P
2019-10-28 21:00             ` Peter Zijlstra
2019-10-29 17:27               ` Edgecombe, Rick P
2019-10-30 10:04                 ` Peter Zijlstra
2019-10-30 15:35                   ` Alexei Starovoitov
2019-10-30 18:39                     ` Peter Zijlstra
2019-10-30 18:52                       ` Alexei Starovoitov
2019-10-30 17:48                   ` Edgecombe, Rick P
2019-10-30 17:58                     ` Dave Hansen
2019-10-30 18:01                       ` Dave Hansen
2019-10-29  5:43         ` Dan Williams
2019-10-29  6:43           ` Kirill A. Shutemov
2019-10-29  8:56             ` Peter Zijlstra
2019-10-29 11:00               ` Kirill A. Shutemov
2019-10-29 12:39                 ` AMD TLB errata, (Was: [PATCH RFC] mm: add MAP_EXCLUSIVE to create exclusive user mappings) Peter Zijlstra
2019-11-15 14:12                   ` Tom Lendacky
2019-11-15 14:31                     ` Peter Zijlstra
2019-10-29 19:43             ` [PATCH RFC] mm: add MAP_EXCLUSIVE to create exclusive user mappings Dan Williams
2019-10-29 20:07               ` Dave Hansen
2019-10-29  7:08         ` Christopher Lameter
2019-10-29  8:55           ` Mike Rapoport
2019-10-29 10:12             ` Christopher Lameter
2019-10-30  7:11               ` Mike Rapoport
2019-10-30 12:09                 ` Christopher Lameter
2019-10-28 14:55   ` David Hildenbrand
2019-10-28 17:12   ` Dave Hansen
2019-10-28 17:32     ` Sean Christopherson
2019-10-28 18:08     ` Matthew Wilcox
2019-10-29  9:28       ` Mike Rapoport
2019-10-29  9:19     ` Mike Rapoport
2019-10-28 18:02   ` Andy Lutomirski
2019-10-29 11:02   ` David Hildenbrand
2019-10-30  8:15     ` Mike Rapoport
2019-10-30  8:19       ` David Hildenbrand
2019-10-31 19:16         ` Mike Rapoport
2019-10-31 21:52           ` Dan Williams
2019-10-27 10:30 ` Florian Weimer
2019-10-27 11:00   ` Mike Rapoport
2019-10-28 20:23     ` Florian Weimer
2019-10-29  9:01       ` Mike Rapoport
2019-10-28 20:44 ` Andy Lutomirski
2019-10-29  9:32   ` Mike Rapoport
2019-10-29 17:00     ` Andy Lutomirski
2019-10-30  8:40       ` Mike Rapoport
2019-10-30 21:28         ` Andy Lutomirski
2019-10-31  7:21           ` Mike Rapoport [this message]
2019-12-05 15:34           ` Mike Rapoport
2019-12-08 14:10             ` [PATCH] mm: extend memfd with ability to create secret memory kbuild test robot
2019-10-29 11:25 ` [PATCH RFC] mm: add MAP_EXCLUSIVE to create exclusive user mappings Reshetova, Elena
2019-10-29 15:13   ` Tycho Andersen
2019-10-29 17:03   ` Andy Lutomirski
2019-10-29 17:37     ` Alan Cox
2019-10-29 17:43     ` James Bottomley
2019-10-29 18:10       ` Andy Lutomirski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191031072112.GA6990@rapoport-lnx \
    --to=rppt@kernel.org \
    --cc=adobriyan@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=jejb@linux.ibm.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=rppt@linux.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).