linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mike Rapoport <rppt@kernel.org>
To: Andy Lutomirski <luto@kernel.org>
Cc: LKML <linux-kernel@vger.kernel.org>,
	Alexey Dobriyan <adobriyan@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Arnd Bergmann <arnd@arndb.de>, Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	James Bottomley <jejb@linux.ibm.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
	Linux API <linux-api@vger.kernel.org>,
	Linux-MM <linux-mm@kvack.org>, X86 ML <x86@kernel.org>,
	Mike Rapoport <rppt@linux.ibm.com>
Subject: Re: [PATCH RFC] mm: add MAP_EXCLUSIVE to create exclusive user mappings
Date: Wed, 30 Oct 2019 09:40:06 +0100	[thread overview]
Message-ID: <20191030084005.GC20624@rapoport-lnx> (raw)
In-Reply-To: <CALCETrUuuc4DS0cdMBtS550Wkp0x9ND3M3SgtaMgyRROnDR5Kg@mail.gmail.com>

On Tue, Oct 29, 2019 at 10:00:55AM -0700, Andy Lutomirski wrote:
> On Tue, Oct 29, 2019 at 2:33 AM Mike Rapoport <rppt@kernel.org> wrote:
> >
> > On Mon, Oct 28, 2019 at 02:44:23PM -0600, Andy Lutomirski wrote:
> > >
> > > > On Oct 27, 2019, at 4:17 AM, Mike Rapoport <rppt@kernel.org> wrote:
> > > >
> > > > From: Mike Rapoport <rppt@linux.ibm.com>
> > > >
> > > > Hi,
> > > >
> > > > The patch below aims to allow applications to create mappins that have
> > > > pages visible only to the owning process. Such mappings could be used to
> > > > store secrets so that these secrets are not visible neither to other
> > > > processes nor to the kernel.
> > > >
> > > > I've only tested the basic functionality, the changes should be verified
> > > > against THP/migration/compaction. Yet, I'd appreciate early feedback.
> > >
> > > I’ve contemplated the concept a fair amount, and I think you should
> > > consider a change to the API. In particular, rather than having it be a
> > > MAP_ flag, make it a chardev.  You can, at least at first, allow only
> > > MAP_SHARED, and admins can decide who gets to use it.  It might also play
> > > better with the VM overall, and you won’t need a VM_ flag for it — you
> > > can just wire up .fault to do the right thing.
> >
> > I think mmap()/mprotect()/madvise() are the natural APIs for such
> > interface.
> 
> Then you have a whole bunch of questions to answer.  For example:
> 
> What happens if you mprotect() or similar when the mapping is already
> in use in a way that's incompatible with MAP_EXCLUSIVE?

Then we refuse to mprotect()? Like in any other case when vm_flags are not
compatible with required madvise()/mprotect() operation.

> Is it actually reasonable to malloc() some memory and then make it exclusive?
> 
> Are you permitted to map a file MAP_EXCLUSIVE?  What does it mean?

I'd limit MAP_EXCLUSIVE only to anonymous memory.

> What does MAP_PRIVATE | MAP_EXCLUSIVE do?

My preference is to have only mmap() and then the semantics is more clear:

MAP_PRIVATE | MAP_EXCLUSIVE creates a pre-populated region, marks it locked
and drops the pages in this region from the direct map.
The pages are returned back on munmap(). 
Then there is no way to change an existing area to be exclusive or vice
versa.

> How does one pass exclusive memory via SCM_RIGHTS?  (If it's a
> memfd-like or chardev interface, it's trivial.  mmap(), not so much.)

Why passing such memory via SCM_RIGHTS would be useful?
 
> And finally, there's my personal giant pet peeve: a major use of this
> will be for virtualization.  I suspect that a lot of people would like
> the majority of KVM guest memory to be unmapped from the host
> pagetables.  But people might also like for guest memory to be
> unmapped in *QEMU's* pagetables, and mmap() is a basically worthless
> interface for this.  Getting fd-backed memory into a guest will take
> some possibly major work in the kernel, but getting vma-backed memory
> into a guest without mapping it in the host user address space seems
> much, much worse.

Well, in my view, the MAP_EXCLUSIVE is intended to keep small secrets
rather than use it for the entire guest memory. I even considered adding a
limit for the mapping size, but then I decided that since RLIMIT_MEMLOCK is
anyway enforced there is no need for a new one.

I agree that getting fd-backed memory into a guest would be less pain that
VMA, but KVM can already use memory outside the control of the kernel via
/dev/map [1].

So unless I'm missing something here, there is no need to use MAP_EXCLUSIVE
for the guest memory.

[1] https://lwn.net/Articles/778240/

> > Switching to a chardev doesn't solve the major problem of direct
> > map fragmentation and defeats the ability to use exclusive memory mappings
> > with the existing allocators, while mprotect() and madvise() do not.
> >
> 
> Will people really want to do malloc() and then remap it exclusive?
> This sounds dubiously useful at best.

Again, my preference is to have mmap() only, but I see a value in this use
case as well. Application developers allocate memory and then sometimes
change its properties rather than go mmap() something. For such usage
mprotect() may be usefull.


-- 
Sincerely yours,
Mike.

  reply	other threads:[~2019-10-30  8:40 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-27 10:17 [PATCH RFC] mm: add MAP_EXCLUSIVE to create exclusive user mappings Mike Rapoport
2019-10-27 10:17 ` Mike Rapoport
2019-10-28 12:31   ` Kirill A. Shutemov
2019-10-28 13:00     ` Mike Rapoport
2019-10-28 13:16       ` Kirill A. Shutemov
2019-10-28 13:55         ` Peter Zijlstra
2019-10-28 19:59           ` Edgecombe, Rick P
2019-10-28 21:00             ` Peter Zijlstra
2019-10-29 17:27               ` Edgecombe, Rick P
2019-10-30 10:04                 ` Peter Zijlstra
2019-10-30 15:35                   ` Alexei Starovoitov
2019-10-30 18:39                     ` Peter Zijlstra
2019-10-30 18:52                       ` Alexei Starovoitov
2019-10-30 17:48                   ` Edgecombe, Rick P
2019-10-30 17:58                     ` Dave Hansen
2019-10-30 18:01                       ` Dave Hansen
2019-10-29  5:43         ` Dan Williams
2019-10-29  6:43           ` Kirill A. Shutemov
2019-10-29  8:56             ` Peter Zijlstra
2019-10-29 11:00               ` Kirill A. Shutemov
2019-10-29 12:39                 ` AMD TLB errata, (Was: [PATCH RFC] mm: add MAP_EXCLUSIVE to create exclusive user mappings) Peter Zijlstra
2019-11-15 14:12                   ` Tom Lendacky
2019-11-15 14:31                     ` Peter Zijlstra
2019-10-29 19:43             ` [PATCH RFC] mm: add MAP_EXCLUSIVE to create exclusive user mappings Dan Williams
2019-10-29 20:07               ` Dave Hansen
2019-10-29  7:08         ` Christopher Lameter
2019-10-29  8:55           ` Mike Rapoport
2019-10-29 10:12             ` Christopher Lameter
2019-10-30  7:11               ` Mike Rapoport
2019-10-30 12:09                 ` Christopher Lameter
2019-10-28 14:55   ` David Hildenbrand
2019-10-28 17:12   ` Dave Hansen
2019-10-28 17:32     ` Sean Christopherson
2019-10-28 18:08     ` Matthew Wilcox
2019-10-29  9:28       ` Mike Rapoport
2019-10-29  9:19     ` Mike Rapoport
2019-10-28 18:02   ` Andy Lutomirski
2019-10-29 11:02   ` David Hildenbrand
2019-10-30  8:15     ` Mike Rapoport
2019-10-30  8:19       ` David Hildenbrand
2019-10-31 19:16         ` Mike Rapoport
2019-10-31 21:52           ` Dan Williams
2019-10-27 10:30 ` Florian Weimer
2019-10-27 11:00   ` Mike Rapoport
2019-10-28 20:23     ` Florian Weimer
2019-10-29  9:01       ` Mike Rapoport
2019-10-28 20:44 ` Andy Lutomirski
2019-10-29  9:32   ` Mike Rapoport
2019-10-29 17:00     ` Andy Lutomirski
2019-10-30  8:40       ` Mike Rapoport [this message]
2019-10-30 21:28         ` Andy Lutomirski
2019-10-31  7:21           ` Mike Rapoport
2019-12-05 15:34           ` Mike Rapoport
2019-12-08 14:10             ` [PATCH] mm: extend memfd with ability to create secret memory kbuild test robot
2019-10-29 11:25 ` [PATCH RFC] mm: add MAP_EXCLUSIVE to create exclusive user mappings Reshetova, Elena
2019-10-29 15:13   ` Tycho Andersen
2019-10-29 17:03   ` Andy Lutomirski
2019-10-29 17:37     ` Alan Cox
2019-10-29 17:43     ` James Bottomley
2019-10-29 18:10       ` Andy Lutomirski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191030084005.GC20624@rapoport-lnx \
    --to=rppt@kernel.org \
    --cc=adobriyan@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=jejb@linux.ibm.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=rppt@linux.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).