From: David Hildenbrand <david@redhat.com>
To: Mike Rapoport <rppt@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>,
Andy Lutomirski <luto@kernel.org>, Arnd Bergmann <arnd@arndb.de>,
Borislav Petkov <bp@alien8.de>,
Catalin Marinas <catalin.marinas@arm.com>,
Christopher Lameter <cl@linux.com>,
Dave Hansen <dave.hansen@linux.intel.com>,
Elena Reshetova <elena.reshetova@intel.com>,
"H. Peter Anvin" <hpa@zytor.com>,
Hagen Paul Pfeifer <hagen@jauu.net>,
Ingo Molnar <mingo@redhat.com>,
James Bottomley <jejb@linux.ibm.com>,
Kees Cook <keescook@chromium.org>,
"Kirill A. Shutemov" <kirill@shutemov.name>,
Matthew Wilcox <willy@infradead.org>,
Matthew Garrett <mjg59@srcf.ucam.org>,
Mark Rutland <mark.rutland@arm.com>,
Michal Hocko <mhocko@suse.com>,
Mike Rapoport <rppt@linux.ibm.com>,
Michael Kerrisk <mtk.manpages@gmail.com>,
Palmer Dabbelt <palmer@dabbelt.com>,
Palmer Dabbelt <palmerdabbelt@google.com>,
Paul Walmsley <paul.walmsley@sifive.com>,
Peter Zijlstra <peterz@infradead.org>,
"Rafael J. Wysocki" <rjw@rjwysocki.net>,
Rick Edgecombe <rick.p.edgecombe@intel.c om>,
Roman Gushchin <guro@fb.com>, Shakeel Butt <shakeelb@google.com>,
Shuah Khan <shuah@kernel.org>,
Thomas Gleixner <tglx@linutronix.de>,
Tycho Andersen <tycho@tycho.ws>, Will Deacon <will@kernel.org>,
Yury Norov <yury.norov@gmail.com>,
linux-api@vger.kernel.org, linux-arch@vger.kernel.org,
linux-arm-kernel@lists.infradead.org,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org,
linux-nvdimm@lists.01.org, linux-riscv@lists.infradead.org,
x86@kernel.org
Subject: Re: [PATCH v19 5/8] mm: introduce memfd_secret system call to create "secret" memory areas
Date: Fri, 14 May 2021 10:50:55 +0200 [thread overview]
Message-ID: <ea1ddcfa-f52d-9a7d-cb7b-8502b38a90da@redhat.com> (raw)
In-Reply-To: <20210513184734.29317-6-rppt@kernel.org>
On 13.05.21 20:47, Mike Rapoport wrote:
> From: Mike Rapoport <rppt@linux.ibm.com>
>
> Introduce "memfd_secret" system call with the ability to create
> memory areas visible only in the context of the owning process and
> not mapped not only to other processes but in the kernel page tables
> as well.
>
> The secretmem feature is off by default and the user must explicitly
> enable it at the boot time.
>
> Once secretmem is enabled, the user will be able to create a file
> descriptor using the memfd_secret() system call. The memory areas
> created by mmap() calls from this file descriptor will be unmapped
> from the kernel direct map and they will be only mapped in the page
> table of the processes that have access to the file descriptor.
>
> The file descriptor based memory has several advantages over the
> "traditional" mm interfaces, such as mlock(), mprotect(), madvise().
> File descriptor approach allows explict and controlled sharing of the
> memory
s/explict/explicit/
> areas, it allows to seal the operations. Besides, file descriptor
> based memory paves the way for VMMs to remove the secret memory range
> from the userpace hipervisor process, for instance QEMU. Andy
> Lutomirski says:
s/userpace hipervisor/userspace hypervisor/
>
> "Getting fd-backed memory into a guest will take some possibly major
> work in the kernel, but getting vma-backed memory into a guest
> without mapping it in the host user address space seems much, much
> worse."
>
> memfd_secret() is made a dedicated system call rather than an
> extention to
s/extention/extension/
> memfd_create() because it's purpose is to allow the user to create
> more secure memory mappings rather than to simply allow file based
> access to the memory. Nowadays a new system call cost is negligible
> while it is way simpler for userspace to deal with a clear-cut system
> calls than with a multiplexer or an overloaded syscall. Moreover, the
> initial implementation of memfd_secret() is completely distinct from
> memfd_create() so there is no much sense in overloading
> memfd_create() to begin with. If there will be a need for code
> sharing between these implementation it can be easily achieved
> without a need to adjust user visible APIs.
>
> The secret memory remains accessible in the process context using
> uaccess primitives, but it is not exposed to the kernel otherwise;
> secret memory areas are removed from the direct map and functions in
> the follow_page()/get_user_page() family will refuse to return a page
> that belongs to the secret memory area.
>
> Once there will be a use case that will require exposing secretmem to
> the kernel it will be an opt-in request in the system call flags so
> that user would have to decide what data can be exposed to the
> kernel.
Maybe spell out an example: like page migration.
>
> Removing of the pages from the direct map may cause its fragmentation
> on architectures that use large pages to map the physical memory
> which affects the system performance. However, the original Kconfig
> text for CONFIG_DIRECT_GBPAGES said that gigabyte pages in the direct
> map "... can improve the kernel's performance a tiny bit ..." (commit
> 00d1c5e05736 ("x86: add gbpages switches")) and the recent report [1]
> showed that "... although 1G mappings are a good default choice,
> there is no compelling evidence that it must be the only choice".
> Hence, it is sufficient to have secretmem disabled by default with
> the ability of a system administrator to enable it at boot time.
Maybe add a link to the Intel performance evaluation.
>
> Pages in the secretmem regions are unevictable and unmovable to
> avoid accidental exposure of the sensitive data via swap or during
> page migration.
>
> Since the secretmem mappings are locked in memory they cannot exceed
> RLIMIT_MEMLOCK. Since these mappings are already locked independently
> from mlock(), an attempt to mlock()/munlock() secretmem range would
> fail and mlockall()/munlockall() will ignore secretmem mappings.
Maybe add something like "similar to pages pinned by VFIO".
>
> However, unlike mlock()ed memory, secretmem currently behaves more
> like long-term GUP: secretmem mappings are unmovable mappings
> directly consumed by user space. With default limits, there is no
> excessive use of secretmem and it poses no real problem in
> combination with ZONE_MOVABLE/CMA, but in the future this should be
> addressed to allow balanced use of large amounts of secretmem along
> with ZONE_MOVABLE/CMA.
>
> A page that was a part of the secret memory area is cleared when it
> is freed to ensure the data is not exposed to the next user of that
> page.
You could skip that with init_on_free (and eventually also with
init_on_alloc) set to avoid double clearing.
>
> The following example demonstrates creation of a secret mapping
> (error handling is omitted):
>
> fd = memfd_secret(0); ftruncate(fd, MAP_SIZE); ptr = mmap(NULL,
> MAP_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
>
> [1]
> https://lore.kernel.org/linux-mm/213b4567-46ce-f116-9cdf-bbd0c884eb3c@linux.intel.com/
[my mail client messed up the remainder of the mail for whatever reason,
will comment in a separate mail if there is anything to comment :) ]
--
Thanks,
David / dhildenb
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org
next prev parent reply other threads:[~2021-05-14 8:51 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-05-13 18:47 [PATCH v19 0/8] mm: introduce memfd_secret system call to create "secret" memory areas Mike Rapoport
2021-05-13 18:47 ` [PATCH v19 1/8] mmap: make mlock_future_check() global Mike Rapoport
2021-05-14 8:27 ` David Hildenbrand
2021-05-13 18:47 ` [PATCH v19 2/8] riscv/Kconfig: make direct map manipulation options depend on MMU Mike Rapoport
2021-05-14 8:28 ` David Hildenbrand
2021-05-13 18:47 ` [PATCH v19 3/8] set_memory: allow set_direct_map_*_noflush() for multiple pages Mike Rapoport
2021-05-14 8:43 ` David Hildenbrand
2021-05-16 7:13 ` Mike Rapoport
2021-05-13 18:47 ` [PATCH v19 4/8] set_memory: allow querying whether set_direct_map_*() is actually enabled Mike Rapoport
2021-05-13 18:47 ` [PATCH v19 5/8] mm: introduce memfd_secret system call to create "secret" memory areas Mike Rapoport
2021-05-14 8:50 ` David Hildenbrand [this message]
2021-05-17 7:23 ` Mike Rapoport
2021-05-14 9:25 ` David Hildenbrand
2021-05-16 7:29 ` Mike Rapoport
2021-05-18 9:59 ` Michal Hocko
2021-05-18 10:06 ` David Hildenbrand
2021-05-18 10:31 ` Michal Hocko
2021-05-18 10:35 ` David Hildenbrand
2021-05-18 11:08 ` Michal Hocko
2021-05-19 7:13 ` Mike Rapoport
2021-05-13 18:47 ` [PATCH v19 6/8] PM: hibernate: disable when there are active secretmem users Mike Rapoport
2021-05-14 9:27 ` David Hildenbrand
2021-05-18 10:24 ` Mark Rutland
2021-05-18 10:27 ` David Hildenbrand
2021-05-19 1:32 ` James Bottomley
2021-05-19 1:49 ` Dan Williams
2021-05-19 3:50 ` James Bottomley
2021-05-13 18:47 ` [PATCH v19 7/8] arch, mm: wire up memfd_secret system call where relevant Mike Rapoport
2021-05-14 9:27 ` David Hildenbrand
2021-05-13 18:47 ` [PATCH v19 8/8] secretmem: test: add basic selftest for memfd_secret(2) Mike Rapoport
2021-05-14 9:40 ` David Hildenbrand
2021-05-13 19:08 ` [PATCH v19 0/8] mm: introduce memfd_secret system call to create "secret" memory areas James Bottomley
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ea1ddcfa-f52d-9a7d-cb7b-8502b38a90da@redhat.com \
--to=david@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=arnd@arndb.de \
--cc=bp@alien8.de \
--cc=catalin.marinas@arm.com \
--cc=cl@linux.com \
--cc=dave.hansen@linux.intel.com \
--cc=elena.reshetova@intel.com \
--cc=hagen@jauu.net \
--cc=hpa@zytor.com \
--cc=jejb@linux.ibm.com \
--cc=keescook@chromium.org \
--cc=kirill@shutemov.name \
--cc=luto@kernel.org \
--cc=mark.rutland@arm.com \
--cc=mhocko@suse.com \
--cc=mingo@redhat.com \
--cc=mjg59@srcf.ucam.org \
--cc=mtk.manpages@gmail.com \
--cc=palmer@dabbelt.com \
--cc=palmerdabbelt@google.com \
--cc=paul.walmsley@sifive.com \
--cc=peterz@infradead.org \
--cc=rick.p.edgecombe@intel.c \
--cc=rjw@rjwysocki.net \
--cc=rppt@kernel.org \
--cc=rppt@linux.ibm.com \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).