linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Mike Rapoport <rppt@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>,
	Andy Lutomirski <luto@kernel.org>, Arnd Bergmann <arnd@arndb.de>,
	Borislav Petkov <bp@alien8.de>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Christopher Lameter <cl@linux.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Elena Reshetova <elena.reshetova@intel.com>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Hagen Paul Pfeifer <hagen@jauu.net>,
	Ingo Molnar <mingo@redhat.com>,
	James Bottomley <jejb@linux.ibm.com>,
	Kees Cook <keescook@chromium.org>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	Matthew Wilcox <willy@infradead.org>,
	Matthew Garrett <mjg59@srcf.ucam.org>,
	Mark Rutland <mark.rutland@arm.com>,
	Michal Hocko <mhocko@suse.com>,
	Mike Rapoport <rppt@linux.ibm.com>,
	Michael Kerrisk <mtk.manpages@gmail.com>,
	Palmer Dabbelt <palmer@dabbelt.com>,
	Palmer Dabbelt <palmerdabbelt@google.com>,
	Paul Walmsley <paul.walmsley@sifive.com>,
	Peter Zijlstra <peterz@infradead.org>,
	"Rafael J. Wysocki" <rjw@rjwysocki.net>,
	Rick Edgecombe <rick.p.edgecombe@intel.com>,
	Roman Gushchin <guro@fb.com>, Shakeel Butt <shakeelb@google.com>,
	Shuah Khan <shuah@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Tycho Andersen <tycho@tycho.ws>, Will Deacon <will@kernel.org>,
	Yury Norov <yury.norov@gmail.com>,
	linux-api@vger.kernel.org, linux-arch@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org,
	linux-nvdimm@lists.01.org, linux-riscv@lists.infradead.org,
	x86@kernel.org
Subject: Re: [PATCH v19 5/8] mm: introduce memfd_secret system call to create "secret" memory areas
Date: Fri, 14 May 2021 10:50:55 +0200	[thread overview]
Message-ID: <ea1ddcfa-f52d-9a7d-cb7b-8502b38a90da@redhat.com> (raw)
In-Reply-To: <20210513184734.29317-6-rppt@kernel.org>

On 13.05.21 20:47, Mike Rapoport wrote:
> From: Mike Rapoport <rppt@linux.ibm.com>
> 
> Introduce "memfd_secret" system call with the ability to create
> memory areas visible only in the context of the owning process and
> not mapped not only to other processes but in the kernel page tables
> as well.
> 
> The secretmem feature is off by default and the user must explicitly
> enable it at the boot time.
> 
> Once secretmem is enabled, the user will be able to create a file 
> descriptor using the memfd_secret() system call. The memory areas
> created by mmap() calls from this file descriptor will be unmapped
> from the kernel direct map and they will be only mapped in the page
> table of the processes that have access to the file descriptor.
> 
> The file descriptor based memory has several advantages over the 
> "traditional" mm interfaces, such as mlock(), mprotect(), madvise().
> File descriptor approach allows explict and controlled sharing of the
> memory

s/explict/explicit/

> areas, it allows to seal the operations. Besides, file descriptor
> based memory paves the way for VMMs to remove the secret memory range
> from the userpace hipervisor process, for instance QEMU. Andy
> Lutomirski says:

s/userpace hipervisor/userspace hypervisor/

> 
> "Getting fd-backed memory into a guest will take some possibly major
> work in the kernel, but getting vma-backed memory into a guest
> without mapping it in the host user address space seems much, much
> worse."
> 
> memfd_secret() is made a dedicated system call rather than an
> extention to

s/extention/extension/

> memfd_create() because it's purpose is to allow the user to create
> more secure memory mappings rather than to simply allow file based
> access to the memory. Nowadays a new system call cost is negligible
> while it is way simpler for userspace to deal with a clear-cut system
> calls than with a multiplexer or an overloaded syscall. Moreover, the
> initial implementation of memfd_secret() is completely distinct from
> memfd_create() so there is no much sense in overloading
> memfd_create() to begin with. If there will be a need for code
> sharing between these implementation it can be easily achieved
> without a need to adjust user visible APIs.
> 
> The secret memory remains accessible in the process context using
> uaccess primitives, but it is not exposed to the kernel otherwise;
> secret memory areas are removed from the direct map and functions in
> the follow_page()/get_user_page() family will refuse to return a page
> that belongs to the secret memory area.
> 
> Once there will be a use case that will require exposing secretmem to
> the kernel it will be an opt-in request in the system call flags so
> that user would have to decide what data can be exposed to the
> kernel.

Maybe spell out an example: like page migration.

> 
> Removing of the pages from the direct map may cause its fragmentation
> on architectures that use large pages to map the physical memory
> which affects the system performance. However, the original Kconfig
> text for CONFIG_DIRECT_GBPAGES said that gigabyte pages in the direct
> map "... can improve the kernel's performance a tiny bit ..." (commit
> 00d1c5e05736 ("x86: add gbpages switches")) and the recent report [1]
> showed that "... although 1G mappings are a good default choice,
> there is no compelling evidence that it must be the only choice".
> Hence, it is sufficient to have secretmem disabled by default with
> the ability of a system administrator to enable it at boot time.

Maybe add a link to the Intel performance evaluation.

> 
> Pages in the secretmem regions are unevictable and unmovable to
> avoid accidental exposure of the sensitive data via swap or during
> page migration.
> 
> Since the secretmem mappings are locked in memory they cannot exceed 
> RLIMIT_MEMLOCK. Since these mappings are already locked independently
> from mlock(), an attempt to mlock()/munlock() secretmem range would
> fail and mlockall()/munlockall() will ignore secretmem mappings.

Maybe add something like "similar to pages pinned by VFIO".

> 
> However, unlike mlock()ed memory, secretmem currently behaves more
> like long-term GUP: secretmem mappings are unmovable mappings
> directly consumed by user space. With default limits, there is no
> excessive use of secretmem and it poses no real problem in
> combination with ZONE_MOVABLE/CMA, but in the future this should be
> addressed to allow balanced use of large amounts of secretmem along
> with ZONE_MOVABLE/CMA.
> 
> A page that was a part of the secret memory area is cleared when it
> is freed to ensure the data is not exposed to the next user of that
> page.

You could skip that with init_on_free (and eventually also with 
init_on_alloc) set to avoid double clearing.

> 
> The following example demonstrates creation of a secret mapping
> (error handling is omitted):
> 
> fd = memfd_secret(0); ftruncate(fd, MAP_SIZE); ptr = mmap(NULL,
> MAP_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
> 
> [1]
> https://lore.kernel.org/linux-mm/213b4567-46ce-f116-9cdf-bbd0c884eb3c@linux.intel.com/

[my mail client messed up the remainder of the mail for whatever reason, 
will comment in a separate mail if there is anything to comment :) ]

-- 
Thanks,

David / dhildenb


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2021-05-14  8:52 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-13 18:47 [PATCH v19 0/8] mm: introduce memfd_secret system call to create "secret" memory areas Mike Rapoport
2021-05-13 18:47 ` [PATCH v19 1/8] mmap: make mlock_future_check() global Mike Rapoport
2021-05-14  8:27   ` David Hildenbrand
2021-05-13 18:47 ` [PATCH v19 2/8] riscv/Kconfig: make direct map manipulation options depend on MMU Mike Rapoport
2021-05-14  8:28   ` David Hildenbrand
2021-05-13 18:47 ` [PATCH v19 3/8] set_memory: allow set_direct_map_*_noflush() for multiple pages Mike Rapoport
2021-05-14  8:43   ` David Hildenbrand
2021-05-16  7:13     ` Mike Rapoport
2021-05-13 18:47 ` [PATCH v19 4/8] set_memory: allow querying whether set_direct_map_*() is actually enabled Mike Rapoport
2021-05-13 18:47 ` [PATCH v19 5/8] mm: introduce memfd_secret system call to create "secret" memory areas Mike Rapoport
2021-05-14  8:50   ` David Hildenbrand [this message]
2021-05-17  7:23     ` Mike Rapoport
2021-05-14  9:25   ` David Hildenbrand
2021-05-16  7:29     ` Mike Rapoport
2021-05-18  9:59       ` Michal Hocko
2021-05-18 10:06         ` David Hildenbrand
2021-05-18 10:31           ` Michal Hocko
2021-05-18 10:35             ` David Hildenbrand
2021-05-18 11:08               ` Michal Hocko
2021-05-19  7:13                 ` Mike Rapoport
2021-05-13 18:47 ` [PATCH v19 6/8] PM: hibernate: disable when there are active secretmem users Mike Rapoport
2021-05-14  9:27   ` David Hildenbrand
2021-05-18 10:24   ` Mark Rutland
2021-05-18 10:27     ` David Hildenbrand
2021-05-19  1:32     ` James Bottomley
2021-05-19  1:49       ` Dan Williams
2021-05-19  3:50         ` James Bottomley
2021-05-13 18:47 ` [PATCH v19 7/8] arch, mm: wire up memfd_secret system call where relevant Mike Rapoport
2021-05-14  9:27   ` David Hildenbrand
2021-05-13 18:47 ` [PATCH v19 8/8] secretmem: test: add basic selftest for memfd_secret(2) Mike Rapoport
2021-05-14  9:40   ` David Hildenbrand
2021-05-13 19:08 ` [PATCH v19 0/8] mm: introduce memfd_secret system call to create "secret" memory areas James Bottomley

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ea1ddcfa-f52d-9a7d-cb7b-8502b38a90da@redhat.com \
    --to=david@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=bp@alien8.de \
    --cc=catalin.marinas@arm.com \
    --cc=cl@linux.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=elena.reshetova@intel.com \
    --cc=guro@fb.com \
    --cc=hagen@jauu.net \
    --cc=hpa@zytor.com \
    --cc=jejb@linux.ibm.com \
    --cc=keescook@chromium.org \
    --cc=kirill@shutemov.name \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=luto@kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mhocko@suse.com \
    --cc=mingo@redhat.com \
    --cc=mjg59@srcf.ucam.org \
    --cc=mtk.manpages@gmail.com \
    --cc=palmer@dabbelt.com \
    --cc=palmerdabbelt@google.com \
    --cc=paul.walmsley@sifive.com \
    --cc=peterz@infradead.org \
    --cc=rick.p.edgecombe@intel.com \
    --cc=rjw@rjwysocki.net \
    --cc=rppt@kernel.org \
    --cc=rppt@linux.ibm.com \
    --cc=shakeelb@google.com \
    --cc=shuah@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=tycho@tycho.ws \
    --cc=viro@zeniv.linux.org.uk \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    --cc=yury.norov@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).