linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: akpm@linux-foundation.org, arnd@arndb.de, bp@alien8.de,
	catalin.marinas@arm.com, cl@linux.com, dan.j.williams@intel.com,
	dave.hansen@linux.intel.com, david@redhat.com,
	elena.reshetova@intel.com, guro@fb.com, hagen@jauu.net,
	hpa@zytor.com, James.Bottomley@HansenPartnership.com,
	jejb@linux.ibm.com, kirill@shutemov.name, linux-mm@kvack.org,
	lkp@intel.com, luto@kernel.org, mark.rutland@arm.com,
	mingo@redhat.com, mm-commits@vger.kernel.org,
	mtk.manpages@gmail.com, palmer@dabbelt.com,
	palmerdabbelt@google.com, paul.walmsley@sifive.com,
	peterz@infradead.org, rick.p.edgecombe@intel.com,
	rppt@linux.ibm.com, shakeelb@google.com, shuah@kernel.org,
	tglx@linutronix.de, torvalds@linux-foundation.org,
	tycho@tycho.ws, viro@zeniv.linux.org.uk, will@kernel.org,
	willy@infradead.org
Subject: [patch 08/54] mmap: make mlock_future_check() global
Date: Wed, 07 Jul 2021 18:07:50 -0700	[thread overview]
Message-ID: <20210708010750.QylhRRP19%akpm@linux-foundation.org> (raw)
In-Reply-To: <20210707175950.eceddb86c6c555555d4730e2@linux-foundation.org>

From: Mike Rapoport <rppt@linux.ibm.com>
Subject: mmap: make mlock_future_check() global

Patch series "mm: introduce memfd_secret system call to create "secret" memory areas", v20.

This is an implementation of "secret" mappings backed by a file
descriptor.

The file descriptor backing secret memory mappings is created using a
dedicated memfd_secret system call The desired protection mode for the
memory is configured using flags parameter of the system call.  The mmap()
of the file descriptor created with memfd_secret() will create a "secret"
memory mapping.  The pages in that mapping will be marked as not present
in the direct map and will be present only in the page table of the owning
mm.

Although normally Linux userspace mappings are protected from other users,
such secret mappings are useful for environments where a hostile tenant is
trying to trick the kernel into giving them access to other tenants
mappings.

It's designed to provide the following protections:

* Enhanced protection (in conjunction with all the other in-kernel
  attack prevention systems) against ROP attacks.  Seceretmem makes
  "simple" ROP insufficient to perform exfiltration, which increases the
  required complexity of the attack.  Along with other protections like
  the kernel stack size limit and address space layout randomization which
  make finding gadgets is really hard, absence of any in-kernel primitive
  for accessing secret memory means the one gadget ROP attack can't work. 
  Since the only way to access secret memory is to reconstruct the missing
  mapping entry, the attacker has to recover the physical page and insert
  a PTE pointing to it in the kernel and then retrieve the contents.  That
  takes at least three gadgets which is a level of difficulty beyond most
  standard attacks.

* Prevent cross-process secret userspace memory exposures.  Once the
  secret memory is allocated, the user can't accidentally pass it into the
  kernel to be transmitted somewhere.  The secreremem pages cannot be
  accessed via the direct map and they are disallowed in GUP.

* Harden against exploited kernel flaws.  In order to access secretmem,
  a kernel-side attack would need to either walk the page tables and
  create new ones, or spawn a new privileged uiserspace process to perform
  secrets exfiltration using ptrace.

In the future the secret mappings may be used as a mean to protect guest
memory in a virtual machine host.

For demonstration of secret memory usage we've created a userspace library

https://git.kernel.org/pub/scm/linux/kernel/git/jejb/secret-memory-preloader.git

that does two things: the first is act as a preloader for openssl to
redirect all the OPENSSL_malloc calls to secret memory meaning any secret
keys get automatically protected this way and the other thing it does is
expose the API to the user who needs it.  We anticipate that a lot of the
use cases would be like the openssl one: many toolkits that deal with
secret keys already have special handling for the memory to try to give
them greater protection, so this would simply be pluggable into the
toolkits without any need for user application modification.

Hiding secret memory mappings behind an anonymous file allows usage of the
page cache for tracking pages allocated for the "secret" mappings as well
as using address_space_operations for e.g.  page migration callbacks.

The anonymous file may be also used implicitly, like hugetlb files, to
implement mmap(MAP_SECRET) and use the secret memory areas with "native"
mm ABIs in the future.

Removing of the pages from the direct map may cause its fragmentation on
architectures that use large pages to map the physical memory which
affects the system performance.  However, the original Kconfig text for
CONFIG_DIRECT_GBPAGES said that gigabyte pages in the direct map "...  can
improve the kernel's performance a tiny bit ..." (commit 00d1c5e05736
("x86: add gbpages switches")) and the recent report [1] showed that "... 
although 1G mappings are a good default choice, there is no compelling
evidence that it must be the only choice".  Hence, it is sufficient to
have secretmem disabled by default with the ability of a system
administrator to enable it at boot time.

In addition, there is also a long term goal to improve management of the
direct map.

[1] https://lore.kernel.org/linux-mm/213b4567-46ce-f116-9cdf-bbd0c884eb3c@linux.intel.com/


This patch (of 7):

It will be used by the upcoming secret memory implementation.

Link: https://lkml.kernel.org/r/20210518072034.31572-1-rppt@kernel.org
Link: https://lkml.kernel.org/r/20210518072034.31572-2-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Elena Reshetova <elena.reshetova@intel.com>
Cc: Hagen Paul Pfeifer <hagen@jauu.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Bottomley <jejb@linux.ibm.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Palmer Dabbelt <palmerdabbelt@google.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tycho Andersen <tycho@tycho.ws>
Cc: Will Deacon <will@kernel.org>
Cc: kernel test robot <lkp@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/internal.h |    3 +++
 mm/mmap.c     |    5 ++---
 2 files changed, 5 insertions(+), 3 deletions(-)

--- a/mm/internal.h~mmap-make-mlock_future_check-global
+++ a/mm/internal.h
@@ -360,6 +360,9 @@ static inline void munlock_vma_pages_all
 extern void mlock_vma_page(struct page *page);
 extern unsigned int munlock_vma_page(struct page *page);
 
+extern int mlock_future_check(struct mm_struct *mm, unsigned long flags,
+			      unsigned long len);
+
 /*
  * Clear the page's PageMlocked().  This can be useful in a situation where
  * we want to unconditionally remove a page from the pagecache -- e.g.,
--- a/mm/mmap.c~mmap-make-mlock_future_check-global
+++ a/mm/mmap.c
@@ -1352,9 +1352,8 @@ static inline unsigned long round_hint_t
 	return hint;
 }
 
-static inline int mlock_future_check(struct mm_struct *mm,
-				     unsigned long flags,
-				     unsigned long len)
+int mlock_future_check(struct mm_struct *mm, unsigned long flags,
+		       unsigned long len)
 {
 	unsigned long locked, lock_limit;
 
_


  parent reply	other threads:[~2021-07-08  1:07 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-08  0:59 incoming Andrew Morton
2021-07-08  1:07 ` [patch 01/54] lib/test: fix spelling mistakes Andrew Morton
2021-07-08  1:07 ` [patch 02/54] lib: " Andrew Morton
2021-07-08  1:07 ` [patch 03/54] lib: fix spelling mistakes in header files Andrew Morton
2021-07-08  1:07 ` [patch 04/54] hexagon: handle {,SOFT}IRQENTRY_TEXT in linker script Andrew Morton
2021-07-08  1:07 ` [patch 05/54] hexagon: use common DISCARDS macro Andrew Morton
2021-07-08  1:07 ` [patch 06/54] hexagon: select ARCH_WANT_LD_ORPHAN_WARN Andrew Morton
2021-07-08  1:07 ` [patch 07/54] mm/slub: use stackdepot to save stack trace in objects Andrew Morton
2021-07-16  7:39   ` Christoph Hellwig
2021-07-16  8:57     ` Vlastimil Babka
2021-07-16  9:12       ` Christoph Hellwig
2021-07-16 20:12     ` Linus Torvalds
2021-07-16 22:37       ` Vlastimil Babka
2021-07-17 17:34         ` Randy Dunlap
2021-07-18  7:29           ` Vlastimil Babka
2021-07-18 14:17             ` Randy Dunlap
2021-07-08  1:07 ` Andrew Morton [this message]
2021-07-08  1:07 ` [patch 09/54] riscv/Kconfig: make direct map manipulation options depend on MMU Andrew Morton
2021-07-08  1:07 ` [patch 10/54] set_memory: allow querying whether set_direct_map_*() is actually enabled Andrew Morton
2021-07-08  1:08 ` [patch 11/54] mm: introduce memfd_secret system call to create "secret" memory areas Andrew Morton
2021-07-08  3:13   ` Linus Torvalds
2021-07-08  5:21     ` Mike Rapoport
2021-07-08 18:38       ` Linus Torvalds
2021-07-08 20:13         ` Hagen Paul Pfeifer
2021-07-09 15:44           ` Mike Rapoport
2021-07-08  1:08 ` [patch 12/54] PM: hibernate: disable when there are active secretmem users Andrew Morton
2021-07-08  3:15   ` Linus Torvalds
2021-07-08  5:30     ` Mike Rapoport
2021-07-08  1:08 ` [patch 13/54] arch, mm: wire up memfd_secret system call where relevant Andrew Morton
2021-07-08  1:08 ` [patch 14/54] secretmem: test: add basic selftest for memfd_secret(2) Andrew Morton
2021-07-08  1:08 ` [patch 15/54] mm: fix spelling mistakes in header files Andrew Morton
2021-07-08  1:08 ` [patch 16/54] mm: add setup_initial_init_mm() helper Andrew Morton
2021-07-08  1:08 ` [patch 17/54] arc: convert to setup_initial_init_mm() Andrew Morton
2021-07-08  1:08 ` [patch 18/54] arm: " Andrew Morton
2021-07-08  1:08 ` [patch 19/54] arm64: " Andrew Morton
2021-07-08  1:08 ` [patch 20/54] csky: " Andrew Morton
2021-07-08  1:08 ` [patch 21/54] h8300: " Andrew Morton
2021-07-08  1:08 ` [patch 22/54] m68k: " Andrew Morton
2021-07-08  1:08 ` [patch 23/54] nds32: " Andrew Morton
2021-07-08  1:08 ` [patch 24/54] nios2: " Andrew Morton
2021-07-08  1:08 ` [patch 25/54] openrisc: " Andrew Morton
2021-07-08  1:08 ` [patch 26/54] powerpc: " Andrew Morton
2021-07-08  4:46   ` Christophe Leroy
2021-07-08  1:08 ` [patch 27/54] riscv: " Andrew Morton
2021-07-08  1:08 ` [patch 28/54] s390: " Andrew Morton
2021-07-08  1:09 ` [patch 29/54] sh: " Andrew Morton
2021-07-08  1:09 ` [patch 30/54] x86: " Andrew Morton
2021-07-08  1:09 ` [patch 31/54] buildid: only consider GNU notes for build ID parsing Andrew Morton
2021-07-08  1:09 ` [patch 32/54] buildid: add API to parse build ID out of buffer Andrew Morton
2021-07-08  1:09 ` [patch 33/54] buildid: stash away kernels build ID on init Andrew Morton
2021-07-08  1:09 ` [patch 34/54] dump_stack: add vmlinux build ID to stack traces Andrew Morton
2021-07-08  1:09 ` [patch 35/54] module: add printk formats to add module build ID to stacktraces Andrew Morton
2021-07-08  1:09 ` [patch 36/54] arm64: stacktrace: use %pSb for backtrace printing Andrew Morton
2021-07-08  1:09 ` [patch 37/54] x86/dumpstack: use %pSb/%pBb " Andrew Morton
2021-07-08  1:09 ` [patch 38/54] scripts/decode_stacktrace.sh: support debuginfod Andrew Morton
2021-07-08  1:09 ` [patch 39/54] scripts/decode_stacktrace.sh: silence stderr messages from addr2line/nm Andrew Morton
2021-07-08  1:09 ` [patch 40/54] scripts/decode_stacktrace.sh: indicate 'auto' can be used for base path Andrew Morton
2021-07-08  1:09 ` [patch 41/54] buildid: mark some arguments const Andrew Morton
2021-07-08  1:09 ` [patch 42/54] buildid: fix kernel-doc notation Andrew Morton
2021-07-08  1:09 ` [patch 43/54] kdump: use vmlinux_build_id to simplify Andrew Morton
2021-07-08  1:09 ` [patch 44/54] mm: rename pud_page_vaddr to pud_pgtable and make it return pmd_t * Andrew Morton
2021-07-08  1:09 ` [patch 45/54] mm: rename p4d_page_vaddr to p4d_pgtable and make it return pud_t * Andrew Morton
2021-07-08  1:09 ` [patch 46/54] selftest/mremap_test: update the test to handle pagesize other than 4K Andrew Morton
2021-07-08  1:10 ` [patch 47/54] selftest/mremap_test: avoid crash with static build Andrew Morton
2021-07-08  1:10 ` [patch 48/54] mm/mremap: convert huge PUD move to separate helper Andrew Morton
2021-07-08  1:10 ` [patch 49/54] mm/mremap: don't enable optimized PUD move if page table levels is 2 Andrew Morton
2021-07-08  1:10 ` [patch 50/54] mm/mremap: use pmd/pud_poplulate to update page table entries Andrew Morton
2021-07-08  1:10 ` [patch 51/54] mm/mremap: hold the rmap lock in write mode when moving " Andrew Morton
2021-07-08  1:10 ` [patch 52/54] mm/mremap: allow arch runtime override Andrew Morton
2021-07-08  1:10 ` [patch 53/54] powerpc/book3s64/mm: update flush_tlb_range to flush page walk cache Andrew Morton
2021-07-08  1:10 ` [patch 54/54] powerpc/mm: enable HAVE_MOVE_PMD support Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210708010750.QylhRRP19%akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=James.Bottomley@HansenPartnership.com \
    --cc=arnd@arndb.de \
    --cc=bp@alien8.de \
    --cc=catalin.marinas@arm.com \
    --cc=cl@linux.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=elena.reshetova@intel.com \
    --cc=guro@fb.com \
    --cc=hagen@jauu.net \
    --cc=hpa@zytor.com \
    --cc=jejb@linux.ibm.com \
    --cc=kirill@shutemov.name \
    --cc=linux-mm@kvack.org \
    --cc=lkp@intel.com \
    --cc=luto@kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mingo@redhat.com \
    --cc=mm-commits@vger.kernel.org \
    --cc=mtk.manpages@gmail.com \
    --cc=palmer@dabbelt.com \
    --cc=palmerdabbelt@google.com \
    --cc=paul.walmsley@sifive.com \
    --cc=peterz@infradead.org \
    --cc=rick.p.edgecombe@intel.com \
    --cc=rppt@linux.ibm.com \
    --cc=shakeelb@google.com \
    --cc=shuah@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=tycho@tycho.ws \
    --cc=viro@zeniv.linux.org.uk \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).