LKML Archive on lore.kernel.org
 help / color / Atom feed
From: Mike Rapoport <rppt@kernel.org>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>,
	Andy Lutomirski <luto@kernel.org>, Arnd Bergmann <arnd@arndb.de>,
	Borislav Petkov <bp@alien8.de>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Christopher Lameter <cl@linux.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	David Hildenbrand <david@redhat.com>,
	Elena Reshetova <elena.reshetova@intel.com>,
	"H. Peter Anvin" <hpa@zytor.com>, Ingo Molnar <mingo@redhat.com>,
	James Bottomley <jejb@linux.ibm.com>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	Matthew Wilcox <willy@infradead.org>,
	Mark Rutland <mark.rutland@arm.com>,
	Mike Rapoport <rppt@linux.ibm.com>,
	Michael Kerrisk <mtk.manpages@gmail.com>,
	Palmer Dabbelt <palmer@dabbelt.com>,
	Paul Walmsley <paul.walmsley@sifive.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Rick Edgecombe <rick.p.edgecombe@intel.com>,
	Shuah Khan <shuah@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Tycho Andersen <tycho@tycho.ws>, Will Deacon <will@kernel.org>,
	linux-api@vger.kernel.org, linux-arch@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org,
	linux-nvdimm@lists.01.org, linux-riscv@lists.infradead.org,
	x86@kernel.org
Subject: Re: [PATCH v8 0/9] mm: introduce memfd_secret system call to create "secret" memory areas
Date: Thu, 12 Nov 2020 16:56:30 +0200
Message-ID: <20201112145630.GN4758@kernel.org> (raw)
In-Reply-To: <20201110151444.20662-1-rppt@kernel.org>

Hi Andrew,

It'll be great if this can be pulled back to mmotm for wider exposure to
testing and fuzzing.

On Tue, Nov 10, 2020 at 05:14:35PM +0200, Mike Rapoport wrote:
> From: Mike Rapoport <rppt@linux.ibm.com>
> 
> Hi,
> 
> This is an implementation of "secret" mappings backed by a file descriptor.
> 
> The file descriptor backing secret memory mappings is created using a
> dedicated memfd_secret system call The desired protection mode for the
> memory is configured using flags parameter of the system call. The mmap()
> of the file descriptor created with memfd_secret() will create a "secret"
> memory mapping. The pages in that mapping will be marked as not present in
> the direct map and will have desired protection bits set in the user page
> table. For instance, current implementation allows uncached mappings.
> 
> Although normally Linux userspace mappings are protected from other users,
> such secret mappings are useful for environments where a hostile tenant is
> trying to trick the kernel into giving them access to other tenants
> mappings.
> 
> Additionally, in the future the secret mappings may be used as a mean to
> protect guest memory in a virtual machine host.
> 
> For demonstration of secret memory usage we've created a userspace library
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/jejb/secret-memory-preloader.git
> 
> that does two things: the first is act as a preloader for openssl to
> redirect all the OPENSSL_malloc calls to secret memory meaning any secret
> keys get automatically protected this way and the other thing it does is
> expose the API to the user who needs it. We anticipate that a lot of the
> use cases would be like the openssl one: many toolkits that deal with
> secret keys already have special handling for the memory to try to give
> them greater protection, so this would simply be pluggable into the
> toolkits without any need for user application modification.
> 
> Hiding secret memory mappings behind an anonymous file allows (ab)use of
> the page cache for tracking pages allocated for the "secret" mappings as
> well as using address_space_operations for e.g. page migration callbacks.
> 
> The anonymous file may be also used implicitly, like hugetlb files, to
> implement mmap(MAP_SECRET) and use the secret memory areas with "native" mm
> ABIs in the future.
> 
> To limit fragmentation of the direct map to splitting only PUD-size pages,
> I've added an amortizing cache of PMD-size pages to each file descriptor
> that is used as an allocation pool for the secret memory areas.
> 
> As the memory allocated by secretmem becomes unmovable, we use CMA to back
> large page caches so that page allocator won't be surprised by failing attempt
> to migrate these pages.
> 
> v8:
> * Use CMA for all secretmem allocations as David suggested
> * Update memcg accounting after transtion to CMA
> * Prevent hibernation when there are active secretmem users
> * Add zeroing of the memory before releasing it back to cma/page allocator
> * Rebase on v5.10-rc2-mmotm-2020-11-07-21-40
> 
> v7: https://lore.kernel.org/lkml/20201026083752.13267-1-rppt@kernel.org
> * Use set_direct_map() instead of __kernel_map_pages() to ensure error
>   handling in case the direct map update fails
> * Add accounting of large pages used to reduce the direct map fragmentation
> * Teach get_user_pages() and frieds to refuse get/pin secretmem pages
> 
> v6: https://lore.kernel.org/lkml/20200924132904.1391-1-rppt@kernel.org
> * Silence the warning about missing syscall, thanks to Qian Cai
> * Replace spaces with tabs in Kconfig additions, per Randy
> * Add a selftest.
> 
> v5: https://lore.kernel.org/lkml/20200916073539.3552-1-rppt@kernel.org
> * rebase on v5.9-rc5
> * drop boot time memory reservation patch
> 
> v4: https://lore.kernel.org/lkml/20200818141554.13945-1-rppt@kernel.org
> * rebase on v5.9-rc1
> * Do not redefine PMD_PAGE_ORDER in fs/dax.c, thanks Kirill
> * Make secret mappings exclusive by default and only require flags to
>   memfd_secret() system call for uncached mappings, thanks again Kirill :)
> 
> v3: https://lore.kernel.org/lkml/20200804095035.18778-1-rppt@kernel.org
> * Squash kernel-parameters.txt update into the commit that added the
>   command line option.
> * Make uncached mode explicitly selectable by architectures. For now enable
>   it only on x86.
> 
> v2: https://lore.kernel.org/lkml/20200727162935.31714-1-rppt@kernel.org
> * Follow Michael's suggestion and name the new system call 'memfd_secret'
> * Add kernel-parameters documentation about the boot option
> * Fix i386-tinyconfig regression reported by the kbuild bot.
>   CONFIG_SECRETMEM now depends on !EMBEDDED to disable it on small systems
>   from one side and still make it available unconditionally on
>   architectures that support SET_DIRECT_MAP.
> 
> v1: https://lore.kernel.org/lkml/20200720092435.17469-1-rppt@kernel.org
> 
> Mike Rapoport (9):
>   mm: add definition of PMD_PAGE_ORDER
>   mmap: make mlock_future_check() global
>   set_memory: allow set_direct_map_*_noflush() for multiple pages
>   mm: introduce memfd_secret system call to create "secret" memory areas
>   secretmem: use PMD-size pages to amortize direct map fragmentation
>   secretmem: add memcg accounting
>   PM: hibernate: disable when there are active secretmem users
>   arch, mm: wire up memfd_secret system call were relevant
>   secretmem: test: add basic selftest for memfd_secret(2)
> 
>  arch/Kconfig                              |   7 +
>  arch/arm64/include/asm/cacheflush.h       |   4 +-
>  arch/arm64/include/asm/unistd.h           |   2 +-
>  arch/arm64/include/asm/unistd32.h         |   2 +
>  arch/arm64/include/uapi/asm/unistd.h      |   1 +
>  arch/arm64/mm/pageattr.c                  |  10 +-
>  arch/riscv/include/asm/set_memory.h       |   4 +-
>  arch/riscv/include/asm/unistd.h           |   1 +
>  arch/riscv/mm/pageattr.c                  |   8 +-
>  arch/x86/Kconfig                          |   1 +
>  arch/x86/entry/syscalls/syscall_32.tbl    |   1 +
>  arch/x86/entry/syscalls/syscall_64.tbl    |   1 +
>  arch/x86/include/asm/set_memory.h         |   4 +-
>  arch/x86/mm/pat/set_memory.c              |   8 +-
>  fs/dax.c                                  |  11 +-
>  include/linux/pgtable.h                   |   3 +
>  include/linux/secretmem.h                 |  30 ++
>  include/linux/set_memory.h                |   4 +-
>  include/linux/syscalls.h                  |   1 +
>  include/uapi/asm-generic/unistd.h         |   6 +-
>  include/uapi/linux/magic.h                |   1 +
>  include/uapi/linux/secretmem.h            |   8 +
>  kernel/power/hibernate.c                  |   5 +-
>  kernel/power/snapshot.c                   |   4 +-
>  kernel/sys_ni.c                           |   2 +
>  mm/Kconfig                                |   5 +
>  mm/Makefile                               |   1 +
>  mm/filemap.c                              |   2 +-
>  mm/gup.c                                  |  10 +
>  mm/internal.h                             |   3 +
>  mm/mmap.c                                 |   5 +-
>  mm/secretmem.c                            | 451 ++++++++++++++++++++++
>  mm/vmalloc.c                              |   5 +-
>  scripts/checksyscalls.sh                  |   4 +
>  tools/testing/selftests/vm/.gitignore     |   1 +
>  tools/testing/selftests/vm/Makefile       |   3 +-
>  tools/testing/selftests/vm/memfd_secret.c | 298 ++++++++++++++
>  tools/testing/selftests/vm/run_vmtests    |  17 +
>  38 files changed, 895 insertions(+), 39 deletions(-)
>  create mode 100644 include/linux/secretmem.h
>  create mode 100644 include/uapi/linux/secretmem.h
>  create mode 100644 mm/secretmem.c
>  create mode 100644 tools/testing/selftests/vm/memfd_secret.c
> 
> 
> base-commit: 9f8ce377d420db12b19d6a4f636fecbd88a725a5
> -- 
> 2.28.0
> 

-- 
Sincerely yours,
Mike.

      parent reply index

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-10 15:14 Mike Rapoport
2020-11-10 15:14 ` [PATCH v8 1/9] mm: add definition of PMD_PAGE_ORDER Mike Rapoport
2020-11-10 15:14 ` [PATCH v8 2/9] mmap: make mlock_future_check() global Mike Rapoport
2020-11-10 17:17   ` David Hildenbrand
2020-11-10 18:06     ` Mike Rapoport
2020-11-12 16:22       ` David Hildenbrand
2020-11-12 19:08         ` Mike Rapoport
2020-11-12 20:15           ` David Hildenbrand
2020-11-15  8:26             ` Mike Rapoport
2020-11-17 15:09               ` David Hildenbrand
2020-11-17 15:58                 ` Mike Rapoport
2020-11-10 15:14 ` [PATCH v8 3/9] set_memory: allow set_direct_map_*_noflush() for multiple pages Mike Rapoport
2020-11-13 12:26   ` Catalin Marinas
2020-11-10 15:14 ` [PATCH v8 4/9] mm: introduce memfd_secret system call to create "secret" memory areas Mike Rapoport
2020-11-13 13:58   ` Matthew Wilcox
2020-11-15  8:53     ` Mike Rapoport
2020-11-13 14:06   ` Matthew Wilcox
2020-11-15  8:45     ` Mike Rapoport
2020-11-10 15:14 ` [PATCH v8 5/9] secretmem: use PMD-size pages to amortize direct map fragmentation Mike Rapoport
2020-11-10 15:14 ` [PATCH v8 6/9] secretmem: add memcg accounting Mike Rapoport
2020-11-13  1:35   ` Andrew Morton
2020-11-13 23:42   ` Roman Gushchin
2020-11-15  9:17     ` Mike Rapoport
2020-11-10 15:14 ` [PATCH v8 7/9] PM: hibernate: disable when there are active secretmem users Mike Rapoport
2020-11-10 15:14 ` [PATCH v8 8/9] arch, mm: wire up memfd_secret system call were relevant Mike Rapoport
2020-11-13 12:25   ` Catalin Marinas
2020-11-15  8:56     ` Mike Rapoport
2020-11-10 15:14 ` [PATCH v8 9/9] secretmem: test: add basic selftest for memfd_secret(2) Mike Rapoport
2020-11-12 14:56 ` Mike Rapoport [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201112145630.GN4758@kernel.org \
    --to=rppt@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=bp@alien8.de \
    --cc=catalin.marinas@arm.com \
    --cc=cl@linux.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=elena.reshetova@intel.com \
    --cc=hpa@zytor.com \
    --cc=jejb@linux.ibm.com \
    --cc=kirill@shutemov.name \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=luto@kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mingo@redhat.com \
    --cc=mtk.manpages@gmail.com \
    --cc=palmer@dabbelt.com \
    --cc=paul.walmsley@sifive.com \
    --cc=peterz@infradead.org \
    --cc=rick.p.edgecombe@intel.com \
    --cc=rppt@linux.ibm.com \
    --cc=shuah@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=tycho@tycho.ws \
    --cc=viro@zeniv.linux.org.uk \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git
	git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git
	git clone --mirror https://lore.kernel.org/lkml/9 lkml/git/9.git
	git clone --mirror https://lore.kernel.org/lkml/10 lkml/git/10.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git