linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@suse.com>
To: James Bottomley <jejb@linux.ibm.com>
Cc: David Hildenbrand <david@redhat.com>,
	Mike Rapoport <rppt@kernel.org>,
	Mike Rapoport <rppt@linux.ibm.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Andy Lutomirski <luto@kernel.org>, Arnd Bergmann <arnd@arndb.de>,
	Borislav Petkov <bp@alien8.de>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Christopher Lameter <cl@linux.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Elena Reshetova <elena.reshetova@intel.com>,
	"H. Peter Anvin" <hpa@zytor.com>, Ingo Molnar <mingo@redhat.com>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	Matthew Wilcox <willy@infradead.org>,
	Mark Rutland <mark.rutland@arm.com>,
	Michael Kerrisk <mtk.manpages@gmail.com>,
	Palmer Dabbelt <palmer@dabbelt.com>,
	Paul Walmsley <paul.walmsley@sifive.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Rick Edgecombe <rick.p.edgecombe@intel.com>,
	Roman Gushchin <guro@fb.com>, Shakeel Butt <shakeelb@google.com>,
	Shuah Khan <shuah@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Tycho Andersen <tycho@tycho.ws>, Will Deacon <will@kernel.org>,
	linux-api@vger.kernel.org, linux-arch@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org,
	linux-nvdimm@lists.01.org, linux-riscv@lists.infradead.org,
	x86@kernel.org, Hagen Paul Pfeifer <hagen@jauu.net>,
	Palmer Dabbelt <palmerdabbelt@google.com>
Subject: Re: [PATCH v17 07/10] mm: introduce memfd_secret system call to create "secret" memory areas
Date: Mon, 15 Feb 2021 20:20:45 +0100	[thread overview]
Message-ID: <YCrJjYmr7A2nO6lA@dhcp22.suse.cz> (raw)
In-Reply-To: <be1d821d3f0aec24ad13ca7126b4359822212eb0.camel@linux.ibm.com>

On Mon 15-02-21 10:14:43, James Bottomley wrote:
> On Mon, 2021-02-15 at 10:13 +0100, Michal Hocko wrote:
> > On Sun 14-02-21 11:21:02, James Bottomley wrote:
> > > On Sun, 2021-02-14 at 10:58 +0100, David Hildenbrand wrote:
> > > [...]
> > > > > And here we come to the question "what are the differences that
> > > > > justify a new system call?" and the answer to this is very
> > > > > subjective. And as such we can continue bikeshedding forever.
> > > > 
> > > > I think this fits into the existing memfd_create() syscall just
> > > > fine, and I heard no compelling argument why it shouldn‘t. That‘s
> > > > all I can say.
> > > 
> > > OK, so let's review history.  In the first two incarnations of the
> > > patch, it was an extension of memfd_create().  The specific
> > > objection by Kirill Shutemov was that it doesn't share any code in
> > > common with memfd and so should be a separate system call:
> > > 
> > > https://lore.kernel.org/linux-api/20200713105812.dnwtdhsuyj3xbh4f@box/
> > 
> > Thanks for the pointer. But this argument hasn't been challenged at
> > all. It hasn't been brought up that the overlap would be considerable
> > higher by the hugetlb/sealing support. And so far nobody has claimed
> > those combinations as unviable.
> 
> Kirill is actually interested in the sealing path for his KVM code so
> we took a look.  There might be a two line overlap in memfd_create for
> the seal case, but there's no real overlap in memfd_add_seals which is
> the bulk of the code.  So the best way would seem to lift the inode ...
> -> seals helpers to be non-static so they can be reused and roll our
> own add_seals.

These are implementation details which are not really relevant to the
API IMHO. 

> I can't see a use case at all for hugetlb support, so it seems to be a
> bit of an angels on pin head discussion.  However, if one were to come
> along handling it in the same way seems reasonable.

Those angels have made their way to mmap, System V shm, memfd_create and
other MM interfaces which have never envisioned when introduced. Hugetlb
pages to back guest memory is quite a common usecase so why do you think
those guests wouldn't like to see their memory be "secret"?

As I've said in my last response (YCZEGuLK94szKZDf@dhcp22.suse.cz), I am
not going to argue all these again. I have made my point and you are
free to take it or leave it.

> > > The other objection raised offlist is that if we do use
> > > memfd_create, then we have to add all the secret memory flags as an
> > > additional ioctl, whereas they can be specified on open if we do a
> > > separate system call.  The container people violently objected to
> > > the ioctl because it can't be properly analysed by seccomp and much
> > > preferred the syscall version.
> > > 
> > > Since we're dumping the uncached variant, the ioctl problem
> > > disappears but so does the possibility of ever adding it back if we
> > > take on the container peoples' objection.  This argues for a
> > > separate syscall because we can add additional features and extend
> > > the API with flags without causing anti-ioctl riots.
> > 
> > I am sorry but I do not understand this argument.
> 
> You don't understand why container guarding technology doesn't like
> ioctls?

No, I did not see where the ioctl argument came from.

[...]

> >  What kind of flags are we talking about and why would that be a
> > problem with memfd_create interface? Could you be more specific
> > please?
> 
> You mean what were the ioctl flags in the patch series linked above? 
> They were SECRETMEM_EXCLUSIVE and SECRETMEM_UNCACHED in patch 3/5. 

OK I see. How many potential modes are we talking about? A few or
potentially many?

> They were eventually dropped after v10, because of problems with
> architectural semantics, with the idea that it could be added back
> again if a compelling need arose:
> 
> https://lore.kernel.org/linux-api/20201123095432.5860-1-rppt@kernel.org/
> 
> In theory the extra flags could be multiplexed into the memfd_create
> flags like hugetlbfs is but with 32 flags and a lot already taken it
> gets messy for expansion.  When we run out of flags the first question
> people will ask is "why didn't you do separate system calls?".

OK, I do not necessarily see a lack of flag space a problem. I can be
wrong here but I do not see how that would be solved by a separate
syscall when it sounds rather forseeable that many modes supported by
memfd_create will eventually find their way to a secret memory as well.
If for no other reason, secret memory is nothing really special. It is
just a memory which is not mapped to the kernel via 1:1 mapping. That's
it. And that can be applied to any memory provided to the userspace.

But I am repeating myself again here so I better stop.
-- 
Michal Hocko
SUSE Labs


  reply	other threads:[~2021-02-15 19:20 UTC|newest]

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-08  8:49 [PATCH v17 00/10] mm: introduce memfd_secret system call to create "secret" memory areas Mike Rapoport
2021-02-08  8:49 ` [PATCH v17 01/10] mm: add definition of PMD_PAGE_ORDER Mike Rapoport
2021-02-08  8:49 ` [PATCH v17 02/10] mmap: make mlock_future_check() global Mike Rapoport
2021-02-08  8:49 ` [PATCH v17 03/10] riscv/Kconfig: make direct map manipulation options depend on MMU Mike Rapoport
2021-02-08  8:49 ` [PATCH v17 04/10] set_memory: allow set_direct_map_*_noflush() for multiple pages Mike Rapoport
2021-02-08  8:49 ` [PATCH v17 05/10] set_memory: allow querying whether set_direct_map_*() is actually enabled Mike Rapoport
2021-02-08  8:49 ` [PATCH v17 06/10] arm64: kfence: fix header inclusion Mike Rapoport
2021-02-08  8:49 ` [PATCH v17 07/10] mm: introduce memfd_secret system call to create "secret" memory areas Mike Rapoport
2021-02-08 10:49   ` Michal Hocko
2021-02-08 21:26     ` Mike Rapoport
2021-02-09  8:47       ` Michal Hocko
2021-02-09  9:09         ` Mike Rapoport
2021-02-09 13:17           ` Michal Hocko
2021-02-11  7:13             ` Mike Rapoport
2021-02-11  8:39               ` Michal Hocko
2021-02-11  9:01                 ` David Hildenbrand
2021-02-11  9:38                   ` Michal Hocko
2021-02-11  9:48                     ` David Hildenbrand
2021-02-11 10:02                     ` David Hildenbrand
2021-02-11 11:29                       ` Mike Rapoport
2021-02-11 11:27                   ` Mike Rapoport
2021-02-11 12:07                     ` David Hildenbrand
2021-02-11 23:09                       ` Mike Rapoport
2021-02-12  9:18                         ` David Hildenbrand
2021-02-14  9:19                           ` Mike Rapoport
2021-02-14  9:58                             ` David Hildenbrand
2021-02-14 19:21                               ` James Bottomley
2021-02-15  9:13                                 ` Michal Hocko
2021-02-15 18:14                                   ` James Bottomley
2021-02-15 19:20                                     ` Michal Hocko [this message]
2021-02-16 16:25                                       ` James Bottomley
2021-02-16 16:34                                         ` David Hildenbrand
2021-02-16 16:44                                           ` James Bottomley
2021-02-16 17:16                                             ` David Hildenbrand
2021-02-17 16:19                                               ` James Bottomley
2021-02-22  9:38                                                 ` David Hildenbrand
2021-02-22 10:50                                                   ` David Hildenbrand
2021-02-16 16:51                                         ` Michal Hocko
2021-02-11 11:20                 ` Mike Rapoport
2021-02-11 12:30                   ` Michal Hocko
2021-02-11 22:59                     ` Mike Rapoport
2021-02-12  9:02                       ` Michal Hocko
2021-02-08  8:49 ` [PATCH v17 08/10] PM: hibernate: disable when there are active secretmem users Mike Rapoport
2021-02-08 10:18   ` Michal Hocko
2021-02-08 10:32     ` David Hildenbrand
2021-02-08 10:51       ` Michal Hocko
2021-02-08 10:53         ` David Hildenbrand
2021-02-08 10:57           ` Michal Hocko
2021-02-08 11:13             ` David Hildenbrand
2021-02-08 11:14               ` David Hildenbrand
2021-02-08 11:26                 ` David Hildenbrand
2021-02-08 12:17                   ` Michal Hocko
2021-02-08 13:34                     ` Michal Hocko
2021-02-08 13:40                     ` David Hildenbrand
2021-02-08 21:28     ` Mike Rapoport
2021-02-22  7:34   ` Matthew Garrett
2021-02-22 10:23     ` Mike Rapoport
2021-02-22 18:27       ` Matthew Garrett
2021-02-22 19:17       ` Dan Williams
2021-02-22 19:21         ` James Bottomley
2021-02-08  8:49 ` [PATCH v17 09/10] arch, mm: wire up memfd_secret system call where relevant Mike Rapoport
2021-02-08  8:49 ` [PATCH v17 10/10] secretmem: test: add basic selftest for memfd_secret(2) Mike Rapoport
2021-02-08  9:27 ` [PATCH v17 00/10] mm: introduce memfd_secret system call to create "secret" memory areas David Hildenbrand
2021-02-08 21:13   ` Mike Rapoport
2021-02-08 21:38     ` David Hildenbrand
2021-02-09  8:59       ` Michal Hocko
2021-02-09  9:15         ` David Hildenbrand
2021-02-09  9:53           ` Michal Hocko
2021-02-09 10:23             ` David Hildenbrand
2021-02-09 10:30               ` David Hildenbrand
2021-02-09 13:25               ` Michal Hocko
2021-02-09 16:17                 ` David Hildenbrand
2021-02-09 20:08                   ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YCrJjYmr7A2nO6lA@dhcp22.suse.cz \
    --to=mhocko@suse.com \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=bp@alien8.de \
    --cc=catalin.marinas@arm.com \
    --cc=cl@linux.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=elena.reshetova@intel.com \
    --cc=guro@fb.com \
    --cc=hagen@jauu.net \
    --cc=hpa@zytor.com \
    --cc=jejb@linux.ibm.com \
    --cc=kirill@shutemov.name \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=luto@kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mingo@redhat.com \
    --cc=mtk.manpages@gmail.com \
    --cc=palmer@dabbelt.com \
    --cc=palmerdabbelt@google.com \
    --cc=paul.walmsley@sifive.com \
    --cc=peterz@infradead.org \
    --cc=rick.p.edgecombe@intel.com \
    --cc=rppt@kernel.org \
    --cc=rppt@linux.ibm.com \
    --cc=shakeelb@google.com \
    --cc=shuah@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=tycho@tycho.ws \
    --cc=viro@zeniv.linux.org.uk \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).