All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@suse.com>
To: James Bottomley <jejb@linux.ibm.com>
Cc: David Hildenbrand <david@redhat.com>,
	Mike Rapoport <rppt@kernel.org>,
	Mike Rapoport <rppt@linux.ibm.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Andy Lutomirski <luto@kernel.org>, Arnd Bergmann <arnd@arndb.de>,
	Borislav Petkov <bp@alien8.de>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Christopher Lameter <cl@linux.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Elena Reshetova <elena.reshetova@intel.com>,
	"H. Peter Anvin" <hpa@zytor.com>, Ingo Molnar <mingo@redhat.com>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	Matthew Wilcox <willy@infradead.org>,
	Mark Rutland <mark.rutland@arm.com>,
	Michael Kerrisk <mtk.manpages@gmail.com>,
	Palmer Dabbelt <palmer@dabbelt.com>,
	Paul Walmsley <paul.walmsley@sifive.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Rick Edgecombe <rick.p.edgecombe@intel.com>,
	Roman Gushchin <guro@fb.com>, Shakeel Butt <shakeelb@google.com>,
	Shuah Khan <shuah@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Tycho Ander sen <tycho@tycho.ws>, Will Deacon <will@kernel.org>,
	linux-api@vger.kernel.org, linux-arch@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org,
	linux-nvdimm@lists.01.org, linux-riscv@lists.infradead.org,
	x86@kernel.org, Hagen Paul Pfeifer <hagen@jauu.net>,
	Palmer Dabbelt <palmerdabbelt@google.com>
Subject: Re: [PATCH v17 07/10] mm: introduce memfd_secret system call to create "secret" memory areas
Date: Mon, 15 Feb 2021 20:20:45 +0100	[thread overview]
Message-ID: <YCrJjYmr7A2nO6lA@dhcp22.suse.cz> (raw)
In-Reply-To: <be1d821d3f0aec24ad13ca7126b4359822212eb0.camel@linux.ibm.com>

On Mon 15-02-21 10:14:43, James Bottomley wrote:
> On Mon, 2021-02-15 at 10:13 +0100, Michal Hocko wrote:
> > On Sun 14-02-21 11:21:02, James Bottomley wrote:
> > > On Sun, 2021-02-14 at 10:58 +0100, David Hildenbrand wrote:
> > > [...]
> > > > > And here we come to the question "what are the differences that
> > > > > justify a new system call?" and the answer to this is very
> > > > > subjective. And as such we can continue bikeshedding forever.
> > > > 
> > > > I think this fits into the existing memfd_create() syscall just
> > > > fine, and I heard no compelling argument why it shouldn‘t. That‘s
> > > > all I can say.
> > > 
> > > OK, so let's review history.  In the first two incarnations of the
> > > patch, it was an extension of memfd_create().  The specific
> > > objection by Kirill Shutemov was that it doesn't share any code in
> > > common with memfd and so should be a separate system call:
> > > 
> > > https://lore.kernel.org/linux-api/20200713105812.dnwtdhsuyj3xbh4f@box/
> > 
> > Thanks for the pointer. But this argument hasn't been challenged at
> > all. It hasn't been brought up that the overlap would be considerable
> > higher by the hugetlb/sealing support. And so far nobody has claimed
> > those combinations as unviable.
> 
> Kirill is actually interested in the sealing path for his KVM code so
> we took a look.  There might be a two line overlap in memfd_create for
> the seal case, but there's no real overlap in memfd_add_seals which is
> the bulk of the code.  So the best way would seem to lift the inode ...
> -> seals helpers to be non-static so they can be reused and roll our
> own add_seals.

These are implementation details which are not really relevant to the
API IMHO. 

> I can't see a use case at all for hugetlb support, so it seems to be a
> bit of an angels on pin head discussion.  However, if one were to come
> along handling it in the same way seems reasonable.

Those angels have made their way to mmap, System V shm, memfd_create and
other MM interfaces which have never envisioned when introduced. Hugetlb
pages to back guest memory is quite a common usecase so why do you think
those guests wouldn't like to see their memory be "secret"?

As I've said in my last response (YCZEGuLK94szKZDf@dhcp22.suse.cz), I am
not going to argue all these again. I have made my point and you are
free to take it or leave it.

> > > The other objection raised offlist is that if we do use
> > > memfd_create, then we have to add all the secret memory flags as an
> > > additional ioctl, whereas they can be specified on open if we do a
> > > separate system call.  The container people violently objected to
> > > the ioctl because it can't be properly analysed by seccomp and much
> > > preferred the syscall version.
> > > 
> > > Since we're dumping the uncached variant, the ioctl problem
> > > disappears but so does the possibility of ever adding it back if we
> > > take on the container peoples' objection.  This argues for a
> > > separate syscall because we can add additional features and extend
> > > the API with flags without causing anti-ioctl riots.
> > 
> > I am sorry but I do not understand this argument.
> 
> You don't understand why container guarding technology doesn't like
> ioctls?

No, I did not see where the ioctl argument came from.

[...]

> >  What kind of flags are we talking about and why would that be a
> > problem with memfd_create interface? Could you be more specific
> > please?
> 
> You mean what were the ioctl flags in the patch series linked above? 
> They were SECRETMEM_EXCLUSIVE and SECRETMEM_UNCACHED in patch 3/5. 

OK I see. How many potential modes are we talking about? A few or
potentially many?

> They were eventually dropped after v10, because of problems with
> architectural semantics, with the idea that it could be added back
> again if a compelling need arose:
> 
> https://lore.kernel.org/linux-api/20201123095432.5860-1-rppt@kernel.org/
> 
> In theory the extra flags could be multiplexed into the memfd_create
> flags like hugetlbfs is but with 32 flags and a lot already taken it
> gets messy for expansion.  When we run out of flags the first question
> people will ask is "why didn't you do separate system calls?".

OK, I do not necessarily see a lack of flag space a problem. I can be
wrong here but I do not see how that would be solved by a separate
syscall when it sounds rather forseeable that many modes supported by
memfd_create will eventually find their way to a secret memory as well.
If for no other reason, secret memory is nothing really special. It is
just a memory which is not mapped to the kernel via 1:1 mapping. That's
it. And that can be applied to any memory provided to the userspace.

But I am repeating myself again here so I better stop.
-- 
Michal Hocko
SUSE Labs
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

WARNING: multiple messages have this Message-ID (diff)
From: Michal Hocko <mhocko@suse.com>
To: James Bottomley <jejb@linux.ibm.com>
Cc: David Hildenbrand <david@redhat.com>,
	Mike Rapoport <rppt@kernel.org>,
	Mike Rapoport <rppt@linux.ibm.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Andy Lutomirski <luto@kernel.org>, Arnd Bergmann <arnd@arndb.de>,
	Borislav Petkov <bp@alien8.de>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Christopher Lameter <cl@linux.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Elena Reshetova <elena.reshetova@intel.com>,
	"H. Peter Anvin" <hpa@zytor.com>, Ingo Molnar <mingo@redhat.com>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	Matthew Wilcox <willy@infradead.org>,
	Mark Rutland <mark.rutland@arm.com>,
	Michael Kerrisk <mtk.manpages@gmail.com>,
	Palmer Dabbelt <palmer@dabbelt.com>,
	Paul Walmsley <paul.walmsley@sifive.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Rick Edgecombe <rick.p.edgecombe@intel.com>,
	Roman Gushchin <guro@fb.com>, Shakeel Butt <shakeelb@google.com>,
	Shuah Khan <shuah@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Tycho Andersen <tycho@tycho.ws>, Will Deacon <will@kernel.org>,
	linux-api@vger.kernel.org, linux-arch@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org,
	linux-nvdimm@lists.01.org, linux-riscv@lists.infradead.org,
	x86@kernel.org, Hagen Paul Pfeifer <hagen@jauu.net>,
	Palmer Dabbelt <palmerdabbelt@google.com>
Subject: Re: [PATCH v17 07/10] mm: introduce memfd_secret system call to create "secret" memory areas
Date: Mon, 15 Feb 2021 20:20:45 +0100	[thread overview]
Message-ID: <YCrJjYmr7A2nO6lA@dhcp22.suse.cz> (raw)
In-Reply-To: <be1d821d3f0aec24ad13ca7126b4359822212eb0.camel@linux.ibm.com>

On Mon 15-02-21 10:14:43, James Bottomley wrote:
> On Mon, 2021-02-15 at 10:13 +0100, Michal Hocko wrote:
> > On Sun 14-02-21 11:21:02, James Bottomley wrote:
> > > On Sun, 2021-02-14 at 10:58 +0100, David Hildenbrand wrote:
> > > [...]
> > > > > And here we come to the question "what are the differences that
> > > > > justify a new system call?" and the answer to this is very
> > > > > subjective. And as such we can continue bikeshedding forever.
> > > > 
> > > > I think this fits into the existing memfd_create() syscall just
> > > > fine, and I heard no compelling argument why it shouldn‘t. That‘s
> > > > all I can say.
> > > 
> > > OK, so let's review history.  In the first two incarnations of the
> > > patch, it was an extension of memfd_create().  The specific
> > > objection by Kirill Shutemov was that it doesn't share any code in
> > > common with memfd and so should be a separate system call:
> > > 
> > > https://lore.kernel.org/linux-api/20200713105812.dnwtdhsuyj3xbh4f@box/
> > 
> > Thanks for the pointer. But this argument hasn't been challenged at
> > all. It hasn't been brought up that the overlap would be considerable
> > higher by the hugetlb/sealing support. And so far nobody has claimed
> > those combinations as unviable.
> 
> Kirill is actually interested in the sealing path for his KVM code so
> we took a look.  There might be a two line overlap in memfd_create for
> the seal case, but there's no real overlap in memfd_add_seals which is
> the bulk of the code.  So the best way would seem to lift the inode ...
> -> seals helpers to be non-static so they can be reused and roll our
> own add_seals.

These are implementation details which are not really relevant to the
API IMHO. 

> I can't see a use case at all for hugetlb support, so it seems to be a
> bit of an angels on pin head discussion.  However, if one were to come
> along handling it in the same way seems reasonable.

Those angels have made their way to mmap, System V shm, memfd_create and
other MM interfaces which have never envisioned when introduced. Hugetlb
pages to back guest memory is quite a common usecase so why do you think
those guests wouldn't like to see their memory be "secret"?

As I've said in my last response (YCZEGuLK94szKZDf@dhcp22.suse.cz), I am
not going to argue all these again. I have made my point and you are
free to take it or leave it.

> > > The other objection raised offlist is that if we do use
> > > memfd_create, then we have to add all the secret memory flags as an
> > > additional ioctl, whereas they can be specified on open if we do a
> > > separate system call.  The container people violently objected to
> > > the ioctl because it can't be properly analysed by seccomp and much
> > > preferred the syscall version.
> > > 
> > > Since we're dumping the uncached variant, the ioctl problem
> > > disappears but so does the possibility of ever adding it back if we
> > > take on the container peoples' objection.  This argues for a
> > > separate syscall because we can add additional features and extend
> > > the API with flags without causing anti-ioctl riots.
> > 
> > I am sorry but I do not understand this argument.
> 
> You don't understand why container guarding technology doesn't like
> ioctls?

No, I did not see where the ioctl argument came from.

[...]

> >  What kind of flags are we talking about and why would that be a
> > problem with memfd_create interface? Could you be more specific
> > please?
> 
> You mean what were the ioctl flags in the patch series linked above? 
> They were SECRETMEM_EXCLUSIVE and SECRETMEM_UNCACHED in patch 3/5. 

OK I see. How many potential modes are we talking about? A few or
potentially many?

> They were eventually dropped after v10, because of problems with
> architectural semantics, with the idea that it could be added back
> again if a compelling need arose:
> 
> https://lore.kernel.org/linux-api/20201123095432.5860-1-rppt@kernel.org/
> 
> In theory the extra flags could be multiplexed into the memfd_create
> flags like hugetlbfs is but with 32 flags and a lot already taken it
> gets messy for expansion.  When we run out of flags the first question
> people will ask is "why didn't you do separate system calls?".

OK, I do not necessarily see a lack of flag space a problem. I can be
wrong here but I do not see how that would be solved by a separate
syscall when it sounds rather forseeable that many modes supported by
memfd_create will eventually find their way to a secret memory as well.
If for no other reason, secret memory is nothing really special. It is
just a memory which is not mapped to the kernel via 1:1 mapping. That's
it. And that can be applied to any memory provided to the userspace.

But I am repeating myself again here so I better stop.
-- 
Michal Hocko
SUSE Labs

WARNING: multiple messages have this Message-ID (diff)
From: Michal Hocko <mhocko@suse.com>
To: James Bottomley <jejb@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>,
	David Hildenbrand <david@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	linux-mm@kvack.org, linux-kselftest@vger.kernel.org,
	"H. Peter Anvin" <hpa@zytor.com>,
	Christopher Lameter <cl@linux.com>, Shuah Khan <shuah@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Elena Reshetova <elena.reshetova@intel.com>,
	linux-arch@vger.kernel.org, Tycho Andersen <tycho@tycho.ws>,
	linux-nvdimm@lists.01.org, Will Deacon <will@kernel.org>,
	x86@kernel.org, Matthew Wilcox <willy@infradead.org>,
	Mike Rapoport <rppt@linux.ibm.com>,
	Ingo Molnar <mingo@redhat.com>,
	Michael Kerrisk <mtk.manpages@gmail.com>,
	Palmer Dabbelt <palmerdabbelt@google.com>,
	Arnd Bergmann <arnd@arndb.de>,
	Hagen Paul Pfeifer <hagen@jauu.net>,
	Borislav Petkov <bp@alien8.de>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Andy Lutomirski <luto@kernel.org>,
	Paul Walmsley <paul.walmsley@sifive.com>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	Dan Williams <dan.j.williams@intel.com>,
	linux-arm-kernel@lists.infradead.org, linux-api@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org,
	Palmer Dabbelt <palmer@dabbelt.com>,
	linux-fsdevel@vger.kernel.org, Shakeel Butt <shakeelb@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Rick Edgecombe <rick.p.edgecombe@intel.com>,
	Roman Gushchin <guro@fb.com>, Mike Rapoport <rppt@kernel.org>
Subject: Re: [PATCH v17 07/10] mm: introduce memfd_secret system call to create "secret" memory areas
Date: Mon, 15 Feb 2021 20:20:45 +0100	[thread overview]
Message-ID: <YCrJjYmr7A2nO6lA@dhcp22.suse.cz> (raw)
In-Reply-To: <be1d821d3f0aec24ad13ca7126b4359822212eb0.camel@linux.ibm.com>

On Mon 15-02-21 10:14:43, James Bottomley wrote:
> On Mon, 2021-02-15 at 10:13 +0100, Michal Hocko wrote:
> > On Sun 14-02-21 11:21:02, James Bottomley wrote:
> > > On Sun, 2021-02-14 at 10:58 +0100, David Hildenbrand wrote:
> > > [...]
> > > > > And here we come to the question "what are the differences that
> > > > > justify a new system call?" and the answer to this is very
> > > > > subjective. And as such we can continue bikeshedding forever.
> > > > 
> > > > I think this fits into the existing memfd_create() syscall just
> > > > fine, and I heard no compelling argument why it shouldn‘t. That‘s
> > > > all I can say.
> > > 
> > > OK, so let's review history.  In the first two incarnations of the
> > > patch, it was an extension of memfd_create().  The specific
> > > objection by Kirill Shutemov was that it doesn't share any code in
> > > common with memfd and so should be a separate system call:
> > > 
> > > https://lore.kernel.org/linux-api/20200713105812.dnwtdhsuyj3xbh4f@box/
> > 
> > Thanks for the pointer. But this argument hasn't been challenged at
> > all. It hasn't been brought up that the overlap would be considerable
> > higher by the hugetlb/sealing support. And so far nobody has claimed
> > those combinations as unviable.
> 
> Kirill is actually interested in the sealing path for his KVM code so
> we took a look.  There might be a two line overlap in memfd_create for
> the seal case, but there's no real overlap in memfd_add_seals which is
> the bulk of the code.  So the best way would seem to lift the inode ...
> -> seals helpers to be non-static so they can be reused and roll our
> own add_seals.

These are implementation details which are not really relevant to the
API IMHO. 

> I can't see a use case at all for hugetlb support, so it seems to be a
> bit of an angels on pin head discussion.  However, if one were to come
> along handling it in the same way seems reasonable.

Those angels have made their way to mmap, System V shm, memfd_create and
other MM interfaces which have never envisioned when introduced. Hugetlb
pages to back guest memory is quite a common usecase so why do you think
those guests wouldn't like to see their memory be "secret"?

As I've said in my last response (YCZEGuLK94szKZDf@dhcp22.suse.cz), I am
not going to argue all these again. I have made my point and you are
free to take it or leave it.

> > > The other objection raised offlist is that if we do use
> > > memfd_create, then we have to add all the secret memory flags as an
> > > additional ioctl, whereas they can be specified on open if we do a
> > > separate system call.  The container people violently objected to
> > > the ioctl because it can't be properly analysed by seccomp and much
> > > preferred the syscall version.
> > > 
> > > Since we're dumping the uncached variant, the ioctl problem
> > > disappears but so does the possibility of ever adding it back if we
> > > take on the container peoples' objection.  This argues for a
> > > separate syscall because we can add additional features and extend
> > > the API with flags without causing anti-ioctl riots.
> > 
> > I am sorry but I do not understand this argument.
> 
> You don't understand why container guarding technology doesn't like
> ioctls?

No, I did not see where the ioctl argument came from.

[...]

> >  What kind of flags are we talking about and why would that be a
> > problem with memfd_create interface? Could you be more specific
> > please?
> 
> You mean what were the ioctl flags in the patch series linked above? 
> They were SECRETMEM_EXCLUSIVE and SECRETMEM_UNCACHED in patch 3/5. 

OK I see. How many potential modes are we talking about? A few or
potentially many?

> They were eventually dropped after v10, because of problems with
> architectural semantics, with the idea that it could be added back
> again if a compelling need arose:
> 
> https://lore.kernel.org/linux-api/20201123095432.5860-1-rppt@kernel.org/
> 
> In theory the extra flags could be multiplexed into the memfd_create
> flags like hugetlbfs is but with 32 flags and a lot already taken it
> gets messy for expansion.  When we run out of flags the first question
> people will ask is "why didn't you do separate system calls?".

OK, I do not necessarily see a lack of flag space a problem. I can be
wrong here but I do not see how that would be solved by a separate
syscall when it sounds rather forseeable that many modes supported by
memfd_create will eventually find their way to a secret memory as well.
If for no other reason, secret memory is nothing really special. It is
just a memory which is not mapped to the kernel via 1:1 mapping. That's
it. And that can be applied to any memory provided to the userspace.

But I am repeating myself again here so I better stop.
-- 
Michal Hocko
SUSE Labs

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

WARNING: multiple messages have this Message-ID (diff)
From: Michal Hocko <mhocko@suse.com>
To: James Bottomley <jejb@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>,
	David Hildenbrand <david@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	linux-mm@kvack.org, linux-kselftest@vger.kernel.org,
	"H. Peter Anvin" <hpa@zytor.com>,
	Christopher Lameter <cl@linux.com>, Shuah Khan <shuah@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Elena Reshetova <elena.reshetova@intel.com>,
	linux-arch@vger.kernel.org, Tycho Andersen <tycho@tycho.ws>,
	linux-nvdimm@lists.01.org, Will Deacon <will@kernel.org>,
	x86@kernel.org, Matthew Wilcox <willy@infradead.org>,
	Mike Rapoport <rppt@linux.ibm.com>,
	Ingo Molnar <mingo@redhat.com>,
	Michael Kerrisk <mtk.manpages@gmail.com>,
	Palmer Dabbelt <palmerdabbelt@google.com>,
	Arnd Bergmann <arnd@arndb.de>,
	Hagen Paul Pfeifer <hagen@jauu.net>,
	Borislav Petkov <bp@alien8.de>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Andy Lutomirski <luto@kernel.org>,
	Paul Walmsley <paul.walmsley@sifive.com>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	Dan Williams <dan.j.williams@intel.com>,
	linux-arm-kernel@lists.infradead.org, linux-api@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org,
	Palmer Dabbelt <palmer@dabbelt.com>,
	linux-fsdevel@vger.kernel.org, Shakeel Butt <shakeelb@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Rick Edgecombe <rick.p.edgecombe@intel.com>,
	Roman Gushchin <guro@fb.com>, Mike Rapoport <rppt@kernel.org>
Subject: Re: [PATCH v17 07/10] mm: introduce memfd_secret system call to create "secret" memory areas
Date: Mon, 15 Feb 2021 20:20:45 +0100	[thread overview]
Message-ID: <YCrJjYmr7A2nO6lA@dhcp22.suse.cz> (raw)
In-Reply-To: <be1d821d3f0aec24ad13ca7126b4359822212eb0.camel@linux.ibm.com>

On Mon 15-02-21 10:14:43, James Bottomley wrote:
> On Mon, 2021-02-15 at 10:13 +0100, Michal Hocko wrote:
> > On Sun 14-02-21 11:21:02, James Bottomley wrote:
> > > On Sun, 2021-02-14 at 10:58 +0100, David Hildenbrand wrote:
> > > [...]
> > > > > And here we come to the question "what are the differences that
> > > > > justify a new system call?" and the answer to this is very
> > > > > subjective. And as such we can continue bikeshedding forever.
> > > > 
> > > > I think this fits into the existing memfd_create() syscall just
> > > > fine, and I heard no compelling argument why it shouldn‘t. That‘s
> > > > all I can say.
> > > 
> > > OK, so let's review history.  In the first two incarnations of the
> > > patch, it was an extension of memfd_create().  The specific
> > > objection by Kirill Shutemov was that it doesn't share any code in
> > > common with memfd and so should be a separate system call:
> > > 
> > > https://lore.kernel.org/linux-api/20200713105812.dnwtdhsuyj3xbh4f@box/
> > 
> > Thanks for the pointer. But this argument hasn't been challenged at
> > all. It hasn't been brought up that the overlap would be considerable
> > higher by the hugetlb/sealing support. And so far nobody has claimed
> > those combinations as unviable.
> 
> Kirill is actually interested in the sealing path for his KVM code so
> we took a look.  There might be a two line overlap in memfd_create for
> the seal case, but there's no real overlap in memfd_add_seals which is
> the bulk of the code.  So the best way would seem to lift the inode ...
> -> seals helpers to be non-static so they can be reused and roll our
> own add_seals.

These are implementation details which are not really relevant to the
API IMHO. 

> I can't see a use case at all for hugetlb support, so it seems to be a
> bit of an angels on pin head discussion.  However, if one were to come
> along handling it in the same way seems reasonable.

Those angels have made their way to mmap, System V shm, memfd_create and
other MM interfaces which have never envisioned when introduced. Hugetlb
pages to back guest memory is quite a common usecase so why do you think
those guests wouldn't like to see their memory be "secret"?

As I've said in my last response (YCZEGuLK94szKZDf@dhcp22.suse.cz), I am
not going to argue all these again. I have made my point and you are
free to take it or leave it.

> > > The other objection raised offlist is that if we do use
> > > memfd_create, then we have to add all the secret memory flags as an
> > > additional ioctl, whereas they can be specified on open if we do a
> > > separate system call.  The container people violently objected to
> > > the ioctl because it can't be properly analysed by seccomp and much
> > > preferred the syscall version.
> > > 
> > > Since we're dumping the uncached variant, the ioctl problem
> > > disappears but so does the possibility of ever adding it back if we
> > > take on the container peoples' objection.  This argues for a
> > > separate syscall because we can add additional features and extend
> > > the API with flags without causing anti-ioctl riots.
> > 
> > I am sorry but I do not understand this argument.
> 
> You don't understand why container guarding technology doesn't like
> ioctls?

No, I did not see where the ioctl argument came from.

[...]

> >  What kind of flags are we talking about and why would that be a
> > problem with memfd_create interface? Could you be more specific
> > please?
> 
> You mean what were the ioctl flags in the patch series linked above? 
> They were SECRETMEM_EXCLUSIVE and SECRETMEM_UNCACHED in patch 3/5. 

OK I see. How many potential modes are we talking about? A few or
potentially many?

> They were eventually dropped after v10, because of problems with
> architectural semantics, with the idea that it could be added back
> again if a compelling need arose:
> 
> https://lore.kernel.org/linux-api/20201123095432.5860-1-rppt@kernel.org/
> 
> In theory the extra flags could be multiplexed into the memfd_create
> flags like hugetlbfs is but with 32 flags and a lot already taken it
> gets messy for expansion.  When we run out of flags the first question
> people will ask is "why didn't you do separate system calls?".

OK, I do not necessarily see a lack of flag space a problem. I can be
wrong here but I do not see how that would be solved by a separate
syscall when it sounds rather forseeable that many modes supported by
memfd_create will eventually find their way to a secret memory as well.
If for no other reason, secret memory is nothing really special. It is
just a memory which is not mapped to the kernel via 1:1 mapping. That's
it. And that can be applied to any memory provided to the userspace.

But I am repeating myself again here so I better stop.
-- 
Michal Hocko
SUSE Labs

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2021-02-15 19:20 UTC|newest]

Thread overview: 293+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-08  8:49 [PATCH v17 00/10] mm: introduce memfd_secret system call to create "secret" memory areas Mike Rapoport
2021-02-08  8:49 ` Mike Rapoport
2021-02-08  8:49 ` Mike Rapoport
2021-02-08  8:49 ` Mike Rapoport
2021-02-08  8:49 ` [PATCH v17 01/10] mm: add definition of PMD_PAGE_ORDER Mike Rapoport
2021-02-08  8:49   ` Mike Rapoport
2021-02-08  8:49   ` Mike Rapoport
2021-02-08  8:49   ` Mike Rapoport
2021-02-08  8:49 ` [PATCH v17 02/10] mmap: make mlock_future_check() global Mike Rapoport
2021-02-08  8:49   ` Mike Rapoport
2021-02-08  8:49   ` Mike Rapoport
2021-02-08  8:49   ` Mike Rapoport
2021-02-08  8:49 ` [PATCH v17 03/10] riscv/Kconfig: make direct map manipulation options depend on MMU Mike Rapoport
2021-02-08  8:49   ` Mike Rapoport
2021-02-08  8:49   ` Mike Rapoport
2021-02-08  8:49   ` Mike Rapoport
2021-02-08  8:49 ` [PATCH v17 04/10] set_memory: allow set_direct_map_*_noflush() for multiple pages Mike Rapoport
2021-02-08  8:49   ` Mike Rapoport
2021-02-08  8:49   ` Mike Rapoport
2021-02-08  8:49   ` Mike Rapoport
2021-02-08  8:49 ` [PATCH v17 05/10] set_memory: allow querying whether set_direct_map_*() is actually enabled Mike Rapoport
2021-02-08  8:49   ` Mike Rapoport
2021-02-08  8:49   ` Mike Rapoport
2021-02-08  8:49   ` Mike Rapoport
2021-02-08  8:49 ` [PATCH v17 06/10] arm64: kfence: fix header inclusion Mike Rapoport
2021-02-08  8:49   ` Mike Rapoport
2021-02-08  8:49   ` Mike Rapoport
2021-02-08  8:49   ` Mike Rapoport
2021-02-08  8:49 ` [PATCH v17 07/10] mm: introduce memfd_secret system call to create "secret" memory areas Mike Rapoport
2021-02-08  8:49   ` Mike Rapoport
2021-02-08  8:49   ` Mike Rapoport
2021-02-08  8:49   ` Mike Rapoport
2021-02-08 10:49   ` Michal Hocko
2021-02-08 10:49     ` Michal Hocko
2021-02-08 10:49     ` Michal Hocko
2021-02-08 10:49     ` Michal Hocko
2021-02-08 21:26     ` Mike Rapoport
2021-02-08 21:26       ` Mike Rapoport
2021-02-08 21:26       ` Mike Rapoport
2021-02-08 21:26       ` Mike Rapoport
2021-02-09  8:47       ` Michal Hocko
2021-02-09  8:47         ` Michal Hocko
2021-02-09  8:47         ` Michal Hocko
2021-02-09  8:47         ` Michal Hocko
2021-02-09  9:09         ` Mike Rapoport
2021-02-09  9:09           ` Mike Rapoport
2021-02-09  9:09           ` Mike Rapoport
2021-02-09  9:09           ` Mike Rapoport
2021-02-09 13:17           ` Michal Hocko
2021-02-09 13:17             ` Michal Hocko
2021-02-09 13:17             ` Michal Hocko
2021-02-09 13:17             ` Michal Hocko
2021-02-11  7:13             ` Mike Rapoport
2021-02-11  7:13               ` Mike Rapoport
2021-02-11  7:13               ` Mike Rapoport
2021-02-11  7:13               ` Mike Rapoport
2021-02-11  8:39               ` Michal Hocko
2021-02-11  8:39                 ` Michal Hocko
2021-02-11  8:39                 ` Michal Hocko
2021-02-11  8:39                 ` Michal Hocko
2021-02-11  9:01                 ` David Hildenbrand
2021-02-11  9:01                   ` David Hildenbrand
2021-02-11  9:01                   ` David Hildenbrand
2021-02-11  9:01                   ` David Hildenbrand
2021-02-11  9:38                   ` Michal Hocko
2021-02-11  9:38                     ` Michal Hocko
2021-02-11  9:38                     ` Michal Hocko
2021-02-11  9:38                     ` Michal Hocko
2021-02-11  9:48                     ` David Hildenbrand
2021-02-11  9:48                       ` David Hildenbrand
2021-02-11  9:48                       ` David Hildenbrand
2021-02-11  9:48                       ` David Hildenbrand
2021-02-11 10:02                     ` David Hildenbrand
2021-02-11 10:02                       ` David Hildenbrand
2021-02-11 10:02                       ` David Hildenbrand
2021-02-11 10:02                       ` David Hildenbrand
2021-02-11 11:29                       ` Mike Rapoport
2021-02-11 11:29                         ` Mike Rapoport
2021-02-11 11:29                         ` Mike Rapoport
2021-02-11 11:29                         ` Mike Rapoport
2021-02-11 11:27                   ` Mike Rapoport
2021-02-11 11:27                     ` Mike Rapoport
2021-02-11 11:27                     ` Mike Rapoport
2021-02-11 11:27                     ` Mike Rapoport
2021-02-11 12:07                     ` David Hildenbrand
2021-02-11 12:07                       ` David Hildenbrand
2021-02-11 12:07                       ` David Hildenbrand
2021-02-11 12:07                       ` David Hildenbrand
2021-02-11 23:09                       ` Mike Rapoport
2021-02-11 23:09                         ` Mike Rapoport
2021-02-11 23:09                         ` Mike Rapoport
2021-02-11 23:09                         ` Mike Rapoport
2021-02-12  9:18                         ` David Hildenbrand
2021-02-12  9:18                           ` David Hildenbrand
2021-02-12  9:18                           ` David Hildenbrand
2021-02-12  9:18                           ` David Hildenbrand
2021-02-14  9:19                           ` Mike Rapoport
2021-02-14  9:19                             ` Mike Rapoport
2021-02-14  9:19                             ` Mike Rapoport
2021-02-14  9:19                             ` Mike Rapoport
2021-02-14  9:58                             ` David Hildenbrand
2021-02-14  9:58                               ` David Hildenbrand
2021-02-14  9:58                               ` David Hildenbrand
2021-02-14  9:58                               ` David Hildenbrand
2021-02-14 19:21                               ` James Bottomley
2021-02-14 19:21                                 ` James Bottomley
2021-02-14 19:21                                 ` James Bottomley
2021-02-14 19:21                                 ` James Bottomley
2021-02-15  9:13                                 ` Michal Hocko
2021-02-15  9:13                                   ` Michal Hocko
2021-02-15  9:13                                   ` Michal Hocko
2021-02-15  9:13                                   ` Michal Hocko
2021-02-15 18:14                                   ` James Bottomley
2021-02-15 18:14                                     ` James Bottomley
2021-02-15 18:14                                     ` James Bottomley
2021-02-15 18:14                                     ` James Bottomley
2021-02-15 19:20                                     ` Michal Hocko [this message]
2021-02-15 19:20                                       ` Michal Hocko
2021-02-15 19:20                                       ` Michal Hocko
2021-02-15 19:20                                       ` Michal Hocko
2021-02-16 16:25                                       ` James Bottomley
2021-02-16 16:25                                         ` James Bottomley
2021-02-16 16:25                                         ` James Bottomley
2021-02-16 16:25                                         ` James Bottomley
2021-02-16 16:34                                         ` David Hildenbrand
2021-02-16 16:34                                           ` David Hildenbrand
2021-02-16 16:34                                           ` David Hildenbrand
2021-02-16 16:34                                           ` David Hildenbrand
2021-02-16 16:44                                           ` James Bottomley
2021-02-16 16:44                                             ` James Bottomley
2021-02-16 16:44                                             ` James Bottomley
2021-02-16 16:44                                             ` James Bottomley
2021-02-16 17:16                                             ` David Hildenbrand
2021-02-16 17:16                                               ` David Hildenbrand
2021-02-16 17:16                                               ` David Hildenbrand
2021-02-16 17:16                                               ` David Hildenbrand
2021-02-17 16:19                                               ` James Bottomley
2021-02-17 16:19                                                 ` James Bottomley
2021-02-17 16:19                                                 ` James Bottomley
2021-02-17 16:19                                                 ` James Bottomley
2021-02-22  9:38                                                 ` David Hildenbrand
2021-02-22  9:38                                                   ` David Hildenbrand
2021-02-22  9:38                                                   ` David Hildenbrand
2021-02-22  9:38                                                   ` David Hildenbrand
2021-02-22 10:50                                                   ` David Hildenbrand
2021-02-22 10:50                                                     ` David Hildenbrand
2021-02-22 10:50                                                     ` David Hildenbrand
2021-02-22 10:50                                                     ` David Hildenbrand
2021-02-16 16:51                                         ` Michal Hocko
2021-02-16 16:51                                           ` Michal Hocko
2021-02-16 16:51                                           ` Michal Hocko
2021-02-16 16:51                                           ` Michal Hocko
2021-02-11 11:20                 ` Mike Rapoport
2021-02-11 11:20                   ` Mike Rapoport
2021-02-11 11:20                   ` Mike Rapoport
2021-02-11 11:20                   ` Mike Rapoport
2021-02-11 12:30                   ` Michal Hocko
2021-02-11 12:30                     ` Michal Hocko
2021-02-11 12:30                     ` Michal Hocko
2021-02-11 12:30                     ` Michal Hocko
2021-02-11 22:59                     ` Mike Rapoport
2021-02-11 22:59                       ` Mike Rapoport
2021-02-11 22:59                       ` Mike Rapoport
2021-02-11 22:59                       ` Mike Rapoport
2021-02-12  9:02                       ` Michal Hocko
2021-02-12  9:02                         ` Michal Hocko
2021-02-12  9:02                         ` Michal Hocko
2021-02-12  9:02                         ` Michal Hocko
2021-02-08  8:49 ` [PATCH v17 08/10] PM: hibernate: disable when there are active secretmem users Mike Rapoport
2021-02-08  8:49   ` Mike Rapoport
2021-02-08  8:49   ` Mike Rapoport
2021-02-08  8:49   ` Mike Rapoport
2021-02-08 10:18   ` Michal Hocko
2021-02-08 10:18     ` Michal Hocko
2021-02-08 10:18     ` Michal Hocko
2021-02-08 10:18     ` Michal Hocko
2021-02-08 10:32     ` David Hildenbrand
2021-02-08 10:32       ` David Hildenbrand
2021-02-08 10:32       ` David Hildenbrand
2021-02-08 10:32       ` David Hildenbrand
2021-02-08 10:51       ` Michal Hocko
2021-02-08 10:51         ` Michal Hocko
2021-02-08 10:51         ` Michal Hocko
2021-02-08 10:51         ` Michal Hocko
2021-02-08 10:53         ` David Hildenbrand
2021-02-08 10:53           ` David Hildenbrand
2021-02-08 10:53           ` David Hildenbrand
2021-02-08 10:53           ` David Hildenbrand
2021-02-08 10:57           ` Michal Hocko
2021-02-08 10:57             ` Michal Hocko
2021-02-08 10:57             ` Michal Hocko
2021-02-08 10:57             ` Michal Hocko
2021-02-08 11:13             ` David Hildenbrand
2021-02-08 11:13               ` David Hildenbrand
2021-02-08 11:13               ` David Hildenbrand
2021-02-08 11:13               ` David Hildenbrand
2021-02-08 11:14               ` David Hildenbrand
2021-02-08 11:14                 ` David Hildenbrand
2021-02-08 11:14                 ` David Hildenbrand
2021-02-08 11:14                 ` David Hildenbrand
2021-02-08 11:26                 ` David Hildenbrand
2021-02-08 11:26                   ` David Hildenbrand
2021-02-08 11:26                   ` David Hildenbrand
2021-02-08 11:26                   ` David Hildenbrand
2021-02-08 12:17                   ` Michal Hocko
2021-02-08 12:17                     ` Michal Hocko
2021-02-08 12:17                     ` Michal Hocko
2021-02-08 12:17                     ` Michal Hocko
2021-02-08 13:34                     ` Michal Hocko
2021-02-08 13:34                       ` Michal Hocko
2021-02-08 13:34                       ` Michal Hocko
2021-02-08 13:34                       ` Michal Hocko
2021-02-08 13:40                     ` David Hildenbrand
2021-02-08 13:40                       ` David Hildenbrand
2021-02-08 13:40                       ` David Hildenbrand
2021-02-08 13:40                       ` David Hildenbrand
2021-02-08 21:28     ` Mike Rapoport
2021-02-08 21:28       ` Mike Rapoport
2021-02-08 21:28       ` Mike Rapoport
2021-02-08 21:28       ` Mike Rapoport
2021-02-22  7:34   ` Matthew Garrett
2021-02-22  7:34     ` Matthew Garrett
2021-02-22  7:34     ` Matthew Garrett
2021-02-22  7:34     ` Matthew Garrett
2021-02-22 10:23     ` Mike Rapoport
2021-02-22 10:23       ` Mike Rapoport
2021-02-22 10:23       ` Mike Rapoport
2021-02-22 10:23       ` Mike Rapoport
2021-02-22 18:27       ` Matthew Garrett
2021-02-22 18:27         ` Matthew Garrett
2021-02-22 18:27         ` Matthew Garrett
2021-02-22 18:27         ` Matthew Garrett
2021-02-22 19:17       ` Dan Williams
2021-02-22 19:17         ` Dan Williams
2021-02-22 19:17         ` Dan Williams
2021-02-22 19:17         ` Dan Williams
2021-02-22 19:17         ` Dan Williams
2021-02-22 19:21         ` James Bottomley
2021-02-22 19:21           ` James Bottomley
2021-02-22 19:21           ` James Bottomley
2021-02-22 19:21           ` James Bottomley
2021-02-08  8:49 ` [PATCH v17 09/10] arch, mm: wire up memfd_secret system call where relevant Mike Rapoport
2021-02-08  8:49   ` Mike Rapoport
2021-02-08  8:49   ` Mike Rapoport
2021-02-08  8:49   ` Mike Rapoport
2021-02-08  8:49 ` [PATCH v17 10/10] secretmem: test: add basic selftest for memfd_secret(2) Mike Rapoport
2021-02-08  8:49   ` Mike Rapoport
2021-02-08  8:49   ` Mike Rapoport
2021-02-08  8:49   ` Mike Rapoport
2021-02-08  9:27 ` [PATCH v17 00/10] mm: introduce memfd_secret system call to create "secret" memory areas David Hildenbrand
2021-02-08  9:27   ` David Hildenbrand
2021-02-08  9:27   ` David Hildenbrand
2021-02-08  9:27   ` David Hildenbrand
2021-02-08 21:13   ` Mike Rapoport
2021-02-08 21:13     ` Mike Rapoport
2021-02-08 21:13     ` Mike Rapoport
2021-02-08 21:13     ` Mike Rapoport
2021-02-08 21:38     ` David Hildenbrand
2021-02-08 21:38       ` David Hildenbrand
2021-02-08 21:38       ` David Hildenbrand
2021-02-08 21:38       ` David Hildenbrand
2021-02-09  8:59       ` Michal Hocko
2021-02-09  8:59         ` Michal Hocko
2021-02-09  8:59         ` Michal Hocko
2021-02-09  8:59         ` Michal Hocko
2021-02-09  9:15         ` David Hildenbrand
2021-02-09  9:15           ` David Hildenbrand
2021-02-09  9:15           ` David Hildenbrand
2021-02-09  9:15           ` David Hildenbrand
2021-02-09  9:53           ` Michal Hocko
2021-02-09  9:53             ` Michal Hocko
2021-02-09  9:53             ` Michal Hocko
2021-02-09  9:53             ` Michal Hocko
2021-02-09 10:23             ` David Hildenbrand
2021-02-09 10:23               ` David Hildenbrand
2021-02-09 10:23               ` David Hildenbrand
2021-02-09 10:23               ` David Hildenbrand
2021-02-09 10:30               ` David Hildenbrand
2021-02-09 10:30                 ` David Hildenbrand
2021-02-09 10:30                 ` David Hildenbrand
2021-02-09 10:30                 ` David Hildenbrand
2021-02-09 13:25               ` Michal Hocko
2021-02-09 13:25                 ` Michal Hocko
2021-02-09 13:25                 ` Michal Hocko
2021-02-09 13:25                 ` Michal Hocko
2021-02-09 16:17                 ` David Hildenbrand
2021-02-09 16:17                   ` David Hildenbrand
2021-02-09 16:17                   ` David Hildenbrand
2021-02-09 16:17                   ` David Hildenbrand
2021-02-09 20:08                   ` Michal Hocko
2021-02-09 20:08                     ` Michal Hocko
2021-02-09 20:08                     ` Michal Hocko
2021-02-09 20:08                     ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YCrJjYmr7A2nO6lA@dhcp22.suse.cz \
    --to=mhocko@suse.com \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=bp@alien8.de \
    --cc=catalin.marinas@arm.com \
    --cc=cl@linux.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=elena.reshetova@intel.com \
    --cc=guro@fb.com \
    --cc=hagen@jauu.net \
    --cc=hpa@zytor.com \
    --cc=jejb@linux.ibm.com \
    --cc=kirill@shutemov.name \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=luto@kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mingo@redhat.com \
    --cc=mtk.manpages@gmail.com \
    --cc=palmer@dabbelt.com \
    --cc=palmerdabbelt@google.com \
    --cc=paul.walmsley@sifive.com \
    --cc=peterz@infradead.org \
    --cc=rick.p.edgecombe@intel.com \
    --cc=rppt@kernel.org \
    --cc=rppt@linux.ibm.com \
    --cc=shakeelb@google.com \
    --cc=shuah@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=tycho@tycho.ws \
    --cc=viro@zeniv.linux.org.uk \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.