All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@suse.com>
To: Mike Rapoport <rppt@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Andy Lutomirski <luto@kernel.org>, Arnd Bergmann <arnd@arndb.de>,
	Borislav Petkov <bp@alien8.de>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Christopher Lameter <cl@linux.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	David Hildenbrand <david@redhat.com>,
	Elena Reshetova <elena.reshetova@intel.com>,
	"H. Peter Anvin" <hpa@zytor.com>, Ingo Molnar <mingo@redhat.com>,
	James Bottomley <jejb@linux.ibm.com>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	Matthew Wilcox <willy@infradead.org>,
	Mark Rutland <mark.rutland@arm.com>,
	Mike Rapoport <rppt@linux.ibm.com>,
	Michael Kerrisk <mtk.manpages@gmail.com>,
	Palmer Dabbelt <palmer@dabbelt.com>,
	Paul Walmsley <paul.walmsley@sifive.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Rick Edgecombe <rick.p.edgecombe@intel.com>,
	Roman Gushchin <guro@fb.com>, Shakeel Butt <shakeelb@google.com>,
	Shuah Khan <shuah@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Tycho  Andersen <tycho@tycho.ws>, Will Deacon <will@kernel.org>,
	linux-api@vger.kernel.org, linux-arch@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org,
	linux-nvdimm@lists.01.org, linux-riscv@lists.infradead.org,
	x86@kernel.org, Hagen Paul Pfeifer <hagen@jauu.net>,
	Palmer Dabbelt <palmerdabbelt@google.com>
Subject: Re: [PATCH v16 06/11] mm: introduce memfd_secret system call to create "secret" memory areas
Date: Tue, 26 Jan 2021 10:49:03 +0100	[thread overview]
Message-ID: <20210126094903.GI827@dhcp22.suse.cz> (raw)
In-Reply-To: <20210126092011.GP6332@kernel.org>

On Tue 26-01-21 11:20:11, Mike Rapoport wrote:
> On Tue, Jan 26, 2021 at 10:00:13AM +0100, Michal Hocko wrote:
> > On Tue 26-01-21 10:33:11, Mike Rapoport wrote:
> > > On Tue, Jan 26, 2021 at 08:16:14AM +0100, Michal Hocko wrote:
> > > > On Mon 25-01-21 23:36:18, Mike Rapoport wrote:
> > > > > On Mon, Jan 25, 2021 at 06:01:22PM +0100, Michal Hocko wrote:
> > > > > > On Thu 21-01-21 14:27:18, Mike Rapoport wrote:
> > > > > > > From: Mike Rapoport <rppt@linux.ibm.com>
> > > > > > > 
> > > > > > > Introduce "memfd_secret" system call with the ability to create memory
> > > > > > > areas visible only in the context of the owning process and not mapped not
> > > > > > > only to other processes but in the kernel page tables as well.
> > > > > > > 
> > > > > > > The user will create a file descriptor using the memfd_secret() system
> > > > > > > call. The memory areas created by mmap() calls from this file descriptor
> > > > > > > will be unmapped from the kernel direct map and they will be only mapped in
> > > > > > > the page table of the owning mm.
> > > > > > > 
> > > > > > > The secret memory remains accessible in the process context using uaccess
> > > > > > > primitives, but it is not accessible using direct/linear map addresses.
> > > > > > > 
> > > > > > > Functions in the follow_page()/get_user_page() family will refuse to return
> > > > > > > a page that belongs to the secret memory area.
> > > > > > > 
> > > > > > > A page that was a part of the secret memory area is cleared when it is
> > > > > > > freed.
> > > > > > > 
> > > > > > > The following example demonstrates creation of a secret mapping (error
> > > > > > > handling is omitted):
> > > > > > > 
> > > > > > > 	fd = memfd_secret(0);
> > > > > > > 	ftruncate(fd, MAP_SIZE);
> > > > > > > 	ptr = mmap(NULL, MAP_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
> > > > > > 
> > > > > > I do not see any access control or permission model for this feature.
> > > > > > Is this feature generally safe to anybody?
> > > > > 
> > > > > The mappings obey memlock limit. Besides, this feature should be enabled
> > > > > explicitly at boot with the kernel parameter that says what is the maximal
> > > > > memory size secretmem can consume.
> > > > 
> > > > Why is such a model sufficient and future proof? I mean even when it has
> > > > to be enabled by an admin it is still all or nothing approach. Mlock
> > > > limit is not really useful because it is per mm rather than per user.
> > > > 
> > > > Is there any reason why this is allowed for non-privileged processes?
> > > > Maybe this has been discussed in the past but is there any reason why
> > > > this cannot be done by a special device which will allow to provide at
> > > > least some permission policy?
> > >  
> > > Why this should not be allowed for non-privileged processes? This behaves
> > > similarly to mlocked memory, so I don't see a reason why secretmem should
> > > have different permissions model.
> > 
> > Because appart from the reclaim aspect it fragments the direct mapping
> > IIUC. That might have an impact on all others, right?
> 
> It does fragment the direct map, but first it only splits 1G pages to 2M
> pages and as was discussed several times already it's not that clear which
> page size in the direct map is the best and this is very much workload
> dependent.

I do appreciate this has been discussed but this changelog is not
specific on any of that reasoning and I am pretty sure nobody will
remember details in few years in the future. Also some numbers would be
appropriate.

> These are the results of the benchmarks I've run with the default direct
> mapping covered with 1G pages, with disabled 1G pages using "nogbpages" in
> the kernel command line and with the entire direct map forced to use 4K
> pages using a simple patch to arch/x86/mm/init.c.
> 
> https://docs.google.com/spreadsheets/d/1tdD-cu8e93vnfGsTFxZ5YdaEfs2E1GELlvWNOGkJV2U/edit?usp=sharing

A good start for the data I am asking above.
-- 
Michal Hocko
SUSE Labs
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

WARNING: multiple messages have this Message-ID (diff)
From: Michal Hocko <mhocko@suse.com>
To: Mike Rapoport <rppt@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Andy Lutomirski <luto@kernel.org>, Arnd Bergmann <arnd@arndb.de>,
	Borislav Petkov <bp@alien8.de>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Christopher Lameter <cl@linux.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	David Hildenbrand <david@redhat.com>,
	Elena Reshetova <elena.reshetova@intel.com>,
	"H. Peter Anvin" <hpa@zytor.com>, Ingo Molnar <mingo@redhat.com>,
	James Bottomley <jejb@linux.ibm.com>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	Matthew Wilcox <willy@infradead.org>,
	Mark Rutland <mark.rutland@arm.com>,
	Mike Rapoport <rppt@linux.ibm.com>,
	Michael Kerrisk <mtk.manpages@gmail.com>,
	Palmer Dabbelt <palmer@dabbelt.com>,
	Paul Walmsley <paul.walmsley@sifive.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Rick Edgecombe <rick.p.edgecombe@intel.com>,
	Roman Gushchin <guro@fb.com>, Shakeel Butt <shakeelb@google.com>,
	Shuah Khan <shuah@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Tycho Andersen <tycho@tycho.ws>, Will Deacon <will@kernel.org>,
	linux-api@vger.kernel.org, linux-arch@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org,
	linux-nvdimm@lists.01.org, linux-riscv@lists.infradead.org,
	x86@kernel.org, Hagen Paul Pfeifer <hagen@jauu.net>,
	Palmer Dabbelt <palmerdabbelt@google.com>
Subject: Re: [PATCH v16 06/11] mm: introduce memfd_secret system call to create "secret" memory areas
Date: Tue, 26 Jan 2021 10:49:03 +0100	[thread overview]
Message-ID: <20210126094903.GI827@dhcp22.suse.cz> (raw)
In-Reply-To: <20210126092011.GP6332@kernel.org>

On Tue 26-01-21 11:20:11, Mike Rapoport wrote:
> On Tue, Jan 26, 2021 at 10:00:13AM +0100, Michal Hocko wrote:
> > On Tue 26-01-21 10:33:11, Mike Rapoport wrote:
> > > On Tue, Jan 26, 2021 at 08:16:14AM +0100, Michal Hocko wrote:
> > > > On Mon 25-01-21 23:36:18, Mike Rapoport wrote:
> > > > > On Mon, Jan 25, 2021 at 06:01:22PM +0100, Michal Hocko wrote:
> > > > > > On Thu 21-01-21 14:27:18, Mike Rapoport wrote:
> > > > > > > From: Mike Rapoport <rppt@linux.ibm.com>
> > > > > > > 
> > > > > > > Introduce "memfd_secret" system call with the ability to create memory
> > > > > > > areas visible only in the context of the owning process and not mapped not
> > > > > > > only to other processes but in the kernel page tables as well.
> > > > > > > 
> > > > > > > The user will create a file descriptor using the memfd_secret() system
> > > > > > > call. The memory areas created by mmap() calls from this file descriptor
> > > > > > > will be unmapped from the kernel direct map and they will be only mapped in
> > > > > > > the page table of the owning mm.
> > > > > > > 
> > > > > > > The secret memory remains accessible in the process context using uaccess
> > > > > > > primitives, but it is not accessible using direct/linear map addresses.
> > > > > > > 
> > > > > > > Functions in the follow_page()/get_user_page() family will refuse to return
> > > > > > > a page that belongs to the secret memory area.
> > > > > > > 
> > > > > > > A page that was a part of the secret memory area is cleared when it is
> > > > > > > freed.
> > > > > > > 
> > > > > > > The following example demonstrates creation of a secret mapping (error
> > > > > > > handling is omitted):
> > > > > > > 
> > > > > > > 	fd = memfd_secret(0);
> > > > > > > 	ftruncate(fd, MAP_SIZE);
> > > > > > > 	ptr = mmap(NULL, MAP_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
> > > > > > 
> > > > > > I do not see any access control or permission model for this feature.
> > > > > > Is this feature generally safe to anybody?
> > > > > 
> > > > > The mappings obey memlock limit. Besides, this feature should be enabled
> > > > > explicitly at boot with the kernel parameter that says what is the maximal
> > > > > memory size secretmem can consume.
> > > > 
> > > > Why is such a model sufficient and future proof? I mean even when it has
> > > > to be enabled by an admin it is still all or nothing approach. Mlock
> > > > limit is not really useful because it is per mm rather than per user.
> > > > 
> > > > Is there any reason why this is allowed for non-privileged processes?
> > > > Maybe this has been discussed in the past but is there any reason why
> > > > this cannot be done by a special device which will allow to provide at
> > > > least some permission policy?
> > >  
> > > Why this should not be allowed for non-privileged processes? This behaves
> > > similarly to mlocked memory, so I don't see a reason why secretmem should
> > > have different permissions model.
> > 
> > Because appart from the reclaim aspect it fragments the direct mapping
> > IIUC. That might have an impact on all others, right?
> 
> It does fragment the direct map, but first it only splits 1G pages to 2M
> pages and as was discussed several times already it's not that clear which
> page size in the direct map is the best and this is very much workload
> dependent.

I do appreciate this has been discussed but this changelog is not
specific on any of that reasoning and I am pretty sure nobody will
remember details in few years in the future. Also some numbers would be
appropriate.

> These are the results of the benchmarks I've run with the default direct
> mapping covered with 1G pages, with disabled 1G pages using "nogbpages" in
> the kernel command line and with the entire direct map forced to use 4K
> pages using a simple patch to arch/x86/mm/init.c.
> 
> https://docs.google.com/spreadsheets/d/1tdD-cu8e93vnfGsTFxZ5YdaEfs2E1GELlvWNOGkJV2U/edit?usp=sharing

A good start for the data I am asking above.
-- 
Michal Hocko
SUSE Labs

WARNING: multiple messages have this Message-ID (diff)
From: Michal Hocko <mhocko@suse.com>
To: Mike Rapoport <rppt@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>,
	David Hildenbrand <david@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	linux-mm@kvack.org, linux-kselftest@vger.kernel.org,
	"H. Peter Anvin" <hpa@zytor.com>,
	Christopher Lameter <cl@linux.com>, Shuah Khan <shuah@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Elena Reshetova <elena.reshetova@intel.com>,
	linux-arch@vger.kernel.org, Tycho Andersen <tycho@tycho.ws>,
	linux-nvdimm@lists.01.org, Will Deacon <will@kernel.org>,
	x86@kernel.org, Matthew Wilcox <willy@infradead.org>,
	Mike Rapoport <rppt@linux.ibm.com>,
	Ingo Molnar <mingo@redhat.com>,
	Michael Kerrisk <mtk.manpages@gmail.com>,
	Palmer Dabbelt <palmerdabbelt@google.com>,
	Arnd Bergmann <arnd@arndb.de>,
	James Bottomley <jejb@linux.ibm.com>,
	Hagen Paul Pfeifer <hagen@jauu.net>,
	Borislav Petkov <bp@alien8.de>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Andy Lutomirski <luto@kernel.org>,
	Paul Walmsley <paul.walmsley@sifive.com>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	Dan Williams <dan.j.williams@intel.com>,
	linux-arm-kernel@lists.infradead.org, linux-api@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org,
	Palmer Dabbelt <palmer@dabbelt.com>,
	linux-fsdevel@vger.kernel.org, Shakeel Butt <shakeelb@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Rick Edgecombe <rick.p.edgecombe@intel.com>,
	Roman Gushchin <guro@fb.com>
Subject: Re: [PATCH v16 06/11] mm: introduce memfd_secret system call to create "secret" memory areas
Date: Tue, 26 Jan 2021 10:49:03 +0100	[thread overview]
Message-ID: <20210126094903.GI827@dhcp22.suse.cz> (raw)
In-Reply-To: <20210126092011.GP6332@kernel.org>

On Tue 26-01-21 11:20:11, Mike Rapoport wrote:
> On Tue, Jan 26, 2021 at 10:00:13AM +0100, Michal Hocko wrote:
> > On Tue 26-01-21 10:33:11, Mike Rapoport wrote:
> > > On Tue, Jan 26, 2021 at 08:16:14AM +0100, Michal Hocko wrote:
> > > > On Mon 25-01-21 23:36:18, Mike Rapoport wrote:
> > > > > On Mon, Jan 25, 2021 at 06:01:22PM +0100, Michal Hocko wrote:
> > > > > > On Thu 21-01-21 14:27:18, Mike Rapoport wrote:
> > > > > > > From: Mike Rapoport <rppt@linux.ibm.com>
> > > > > > > 
> > > > > > > Introduce "memfd_secret" system call with the ability to create memory
> > > > > > > areas visible only in the context of the owning process and not mapped not
> > > > > > > only to other processes but in the kernel page tables as well.
> > > > > > > 
> > > > > > > The user will create a file descriptor using the memfd_secret() system
> > > > > > > call. The memory areas created by mmap() calls from this file descriptor
> > > > > > > will be unmapped from the kernel direct map and they will be only mapped in
> > > > > > > the page table of the owning mm.
> > > > > > > 
> > > > > > > The secret memory remains accessible in the process context using uaccess
> > > > > > > primitives, but it is not accessible using direct/linear map addresses.
> > > > > > > 
> > > > > > > Functions in the follow_page()/get_user_page() family will refuse to return
> > > > > > > a page that belongs to the secret memory area.
> > > > > > > 
> > > > > > > A page that was a part of the secret memory area is cleared when it is
> > > > > > > freed.
> > > > > > > 
> > > > > > > The following example demonstrates creation of a secret mapping (error
> > > > > > > handling is omitted):
> > > > > > > 
> > > > > > > 	fd = memfd_secret(0);
> > > > > > > 	ftruncate(fd, MAP_SIZE);
> > > > > > > 	ptr = mmap(NULL, MAP_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
> > > > > > 
> > > > > > I do not see any access control or permission model for this feature.
> > > > > > Is this feature generally safe to anybody?
> > > > > 
> > > > > The mappings obey memlock limit. Besides, this feature should be enabled
> > > > > explicitly at boot with the kernel parameter that says what is the maximal
> > > > > memory size secretmem can consume.
> > > > 
> > > > Why is such a model sufficient and future proof? I mean even when it has
> > > > to be enabled by an admin it is still all or nothing approach. Mlock
> > > > limit is not really useful because it is per mm rather than per user.
> > > > 
> > > > Is there any reason why this is allowed for non-privileged processes?
> > > > Maybe this has been discussed in the past but is there any reason why
> > > > this cannot be done by a special device which will allow to provide at
> > > > least some permission policy?
> > >  
> > > Why this should not be allowed for non-privileged processes? This behaves
> > > similarly to mlocked memory, so I don't see a reason why secretmem should
> > > have different permissions model.
> > 
> > Because appart from the reclaim aspect it fragments the direct mapping
> > IIUC. That might have an impact on all others, right?
> 
> It does fragment the direct map, but first it only splits 1G pages to 2M
> pages and as was discussed several times already it's not that clear which
> page size in the direct map is the best and this is very much workload
> dependent.

I do appreciate this has been discussed but this changelog is not
specific on any of that reasoning and I am pretty sure nobody will
remember details in few years in the future. Also some numbers would be
appropriate.

> These are the results of the benchmarks I've run with the default direct
> mapping covered with 1G pages, with disabled 1G pages using "nogbpages" in
> the kernel command line and with the entire direct map forced to use 4K
> pages using a simple patch to arch/x86/mm/init.c.
> 
> https://docs.google.com/spreadsheets/d/1tdD-cu8e93vnfGsTFxZ5YdaEfs2E1GELlvWNOGkJV2U/edit?usp=sharing

A good start for the data I am asking above.
-- 
Michal Hocko
SUSE Labs

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

WARNING: multiple messages have this Message-ID (diff)
From: Michal Hocko <mhocko@suse.com>
To: Mike Rapoport <rppt@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>,
	David Hildenbrand <david@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	linux-mm@kvack.org, linux-kselftest@vger.kernel.org,
	"H. Peter Anvin" <hpa@zytor.com>,
	Christopher Lameter <cl@linux.com>, Shuah Khan <shuah@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Elena Reshetova <elena.reshetova@intel.com>,
	linux-arch@vger.kernel.org, Tycho Andersen <tycho@tycho.ws>,
	linux-nvdimm@lists.01.org, Will Deacon <will@kernel.org>,
	x86@kernel.org, Matthew Wilcox <willy@infradead.org>,
	Mike Rapoport <rppt@linux.ibm.com>,
	Ingo Molnar <mingo@redhat.com>,
	Michael Kerrisk <mtk.manpages@gmail.com>,
	Palmer Dabbelt <palmerdabbelt@google.com>,
	Arnd Bergmann <arnd@arndb.de>,
	James Bottomley <jejb@linux.ibm.com>,
	Hagen Paul Pfeifer <hagen@jauu.net>,
	Borislav Petkov <bp@alien8.de>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Andy Lutomirski <luto@kernel.org>,
	Paul Walmsley <paul.walmsley@sifive.com>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	Dan Williams <dan.j.williams@intel.com>,
	linux-arm-kernel@lists.infradead.org, linux-api@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org,
	Palmer Dabbelt <palmer@dabbelt.com>,
	linux-fsdevel@vger.kernel.org, Shakeel Butt <shakeelb@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Rick Edgecombe <rick.p.edgecombe@intel.com>,
	Roman Gushchin <guro@fb.com>
Subject: Re: [PATCH v16 06/11] mm: introduce memfd_secret system call to create "secret" memory areas
Date: Tue, 26 Jan 2021 10:49:03 +0100	[thread overview]
Message-ID: <20210126094903.GI827@dhcp22.suse.cz> (raw)
In-Reply-To: <20210126092011.GP6332@kernel.org>

On Tue 26-01-21 11:20:11, Mike Rapoport wrote:
> On Tue, Jan 26, 2021 at 10:00:13AM +0100, Michal Hocko wrote:
> > On Tue 26-01-21 10:33:11, Mike Rapoport wrote:
> > > On Tue, Jan 26, 2021 at 08:16:14AM +0100, Michal Hocko wrote:
> > > > On Mon 25-01-21 23:36:18, Mike Rapoport wrote:
> > > > > On Mon, Jan 25, 2021 at 06:01:22PM +0100, Michal Hocko wrote:
> > > > > > On Thu 21-01-21 14:27:18, Mike Rapoport wrote:
> > > > > > > From: Mike Rapoport <rppt@linux.ibm.com>
> > > > > > > 
> > > > > > > Introduce "memfd_secret" system call with the ability to create memory
> > > > > > > areas visible only in the context of the owning process and not mapped not
> > > > > > > only to other processes but in the kernel page tables as well.
> > > > > > > 
> > > > > > > The user will create a file descriptor using the memfd_secret() system
> > > > > > > call. The memory areas created by mmap() calls from this file descriptor
> > > > > > > will be unmapped from the kernel direct map and they will be only mapped in
> > > > > > > the page table of the owning mm.
> > > > > > > 
> > > > > > > The secret memory remains accessible in the process context using uaccess
> > > > > > > primitives, but it is not accessible using direct/linear map addresses.
> > > > > > > 
> > > > > > > Functions in the follow_page()/get_user_page() family will refuse to return
> > > > > > > a page that belongs to the secret memory area.
> > > > > > > 
> > > > > > > A page that was a part of the secret memory area is cleared when it is
> > > > > > > freed.
> > > > > > > 
> > > > > > > The following example demonstrates creation of a secret mapping (error
> > > > > > > handling is omitted):
> > > > > > > 
> > > > > > > 	fd = memfd_secret(0);
> > > > > > > 	ftruncate(fd, MAP_SIZE);
> > > > > > > 	ptr = mmap(NULL, MAP_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
> > > > > > 
> > > > > > I do not see any access control or permission model for this feature.
> > > > > > Is this feature generally safe to anybody?
> > > > > 
> > > > > The mappings obey memlock limit. Besides, this feature should be enabled
> > > > > explicitly at boot with the kernel parameter that says what is the maximal
> > > > > memory size secretmem can consume.
> > > > 
> > > > Why is such a model sufficient and future proof? I mean even when it has
> > > > to be enabled by an admin it is still all or nothing approach. Mlock
> > > > limit is not really useful because it is per mm rather than per user.
> > > > 
> > > > Is there any reason why this is allowed for non-privileged processes?
> > > > Maybe this has been discussed in the past but is there any reason why
> > > > this cannot be done by a special device which will allow to provide at
> > > > least some permission policy?
> > >  
> > > Why this should not be allowed for non-privileged processes? This behaves
> > > similarly to mlocked memory, so I don't see a reason why secretmem should
> > > have different permissions model.
> > 
> > Because appart from the reclaim aspect it fragments the direct mapping
> > IIUC. That might have an impact on all others, right?
> 
> It does fragment the direct map, but first it only splits 1G pages to 2M
> pages and as was discussed several times already it's not that clear which
> page size in the direct map is the best and this is very much workload
> dependent.

I do appreciate this has been discussed but this changelog is not
specific on any of that reasoning and I am pretty sure nobody will
remember details in few years in the future. Also some numbers would be
appropriate.

> These are the results of the benchmarks I've run with the default direct
> mapping covered with 1G pages, with disabled 1G pages using "nogbpages" in
> the kernel command line and with the entire direct map forced to use 4K
> pages using a simple patch to arch/x86/mm/init.c.
> 
> https://docs.google.com/spreadsheets/d/1tdD-cu8e93vnfGsTFxZ5YdaEfs2E1GELlvWNOGkJV2U/edit?usp=sharing

A good start for the data I am asking above.
-- 
Michal Hocko
SUSE Labs

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2021-01-26  9:49 UTC|newest]

Thread overview: 318+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-21 12:27 [PATCH v16 00/11] mm: introduce memfd_secret system call to create "secret" memory areas Mike Rapoport
2021-01-21 12:27 ` Mike Rapoport
2021-01-21 12:27 ` Mike Rapoport
2021-01-21 12:27 ` Mike Rapoport
2021-01-21 12:27 ` [PATCH v16 01/11] mm: add definition of PMD_PAGE_ORDER Mike Rapoport
2021-01-21 12:27   ` Mike Rapoport
2021-01-21 12:27   ` Mike Rapoport
2021-01-21 12:27   ` Mike Rapoport
2021-01-21 12:27 ` [PATCH v16 02/11] mmap: make mlock_future_check() global Mike Rapoport
2021-01-21 12:27   ` Mike Rapoport
2021-01-21 12:27   ` Mike Rapoport
2021-01-21 12:27   ` Mike Rapoport
2021-01-21 12:27 ` [PATCH v16 03/11] riscv/Kconfig: make direct map manipulation options depend on MMU Mike Rapoport
2021-01-21 12:27   ` Mike Rapoport
2021-01-21 12:27   ` Mike Rapoport
2021-01-21 12:27   ` Mike Rapoport
2021-01-21 12:27 ` [PATCH v16 04/11] set_memory: allow set_direct_map_*_noflush() for multiple pages Mike Rapoport
2021-01-21 12:27   ` Mike Rapoport
2021-01-21 12:27   ` Mike Rapoport
2021-01-21 12:27   ` Mike Rapoport
2021-01-21 12:27 ` [PATCH v16 05/11] set_memory: allow querying whether set_direct_map_*() is actually enabled Mike Rapoport
2021-01-21 12:27   ` Mike Rapoport
2021-01-21 12:27   ` Mike Rapoport
2021-01-21 12:27   ` Mike Rapoport
2021-01-21 12:27 ` [PATCH v16 06/11] mm: introduce memfd_secret system call to create "secret" memory areas Mike Rapoport
2021-01-21 12:27   ` Mike Rapoport
2021-01-21 12:27   ` Mike Rapoport
2021-01-21 12:27   ` Mike Rapoport
2021-01-25 17:01   ` Michal Hocko
2021-01-25 17:01     ` Michal Hocko
2021-01-25 17:01     ` Michal Hocko
2021-01-25 17:01     ` Michal Hocko
2021-01-25 21:36     ` Mike Rapoport
2021-01-25 21:36       ` Mike Rapoport
2021-01-25 21:36       ` Mike Rapoport
2021-01-25 21:36       ` Mike Rapoport
2021-01-26  7:16       ` Michal Hocko
2021-01-26  7:16         ` Michal Hocko
2021-01-26  7:16         ` Michal Hocko
2021-01-26  7:16         ` Michal Hocko
2021-01-26  8:33         ` Mike Rapoport
2021-01-26  8:33           ` Mike Rapoport
2021-01-26  8:33           ` Mike Rapoport
2021-01-26  8:33           ` Mike Rapoport
2021-01-26  9:00           ` Michal Hocko
2021-01-26  9:00             ` Michal Hocko
2021-01-26  9:00             ` Michal Hocko
2021-01-26  9:00             ` Michal Hocko
2021-01-26  9:20             ` Mike Rapoport
2021-01-26  9:20               ` Mike Rapoport
2021-01-26  9:20               ` Mike Rapoport
2021-01-26  9:20               ` Mike Rapoport
2021-01-26  9:49               ` Michal Hocko [this message]
2021-01-26  9:49                 ` Michal Hocko
2021-01-26  9:49                 ` Michal Hocko
2021-01-26  9:49                 ` Michal Hocko
2021-01-26  9:53                 ` David Hildenbrand
2021-01-26  9:53                   ` David Hildenbrand
2021-01-26  9:53                   ` David Hildenbrand
2021-01-26  9:53                   ` David Hildenbrand
2021-01-26 10:19                   ` Michal Hocko
2021-01-26 10:19                     ` Michal Hocko
2021-01-26 10:19                     ` Michal Hocko
2021-01-26 10:19                     ` Michal Hocko
2021-01-26  9:20             ` Michal Hocko
2021-01-26  9:20               ` Michal Hocko
2021-01-26  9:20               ` Michal Hocko
2021-01-26  9:20               ` Michal Hocko
2021-02-03 12:15   ` Michal Hocko
2021-02-03 12:15     ` Michal Hocko
2021-02-03 12:15     ` Michal Hocko
2021-02-03 12:15     ` Michal Hocko
2021-02-04 11:34     ` Mike Rapoport
2021-02-04 11:34       ` Mike Rapoport
2021-02-04 11:34       ` Mike Rapoport
2021-02-04 11:34       ` Mike Rapoport
2021-01-21 12:27 ` [PATCH v16 07/11] secretmem: use PMD-size pages to amortize direct map fragmentation Mike Rapoport
2021-01-21 12:27   ` Mike Rapoport
2021-01-21 12:27   ` Mike Rapoport
2021-01-21 12:27   ` Mike Rapoport
2021-01-26 11:46   ` Michal Hocko
2021-01-26 11:46     ` Michal Hocko
2021-01-26 11:46     ` Michal Hocko
2021-01-26 11:46     ` Michal Hocko
2021-01-26 11:56     ` David Hildenbrand
2021-01-26 11:56       ` David Hildenbrand
2021-01-26 11:56       ` David Hildenbrand
2021-01-26 11:56       ` David Hildenbrand
2021-01-26 12:08       ` Michal Hocko
2021-01-26 12:08         ` Michal Hocko
2021-01-26 12:08         ` Michal Hocko
2021-01-26 12:08         ` Michal Hocko
2021-01-28  9:22         ` Mike Rapoport
2021-01-28  9:22           ` Mike Rapoport
2021-01-28  9:22           ` Mike Rapoport
2021-01-28  9:22           ` Mike Rapoport
2021-01-28 13:01           ` Michal Hocko
2021-01-28 13:01             ` Michal Hocko
2021-01-28 13:01             ` Michal Hocko
2021-01-28 13:01             ` Michal Hocko
2021-01-28 13:28             ` Christoph Lameter
2021-01-28 13:28               ` Christoph Lameter
2021-01-28 13:28               ` Christoph Lameter
2021-01-28 13:28               ` Christoph Lameter
2021-01-28 13:28               ` Christoph Lameter
2021-01-28 13:49               ` Michal Hocko
2021-01-28 13:49                 ` Michal Hocko
2021-01-28 13:49                 ` Michal Hocko
2021-01-28 13:49                 ` Michal Hocko
2021-01-28 15:56                 ` Christoph Lameter
2021-01-28 15:56                   ` Christoph Lameter
2021-01-28 15:56                   ` Christoph Lameter
2021-01-28 15:56                   ` Christoph Lameter
2021-01-28 15:56                   ` Christoph Lameter
2021-01-28 16:23                   ` Michal Hocko
2021-01-28 16:23                     ` Michal Hocko
2021-01-28 16:23                     ` Michal Hocko
2021-01-28 16:23                     ` Michal Hocko
2021-01-28 15:28             ` James Bottomley
2021-01-28 15:28               ` James Bottomley
2021-01-28 15:28               ` James Bottomley
2021-01-28 15:28               ` James Bottomley
2021-01-29  7:03               ` Mike Rapoport
2021-01-29  7:03                 ` Mike Rapoport
2021-01-29  7:03                 ` Mike Rapoport
2021-01-29  7:03                 ` Mike Rapoport
2021-01-28 21:05             ` James Bottomley
2021-01-28 21:05               ` James Bottomley
2021-01-28 21:05               ` James Bottomley
2021-01-28 21:05               ` James Bottomley
2021-01-29  7:53               ` Michal Hocko
2021-01-29  7:53                 ` Michal Hocko
2021-01-29  7:53                 ` Michal Hocko
2021-01-29  7:53                 ` Michal Hocko
2021-01-29  8:23               ` Michal Hocko
2021-01-29  8:23                 ` Michal Hocko
2021-01-29  8:23                 ` Michal Hocko
2021-01-29  8:23                 ` Michal Hocko
2021-02-01 16:56                 ` James Bottomley
2021-02-01 16:56                   ` James Bottomley
2021-02-01 16:56                   ` James Bottomley
2021-02-01 16:56                   ` James Bottomley
2021-02-02  9:35                   ` Michal Hocko
2021-02-02  9:35                     ` Michal Hocko
2021-02-02  9:35                     ` Michal Hocko
2021-02-02  9:35                     ` Michal Hocko
2021-02-02 12:48                     ` Mike Rapoport
2021-02-02 12:48                       ` Mike Rapoport
2021-02-02 12:48                       ` Mike Rapoport
2021-02-02 12:48                       ` Mike Rapoport
2021-02-02 13:14                       ` David Hildenbrand
2021-02-02 13:14                         ` David Hildenbrand
2021-02-02 13:14                         ` David Hildenbrand
2021-02-02 13:14                         ` David Hildenbrand
2021-02-02 13:32                         ` Michal Hocko
2021-02-02 13:32                           ` Michal Hocko
2021-02-02 13:32                           ` Michal Hocko
2021-02-02 13:32                           ` Michal Hocko
2021-02-02 14:12                           ` David Hildenbrand
2021-02-02 14:12                             ` David Hildenbrand
2021-02-02 14:12                             ` David Hildenbrand
2021-02-02 14:12                             ` David Hildenbrand
2021-02-02 14:22                             ` Michal Hocko
2021-02-02 14:22                               ` Michal Hocko
2021-02-02 14:22                               ` Michal Hocko
2021-02-02 14:22                               ` Michal Hocko
2021-02-02 14:26                               ` David Hildenbrand
2021-02-02 14:26                                 ` David Hildenbrand
2021-02-02 14:26                                 ` David Hildenbrand
2021-02-02 14:26                                 ` David Hildenbrand
2021-02-02 14:32                                 ` Michal Hocko
2021-02-02 14:32                                   ` Michal Hocko
2021-02-02 14:32                                   ` Michal Hocko
2021-02-02 14:32                                   ` Michal Hocko
2021-02-02 14:34                                   ` David Hildenbrand
2021-02-02 14:34                                     ` David Hildenbrand
2021-02-02 14:34                                     ` David Hildenbrand
2021-02-02 14:34                                     ` David Hildenbrand
2021-02-02 18:15                                     ` Mike Rapoport
2021-02-02 18:15                                       ` Mike Rapoport
2021-02-02 18:15                                       ` Mike Rapoport
2021-02-02 18:15                                       ` Mike Rapoport
2021-02-02 18:55                                       ` James Bottomley
2021-02-02 18:55                                         ` James Bottomley
2021-02-02 18:55                                         ` James Bottomley
2021-02-02 18:55                                         ` James Bottomley
2021-02-03 12:09                                         ` Michal Hocko
2021-02-03 12:09                                           ` Michal Hocko
2021-02-03 12:09                                           ` Michal Hocko
2021-02-03 12:09                                           ` Michal Hocko
2021-02-04 11:31                                           ` Mike Rapoport
2021-02-04 11:31                                             ` Mike Rapoport
2021-02-04 11:31                                             ` Mike Rapoport
2021-02-04 11:31                                             ` Mike Rapoport
2021-02-02 13:27                       ` Michal Hocko
2021-02-02 13:27                         ` Michal Hocko
2021-02-02 13:27                         ` Michal Hocko
2021-02-02 13:27                         ` Michal Hocko
2021-02-02 19:10                         ` Mike Rapoport
2021-02-02 19:10                           ` Mike Rapoport
2021-02-02 19:10                           ` Mike Rapoport
2021-02-02 19:10                           ` Mike Rapoport
2021-02-03  9:12                           ` Michal Hocko
2021-02-03  9:12                             ` Michal Hocko
2021-02-03  9:12                             ` Michal Hocko
2021-02-03  9:12                             ` Michal Hocko
2021-02-04  9:58                             ` Mike Rapoport
2021-02-04  9:58                               ` Mike Rapoport
2021-02-04  9:58                               ` Mike Rapoport
2021-02-04  9:58                               ` Mike Rapoport
2021-02-04 13:02                               ` Michal Hocko
2021-02-04 13:02                                 ` Michal Hocko
2021-02-04 13:02                                 ` Michal Hocko
2021-02-04 13:02                                 ` Michal Hocko
2021-01-29  7:21             ` Mike Rapoport
2021-01-29  7:21               ` Mike Rapoport
2021-01-29  7:21               ` Mike Rapoport
2021-01-29  7:21               ` Mike Rapoport
2021-01-29  8:51               ` Michal Hocko
2021-01-29  8:51                 ` Michal Hocko
2021-01-29  8:51                 ` Michal Hocko
2021-01-29  8:51                 ` Michal Hocko
2021-02-02 14:42                 ` David Hildenbrand
2021-02-02 14:42                   ` David Hildenbrand
2021-02-02 14:42                   ` David Hildenbrand
2021-02-02 14:42                   ` David Hildenbrand
2021-01-21 12:27 ` [PATCH v16 08/11] secretmem: add memcg accounting Mike Rapoport
2021-01-21 12:27   ` Mike Rapoport
2021-01-21 12:27   ` Mike Rapoport
2021-01-21 12:27   ` Mike Rapoport
2021-01-25 16:17   ` Matthew Wilcox
2021-01-25 16:17     ` Matthew Wilcox
2021-01-25 16:17     ` Matthew Wilcox
2021-01-25 16:17     ` Matthew Wilcox
2021-01-25 17:18     ` Shakeel Butt
2021-01-25 17:18       ` Shakeel Butt
2021-01-25 17:18       ` Shakeel Butt
2021-01-25 17:18       ` Shakeel Butt
2021-01-25 17:18       ` Shakeel Butt
2021-01-25 21:35       ` Mike Rapoport
2021-01-25 21:35         ` Mike Rapoport
2021-01-25 21:35         ` Mike Rapoport
2021-01-25 21:35         ` Mike Rapoport
2021-01-28 15:07         ` Shakeel Butt
2021-01-28 15:07           ` Shakeel Butt
2021-01-28 15:07           ` Shakeel Butt
2021-01-28 15:07           ` Shakeel Butt
2021-01-28 15:07           ` Shakeel Butt
2021-01-25 16:54   ` Michal Hocko
2021-01-25 16:54     ` Michal Hocko
2021-01-25 16:54     ` Michal Hocko
2021-01-25 16:54     ` Michal Hocko
2021-01-25 21:38     ` Mike Rapoport
2021-01-25 21:38       ` Mike Rapoport
2021-01-25 21:38       ` Mike Rapoport
2021-01-25 21:38       ` Mike Rapoport
2021-01-26  7:31       ` Michal Hocko
2021-01-26  7:31         ` Michal Hocko
2021-01-26  7:31         ` Michal Hocko
2021-01-26  7:31         ` Michal Hocko
2021-01-26  8:56         ` Mike Rapoport
2021-01-26  8:56           ` Mike Rapoport
2021-01-26  8:56           ` Mike Rapoport
2021-01-26  8:56           ` Mike Rapoport
2021-01-26  9:15           ` Michal Hocko
2021-01-26  9:15             ` Michal Hocko
2021-01-26  9:15             ` Michal Hocko
2021-01-26  9:15             ` Michal Hocko
2021-01-26 14:48       ` Matthew Wilcox
2021-01-26 14:48         ` Matthew Wilcox
2021-01-26 14:48         ` Matthew Wilcox
2021-01-26 14:48         ` Matthew Wilcox
2021-01-26 15:05         ` Michal Hocko
2021-01-26 15:05           ` Michal Hocko
2021-01-26 15:05           ` Michal Hocko
2021-01-26 15:05           ` Michal Hocko
2021-01-27 18:42           ` Roman Gushchin
2021-01-27 18:42             ` Roman Gushchin
2021-01-27 18:42             ` Roman Gushchin
2021-01-27 18:42             ` Roman Gushchin
2021-01-28  7:58             ` Michal Hocko
2021-01-28  7:58               ` Michal Hocko
2021-01-28  7:58               ` Michal Hocko
2021-01-28  7:58               ` Michal Hocko
2021-01-28 14:05               ` Shakeel Butt
2021-01-28 14:05                 ` Shakeel Butt
2021-01-28 14:05                 ` Shakeel Butt
2021-01-28 14:05                 ` Shakeel Butt
2021-01-28 14:05                 ` Shakeel Butt
2021-01-28 14:22                 ` Michal Hocko
2021-01-28 14:22                   ` Michal Hocko
2021-01-28 14:22                   ` Michal Hocko
2021-01-28 14:22                   ` Michal Hocko
2021-01-28 14:57                   ` Shakeel Butt
2021-01-28 14:57                     ` Shakeel Butt
2021-01-28 14:57                     ` Shakeel Butt
2021-01-28 14:57                     ` Shakeel Butt
2021-01-28 14:57                     ` Shakeel Butt
2021-01-21 12:27 ` [PATCH v16 09/11] PM: hibernate: disable when there are active secretmem users Mike Rapoport
2021-01-21 12:27   ` Mike Rapoport
2021-01-21 12:27   ` Mike Rapoport
2021-01-21 12:27   ` Mike Rapoport
2021-01-21 12:27 ` [PATCH v16 10/11] arch, mm: wire up memfd_secret system call where relevant Mike Rapoport
2021-01-21 12:27   ` Mike Rapoport
2021-01-21 12:27   ` Mike Rapoport
2021-01-21 12:27   ` Mike Rapoport
2021-01-25 18:18   ` Catalin Marinas
2021-01-25 18:18     ` Catalin Marinas
2021-01-25 18:18     ` Catalin Marinas
2021-01-25 18:18     ` Catalin Marinas
2021-01-21 12:27 ` [PATCH v16 11/11] secretmem: test: add basic selftest for memfd_secret(2) Mike Rapoport
2021-01-21 12:27   ` Mike Rapoport
2021-01-21 12:27   ` Mike Rapoport
2021-01-21 12:27   ` Mike Rapoport
2021-01-21 22:18 ` [PATCH v16 00/11] mm: introduce memfd_secret system call to create "secret" memory areas Andrew Morton
2021-01-21 22:18   ` Andrew Morton
2021-01-21 22:18   ` Andrew Morton
2021-01-21 22:18   ` Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210126094903.GI827@dhcp22.suse.cz \
    --to=mhocko@suse.com \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=bp@alien8.de \
    --cc=catalin.marinas@arm.com \
    --cc=cl@linux.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=elena.reshetova@intel.com \
    --cc=guro@fb.com \
    --cc=hagen@jauu.net \
    --cc=hpa@zytor.com \
    --cc=jejb@linux.ibm.com \
    --cc=kirill@shutemov.name \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=luto@kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mingo@redhat.com \
    --cc=mtk.manpages@gmail.com \
    --cc=palmer@dabbelt.com \
    --cc=palmerdabbelt@google.com \
    --cc=paul.walmsley@sifive.com \
    --cc=peterz@infradead.org \
    --cc=rick.p.edgecombe@intel.com \
    --cc=rppt@kernel.org \
    --cc=rppt@linux.ibm.com \
    --cc=shakeelb@google.com \
    --cc=shuah@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=tycho@tycho.ws \
    --cc=viro@zeniv.linux.org.uk \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.