All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@suse.com>
To: David Hildenbrand <david@redhat.com>
Cc: Mike Rapoport <rppt@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Andy Lutomirski <luto@kernel.org>, Arnd Bergmann <arnd@arndb.de>,
	Borislav Petkov <bp@alien8.de>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Christopher Lameter <cl@linux.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Elena Reshetova <elena.reshetova@intel.com>,
	"H. Peter Anvin" <hpa@zytor.com>, Ingo Molnar <mingo@redhat.com>,
	James Bottomley <jejb@linux.ibm.com>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	Matthew Wilcox <willy@infradead.org>,
	Mark Rutland <mark.rutland@arm.com>,
	Mike Rapoport <rppt@linux.ibm.com>,
	Michael Kerrisk <mtk.manpages@gmail.com>,
	Palmer Dabbelt <palmer@dabbelt.com>,
	Paul Walmsley <paul.walmsley@sifive.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Rick Edgecombe <rick.p.edgecombe@intel.com>,
	Roman Gushchin <guro@fb.com>, Shakeel Butt <shakeelb@google.com>,
	Shuah Khan <shuah@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Tycho Ander sen <tycho@tycho.ws>, Will Deacon <will@kernel.org>,
	linux-api@vger.kernel.org, linux-arch@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org,
	linux-nvdimm@lists.01.org, linux-riscv@lists.infradead.org,
	x86@kernel.org
Subject: Re: [PATCH v17 00/10] mm: introduce memfd_secret system call to create "secret" memory areas
Date: Tue, 9 Feb 2021 10:53:29 +0100	[thread overview]
Message-ID: <YCJbmR11ikrWKaU8@dhcp22.suse.cz> (raw)
In-Reply-To: <662b5871-b461-0896-697f-5e903c23d7b9@redhat.com>

On Tue 09-02-21 10:15:17, David Hildenbrand wrote:
> On 09.02.21 09:59, Michal Hocko wrote:
> > On Mon 08-02-21 22:38:03, David Hildenbrand wrote:
> > > 
> > > > Am 08.02.2021 um 22:13 schrieb Mike Rapoport <rppt@kernel.org>:
> > > > 
> > > > On Mon, Feb 08, 2021 at 10:27:18AM +0100, David Hildenbrand wrote:
> > > > > On 08.02.21 09:49, Mike Rapoport wrote:
> > > > > 
> > > > > Some questions (and request to document the answers) as we now allow to have
> > > > > unmovable allocations all over the place and I don't see a single comment
> > > > > regarding that in the cover letter:
> > > > > 
> > > > > 1. How will the issue of plenty of unmovable allocations for user space be
> > > > > tackled in the future?
> > > > > 
> > > > > 2. How has this issue been documented? E.g., interaction with ZONE_MOVABLE
> > > > > and CMA, alloc_conig_range()/alloc_contig_pages?.
> > > > 
> > > > Secretmem sets the mappings gfp mask to GFP_HIGHUSER, so it does not
> > > > allocate movable pages at the first place.
> > > 
> > > That is not the point. Secretmem cannot go on CMA / ZONE_MOVABLE
> > > memory and behaves like long-term pinnings in that sense. This is a
> > > real issue when using a lot of sectremem.
> > 
> > A lot of unevictable memory is a concern regardless of CMA/ZONE_MOVABLE.
> > As I've said it is quite easy to land at the similar situation even with
> > tmpfs/MAP_ANON|MAP_SHARED on swapless system. Neither of the two is
> > really uncommon. It would be even worse that those would be allowed to
> > consume both CMA/ZONE_MOVABLE.
> 
> IIRC, tmpfs/MAP_ANON|MAP_SHARED memory
> a) Is movable, can land in ZONE_MOVABLE/CMA
> b) Can be limited by sizing tmpfs appropriately
> 
> AFAIK, what you describe is a problem with memory overcommit, not with zone
> imbalances (below). Or what am I missing?

It can be problem for both. If you have just too much of shm (do not
forget about MAP_SHARED|MAP_ANON which is much harder to size from an
admin POV) then migrateability doesn't really help because you need a
free memory to migrate. Without reclaimability this can easily become a
problem. That is why I am saying this is not really a new problem.
Swapless systems are not all that uncommon.
 
> > One has to be very careful when relying on CMA or movable zones. This is
> > definitely worth a comment in the kernel command line parameter
> > documentation. But this is not a new problem.
> 
> I see the following thing worth documenting:
> 
> Assume you have a system with 2GB of ZONE_NORMAL/ZONE_DMA and 4GB of
> ZONE_MOVABLE/CMA.
> 
> Assume you make use of 1.5GB of secretmem. Your system might run into OOM
> any time although you still have plenty of memory on ZONE_MOVAVLE (and even
> swap!), simply because you are making excessive use of unmovable allocations
> (for user space!) in an environment where you should not make excessive use
> of unmovable allocations (e.g., where should page tables go?).

yes, you are right of course and I am not really disputing this. But I
would argue that 2:1 Movable/Normal is something to expect problems
already. "Lowmem" allocations can easily trigger OOM even without secret
mem in the picture. It all just takes to allocate a lot of GFP_KERNEL or
even GFP_{HIGH}USER. Really, it is CMA/MOVABLE that are elephant in the
room and one has to be really careful when relying on them.
 
> The existing controls (mlock limit) don't really match the current semantics
> of that memory. I repeat it once again: secretmem *currently* resembles
> long-term pinned memory, not mlocked memory.

Well, if we had a proper user space pinning accounting then I would
agree that there is a better model to use. But we don't. And previous
attempts to achieve that have failed. So I am afraid that we do not have
much choice left than using mlock as a model.

> Things will change when
> implementing migration support for secretmem pages. Until then, the
> semantics are different and this should be spelled out.
> 
> For long-term pinnings this is kind of obvious, still we're now documenting
> it because it's dangerous to not be aware of. Secretmem behaves exactly the
> same and I think this is worth spelling out: secretmem has the potential of
> being used much more often than fairly special vfio/rdma/ ...

yeah I do agree that pinning is a problem for movable/CMA but most
people simply do not care about those. Movable is the thing for hoptlug
and a really weird fragmentation avoidance IIRC and CMA is mostly to
handle crap HW. If those are to be used along with secret mem or
longterm GUP then they will constantly bump into corner cases. Do not
take me wrong, we should be looking at those problems, we should even
document them but I do not see this as anything new. We should probably
have a central place in Documentation explaining all those problems. I
would be even happy to see an explicit note in the tunables - e.g.
configuring movable/normal in 2:1 will get you back to 32b times wrt.
low mem problems.
-- 
Michal Hocko
SUSE Labs
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

WARNING: multiple messages have this Message-ID (diff)
From: Michal Hocko <mhocko@suse.com>
To: David Hildenbrand <david@redhat.com>
Cc: Mike Rapoport <rppt@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Andy Lutomirski <luto@kernel.org>, Arnd Bergmann <arnd@arndb.de>,
	Borislav Petkov <bp@alien8.de>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Christopher Lameter <cl@linux.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Elena Reshetova <elena.reshetova@intel.com>,
	"H. Peter Anvin" <hpa@zytor.com>, Ingo Molnar <mingo@redhat.com>,
	James Bottomley <jejb@linux.ibm.com>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	Matthew Wilcox <willy@infradead.org>,
	Mark Rutland <mark.rutland@arm.com>,
	Mike Rapoport <rppt@linux.ibm.com>,
	Michael Kerrisk <mtk.manpages@gmail.com>,
	Palmer Dabbelt <palmer@dabbelt.com>,
	Paul Walmsley <paul.walmsley@sifive.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Rick Edgecombe <rick.p.edgecombe@intel.com>,
	Roman Gushchin <guro@fb.com>, Shakeel Butt <shakeelb@google.com>,
	Shuah Khan <shuah@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Tycho Andersen <tycho@tycho.ws>, Will Deacon <will@kernel.org>,
	linux-api@vger.kernel.org, linux-arch@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org,
	linux-nvdimm@lists.01.org, linux-riscv@lists.infradead.org,
	x86@kernel.org
Subject: Re: [PATCH v17 00/10] mm: introduce memfd_secret system call to create "secret" memory areas
Date: Tue, 9 Feb 2021 10:53:29 +0100	[thread overview]
Message-ID: <YCJbmR11ikrWKaU8@dhcp22.suse.cz> (raw)
In-Reply-To: <662b5871-b461-0896-697f-5e903c23d7b9@redhat.com>

On Tue 09-02-21 10:15:17, David Hildenbrand wrote:
> On 09.02.21 09:59, Michal Hocko wrote:
> > On Mon 08-02-21 22:38:03, David Hildenbrand wrote:
> > > 
> > > > Am 08.02.2021 um 22:13 schrieb Mike Rapoport <rppt@kernel.org>:
> > > > 
> > > > On Mon, Feb 08, 2021 at 10:27:18AM +0100, David Hildenbrand wrote:
> > > > > On 08.02.21 09:49, Mike Rapoport wrote:
> > > > > 
> > > > > Some questions (and request to document the answers) as we now allow to have
> > > > > unmovable allocations all over the place and I don't see a single comment
> > > > > regarding that in the cover letter:
> > > > > 
> > > > > 1. How will the issue of plenty of unmovable allocations for user space be
> > > > > tackled in the future?
> > > > > 
> > > > > 2. How has this issue been documented? E.g., interaction with ZONE_MOVABLE
> > > > > and CMA, alloc_conig_range()/alloc_contig_pages?.
> > > > 
> > > > Secretmem sets the mappings gfp mask to GFP_HIGHUSER, so it does not
> > > > allocate movable pages at the first place.
> > > 
> > > That is not the point. Secretmem cannot go on CMA / ZONE_MOVABLE
> > > memory and behaves like long-term pinnings in that sense. This is a
> > > real issue when using a lot of sectremem.
> > 
> > A lot of unevictable memory is a concern regardless of CMA/ZONE_MOVABLE.
> > As I've said it is quite easy to land at the similar situation even with
> > tmpfs/MAP_ANON|MAP_SHARED on swapless system. Neither of the two is
> > really uncommon. It would be even worse that those would be allowed to
> > consume both CMA/ZONE_MOVABLE.
> 
> IIRC, tmpfs/MAP_ANON|MAP_SHARED memory
> a) Is movable, can land in ZONE_MOVABLE/CMA
> b) Can be limited by sizing tmpfs appropriately
> 
> AFAIK, what you describe is a problem with memory overcommit, not with zone
> imbalances (below). Or what am I missing?

It can be problem for both. If you have just too much of shm (do not
forget about MAP_SHARED|MAP_ANON which is much harder to size from an
admin POV) then migrateability doesn't really help because you need a
free memory to migrate. Without reclaimability this can easily become a
problem. That is why I am saying this is not really a new problem.
Swapless systems are not all that uncommon.
 
> > One has to be very careful when relying on CMA or movable zones. This is
> > definitely worth a comment in the kernel command line parameter
> > documentation. But this is not a new problem.
> 
> I see the following thing worth documenting:
> 
> Assume you have a system with 2GB of ZONE_NORMAL/ZONE_DMA and 4GB of
> ZONE_MOVABLE/CMA.
> 
> Assume you make use of 1.5GB of secretmem. Your system might run into OOM
> any time although you still have plenty of memory on ZONE_MOVAVLE (and even
> swap!), simply because you are making excessive use of unmovable allocations
> (for user space!) in an environment where you should not make excessive use
> of unmovable allocations (e.g., where should page tables go?).

yes, you are right of course and I am not really disputing this. But I
would argue that 2:1 Movable/Normal is something to expect problems
already. "Lowmem" allocations can easily trigger OOM even without secret
mem in the picture. It all just takes to allocate a lot of GFP_KERNEL or
even GFP_{HIGH}USER. Really, it is CMA/MOVABLE that are elephant in the
room and one has to be really careful when relying on them.
 
> The existing controls (mlock limit) don't really match the current semantics
> of that memory. I repeat it once again: secretmem *currently* resembles
> long-term pinned memory, not mlocked memory.

Well, if we had a proper user space pinning accounting then I would
agree that there is a better model to use. But we don't. And previous
attempts to achieve that have failed. So I am afraid that we do not have
much choice left than using mlock as a model.

> Things will change when
> implementing migration support for secretmem pages. Until then, the
> semantics are different and this should be spelled out.
> 
> For long-term pinnings this is kind of obvious, still we're now documenting
> it because it's dangerous to not be aware of. Secretmem behaves exactly the
> same and I think this is worth spelling out: secretmem has the potential of
> being used much more often than fairly special vfio/rdma/ ...

yeah I do agree that pinning is a problem for movable/CMA but most
people simply do not care about those. Movable is the thing for hoptlug
and a really weird fragmentation avoidance IIRC and CMA is mostly to
handle crap HW. If those are to be used along with secret mem or
longterm GUP then they will constantly bump into corner cases. Do not
take me wrong, we should be looking at those problems, we should even
document them but I do not see this as anything new. We should probably
have a central place in Documentation explaining all those problems. I
would be even happy to see an explicit note in the tunables - e.g.
configuring movable/normal in 2:1 will get you back to 32b times wrt.
low mem problems.
-- 
Michal Hocko
SUSE Labs

WARNING: multiple messages have this Message-ID (diff)
From: Michal Hocko <mhocko@suse.com>
To: David Hildenbrand <david@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	linux-mm@kvack.org, linux-kselftest@vger.kernel.org,
	"H. Peter Anvin" <hpa@zytor.com>,
	Christopher Lameter <cl@linux.com>, Shuah Khan <shuah@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Elena Reshetova <elena.reshetova@intel.com>,
	linux-arch@vger.kernel.org, Tycho Andersen <tycho@tycho.ws>,
	linux-nvdimm@lists.01.org, Will Deacon <will@kernel.org>,
	x86@kernel.org, Matthew Wilcox <willy@infradead.org>,
	Mike Rapoport <rppt@linux.ibm.com>,
	Ingo Molnar <mingo@redhat.com>,
	Michael Kerrisk <mtk.manpages@gmail.com>,
	Arnd Bergmann <arnd@arndb.de>,
	James Bottomley <jejb@linux.ibm.com>,
	Borislav Petkov <bp@alien8.de>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Andy Lutomirski <luto@kernel.org>,
	Paul Walmsley <paul.walmsley@sifive.com>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	Dan Williams <dan.j.williams@intel.com>,
	linux-arm-kernel@lists.infradead.org, linux-api@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org,
	Palmer Dabbelt <palmer@dabbelt.com>,
	linux-fsdevel@vger.kernel.org, Shakeel Butt <shakeelb@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Rick Edgecombe <rick.p.edgecombe@intel.com>,
	Roman Gushchin <guro@fb.com>, Mike Rapoport <rppt@kernel.org>
Subject: Re: [PATCH v17 00/10] mm: introduce memfd_secret system call to create "secret" memory areas
Date: Tue, 9 Feb 2021 10:53:29 +0100	[thread overview]
Message-ID: <YCJbmR11ikrWKaU8@dhcp22.suse.cz> (raw)
In-Reply-To: <662b5871-b461-0896-697f-5e903c23d7b9@redhat.com>

On Tue 09-02-21 10:15:17, David Hildenbrand wrote:
> On 09.02.21 09:59, Michal Hocko wrote:
> > On Mon 08-02-21 22:38:03, David Hildenbrand wrote:
> > > 
> > > > Am 08.02.2021 um 22:13 schrieb Mike Rapoport <rppt@kernel.org>:
> > > > 
> > > > On Mon, Feb 08, 2021 at 10:27:18AM +0100, David Hildenbrand wrote:
> > > > > On 08.02.21 09:49, Mike Rapoport wrote:
> > > > > 
> > > > > Some questions (and request to document the answers) as we now allow to have
> > > > > unmovable allocations all over the place and I don't see a single comment
> > > > > regarding that in the cover letter:
> > > > > 
> > > > > 1. How will the issue of plenty of unmovable allocations for user space be
> > > > > tackled in the future?
> > > > > 
> > > > > 2. How has this issue been documented? E.g., interaction with ZONE_MOVABLE
> > > > > and CMA, alloc_conig_range()/alloc_contig_pages?.
> > > > 
> > > > Secretmem sets the mappings gfp mask to GFP_HIGHUSER, so it does not
> > > > allocate movable pages at the first place.
> > > 
> > > That is not the point. Secretmem cannot go on CMA / ZONE_MOVABLE
> > > memory and behaves like long-term pinnings in that sense. This is a
> > > real issue when using a lot of sectremem.
> > 
> > A lot of unevictable memory is a concern regardless of CMA/ZONE_MOVABLE.
> > As I've said it is quite easy to land at the similar situation even with
> > tmpfs/MAP_ANON|MAP_SHARED on swapless system. Neither of the two is
> > really uncommon. It would be even worse that those would be allowed to
> > consume both CMA/ZONE_MOVABLE.
> 
> IIRC, tmpfs/MAP_ANON|MAP_SHARED memory
> a) Is movable, can land in ZONE_MOVABLE/CMA
> b) Can be limited by sizing tmpfs appropriately
> 
> AFAIK, what you describe is a problem with memory overcommit, not with zone
> imbalances (below). Or what am I missing?

It can be problem for both. If you have just too much of shm (do not
forget about MAP_SHARED|MAP_ANON which is much harder to size from an
admin POV) then migrateability doesn't really help because you need a
free memory to migrate. Without reclaimability this can easily become a
problem. That is why I am saying this is not really a new problem.
Swapless systems are not all that uncommon.
 
> > One has to be very careful when relying on CMA or movable zones. This is
> > definitely worth a comment in the kernel command line parameter
> > documentation. But this is not a new problem.
> 
> I see the following thing worth documenting:
> 
> Assume you have a system with 2GB of ZONE_NORMAL/ZONE_DMA and 4GB of
> ZONE_MOVABLE/CMA.
> 
> Assume you make use of 1.5GB of secretmem. Your system might run into OOM
> any time although you still have plenty of memory on ZONE_MOVAVLE (and even
> swap!), simply because you are making excessive use of unmovable allocations
> (for user space!) in an environment where you should not make excessive use
> of unmovable allocations (e.g., where should page tables go?).

yes, you are right of course and I am not really disputing this. But I
would argue that 2:1 Movable/Normal is something to expect problems
already. "Lowmem" allocations can easily trigger OOM even without secret
mem in the picture. It all just takes to allocate a lot of GFP_KERNEL or
even GFP_{HIGH}USER. Really, it is CMA/MOVABLE that are elephant in the
room and one has to be really careful when relying on them.
 
> The existing controls (mlock limit) don't really match the current semantics
> of that memory. I repeat it once again: secretmem *currently* resembles
> long-term pinned memory, not mlocked memory.

Well, if we had a proper user space pinning accounting then I would
agree that there is a better model to use. But we don't. And previous
attempts to achieve that have failed. So I am afraid that we do not have
much choice left than using mlock as a model.

> Things will change when
> implementing migration support for secretmem pages. Until then, the
> semantics are different and this should be spelled out.
> 
> For long-term pinnings this is kind of obvious, still we're now documenting
> it because it's dangerous to not be aware of. Secretmem behaves exactly the
> same and I think this is worth spelling out: secretmem has the potential of
> being used much more often than fairly special vfio/rdma/ ...

yeah I do agree that pinning is a problem for movable/CMA but most
people simply do not care about those. Movable is the thing for hoptlug
and a really weird fragmentation avoidance IIRC and CMA is mostly to
handle crap HW. If those are to be used along with secret mem or
longterm GUP then they will constantly bump into corner cases. Do not
take me wrong, we should be looking at those problems, we should even
document them but I do not see this as anything new. We should probably
have a central place in Documentation explaining all those problems. I
would be even happy to see an explicit note in the tunables - e.g.
configuring movable/normal in 2:1 will get you back to 32b times wrt.
low mem problems.
-- 
Michal Hocko
SUSE Labs

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

WARNING: multiple messages have this Message-ID (diff)
From: Michal Hocko <mhocko@suse.com>
To: David Hildenbrand <david@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	linux-mm@kvack.org, linux-kselftest@vger.kernel.org,
	"H. Peter Anvin" <hpa@zytor.com>,
	Christopher Lameter <cl@linux.com>, Shuah Khan <shuah@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Elena Reshetova <elena.reshetova@intel.com>,
	linux-arch@vger.kernel.org, Tycho Andersen <tycho@tycho.ws>,
	linux-nvdimm@lists.01.org, Will Deacon <will@kernel.org>,
	x86@kernel.org, Matthew Wilcox <willy@infradead.org>,
	Mike Rapoport <rppt@linux.ibm.com>,
	Ingo Molnar <mingo@redhat.com>,
	Michael Kerrisk <mtk.manpages@gmail.com>,
	Arnd Bergmann <arnd@arndb.de>,
	James Bottomley <jejb@linux.ibm.com>,
	Borislav Petkov <bp@alien8.de>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Andy Lutomirski <luto@kernel.org>,
	Paul Walmsley <paul.walmsley@sifive.com>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	Dan Williams <dan.j.williams@intel.com>,
	linux-arm-kernel@lists.infradead.org, linux-api@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org,
	Palmer Dabbelt <palmer@dabbelt.com>,
	linux-fsdevel@vger.kernel.org, Shakeel Butt <shakeelb@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Rick Edgecombe <rick.p.edgecombe@intel.com>,
	Roman Gushchin <guro@fb.com>, Mike Rapoport <rppt@kernel.org>
Subject: Re: [PATCH v17 00/10] mm: introduce memfd_secret system call to create "secret" memory areas
Date: Tue, 9 Feb 2021 10:53:29 +0100	[thread overview]
Message-ID: <YCJbmR11ikrWKaU8@dhcp22.suse.cz> (raw)
In-Reply-To: <662b5871-b461-0896-697f-5e903c23d7b9@redhat.com>

On Tue 09-02-21 10:15:17, David Hildenbrand wrote:
> On 09.02.21 09:59, Michal Hocko wrote:
> > On Mon 08-02-21 22:38:03, David Hildenbrand wrote:
> > > 
> > > > Am 08.02.2021 um 22:13 schrieb Mike Rapoport <rppt@kernel.org>:
> > > > 
> > > > On Mon, Feb 08, 2021 at 10:27:18AM +0100, David Hildenbrand wrote:
> > > > > On 08.02.21 09:49, Mike Rapoport wrote:
> > > > > 
> > > > > Some questions (and request to document the answers) as we now allow to have
> > > > > unmovable allocations all over the place and I don't see a single comment
> > > > > regarding that in the cover letter:
> > > > > 
> > > > > 1. How will the issue of plenty of unmovable allocations for user space be
> > > > > tackled in the future?
> > > > > 
> > > > > 2. How has this issue been documented? E.g., interaction with ZONE_MOVABLE
> > > > > and CMA, alloc_conig_range()/alloc_contig_pages?.
> > > > 
> > > > Secretmem sets the mappings gfp mask to GFP_HIGHUSER, so it does not
> > > > allocate movable pages at the first place.
> > > 
> > > That is not the point. Secretmem cannot go on CMA / ZONE_MOVABLE
> > > memory and behaves like long-term pinnings in that sense. This is a
> > > real issue when using a lot of sectremem.
> > 
> > A lot of unevictable memory is a concern regardless of CMA/ZONE_MOVABLE.
> > As I've said it is quite easy to land at the similar situation even with
> > tmpfs/MAP_ANON|MAP_SHARED on swapless system. Neither of the two is
> > really uncommon. It would be even worse that those would be allowed to
> > consume both CMA/ZONE_MOVABLE.
> 
> IIRC, tmpfs/MAP_ANON|MAP_SHARED memory
> a) Is movable, can land in ZONE_MOVABLE/CMA
> b) Can be limited by sizing tmpfs appropriately
> 
> AFAIK, what you describe is a problem with memory overcommit, not with zone
> imbalances (below). Or what am I missing?

It can be problem for both. If you have just too much of shm (do not
forget about MAP_SHARED|MAP_ANON which is much harder to size from an
admin POV) then migrateability doesn't really help because you need a
free memory to migrate. Without reclaimability this can easily become a
problem. That is why I am saying this is not really a new problem.
Swapless systems are not all that uncommon.
 
> > One has to be very careful when relying on CMA or movable zones. This is
> > definitely worth a comment in the kernel command line parameter
> > documentation. But this is not a new problem.
> 
> I see the following thing worth documenting:
> 
> Assume you have a system with 2GB of ZONE_NORMAL/ZONE_DMA and 4GB of
> ZONE_MOVABLE/CMA.
> 
> Assume you make use of 1.5GB of secretmem. Your system might run into OOM
> any time although you still have plenty of memory on ZONE_MOVAVLE (and even
> swap!), simply because you are making excessive use of unmovable allocations
> (for user space!) in an environment where you should not make excessive use
> of unmovable allocations (e.g., where should page tables go?).

yes, you are right of course and I am not really disputing this. But I
would argue that 2:1 Movable/Normal is something to expect problems
already. "Lowmem" allocations can easily trigger OOM even without secret
mem in the picture. It all just takes to allocate a lot of GFP_KERNEL or
even GFP_{HIGH}USER. Really, it is CMA/MOVABLE that are elephant in the
room and one has to be really careful when relying on them.
 
> The existing controls (mlock limit) don't really match the current semantics
> of that memory. I repeat it once again: secretmem *currently* resembles
> long-term pinned memory, not mlocked memory.

Well, if we had a proper user space pinning accounting then I would
agree that there is a better model to use. But we don't. And previous
attempts to achieve that have failed. So I am afraid that we do not have
much choice left than using mlock as a model.

> Things will change when
> implementing migration support for secretmem pages. Until then, the
> semantics are different and this should be spelled out.
> 
> For long-term pinnings this is kind of obvious, still we're now documenting
> it because it's dangerous to not be aware of. Secretmem behaves exactly the
> same and I think this is worth spelling out: secretmem has the potential of
> being used much more often than fairly special vfio/rdma/ ...

yeah I do agree that pinning is a problem for movable/CMA but most
people simply do not care about those. Movable is the thing for hoptlug
and a really weird fragmentation avoidance IIRC and CMA is mostly to
handle crap HW. If those are to be used along with secret mem or
longterm GUP then they will constantly bump into corner cases. Do not
take me wrong, we should be looking at those problems, we should even
document them but I do not see this as anything new. We should probably
have a central place in Documentation explaining all those problems. I
would be even happy to see an explicit note in the tunables - e.g.
configuring movable/normal in 2:1 will get you back to 32b times wrt.
low mem problems.
-- 
Michal Hocko
SUSE Labs

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2021-02-09  9:53 UTC|newest]

Thread overview: 293+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-08  8:49 [PATCH v17 00/10] mm: introduce memfd_secret system call to create "secret" memory areas Mike Rapoport
2021-02-08  8:49 ` Mike Rapoport
2021-02-08  8:49 ` Mike Rapoport
2021-02-08  8:49 ` Mike Rapoport
2021-02-08  8:49 ` [PATCH v17 01/10] mm: add definition of PMD_PAGE_ORDER Mike Rapoport
2021-02-08  8:49   ` Mike Rapoport
2021-02-08  8:49   ` Mike Rapoport
2021-02-08  8:49   ` Mike Rapoport
2021-02-08  8:49 ` [PATCH v17 02/10] mmap: make mlock_future_check() global Mike Rapoport
2021-02-08  8:49   ` Mike Rapoport
2021-02-08  8:49   ` Mike Rapoport
2021-02-08  8:49   ` Mike Rapoport
2021-02-08  8:49 ` [PATCH v17 03/10] riscv/Kconfig: make direct map manipulation options depend on MMU Mike Rapoport
2021-02-08  8:49   ` Mike Rapoport
2021-02-08  8:49   ` Mike Rapoport
2021-02-08  8:49   ` Mike Rapoport
2021-02-08  8:49 ` [PATCH v17 04/10] set_memory: allow set_direct_map_*_noflush() for multiple pages Mike Rapoport
2021-02-08  8:49   ` Mike Rapoport
2021-02-08  8:49   ` Mike Rapoport
2021-02-08  8:49   ` Mike Rapoport
2021-02-08  8:49 ` [PATCH v17 05/10] set_memory: allow querying whether set_direct_map_*() is actually enabled Mike Rapoport
2021-02-08  8:49   ` Mike Rapoport
2021-02-08  8:49   ` Mike Rapoport
2021-02-08  8:49   ` Mike Rapoport
2021-02-08  8:49 ` [PATCH v17 06/10] arm64: kfence: fix header inclusion Mike Rapoport
2021-02-08  8:49   ` Mike Rapoport
2021-02-08  8:49   ` Mike Rapoport
2021-02-08  8:49   ` Mike Rapoport
2021-02-08  8:49 ` [PATCH v17 07/10] mm: introduce memfd_secret system call to create "secret" memory areas Mike Rapoport
2021-02-08  8:49   ` Mike Rapoport
2021-02-08  8:49   ` Mike Rapoport
2021-02-08  8:49   ` Mike Rapoport
2021-02-08 10:49   ` Michal Hocko
2021-02-08 10:49     ` Michal Hocko
2021-02-08 10:49     ` Michal Hocko
2021-02-08 10:49     ` Michal Hocko
2021-02-08 21:26     ` Mike Rapoport
2021-02-08 21:26       ` Mike Rapoport
2021-02-08 21:26       ` Mike Rapoport
2021-02-08 21:26       ` Mike Rapoport
2021-02-09  8:47       ` Michal Hocko
2021-02-09  8:47         ` Michal Hocko
2021-02-09  8:47         ` Michal Hocko
2021-02-09  8:47         ` Michal Hocko
2021-02-09  9:09         ` Mike Rapoport
2021-02-09  9:09           ` Mike Rapoport
2021-02-09  9:09           ` Mike Rapoport
2021-02-09  9:09           ` Mike Rapoport
2021-02-09 13:17           ` Michal Hocko
2021-02-09 13:17             ` Michal Hocko
2021-02-09 13:17             ` Michal Hocko
2021-02-09 13:17             ` Michal Hocko
2021-02-11  7:13             ` Mike Rapoport
2021-02-11  7:13               ` Mike Rapoport
2021-02-11  7:13               ` Mike Rapoport
2021-02-11  7:13               ` Mike Rapoport
2021-02-11  8:39               ` Michal Hocko
2021-02-11  8:39                 ` Michal Hocko
2021-02-11  8:39                 ` Michal Hocko
2021-02-11  8:39                 ` Michal Hocko
2021-02-11  9:01                 ` David Hildenbrand
2021-02-11  9:01                   ` David Hildenbrand
2021-02-11  9:01                   ` David Hildenbrand
2021-02-11  9:01                   ` David Hildenbrand
2021-02-11  9:38                   ` Michal Hocko
2021-02-11  9:38                     ` Michal Hocko
2021-02-11  9:38                     ` Michal Hocko
2021-02-11  9:38                     ` Michal Hocko
2021-02-11  9:48                     ` David Hildenbrand
2021-02-11  9:48                       ` David Hildenbrand
2021-02-11  9:48                       ` David Hildenbrand
2021-02-11  9:48                       ` David Hildenbrand
2021-02-11 10:02                     ` David Hildenbrand
2021-02-11 10:02                       ` David Hildenbrand
2021-02-11 10:02                       ` David Hildenbrand
2021-02-11 10:02                       ` David Hildenbrand
2021-02-11 11:29                       ` Mike Rapoport
2021-02-11 11:29                         ` Mike Rapoport
2021-02-11 11:29                         ` Mike Rapoport
2021-02-11 11:29                         ` Mike Rapoport
2021-02-11 11:27                   ` Mike Rapoport
2021-02-11 11:27                     ` Mike Rapoport
2021-02-11 11:27                     ` Mike Rapoport
2021-02-11 11:27                     ` Mike Rapoport
2021-02-11 12:07                     ` David Hildenbrand
2021-02-11 12:07                       ` David Hildenbrand
2021-02-11 12:07                       ` David Hildenbrand
2021-02-11 12:07                       ` David Hildenbrand
2021-02-11 23:09                       ` Mike Rapoport
2021-02-11 23:09                         ` Mike Rapoport
2021-02-11 23:09                         ` Mike Rapoport
2021-02-11 23:09                         ` Mike Rapoport
2021-02-12  9:18                         ` David Hildenbrand
2021-02-12  9:18                           ` David Hildenbrand
2021-02-12  9:18                           ` David Hildenbrand
2021-02-12  9:18                           ` David Hildenbrand
2021-02-14  9:19                           ` Mike Rapoport
2021-02-14  9:19                             ` Mike Rapoport
2021-02-14  9:19                             ` Mike Rapoport
2021-02-14  9:19                             ` Mike Rapoport
2021-02-14  9:58                             ` David Hildenbrand
2021-02-14  9:58                               ` David Hildenbrand
2021-02-14  9:58                               ` David Hildenbrand
2021-02-14  9:58                               ` David Hildenbrand
2021-02-14 19:21                               ` James Bottomley
2021-02-14 19:21                                 ` James Bottomley
2021-02-14 19:21                                 ` James Bottomley
2021-02-14 19:21                                 ` James Bottomley
2021-02-15  9:13                                 ` Michal Hocko
2021-02-15  9:13                                   ` Michal Hocko
2021-02-15  9:13                                   ` Michal Hocko
2021-02-15  9:13                                   ` Michal Hocko
2021-02-15 18:14                                   ` James Bottomley
2021-02-15 18:14                                     ` James Bottomley
2021-02-15 18:14                                     ` James Bottomley
2021-02-15 18:14                                     ` James Bottomley
2021-02-15 19:20                                     ` Michal Hocko
2021-02-15 19:20                                       ` Michal Hocko
2021-02-15 19:20                                       ` Michal Hocko
2021-02-15 19:20                                       ` Michal Hocko
2021-02-16 16:25                                       ` James Bottomley
2021-02-16 16:25                                         ` James Bottomley
2021-02-16 16:25                                         ` James Bottomley
2021-02-16 16:25                                         ` James Bottomley
2021-02-16 16:34                                         ` David Hildenbrand
2021-02-16 16:34                                           ` David Hildenbrand
2021-02-16 16:34                                           ` David Hildenbrand
2021-02-16 16:34                                           ` David Hildenbrand
2021-02-16 16:44                                           ` James Bottomley
2021-02-16 16:44                                             ` James Bottomley
2021-02-16 16:44                                             ` James Bottomley
2021-02-16 16:44                                             ` James Bottomley
2021-02-16 17:16                                             ` David Hildenbrand
2021-02-16 17:16                                               ` David Hildenbrand
2021-02-16 17:16                                               ` David Hildenbrand
2021-02-16 17:16                                               ` David Hildenbrand
2021-02-17 16:19                                               ` James Bottomley
2021-02-17 16:19                                                 ` James Bottomley
2021-02-17 16:19                                                 ` James Bottomley
2021-02-17 16:19                                                 ` James Bottomley
2021-02-22  9:38                                                 ` David Hildenbrand
2021-02-22  9:38                                                   ` David Hildenbrand
2021-02-22  9:38                                                   ` David Hildenbrand
2021-02-22  9:38                                                   ` David Hildenbrand
2021-02-22 10:50                                                   ` David Hildenbrand
2021-02-22 10:50                                                     ` David Hildenbrand
2021-02-22 10:50                                                     ` David Hildenbrand
2021-02-22 10:50                                                     ` David Hildenbrand
2021-02-16 16:51                                         ` Michal Hocko
2021-02-16 16:51                                           ` Michal Hocko
2021-02-16 16:51                                           ` Michal Hocko
2021-02-16 16:51                                           ` Michal Hocko
2021-02-11 11:20                 ` Mike Rapoport
2021-02-11 11:20                   ` Mike Rapoport
2021-02-11 11:20                   ` Mike Rapoport
2021-02-11 11:20                   ` Mike Rapoport
2021-02-11 12:30                   ` Michal Hocko
2021-02-11 12:30                     ` Michal Hocko
2021-02-11 12:30                     ` Michal Hocko
2021-02-11 12:30                     ` Michal Hocko
2021-02-11 22:59                     ` Mike Rapoport
2021-02-11 22:59                       ` Mike Rapoport
2021-02-11 22:59                       ` Mike Rapoport
2021-02-11 22:59                       ` Mike Rapoport
2021-02-12  9:02                       ` Michal Hocko
2021-02-12  9:02                         ` Michal Hocko
2021-02-12  9:02                         ` Michal Hocko
2021-02-12  9:02                         ` Michal Hocko
2021-02-08  8:49 ` [PATCH v17 08/10] PM: hibernate: disable when there are active secretmem users Mike Rapoport
2021-02-08  8:49   ` Mike Rapoport
2021-02-08  8:49   ` Mike Rapoport
2021-02-08  8:49   ` Mike Rapoport
2021-02-08 10:18   ` Michal Hocko
2021-02-08 10:18     ` Michal Hocko
2021-02-08 10:18     ` Michal Hocko
2021-02-08 10:18     ` Michal Hocko
2021-02-08 10:32     ` David Hildenbrand
2021-02-08 10:32       ` David Hildenbrand
2021-02-08 10:32       ` David Hildenbrand
2021-02-08 10:32       ` David Hildenbrand
2021-02-08 10:51       ` Michal Hocko
2021-02-08 10:51         ` Michal Hocko
2021-02-08 10:51         ` Michal Hocko
2021-02-08 10:51         ` Michal Hocko
2021-02-08 10:53         ` David Hildenbrand
2021-02-08 10:53           ` David Hildenbrand
2021-02-08 10:53           ` David Hildenbrand
2021-02-08 10:53           ` David Hildenbrand
2021-02-08 10:57           ` Michal Hocko
2021-02-08 10:57             ` Michal Hocko
2021-02-08 10:57             ` Michal Hocko
2021-02-08 10:57             ` Michal Hocko
2021-02-08 11:13             ` David Hildenbrand
2021-02-08 11:13               ` David Hildenbrand
2021-02-08 11:13               ` David Hildenbrand
2021-02-08 11:13               ` David Hildenbrand
2021-02-08 11:14               ` David Hildenbrand
2021-02-08 11:14                 ` David Hildenbrand
2021-02-08 11:14                 ` David Hildenbrand
2021-02-08 11:14                 ` David Hildenbrand
2021-02-08 11:26                 ` David Hildenbrand
2021-02-08 11:26                   ` David Hildenbrand
2021-02-08 11:26                   ` David Hildenbrand
2021-02-08 11:26                   ` David Hildenbrand
2021-02-08 12:17                   ` Michal Hocko
2021-02-08 12:17                     ` Michal Hocko
2021-02-08 12:17                     ` Michal Hocko
2021-02-08 12:17                     ` Michal Hocko
2021-02-08 13:34                     ` Michal Hocko
2021-02-08 13:34                       ` Michal Hocko
2021-02-08 13:34                       ` Michal Hocko
2021-02-08 13:34                       ` Michal Hocko
2021-02-08 13:40                     ` David Hildenbrand
2021-02-08 13:40                       ` David Hildenbrand
2021-02-08 13:40                       ` David Hildenbrand
2021-02-08 13:40                       ` David Hildenbrand
2021-02-08 21:28     ` Mike Rapoport
2021-02-08 21:28       ` Mike Rapoport
2021-02-08 21:28       ` Mike Rapoport
2021-02-08 21:28       ` Mike Rapoport
2021-02-22  7:34   ` Matthew Garrett
2021-02-22  7:34     ` Matthew Garrett
2021-02-22  7:34     ` Matthew Garrett
2021-02-22  7:34     ` Matthew Garrett
2021-02-22 10:23     ` Mike Rapoport
2021-02-22 10:23       ` Mike Rapoport
2021-02-22 10:23       ` Mike Rapoport
2021-02-22 10:23       ` Mike Rapoport
2021-02-22 18:27       ` Matthew Garrett
2021-02-22 18:27         ` Matthew Garrett
2021-02-22 18:27         ` Matthew Garrett
2021-02-22 18:27         ` Matthew Garrett
2021-02-22 19:17       ` Dan Williams
2021-02-22 19:17         ` Dan Williams
2021-02-22 19:17         ` Dan Williams
2021-02-22 19:17         ` Dan Williams
2021-02-22 19:17         ` Dan Williams
2021-02-22 19:21         ` James Bottomley
2021-02-22 19:21           ` James Bottomley
2021-02-22 19:21           ` James Bottomley
2021-02-22 19:21           ` James Bottomley
2021-02-08  8:49 ` [PATCH v17 09/10] arch, mm: wire up memfd_secret system call where relevant Mike Rapoport
2021-02-08  8:49   ` Mike Rapoport
2021-02-08  8:49   ` Mike Rapoport
2021-02-08  8:49   ` Mike Rapoport
2021-02-08  8:49 ` [PATCH v17 10/10] secretmem: test: add basic selftest for memfd_secret(2) Mike Rapoport
2021-02-08  8:49   ` Mike Rapoport
2021-02-08  8:49   ` Mike Rapoport
2021-02-08  8:49   ` Mike Rapoport
2021-02-08  9:27 ` [PATCH v17 00/10] mm: introduce memfd_secret system call to create "secret" memory areas David Hildenbrand
2021-02-08  9:27   ` David Hildenbrand
2021-02-08  9:27   ` David Hildenbrand
2021-02-08  9:27   ` David Hildenbrand
2021-02-08 21:13   ` Mike Rapoport
2021-02-08 21:13     ` Mike Rapoport
2021-02-08 21:13     ` Mike Rapoport
2021-02-08 21:13     ` Mike Rapoport
2021-02-08 21:38     ` David Hildenbrand
2021-02-08 21:38       ` David Hildenbrand
2021-02-08 21:38       ` David Hildenbrand
2021-02-08 21:38       ` David Hildenbrand
2021-02-09  8:59       ` Michal Hocko
2021-02-09  8:59         ` Michal Hocko
2021-02-09  8:59         ` Michal Hocko
2021-02-09  8:59         ` Michal Hocko
2021-02-09  9:15         ` David Hildenbrand
2021-02-09  9:15           ` David Hildenbrand
2021-02-09  9:15           ` David Hildenbrand
2021-02-09  9:15           ` David Hildenbrand
2021-02-09  9:53           ` Michal Hocko [this message]
2021-02-09  9:53             ` Michal Hocko
2021-02-09  9:53             ` Michal Hocko
2021-02-09  9:53             ` Michal Hocko
2021-02-09 10:23             ` David Hildenbrand
2021-02-09 10:23               ` David Hildenbrand
2021-02-09 10:23               ` David Hildenbrand
2021-02-09 10:23               ` David Hildenbrand
2021-02-09 10:30               ` David Hildenbrand
2021-02-09 10:30                 ` David Hildenbrand
2021-02-09 10:30                 ` David Hildenbrand
2021-02-09 10:30                 ` David Hildenbrand
2021-02-09 13:25               ` Michal Hocko
2021-02-09 13:25                 ` Michal Hocko
2021-02-09 13:25                 ` Michal Hocko
2021-02-09 13:25                 ` Michal Hocko
2021-02-09 16:17                 ` David Hildenbrand
2021-02-09 16:17                   ` David Hildenbrand
2021-02-09 16:17                   ` David Hildenbrand
2021-02-09 16:17                   ` David Hildenbrand
2021-02-09 20:08                   ` Michal Hocko
2021-02-09 20:08                     ` Michal Hocko
2021-02-09 20:08                     ` Michal Hocko
2021-02-09 20:08                     ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YCJbmR11ikrWKaU8@dhcp22.suse.cz \
    --to=mhocko@suse.com \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=bp@alien8.de \
    --cc=catalin.marinas@arm.com \
    --cc=cl@linux.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=elena.reshetova@intel.com \
    --cc=guro@fb.com \
    --cc=hpa@zytor.com \
    --cc=jejb@linux.ibm.com \
    --cc=kirill@shutemov.name \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=luto@kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mingo@redhat.com \
    --cc=mtk.manpages@gmail.com \
    --cc=palmer@dabbelt.com \
    --cc=paul.walmsley@sifive.com \
    --cc=peterz@infradead.org \
    --cc=rick.p.edgecombe@intel.com \
    --cc=rppt@kernel.org \
    --cc=rppt@linux.ibm.com \
    --cc=shakeelb@google.com \
    --cc=shuah@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=tycho@tycho.ws \
    --cc=viro@zeniv.linux.org.uk \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.