linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Andy Lutomirski <luto@kernel.org>, Arnd Bergmann <arnd@arndb.de>,
	Borislav Petkov <bp@alien8.de>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Christopher Lameter <cl@linux.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Elena Reshetova <elena.reshetova@intel.com>,
	"H. Peter Anvin" <hpa@zytor.com>, Ingo Molnar <mingo@redhat.com>,
	James Bottomley <jejb@linux.ibm.com>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	Matthew Wilcox <willy@infradead.org>,
	Mark Rutland <mark.rutland@arm.com>,
	Mike Rapoport <rppt@linux.ibm.com>,
	Michael Kerrisk <mtk.manpages@gmail.com>,
	Palmer Dabbelt <palmer@dabbelt.com>,
	Paul Walmsley <paul.walmsley@sifive.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Rick Edgecombe <rick.p.edgecombe@intel.com>,
	Roman Gushchin <guro@fb.com>, Shakeel Butt <shakeelb@google.com>,
	Shuah Khan <shuah@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Tycho Andersen <tycho@tycho.ws>, Will Deacon <will@kernel.org>,
	linux-api@vger.kernel.org, linux-arch@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org,
	linux-nvdimm@lists.01.org, linux-riscv@lists.infradead.org,
	x86@kernel.org
Subject: Re: [PATCH v17 00/10] mm: introduce memfd_secret system call to create "secret" memory areas
Date: Tue, 9 Feb 2021 11:30:53 +0100	[thread overview]
Message-ID: <d733d2b5-bb9c-179d-82c2-3c07d7d97a9f@redhat.com> (raw)
In-Reply-To: <c1e5e7b6-3360-ddc4-2ff5-0e79515ee23a@redhat.com>

On 09.02.21 11:23, David Hildenbrand wrote:
>>>> A lot of unevictable memory is a concern regardless of CMA/ZONE_MOVABLE.
>>>> As I've said it is quite easy to land at the similar situation even with
>>>> tmpfs/MAP_ANON|MAP_SHARED on swapless system. Neither of the two is
>>>> really uncommon. It would be even worse that those would be allowed to
>>>> consume both CMA/ZONE_MOVABLE.
>>>
>>> IIRC, tmpfs/MAP_ANON|MAP_SHARED memory
>>> a) Is movable, can land in ZONE_MOVABLE/CMA
>>> b) Can be limited by sizing tmpfs appropriately
>>>
>>> AFAIK, what you describe is a problem with memory overcommit, not with zone
>>> imbalances (below). Or what am I missing?
>>
>> It can be problem for both. If you have just too much of shm (do not
>> forget about MAP_SHARED|MAP_ANON which is much harder to size from an
>> admin POV) then migrateability doesn't really help because you need a
>> free memory to migrate. Without reclaimability this can easily become a
>> problem. That is why I am saying this is not really a new problem.
>> Swapless systems are not all that uncommon.
> 
> I get your point, it's similar but still different. "no memory in the
> system" vs. "plenty of unusable free memory available in the system".
> 
> In many setups, memory for user space applications can go to
> ZONE_MOVABLE just fine. ZONE_NORMAL etc. can be used for supporting user
> space memory (e.g., page tables) and other kernel stuff.
> 
> Like, have 4GB of ZONE_MOVABLE with 2GB of ZONE_NORMAL. Have an
> application (database) that allocates 4GB of memory. Works just fine.
> The zone ratio ends up being a problem for example with many processes
> (-> many page tables).
> 
> Not being able to put user space memory into the movable zone is a
> special case. And we are introducing yet another special case here
> (besides vfio, rdma, unmigratable huge pages like gigantic pages).
> 
> With plenty of secretmem, looking at /proc/meminfo Total vs. Free can be
> a big lie of how your system behaves.
> 
>>    
>>>> One has to be very careful when relying on CMA or movable zones. This is
>>>> definitely worth a comment in the kernel command line parameter
>>>> documentation. But this is not a new problem.
>>>
>>> I see the following thing worth documenting:
>>>
>>> Assume you have a system with 2GB of ZONE_NORMAL/ZONE_DMA and 4GB of
>>> ZONE_MOVABLE/CMA.
>>>
>>> Assume you make use of 1.5GB of secretmem. Your system might run into OOM
>>> any time although you still have plenty of memory on ZONE_MOVAVLE (and even
>>> swap!), simply because you are making excessive use of unmovable allocations
>>> (for user space!) in an environment where you should not make excessive use
>>> of unmovable allocations (e.g., where should page tables go?).
>>
>> yes, you are right of course and I am not really disputing this. But I
>> would argue that 2:1 Movable/Normal is something to expect problems
>> already. "Lowmem" allocations can easily trigger OOM even without secret
>> mem in the picture. It all just takes to allocate a lot of GFP_KERNEL or
>> even GFP_{HIGH}USER. Really, it is CMA/MOVABLE that are elephant in the
>> room and one has to be really careful when relying on them.
> 
> Right, it's all about what the setup actually needs. Sure, there are
> cases where you need significantly more GFP_KERNEL/GFP_{HIGH}USER such
> that a 2:1 ratio is not feasible. But I claim that these are corner cases.
> 
> Secretmem gives user space the option to allocate a lot of
> GFP_{HIGH}USER memory. If I am not wrong, "ulimit -a" tells me that each
> application on F33 can allocate 16 GiB (!) of secretmem.

Got to learn to do my math. It's 16 MiB - so as a default it's less 
dangerous than I thought!

-- 
Thanks,

David / dhildenb


  reply	other threads:[~2021-02-09 10:35 UTC|newest]

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-08  8:49 [PATCH v17 00/10] mm: introduce memfd_secret system call to create "secret" memory areas Mike Rapoport
2021-02-08  8:49 ` [PATCH v17 01/10] mm: add definition of PMD_PAGE_ORDER Mike Rapoport
2021-02-08  8:49 ` [PATCH v17 02/10] mmap: make mlock_future_check() global Mike Rapoport
2021-02-08  8:49 ` [PATCH v17 03/10] riscv/Kconfig: make direct map manipulation options depend on MMU Mike Rapoport
2021-02-08  8:49 ` [PATCH v17 04/10] set_memory: allow set_direct_map_*_noflush() for multiple pages Mike Rapoport
2021-02-08  8:49 ` [PATCH v17 05/10] set_memory: allow querying whether set_direct_map_*() is actually enabled Mike Rapoport
2021-02-08  8:49 ` [PATCH v17 06/10] arm64: kfence: fix header inclusion Mike Rapoport
2021-02-08  8:49 ` [PATCH v17 07/10] mm: introduce memfd_secret system call to create "secret" memory areas Mike Rapoport
2021-02-08 10:49   ` Michal Hocko
2021-02-08 21:26     ` Mike Rapoport
2021-02-09  8:47       ` Michal Hocko
2021-02-09  9:09         ` Mike Rapoport
2021-02-09 13:17           ` Michal Hocko
2021-02-11  7:13             ` Mike Rapoport
2021-02-11  8:39               ` Michal Hocko
2021-02-11  9:01                 ` David Hildenbrand
2021-02-11  9:38                   ` Michal Hocko
2021-02-11  9:48                     ` David Hildenbrand
2021-02-11 10:02                     ` David Hildenbrand
2021-02-11 11:29                       ` Mike Rapoport
2021-02-11 11:27                   ` Mike Rapoport
2021-02-11 12:07                     ` David Hildenbrand
2021-02-11 23:09                       ` Mike Rapoport
2021-02-12  9:18                         ` David Hildenbrand
2021-02-14  9:19                           ` Mike Rapoport
2021-02-14  9:58                             ` David Hildenbrand
2021-02-14 19:21                               ` James Bottomley
2021-02-15  9:13                                 ` Michal Hocko
2021-02-15 18:14                                   ` James Bottomley
2021-02-15 19:20                                     ` Michal Hocko
2021-02-16 16:25                                       ` James Bottomley
2021-02-16 16:34                                         ` David Hildenbrand
2021-02-16 16:44                                           ` James Bottomley
2021-02-16 17:16                                             ` David Hildenbrand
2021-02-17 16:19                                               ` James Bottomley
2021-02-22  9:38                                                 ` David Hildenbrand
2021-02-22 10:50                                                   ` David Hildenbrand
2021-02-16 16:51                                         ` Michal Hocko
2021-02-11 11:20                 ` Mike Rapoport
2021-02-11 12:30                   ` Michal Hocko
2021-02-11 22:59                     ` Mike Rapoport
2021-02-12  9:02                       ` Michal Hocko
2021-02-08  8:49 ` [PATCH v17 08/10] PM: hibernate: disable when there are active secretmem users Mike Rapoport
2021-02-08 10:18   ` Michal Hocko
2021-02-08 10:32     ` David Hildenbrand
2021-02-08 10:51       ` Michal Hocko
2021-02-08 10:53         ` David Hildenbrand
2021-02-08 10:57           ` Michal Hocko
2021-02-08 11:13             ` David Hildenbrand
2021-02-08 11:14               ` David Hildenbrand
2021-02-08 11:26                 ` David Hildenbrand
2021-02-08 12:17                   ` Michal Hocko
2021-02-08 13:34                     ` Michal Hocko
2021-02-08 13:40                     ` David Hildenbrand
2021-02-08 21:28     ` Mike Rapoport
2021-02-22  7:34   ` Matthew Garrett
2021-02-22 10:23     ` Mike Rapoport
2021-02-22 18:27       ` Matthew Garrett
2021-02-22 19:17       ` Dan Williams
2021-02-22 19:21         ` James Bottomley
2021-02-08  8:49 ` [PATCH v17 09/10] arch, mm: wire up memfd_secret system call where relevant Mike Rapoport
2021-02-08  8:49 ` [PATCH v17 10/10] secretmem: test: add basic selftest for memfd_secret(2) Mike Rapoport
2021-02-08  9:27 ` [PATCH v17 00/10] mm: introduce memfd_secret system call to create "secret" memory areas David Hildenbrand
2021-02-08 21:13   ` Mike Rapoport
2021-02-08 21:38     ` David Hildenbrand
2021-02-09  8:59       ` Michal Hocko
2021-02-09  9:15         ` David Hildenbrand
2021-02-09  9:53           ` Michal Hocko
2021-02-09 10:23             ` David Hildenbrand
2021-02-09 10:30               ` David Hildenbrand [this message]
2021-02-09 13:25               ` Michal Hocko
2021-02-09 16:17                 ` David Hildenbrand
2021-02-09 20:08                   ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d733d2b5-bb9c-179d-82c2-3c07d7d97a9f@redhat.com \
    --to=david@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=bp@alien8.de \
    --cc=catalin.marinas@arm.com \
    --cc=cl@linux.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=elena.reshetova@intel.com \
    --cc=guro@fb.com \
    --cc=hpa@zytor.com \
    --cc=jejb@linux.ibm.com \
    --cc=kirill@shutemov.name \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=luto@kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mhocko@suse.com \
    --cc=mingo@redhat.com \
    --cc=mtk.manpages@gmail.com \
    --cc=palmer@dabbelt.com \
    --cc=paul.walmsley@sifive.com \
    --cc=peterz@infradead.org \
    --cc=rick.p.edgecombe@intel.com \
    --cc=rppt@kernel.org \
    --cc=rppt@linux.ibm.com \
    --cc=shakeelb@google.com \
    --cc=shuah@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=tycho@tycho.ws \
    --cc=viro@zeniv.linux.org.uk \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).