All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@suse.com>
To: Mike Rapoport <rppt@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Andy Lutomirski <luto@kernel.org>, Arnd Bergmann <arnd@arndb.de>,
	Borislav Petkov <bp@alien8.de>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Christopher Lameter <cl@linux.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	David Hildenbrand <david@redhat.com>,
	Elena Reshetova <elena.reshetova@intel.com>,
	"H. Peter Anvin" <hpa@zytor.com>, Ingo Molnar <mingo@redhat.com>,
	James Bottomley <jejb@linux.ibm.com>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	Matthew Wilcox <willy@infradead.org>,
	Mark Rutland <mark.rutland@arm.com>,
	Mike Rapoport <rppt@linux.ibm.com>,
	Michael Kerrisk <mtk.manpages@gmail.com>,
	Palmer Dabbelt <palmer@dabbelt.com>,
	Paul Walmsley <paul.walmsley@sifive.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Rick Edgecombe <rick.p.edgecombe@intel.com>,
	Roman Gushchin <guro@fb.com>, Shakeel Butt <shakeelb@google.com>,
	Shuah Khan <shuah@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Tycho  Andersen <tycho@tycho.ws>, Will Deacon <will@kernel.org>,
	linux-api@vger.kernel.org, linux-arch@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org,
	linux-nvdimm@lists.01.org, linux-riscv@lists.infradead.org,
	x86@kernel.org, Hagen Paul Pfeifer <hagen@jauu.net>,
	Palmer Dabbelt <palmerdabbelt@google.com>
Subject: Re: [PATCH v16 07/11] secretmem: use PMD-size pages to amortize direct map fragmentation
Date: Tue, 26 Jan 2021 12:46:57 +0100	[thread overview]
Message-ID: <20210126114657.GL827@dhcp22.suse.cz> (raw)
In-Reply-To: <20210121122723.3446-8-rppt@kernel.org>

On Thu 21-01-21 14:27:19, Mike Rapoport wrote:
> From: Mike Rapoport <rppt@linux.ibm.com>
> 
> Removing a PAGE_SIZE page from the direct map every time such page is
> allocated for a secret memory mapping will cause severe fragmentation of
> the direct map. This fragmentation can be reduced by using PMD-size pages
> as a pool for small pages for secret memory mappings.
> 
> Add a gen_pool per secretmem inode and lazily populate this pool with
> PMD-size pages.
> 
> As pages allocated by secretmem become unmovable, use CMA to back large
> page caches so that page allocator won't be surprised by failing attempt to
> migrate these pages.
> 
> The CMA area used by secretmem is controlled by the "secretmem=" kernel
> parameter. This allows explicit control over the memory available for
> secretmem and provides upper hard limit for secretmem consumption.

OK, so I have finally had a look at this closer and this is really not
acceptable. I have already mentioned that in a response to other patch
but any task is able to deprive access to secret memory to other tasks
and cause OOM killer which wouldn't really recover ever and potentially
panic the system. Now you could be less drastic and only make SIGBUS on
fault but that would be still quite terrible. There is a very good
reason why hugetlb implements is non-trivial reservation system to avoid
exactly these problems.

So unless I am really misreading the code
Nacked-by: Michal Hocko <mhocko@suse.com>

That doesn't mean I reject the whole idea. There are some details to
sort out as mentioned elsewhere but you cannot really depend on
pre-allocated pool which can fail at a fault time like that.

> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Christopher Lameter <cl@linux.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Elena Reshetova <elena.reshetova@intel.com>
> Cc: Hagen Paul Pfeifer <hagen@jauu.net>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: James Bottomley <jejb@linux.ibm.com>
> Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: Michael Kerrisk <mtk.manpages@gmail.com>
> Cc: Palmer Dabbelt <palmer@dabbelt.com>
> Cc: Palmer Dabbelt <palmerdabbelt@google.com>
> Cc: Paul Walmsley <paul.walmsley@sifive.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Cc: Roman Gushchin <guro@fb.com>
> Cc: Shakeel Butt <shakeelb@google.com>
> Cc: Shuah Khan <shuah@kernel.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Tycho Andersen <tycho@tycho.ws>
> Cc: Will Deacon <will@kernel.org>
> ---
>  mm/Kconfig     |   2 +
>  mm/secretmem.c | 175 +++++++++++++++++++++++++++++++++++++++++--------
>  2 files changed, 150 insertions(+), 27 deletions(-)
> 
> diff --git a/mm/Kconfig b/mm/Kconfig
> index 5f8243442f66..ec35bf406439 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -874,5 +874,7 @@ config KMAP_LOCAL
>  
>  config SECRETMEM
>  	def_bool ARCH_HAS_SET_DIRECT_MAP && !EMBEDDED
> +	select GENERIC_ALLOCATOR
> +	select CMA
>  
>  endmenu
> diff --git a/mm/secretmem.c b/mm/secretmem.c
> index 904351d12c33..469211c7cc3a 100644
> --- a/mm/secretmem.c
> +++ b/mm/secretmem.c
> @@ -7,12 +7,15 @@
>  
>  #include <linux/mm.h>
>  #include <linux/fs.h>
> +#include <linux/cma.h>
>  #include <linux/mount.h>
>  #include <linux/memfd.h>
>  #include <linux/bitops.h>
>  #include <linux/printk.h>
>  #include <linux/pagemap.h>
> +#include <linux/genalloc.h>
>  #include <linux/syscalls.h>
> +#include <linux/memblock.h>
>  #include <linux/pseudo_fs.h>
>  #include <linux/secretmem.h>
>  #include <linux/set_memory.h>
> @@ -35,24 +38,94 @@
>  #define SECRETMEM_FLAGS_MASK	SECRETMEM_MODE_MASK
>  
>  struct secretmem_ctx {
> +	struct gen_pool *pool;
>  	unsigned int mode;
>  };
>  
> -static struct page *secretmem_alloc_page(gfp_t gfp)
> +static struct cma *secretmem_cma;
> +
> +static int secretmem_pool_increase(struct secretmem_ctx *ctx, gfp_t gfp)
>  {
> +	unsigned long nr_pages = (1 << PMD_PAGE_ORDER);
> +	struct gen_pool *pool = ctx->pool;
> +	unsigned long addr;
> +	struct page *page;
> +	int i, err;
> +
> +	page = cma_alloc(secretmem_cma, nr_pages, PMD_SIZE, gfp & __GFP_NOWARN);
> +	if (!page)
> +		return -ENOMEM;
> +
>  	/*
> -	 * FIXME: use a cache of large pages to reduce the direct map
> -	 * fragmentation
> +	 * clear the data left from the prevoius user before dropping the
> +	 * pages from the direct map
>  	 */
> -	return alloc_page(gfp | __GFP_ZERO);
> +	for (i = 0; i < nr_pages; i++)
> +		clear_highpage(page + i);
> +
> +	err = set_direct_map_invalid_noflush(page, nr_pages);
> +	if (err)
> +		goto err_cma_release;
> +
> +	addr = (unsigned long)page_address(page);
> +	err = gen_pool_add(pool, addr, PMD_SIZE, NUMA_NO_NODE);
> +	if (err)
> +		goto err_set_direct_map;
> +
> +	flush_tlb_kernel_range(addr, addr + PMD_SIZE);
> +
> +	return 0;
> +
> +err_set_direct_map:
> +	/*
> +	 * If a split of PUD-size page was required, it already happened
> +	 * when we marked the pages invalid which guarantees that this call
> +	 * won't fail
> +	 */
> +	set_direct_map_default_noflush(page, nr_pages);
> +err_cma_release:
> +	cma_release(secretmem_cma, page, nr_pages);
> +	return err;
> +}
> +
> +static void secretmem_free_page(struct secretmem_ctx *ctx, struct page *page)
> +{
> +	unsigned long addr = (unsigned long)page_address(page);
> +	struct gen_pool *pool = ctx->pool;
> +
> +	gen_pool_free(pool, addr, PAGE_SIZE);
> +}
> +
> +static struct page *secretmem_alloc_page(struct secretmem_ctx *ctx,
> +					 gfp_t gfp)
> +{
> +	struct gen_pool *pool = ctx->pool;
> +	unsigned long addr;
> +	struct page *page;
> +	int err;
> +
> +	if (gen_pool_avail(pool) < PAGE_SIZE) {
> +		err = secretmem_pool_increase(ctx, gfp);
> +		if (err)
> +			return NULL;
> +	}
> +
> +	addr = gen_pool_alloc(pool, PAGE_SIZE);
> +	if (!addr)
> +		return NULL;
> +
> +	page = virt_to_page(addr);
> +	get_page(page);
> +
> +	return page;
>  }
>  
>  static vm_fault_t secretmem_fault(struct vm_fault *vmf)
>  {
> +	struct secretmem_ctx *ctx = vmf->vma->vm_file->private_data;
>  	struct address_space *mapping = vmf->vma->vm_file->f_mapping;
>  	struct inode *inode = file_inode(vmf->vma->vm_file);
>  	pgoff_t offset = vmf->pgoff;
> -	unsigned long addr;
>  	struct page *page;
>  	int err;
>  
> @@ -62,40 +135,25 @@ static vm_fault_t secretmem_fault(struct vm_fault *vmf)
>  retry:
>  	page = find_lock_page(mapping, offset);
>  	if (!page) {
> -		page = secretmem_alloc_page(vmf->gfp_mask);
> +		page = secretmem_alloc_page(ctx, vmf->gfp_mask);
>  		if (!page)
>  			return VM_FAULT_OOM;
>  
> -		err = set_direct_map_invalid_noflush(page, 1);
> -		if (err) {
> -			put_page(page);
> -			return vmf_error(err);
> -		}
> -
>  		__SetPageUptodate(page);
>  		err = add_to_page_cache(page, mapping, offset, vmf->gfp_mask);
>  		if (unlikely(err)) {
> +			secretmem_free_page(ctx, page);
>  			put_page(page);
>  			if (err == -EEXIST)
>  				goto retry;
> -			goto err_restore_direct_map;
> +			return vmf_error(err);
>  		}
>  
> -		addr = (unsigned long)page_address(page);
> -		flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
> +		set_page_private(page, (unsigned long)ctx);
>  	}
>  
>  	vmf->page = page;
>  	return VM_FAULT_LOCKED;
> -
> -err_restore_direct_map:
> -	/*
> -	 * If a split of large page was required, it already happened
> -	 * when we marked the page invalid which guarantees that this call
> -	 * won't fail
> -	 */
> -	set_direct_map_default_noflush(page, 1);
> -	return vmf_error(err);
>  }
>  
>  static const struct vm_operations_struct secretmem_vm_ops = {
> @@ -141,8 +199,9 @@ static int secretmem_migratepage(struct address_space *mapping,
>  
>  static void secretmem_freepage(struct page *page)
>  {
> -	set_direct_map_default_noflush(page, 1);
> -	clear_highpage(page);
> +	struct secretmem_ctx *ctx = (struct secretmem_ctx *)page_private(page);
> +
> +	secretmem_free_page(ctx, page);
>  }
>  
>  static const struct address_space_operations secretmem_aops = {
> @@ -177,13 +236,18 @@ static struct file *secretmem_file_create(unsigned long flags)
>  	if (!ctx)
>  		goto err_free_inode;
>  
> +	ctx->pool = gen_pool_create(PAGE_SHIFT, NUMA_NO_NODE);
> +	if (!ctx->pool)
> +		goto err_free_ctx;
> +
>  	file = alloc_file_pseudo(inode, secretmem_mnt, "secretmem",
>  				 O_RDWR, &secretmem_fops);
>  	if (IS_ERR(file))
> -		goto err_free_ctx;
> +		goto err_free_pool;
>  
>  	mapping_set_unevictable(inode->i_mapping);
>  
> +	inode->i_private = ctx;
>  	inode->i_mapping->private_data = ctx;
>  	inode->i_mapping->a_ops = &secretmem_aops;
>  
> @@ -197,6 +261,8 @@ static struct file *secretmem_file_create(unsigned long flags)
>  
>  	return file;
>  
> +err_free_pool:
> +	gen_pool_destroy(ctx->pool);
>  err_free_ctx:
>  	kfree(ctx);
>  err_free_inode:
> @@ -215,6 +281,9 @@ SYSCALL_DEFINE1(memfd_secret, unsigned long, flags)
>  	if (flags & ~(SECRETMEM_FLAGS_MASK | O_CLOEXEC))
>  		return -EINVAL;
>  
> +	if (!secretmem_cma)
> +		return -ENOMEM;
> +
>  	fd = get_unused_fd_flags(flags & O_CLOEXEC);
>  	if (fd < 0)
>  		return fd;
> @@ -235,11 +304,37 @@ SYSCALL_DEFINE1(memfd_secret, unsigned long, flags)
>  	return err;
>  }
>  
> +static void secretmem_cleanup_chunk(struct gen_pool *pool,
> +				    struct gen_pool_chunk *chunk, void *data)
> +{
> +	unsigned long start = chunk->start_addr;
> +	unsigned long end = chunk->end_addr;
> +	struct page *page = virt_to_page(start);
> +	unsigned long nr_pages = (end - start + 1) / PAGE_SIZE;
> +	int i;
> +
> +	set_direct_map_default_noflush(page, nr_pages);
> +
> +	for (i = 0; i < nr_pages; i++)
> +		clear_highpage(page + i);
> +
> +	cma_release(secretmem_cma, page, nr_pages);
> +}
> +
> +static void secretmem_cleanup_pool(struct secretmem_ctx *ctx)
> +{
> +	struct gen_pool *pool = ctx->pool;
> +
> +	gen_pool_for_each_chunk(pool, secretmem_cleanup_chunk, ctx);
> +	gen_pool_destroy(pool);
> +}
> +
>  static void secretmem_evict_inode(struct inode *inode)
>  {
>  	struct secretmem_ctx *ctx = inode->i_private;
>  
>  	truncate_inode_pages_final(&inode->i_data);
> +	secretmem_cleanup_pool(ctx);
>  	clear_inode(inode);
>  	kfree(ctx);
>  }
> @@ -276,3 +371,29 @@ static int secretmem_init(void)
>  	return ret;
>  }
>  fs_initcall(secretmem_init);
> +
> +static int __init secretmem_setup(char *str)
> +{
> +	phys_addr_t align = PMD_SIZE;
> +	unsigned long reserved_size;
> +	int err;
> +
> +	reserved_size = memparse(str, NULL);
> +	if (!reserved_size)
> +		return 0;
> +
> +	if (reserved_size * 2 > PUD_SIZE)
> +		align = PUD_SIZE;
> +
> +	err = cma_declare_contiguous(0, reserved_size, 0, align, 0, false,
> +				     "secretmem", &secretmem_cma);
> +	if (err) {
> +		pr_err("failed to create CMA: %d\n", err);
> +		return err;
> +	}
> +
> +	pr_info("reserved %luM\n", reserved_size >> 20);
> +
> +	return 0;
> +}
> +__setup("secretmem=", secretmem_setup);
> -- 
> 2.28.0
> 

-- 
Michal Hocko
SUSE Labs
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

WARNING: multiple messages have this Message-ID (diff)
From: Michal Hocko <mhocko@suse.com>
To: Mike Rapoport <rppt@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Andy Lutomirski <luto@kernel.org>, Arnd Bergmann <arnd@arndb.de>,
	Borislav Petkov <bp@alien8.de>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Christopher Lameter <cl@linux.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	David Hildenbrand <david@redhat.com>,
	Elena Reshetova <elena.reshetova@intel.com>,
	"H. Peter Anvin" <hpa@zytor.com>, Ingo Molnar <mingo@redhat.com>,
	James Bottomley <jejb@linux.ibm.com>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	Matthew Wilcox <willy@infradead.org>,
	Mark Rutland <mark.rutland@arm.com>,
	Mike Rapoport <rppt@linux.ibm.com>,
	Michael Kerrisk <mtk.manpages@gmail.com>,
	Palmer Dabbelt <palmer@dabbelt.com>,
	Paul Walmsley <paul.walmsley@sifive.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Rick Edgecombe <rick.p.edgecombe@intel.com>,
	Roman Gushchin <guro@fb.com>, Shakeel Butt <shakeelb@google.com>,
	Shuah Khan <shuah@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Tycho Andersen <tycho@tycho.ws>, Will Deacon <will@kernel.org>,
	linux-api@vger.kernel.org, linux-arch@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org,
	linux-nvdimm@lists.01.org, linux-riscv@lists.infradead.org,
	x86@kernel.org, Hagen Paul Pfeifer <hagen@jauu.net>,
	Palmer Dabbelt <palmerdabbelt@google.com>
Subject: Re: [PATCH v16 07/11] secretmem: use PMD-size pages to amortize direct map fragmentation
Date: Tue, 26 Jan 2021 12:46:57 +0100	[thread overview]
Message-ID: <20210126114657.GL827@dhcp22.suse.cz> (raw)
In-Reply-To: <20210121122723.3446-8-rppt@kernel.org>

On Thu 21-01-21 14:27:19, Mike Rapoport wrote:
> From: Mike Rapoport <rppt@linux.ibm.com>
> 
> Removing a PAGE_SIZE page from the direct map every time such page is
> allocated for a secret memory mapping will cause severe fragmentation of
> the direct map. This fragmentation can be reduced by using PMD-size pages
> as a pool for small pages for secret memory mappings.
> 
> Add a gen_pool per secretmem inode and lazily populate this pool with
> PMD-size pages.
> 
> As pages allocated by secretmem become unmovable, use CMA to back large
> page caches so that page allocator won't be surprised by failing attempt to
> migrate these pages.
> 
> The CMA area used by secretmem is controlled by the "secretmem=" kernel
> parameter. This allows explicit control over the memory available for
> secretmem and provides upper hard limit for secretmem consumption.

OK, so I have finally had a look at this closer and this is really not
acceptable. I have already mentioned that in a response to other patch
but any task is able to deprive access to secret memory to other tasks
and cause OOM killer which wouldn't really recover ever and potentially
panic the system. Now you could be less drastic and only make SIGBUS on
fault but that would be still quite terrible. There is a very good
reason why hugetlb implements is non-trivial reservation system to avoid
exactly these problems.

So unless I am really misreading the code
Nacked-by: Michal Hocko <mhocko@suse.com>

That doesn't mean I reject the whole idea. There are some details to
sort out as mentioned elsewhere but you cannot really depend on
pre-allocated pool which can fail at a fault time like that.

> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Christopher Lameter <cl@linux.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Elena Reshetova <elena.reshetova@intel.com>
> Cc: Hagen Paul Pfeifer <hagen@jauu.net>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: James Bottomley <jejb@linux.ibm.com>
> Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: Michael Kerrisk <mtk.manpages@gmail.com>
> Cc: Palmer Dabbelt <palmer@dabbelt.com>
> Cc: Palmer Dabbelt <palmerdabbelt@google.com>
> Cc: Paul Walmsley <paul.walmsley@sifive.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Cc: Roman Gushchin <guro@fb.com>
> Cc: Shakeel Butt <shakeelb@google.com>
> Cc: Shuah Khan <shuah@kernel.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Tycho Andersen <tycho@tycho.ws>
> Cc: Will Deacon <will@kernel.org>
> ---
>  mm/Kconfig     |   2 +
>  mm/secretmem.c | 175 +++++++++++++++++++++++++++++++++++++++++--------
>  2 files changed, 150 insertions(+), 27 deletions(-)
> 
> diff --git a/mm/Kconfig b/mm/Kconfig
> index 5f8243442f66..ec35bf406439 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -874,5 +874,7 @@ config KMAP_LOCAL
>  
>  config SECRETMEM
>  	def_bool ARCH_HAS_SET_DIRECT_MAP && !EMBEDDED
> +	select GENERIC_ALLOCATOR
> +	select CMA
>  
>  endmenu
> diff --git a/mm/secretmem.c b/mm/secretmem.c
> index 904351d12c33..469211c7cc3a 100644
> --- a/mm/secretmem.c
> +++ b/mm/secretmem.c
> @@ -7,12 +7,15 @@
>  
>  #include <linux/mm.h>
>  #include <linux/fs.h>
> +#include <linux/cma.h>
>  #include <linux/mount.h>
>  #include <linux/memfd.h>
>  #include <linux/bitops.h>
>  #include <linux/printk.h>
>  #include <linux/pagemap.h>
> +#include <linux/genalloc.h>
>  #include <linux/syscalls.h>
> +#include <linux/memblock.h>
>  #include <linux/pseudo_fs.h>
>  #include <linux/secretmem.h>
>  #include <linux/set_memory.h>
> @@ -35,24 +38,94 @@
>  #define SECRETMEM_FLAGS_MASK	SECRETMEM_MODE_MASK
>  
>  struct secretmem_ctx {
> +	struct gen_pool *pool;
>  	unsigned int mode;
>  };
>  
> -static struct page *secretmem_alloc_page(gfp_t gfp)
> +static struct cma *secretmem_cma;
> +
> +static int secretmem_pool_increase(struct secretmem_ctx *ctx, gfp_t gfp)
>  {
> +	unsigned long nr_pages = (1 << PMD_PAGE_ORDER);
> +	struct gen_pool *pool = ctx->pool;
> +	unsigned long addr;
> +	struct page *page;
> +	int i, err;
> +
> +	page = cma_alloc(secretmem_cma, nr_pages, PMD_SIZE, gfp & __GFP_NOWARN);
> +	if (!page)
> +		return -ENOMEM;
> +
>  	/*
> -	 * FIXME: use a cache of large pages to reduce the direct map
> -	 * fragmentation
> +	 * clear the data left from the prevoius user before dropping the
> +	 * pages from the direct map
>  	 */
> -	return alloc_page(gfp | __GFP_ZERO);
> +	for (i = 0; i < nr_pages; i++)
> +		clear_highpage(page + i);
> +
> +	err = set_direct_map_invalid_noflush(page, nr_pages);
> +	if (err)
> +		goto err_cma_release;
> +
> +	addr = (unsigned long)page_address(page);
> +	err = gen_pool_add(pool, addr, PMD_SIZE, NUMA_NO_NODE);
> +	if (err)
> +		goto err_set_direct_map;
> +
> +	flush_tlb_kernel_range(addr, addr + PMD_SIZE);
> +
> +	return 0;
> +
> +err_set_direct_map:
> +	/*
> +	 * If a split of PUD-size page was required, it already happened
> +	 * when we marked the pages invalid which guarantees that this call
> +	 * won't fail
> +	 */
> +	set_direct_map_default_noflush(page, nr_pages);
> +err_cma_release:
> +	cma_release(secretmem_cma, page, nr_pages);
> +	return err;
> +}
> +
> +static void secretmem_free_page(struct secretmem_ctx *ctx, struct page *page)
> +{
> +	unsigned long addr = (unsigned long)page_address(page);
> +	struct gen_pool *pool = ctx->pool;
> +
> +	gen_pool_free(pool, addr, PAGE_SIZE);
> +}
> +
> +static struct page *secretmem_alloc_page(struct secretmem_ctx *ctx,
> +					 gfp_t gfp)
> +{
> +	struct gen_pool *pool = ctx->pool;
> +	unsigned long addr;
> +	struct page *page;
> +	int err;
> +
> +	if (gen_pool_avail(pool) < PAGE_SIZE) {
> +		err = secretmem_pool_increase(ctx, gfp);
> +		if (err)
> +			return NULL;
> +	}
> +
> +	addr = gen_pool_alloc(pool, PAGE_SIZE);
> +	if (!addr)
> +		return NULL;
> +
> +	page = virt_to_page(addr);
> +	get_page(page);
> +
> +	return page;
>  }
>  
>  static vm_fault_t secretmem_fault(struct vm_fault *vmf)
>  {
> +	struct secretmem_ctx *ctx = vmf->vma->vm_file->private_data;
>  	struct address_space *mapping = vmf->vma->vm_file->f_mapping;
>  	struct inode *inode = file_inode(vmf->vma->vm_file);
>  	pgoff_t offset = vmf->pgoff;
> -	unsigned long addr;
>  	struct page *page;
>  	int err;
>  
> @@ -62,40 +135,25 @@ static vm_fault_t secretmem_fault(struct vm_fault *vmf)
>  retry:
>  	page = find_lock_page(mapping, offset);
>  	if (!page) {
> -		page = secretmem_alloc_page(vmf->gfp_mask);
> +		page = secretmem_alloc_page(ctx, vmf->gfp_mask);
>  		if (!page)
>  			return VM_FAULT_OOM;
>  
> -		err = set_direct_map_invalid_noflush(page, 1);
> -		if (err) {
> -			put_page(page);
> -			return vmf_error(err);
> -		}
> -
>  		__SetPageUptodate(page);
>  		err = add_to_page_cache(page, mapping, offset, vmf->gfp_mask);
>  		if (unlikely(err)) {
> +			secretmem_free_page(ctx, page);
>  			put_page(page);
>  			if (err == -EEXIST)
>  				goto retry;
> -			goto err_restore_direct_map;
> +			return vmf_error(err);
>  		}
>  
> -		addr = (unsigned long)page_address(page);
> -		flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
> +		set_page_private(page, (unsigned long)ctx);
>  	}
>  
>  	vmf->page = page;
>  	return VM_FAULT_LOCKED;
> -
> -err_restore_direct_map:
> -	/*
> -	 * If a split of large page was required, it already happened
> -	 * when we marked the page invalid which guarantees that this call
> -	 * won't fail
> -	 */
> -	set_direct_map_default_noflush(page, 1);
> -	return vmf_error(err);
>  }
>  
>  static const struct vm_operations_struct secretmem_vm_ops = {
> @@ -141,8 +199,9 @@ static int secretmem_migratepage(struct address_space *mapping,
>  
>  static void secretmem_freepage(struct page *page)
>  {
> -	set_direct_map_default_noflush(page, 1);
> -	clear_highpage(page);
> +	struct secretmem_ctx *ctx = (struct secretmem_ctx *)page_private(page);
> +
> +	secretmem_free_page(ctx, page);
>  }
>  
>  static const struct address_space_operations secretmem_aops = {
> @@ -177,13 +236,18 @@ static struct file *secretmem_file_create(unsigned long flags)
>  	if (!ctx)
>  		goto err_free_inode;
>  
> +	ctx->pool = gen_pool_create(PAGE_SHIFT, NUMA_NO_NODE);
> +	if (!ctx->pool)
> +		goto err_free_ctx;
> +
>  	file = alloc_file_pseudo(inode, secretmem_mnt, "secretmem",
>  				 O_RDWR, &secretmem_fops);
>  	if (IS_ERR(file))
> -		goto err_free_ctx;
> +		goto err_free_pool;
>  
>  	mapping_set_unevictable(inode->i_mapping);
>  
> +	inode->i_private = ctx;
>  	inode->i_mapping->private_data = ctx;
>  	inode->i_mapping->a_ops = &secretmem_aops;
>  
> @@ -197,6 +261,8 @@ static struct file *secretmem_file_create(unsigned long flags)
>  
>  	return file;
>  
> +err_free_pool:
> +	gen_pool_destroy(ctx->pool);
>  err_free_ctx:
>  	kfree(ctx);
>  err_free_inode:
> @@ -215,6 +281,9 @@ SYSCALL_DEFINE1(memfd_secret, unsigned long, flags)
>  	if (flags & ~(SECRETMEM_FLAGS_MASK | O_CLOEXEC))
>  		return -EINVAL;
>  
> +	if (!secretmem_cma)
> +		return -ENOMEM;
> +
>  	fd = get_unused_fd_flags(flags & O_CLOEXEC);
>  	if (fd < 0)
>  		return fd;
> @@ -235,11 +304,37 @@ SYSCALL_DEFINE1(memfd_secret, unsigned long, flags)
>  	return err;
>  }
>  
> +static void secretmem_cleanup_chunk(struct gen_pool *pool,
> +				    struct gen_pool_chunk *chunk, void *data)
> +{
> +	unsigned long start = chunk->start_addr;
> +	unsigned long end = chunk->end_addr;
> +	struct page *page = virt_to_page(start);
> +	unsigned long nr_pages = (end - start + 1) / PAGE_SIZE;
> +	int i;
> +
> +	set_direct_map_default_noflush(page, nr_pages);
> +
> +	for (i = 0; i < nr_pages; i++)
> +		clear_highpage(page + i);
> +
> +	cma_release(secretmem_cma, page, nr_pages);
> +}
> +
> +static void secretmem_cleanup_pool(struct secretmem_ctx *ctx)
> +{
> +	struct gen_pool *pool = ctx->pool;
> +
> +	gen_pool_for_each_chunk(pool, secretmem_cleanup_chunk, ctx);
> +	gen_pool_destroy(pool);
> +}
> +
>  static void secretmem_evict_inode(struct inode *inode)
>  {
>  	struct secretmem_ctx *ctx = inode->i_private;
>  
>  	truncate_inode_pages_final(&inode->i_data);
> +	secretmem_cleanup_pool(ctx);
>  	clear_inode(inode);
>  	kfree(ctx);
>  }
> @@ -276,3 +371,29 @@ static int secretmem_init(void)
>  	return ret;
>  }
>  fs_initcall(secretmem_init);
> +
> +static int __init secretmem_setup(char *str)
> +{
> +	phys_addr_t align = PMD_SIZE;
> +	unsigned long reserved_size;
> +	int err;
> +
> +	reserved_size = memparse(str, NULL);
> +	if (!reserved_size)
> +		return 0;
> +
> +	if (reserved_size * 2 > PUD_SIZE)
> +		align = PUD_SIZE;
> +
> +	err = cma_declare_contiguous(0, reserved_size, 0, align, 0, false,
> +				     "secretmem", &secretmem_cma);
> +	if (err) {
> +		pr_err("failed to create CMA: %d\n", err);
> +		return err;
> +	}
> +
> +	pr_info("reserved %luM\n", reserved_size >> 20);
> +
> +	return 0;
> +}
> +__setup("secretmem=", secretmem_setup);
> -- 
> 2.28.0
> 

-- 
Michal Hocko
SUSE Labs

WARNING: multiple messages have this Message-ID (diff)
From: Michal Hocko <mhocko@suse.com>
To: Mike Rapoport <rppt@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>,
	David Hildenbrand <david@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	linux-mm@kvack.org, linux-kselftest@vger.kernel.org,
	"H. Peter Anvin" <hpa@zytor.com>,
	Christopher Lameter <cl@linux.com>, Shuah Khan <shuah@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Elena Reshetova <elena.reshetova@intel.com>,
	linux-arch@vger.kernel.org, Tycho Andersen <tycho@tycho.ws>,
	linux-nvdimm@lists.01.org, Will Deacon <will@kernel.org>,
	x86@kernel.org, Matthew Wilcox <willy@infradead.org>,
	Mike Rapoport <rppt@linux.ibm.com>,
	Ingo Molnar <mingo@redhat.com>,
	Michael Kerrisk <mtk.manpages@gmail.com>,
	Palmer Dabbelt <palmerdabbelt@google.com>,
	Arnd Bergmann <arnd@arndb.de>,
	James Bottomley <jejb@linux.ibm.com>,
	Hagen Paul Pfeifer <hagen@jauu.net>,
	Borislav Petkov <bp@alien8.de>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Andy Lutomirski <luto@kernel.org>,
	Paul Walmsley <paul.walmsley@sifive.com>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	Dan Williams <dan.j.williams@intel.com>,
	linux-arm-kernel@lists.infradead.org, linux-api@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org,
	Palmer Dabbelt <palmer@dabbelt.com>,
	linux-fsdevel@vger.kernel.org, Shakeel Butt <shakeelb@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Rick Edgecombe <rick.p.edgecombe@intel.com>,
	Roman Gushchin <guro@fb.com>
Subject: Re: [PATCH v16 07/11] secretmem: use PMD-size pages to amortize direct map fragmentation
Date: Tue, 26 Jan 2021 12:46:57 +0100	[thread overview]
Message-ID: <20210126114657.GL827@dhcp22.suse.cz> (raw)
In-Reply-To: <20210121122723.3446-8-rppt@kernel.org>

On Thu 21-01-21 14:27:19, Mike Rapoport wrote:
> From: Mike Rapoport <rppt@linux.ibm.com>
> 
> Removing a PAGE_SIZE page from the direct map every time such page is
> allocated for a secret memory mapping will cause severe fragmentation of
> the direct map. This fragmentation can be reduced by using PMD-size pages
> as a pool for small pages for secret memory mappings.
> 
> Add a gen_pool per secretmem inode and lazily populate this pool with
> PMD-size pages.
> 
> As pages allocated by secretmem become unmovable, use CMA to back large
> page caches so that page allocator won't be surprised by failing attempt to
> migrate these pages.
> 
> The CMA area used by secretmem is controlled by the "secretmem=" kernel
> parameter. This allows explicit control over the memory available for
> secretmem and provides upper hard limit for secretmem consumption.

OK, so I have finally had a look at this closer and this is really not
acceptable. I have already mentioned that in a response to other patch
but any task is able to deprive access to secret memory to other tasks
and cause OOM killer which wouldn't really recover ever and potentially
panic the system. Now you could be less drastic and only make SIGBUS on
fault but that would be still quite terrible. There is a very good
reason why hugetlb implements is non-trivial reservation system to avoid
exactly these problems.

So unless I am really misreading the code
Nacked-by: Michal Hocko <mhocko@suse.com>

That doesn't mean I reject the whole idea. There are some details to
sort out as mentioned elsewhere but you cannot really depend on
pre-allocated pool which can fail at a fault time like that.

> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Christopher Lameter <cl@linux.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Elena Reshetova <elena.reshetova@intel.com>
> Cc: Hagen Paul Pfeifer <hagen@jauu.net>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: James Bottomley <jejb@linux.ibm.com>
> Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: Michael Kerrisk <mtk.manpages@gmail.com>
> Cc: Palmer Dabbelt <palmer@dabbelt.com>
> Cc: Palmer Dabbelt <palmerdabbelt@google.com>
> Cc: Paul Walmsley <paul.walmsley@sifive.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Cc: Roman Gushchin <guro@fb.com>
> Cc: Shakeel Butt <shakeelb@google.com>
> Cc: Shuah Khan <shuah@kernel.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Tycho Andersen <tycho@tycho.ws>
> Cc: Will Deacon <will@kernel.org>
> ---
>  mm/Kconfig     |   2 +
>  mm/secretmem.c | 175 +++++++++++++++++++++++++++++++++++++++++--------
>  2 files changed, 150 insertions(+), 27 deletions(-)
> 
> diff --git a/mm/Kconfig b/mm/Kconfig
> index 5f8243442f66..ec35bf406439 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -874,5 +874,7 @@ config KMAP_LOCAL
>  
>  config SECRETMEM
>  	def_bool ARCH_HAS_SET_DIRECT_MAP && !EMBEDDED
> +	select GENERIC_ALLOCATOR
> +	select CMA
>  
>  endmenu
> diff --git a/mm/secretmem.c b/mm/secretmem.c
> index 904351d12c33..469211c7cc3a 100644
> --- a/mm/secretmem.c
> +++ b/mm/secretmem.c
> @@ -7,12 +7,15 @@
>  
>  #include <linux/mm.h>
>  #include <linux/fs.h>
> +#include <linux/cma.h>
>  #include <linux/mount.h>
>  #include <linux/memfd.h>
>  #include <linux/bitops.h>
>  #include <linux/printk.h>
>  #include <linux/pagemap.h>
> +#include <linux/genalloc.h>
>  #include <linux/syscalls.h>
> +#include <linux/memblock.h>
>  #include <linux/pseudo_fs.h>
>  #include <linux/secretmem.h>
>  #include <linux/set_memory.h>
> @@ -35,24 +38,94 @@
>  #define SECRETMEM_FLAGS_MASK	SECRETMEM_MODE_MASK
>  
>  struct secretmem_ctx {
> +	struct gen_pool *pool;
>  	unsigned int mode;
>  };
>  
> -static struct page *secretmem_alloc_page(gfp_t gfp)
> +static struct cma *secretmem_cma;
> +
> +static int secretmem_pool_increase(struct secretmem_ctx *ctx, gfp_t gfp)
>  {
> +	unsigned long nr_pages = (1 << PMD_PAGE_ORDER);
> +	struct gen_pool *pool = ctx->pool;
> +	unsigned long addr;
> +	struct page *page;
> +	int i, err;
> +
> +	page = cma_alloc(secretmem_cma, nr_pages, PMD_SIZE, gfp & __GFP_NOWARN);
> +	if (!page)
> +		return -ENOMEM;
> +
>  	/*
> -	 * FIXME: use a cache of large pages to reduce the direct map
> -	 * fragmentation
> +	 * clear the data left from the prevoius user before dropping the
> +	 * pages from the direct map
>  	 */
> -	return alloc_page(gfp | __GFP_ZERO);
> +	for (i = 0; i < nr_pages; i++)
> +		clear_highpage(page + i);
> +
> +	err = set_direct_map_invalid_noflush(page, nr_pages);
> +	if (err)
> +		goto err_cma_release;
> +
> +	addr = (unsigned long)page_address(page);
> +	err = gen_pool_add(pool, addr, PMD_SIZE, NUMA_NO_NODE);
> +	if (err)
> +		goto err_set_direct_map;
> +
> +	flush_tlb_kernel_range(addr, addr + PMD_SIZE);
> +
> +	return 0;
> +
> +err_set_direct_map:
> +	/*
> +	 * If a split of PUD-size page was required, it already happened
> +	 * when we marked the pages invalid which guarantees that this call
> +	 * won't fail
> +	 */
> +	set_direct_map_default_noflush(page, nr_pages);
> +err_cma_release:
> +	cma_release(secretmem_cma, page, nr_pages);
> +	return err;
> +}
> +
> +static void secretmem_free_page(struct secretmem_ctx *ctx, struct page *page)
> +{
> +	unsigned long addr = (unsigned long)page_address(page);
> +	struct gen_pool *pool = ctx->pool;
> +
> +	gen_pool_free(pool, addr, PAGE_SIZE);
> +}
> +
> +static struct page *secretmem_alloc_page(struct secretmem_ctx *ctx,
> +					 gfp_t gfp)
> +{
> +	struct gen_pool *pool = ctx->pool;
> +	unsigned long addr;
> +	struct page *page;
> +	int err;
> +
> +	if (gen_pool_avail(pool) < PAGE_SIZE) {
> +		err = secretmem_pool_increase(ctx, gfp);
> +		if (err)
> +			return NULL;
> +	}
> +
> +	addr = gen_pool_alloc(pool, PAGE_SIZE);
> +	if (!addr)
> +		return NULL;
> +
> +	page = virt_to_page(addr);
> +	get_page(page);
> +
> +	return page;
>  }
>  
>  static vm_fault_t secretmem_fault(struct vm_fault *vmf)
>  {
> +	struct secretmem_ctx *ctx = vmf->vma->vm_file->private_data;
>  	struct address_space *mapping = vmf->vma->vm_file->f_mapping;
>  	struct inode *inode = file_inode(vmf->vma->vm_file);
>  	pgoff_t offset = vmf->pgoff;
> -	unsigned long addr;
>  	struct page *page;
>  	int err;
>  
> @@ -62,40 +135,25 @@ static vm_fault_t secretmem_fault(struct vm_fault *vmf)
>  retry:
>  	page = find_lock_page(mapping, offset);
>  	if (!page) {
> -		page = secretmem_alloc_page(vmf->gfp_mask);
> +		page = secretmem_alloc_page(ctx, vmf->gfp_mask);
>  		if (!page)
>  			return VM_FAULT_OOM;
>  
> -		err = set_direct_map_invalid_noflush(page, 1);
> -		if (err) {
> -			put_page(page);
> -			return vmf_error(err);
> -		}
> -
>  		__SetPageUptodate(page);
>  		err = add_to_page_cache(page, mapping, offset, vmf->gfp_mask);
>  		if (unlikely(err)) {
> +			secretmem_free_page(ctx, page);
>  			put_page(page);
>  			if (err == -EEXIST)
>  				goto retry;
> -			goto err_restore_direct_map;
> +			return vmf_error(err);
>  		}
>  
> -		addr = (unsigned long)page_address(page);
> -		flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
> +		set_page_private(page, (unsigned long)ctx);
>  	}
>  
>  	vmf->page = page;
>  	return VM_FAULT_LOCKED;
> -
> -err_restore_direct_map:
> -	/*
> -	 * If a split of large page was required, it already happened
> -	 * when we marked the page invalid which guarantees that this call
> -	 * won't fail
> -	 */
> -	set_direct_map_default_noflush(page, 1);
> -	return vmf_error(err);
>  }
>  
>  static const struct vm_operations_struct secretmem_vm_ops = {
> @@ -141,8 +199,9 @@ static int secretmem_migratepage(struct address_space *mapping,
>  
>  static void secretmem_freepage(struct page *page)
>  {
> -	set_direct_map_default_noflush(page, 1);
> -	clear_highpage(page);
> +	struct secretmem_ctx *ctx = (struct secretmem_ctx *)page_private(page);
> +
> +	secretmem_free_page(ctx, page);
>  }
>  
>  static const struct address_space_operations secretmem_aops = {
> @@ -177,13 +236,18 @@ static struct file *secretmem_file_create(unsigned long flags)
>  	if (!ctx)
>  		goto err_free_inode;
>  
> +	ctx->pool = gen_pool_create(PAGE_SHIFT, NUMA_NO_NODE);
> +	if (!ctx->pool)
> +		goto err_free_ctx;
> +
>  	file = alloc_file_pseudo(inode, secretmem_mnt, "secretmem",
>  				 O_RDWR, &secretmem_fops);
>  	if (IS_ERR(file))
> -		goto err_free_ctx;
> +		goto err_free_pool;
>  
>  	mapping_set_unevictable(inode->i_mapping);
>  
> +	inode->i_private = ctx;
>  	inode->i_mapping->private_data = ctx;
>  	inode->i_mapping->a_ops = &secretmem_aops;
>  
> @@ -197,6 +261,8 @@ static struct file *secretmem_file_create(unsigned long flags)
>  
>  	return file;
>  
> +err_free_pool:
> +	gen_pool_destroy(ctx->pool);
>  err_free_ctx:
>  	kfree(ctx);
>  err_free_inode:
> @@ -215,6 +281,9 @@ SYSCALL_DEFINE1(memfd_secret, unsigned long, flags)
>  	if (flags & ~(SECRETMEM_FLAGS_MASK | O_CLOEXEC))
>  		return -EINVAL;
>  
> +	if (!secretmem_cma)
> +		return -ENOMEM;
> +
>  	fd = get_unused_fd_flags(flags & O_CLOEXEC);
>  	if (fd < 0)
>  		return fd;
> @@ -235,11 +304,37 @@ SYSCALL_DEFINE1(memfd_secret, unsigned long, flags)
>  	return err;
>  }
>  
> +static void secretmem_cleanup_chunk(struct gen_pool *pool,
> +				    struct gen_pool_chunk *chunk, void *data)
> +{
> +	unsigned long start = chunk->start_addr;
> +	unsigned long end = chunk->end_addr;
> +	struct page *page = virt_to_page(start);
> +	unsigned long nr_pages = (end - start + 1) / PAGE_SIZE;
> +	int i;
> +
> +	set_direct_map_default_noflush(page, nr_pages);
> +
> +	for (i = 0; i < nr_pages; i++)
> +		clear_highpage(page + i);
> +
> +	cma_release(secretmem_cma, page, nr_pages);
> +}
> +
> +static void secretmem_cleanup_pool(struct secretmem_ctx *ctx)
> +{
> +	struct gen_pool *pool = ctx->pool;
> +
> +	gen_pool_for_each_chunk(pool, secretmem_cleanup_chunk, ctx);
> +	gen_pool_destroy(pool);
> +}
> +
>  static void secretmem_evict_inode(struct inode *inode)
>  {
>  	struct secretmem_ctx *ctx = inode->i_private;
>  
>  	truncate_inode_pages_final(&inode->i_data);
> +	secretmem_cleanup_pool(ctx);
>  	clear_inode(inode);
>  	kfree(ctx);
>  }
> @@ -276,3 +371,29 @@ static int secretmem_init(void)
>  	return ret;
>  }
>  fs_initcall(secretmem_init);
> +
> +static int __init secretmem_setup(char *str)
> +{
> +	phys_addr_t align = PMD_SIZE;
> +	unsigned long reserved_size;
> +	int err;
> +
> +	reserved_size = memparse(str, NULL);
> +	if (!reserved_size)
> +		return 0;
> +
> +	if (reserved_size * 2 > PUD_SIZE)
> +		align = PUD_SIZE;
> +
> +	err = cma_declare_contiguous(0, reserved_size, 0, align, 0, false,
> +				     "secretmem", &secretmem_cma);
> +	if (err) {
> +		pr_err("failed to create CMA: %d\n", err);
> +		return err;
> +	}
> +
> +	pr_info("reserved %luM\n", reserved_size >> 20);
> +
> +	return 0;
> +}
> +__setup("secretmem=", secretmem_setup);
> -- 
> 2.28.0
> 

-- 
Michal Hocko
SUSE Labs

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

WARNING: multiple messages have this Message-ID (diff)
From: Michal Hocko <mhocko@suse.com>
To: Mike Rapoport <rppt@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>,
	David Hildenbrand <david@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	linux-mm@kvack.org, linux-kselftest@vger.kernel.org,
	"H. Peter Anvin" <hpa@zytor.com>,
	Christopher Lameter <cl@linux.com>, Shuah Khan <shuah@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Elena Reshetova <elena.reshetova@intel.com>,
	linux-arch@vger.kernel.org, Tycho Andersen <tycho@tycho.ws>,
	linux-nvdimm@lists.01.org, Will Deacon <will@kernel.org>,
	x86@kernel.org, Matthew Wilcox <willy@infradead.org>,
	Mike Rapoport <rppt@linux.ibm.com>,
	Ingo Molnar <mingo@redhat.com>,
	Michael Kerrisk <mtk.manpages@gmail.com>,
	Palmer Dabbelt <palmerdabbelt@google.com>,
	Arnd Bergmann <arnd@arndb.de>,
	James Bottomley <jejb@linux.ibm.com>,
	Hagen Paul Pfeifer <hagen@jauu.net>,
	Borislav Petkov <bp@alien8.de>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Andy Lutomirski <luto@kernel.org>,
	Paul Walmsley <paul.walmsley@sifive.com>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	Dan Williams <dan.j.williams@intel.com>,
	linux-arm-kernel@lists.infradead.org, linux-api@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org,
	Palmer Dabbelt <palmer@dabbelt.com>,
	linux-fsdevel@vger.kernel.org, Shakeel Butt <shakeelb@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Rick Edgecombe <rick.p.edgecombe@intel.com>,
	Roman Gushchin <guro@fb.com>
Subject: Re: [PATCH v16 07/11] secretmem: use PMD-size pages to amortize direct map fragmentation
Date: Tue, 26 Jan 2021 12:46:57 +0100	[thread overview]
Message-ID: <20210126114657.GL827@dhcp22.suse.cz> (raw)
In-Reply-To: <20210121122723.3446-8-rppt@kernel.org>

On Thu 21-01-21 14:27:19, Mike Rapoport wrote:
> From: Mike Rapoport <rppt@linux.ibm.com>
> 
> Removing a PAGE_SIZE page from the direct map every time such page is
> allocated for a secret memory mapping will cause severe fragmentation of
> the direct map. This fragmentation can be reduced by using PMD-size pages
> as a pool for small pages for secret memory mappings.
> 
> Add a gen_pool per secretmem inode and lazily populate this pool with
> PMD-size pages.
> 
> As pages allocated by secretmem become unmovable, use CMA to back large
> page caches so that page allocator won't be surprised by failing attempt to
> migrate these pages.
> 
> The CMA area used by secretmem is controlled by the "secretmem=" kernel
> parameter. This allows explicit control over the memory available for
> secretmem and provides upper hard limit for secretmem consumption.

OK, so I have finally had a look at this closer and this is really not
acceptable. I have already mentioned that in a response to other patch
but any task is able to deprive access to secret memory to other tasks
and cause OOM killer which wouldn't really recover ever and potentially
panic the system. Now you could be less drastic and only make SIGBUS on
fault but that would be still quite terrible. There is a very good
reason why hugetlb implements is non-trivial reservation system to avoid
exactly these problems.

So unless I am really misreading the code
Nacked-by: Michal Hocko <mhocko@suse.com>

That doesn't mean I reject the whole idea. There are some details to
sort out as mentioned elsewhere but you cannot really depend on
pre-allocated pool which can fail at a fault time like that.

> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Christopher Lameter <cl@linux.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Elena Reshetova <elena.reshetova@intel.com>
> Cc: Hagen Paul Pfeifer <hagen@jauu.net>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: James Bottomley <jejb@linux.ibm.com>
> Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: Michael Kerrisk <mtk.manpages@gmail.com>
> Cc: Palmer Dabbelt <palmer@dabbelt.com>
> Cc: Palmer Dabbelt <palmerdabbelt@google.com>
> Cc: Paul Walmsley <paul.walmsley@sifive.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Cc: Roman Gushchin <guro@fb.com>
> Cc: Shakeel Butt <shakeelb@google.com>
> Cc: Shuah Khan <shuah@kernel.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Tycho Andersen <tycho@tycho.ws>
> Cc: Will Deacon <will@kernel.org>
> ---
>  mm/Kconfig     |   2 +
>  mm/secretmem.c | 175 +++++++++++++++++++++++++++++++++++++++++--------
>  2 files changed, 150 insertions(+), 27 deletions(-)
> 
> diff --git a/mm/Kconfig b/mm/Kconfig
> index 5f8243442f66..ec35bf406439 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -874,5 +874,7 @@ config KMAP_LOCAL
>  
>  config SECRETMEM
>  	def_bool ARCH_HAS_SET_DIRECT_MAP && !EMBEDDED
> +	select GENERIC_ALLOCATOR
> +	select CMA
>  
>  endmenu
> diff --git a/mm/secretmem.c b/mm/secretmem.c
> index 904351d12c33..469211c7cc3a 100644
> --- a/mm/secretmem.c
> +++ b/mm/secretmem.c
> @@ -7,12 +7,15 @@
>  
>  #include <linux/mm.h>
>  #include <linux/fs.h>
> +#include <linux/cma.h>
>  #include <linux/mount.h>
>  #include <linux/memfd.h>
>  #include <linux/bitops.h>
>  #include <linux/printk.h>
>  #include <linux/pagemap.h>
> +#include <linux/genalloc.h>
>  #include <linux/syscalls.h>
> +#include <linux/memblock.h>
>  #include <linux/pseudo_fs.h>
>  #include <linux/secretmem.h>
>  #include <linux/set_memory.h>
> @@ -35,24 +38,94 @@
>  #define SECRETMEM_FLAGS_MASK	SECRETMEM_MODE_MASK
>  
>  struct secretmem_ctx {
> +	struct gen_pool *pool;
>  	unsigned int mode;
>  };
>  
> -static struct page *secretmem_alloc_page(gfp_t gfp)
> +static struct cma *secretmem_cma;
> +
> +static int secretmem_pool_increase(struct secretmem_ctx *ctx, gfp_t gfp)
>  {
> +	unsigned long nr_pages = (1 << PMD_PAGE_ORDER);
> +	struct gen_pool *pool = ctx->pool;
> +	unsigned long addr;
> +	struct page *page;
> +	int i, err;
> +
> +	page = cma_alloc(secretmem_cma, nr_pages, PMD_SIZE, gfp & __GFP_NOWARN);
> +	if (!page)
> +		return -ENOMEM;
> +
>  	/*
> -	 * FIXME: use a cache of large pages to reduce the direct map
> -	 * fragmentation
> +	 * clear the data left from the prevoius user before dropping the
> +	 * pages from the direct map
>  	 */
> -	return alloc_page(gfp | __GFP_ZERO);
> +	for (i = 0; i < nr_pages; i++)
> +		clear_highpage(page + i);
> +
> +	err = set_direct_map_invalid_noflush(page, nr_pages);
> +	if (err)
> +		goto err_cma_release;
> +
> +	addr = (unsigned long)page_address(page);
> +	err = gen_pool_add(pool, addr, PMD_SIZE, NUMA_NO_NODE);
> +	if (err)
> +		goto err_set_direct_map;
> +
> +	flush_tlb_kernel_range(addr, addr + PMD_SIZE);
> +
> +	return 0;
> +
> +err_set_direct_map:
> +	/*
> +	 * If a split of PUD-size page was required, it already happened
> +	 * when we marked the pages invalid which guarantees that this call
> +	 * won't fail
> +	 */
> +	set_direct_map_default_noflush(page, nr_pages);
> +err_cma_release:
> +	cma_release(secretmem_cma, page, nr_pages);
> +	return err;
> +}
> +
> +static void secretmem_free_page(struct secretmem_ctx *ctx, struct page *page)
> +{
> +	unsigned long addr = (unsigned long)page_address(page);
> +	struct gen_pool *pool = ctx->pool;
> +
> +	gen_pool_free(pool, addr, PAGE_SIZE);
> +}
> +
> +static struct page *secretmem_alloc_page(struct secretmem_ctx *ctx,
> +					 gfp_t gfp)
> +{
> +	struct gen_pool *pool = ctx->pool;
> +	unsigned long addr;
> +	struct page *page;
> +	int err;
> +
> +	if (gen_pool_avail(pool) < PAGE_SIZE) {
> +		err = secretmem_pool_increase(ctx, gfp);
> +		if (err)
> +			return NULL;
> +	}
> +
> +	addr = gen_pool_alloc(pool, PAGE_SIZE);
> +	if (!addr)
> +		return NULL;
> +
> +	page = virt_to_page(addr);
> +	get_page(page);
> +
> +	return page;
>  }
>  
>  static vm_fault_t secretmem_fault(struct vm_fault *vmf)
>  {
> +	struct secretmem_ctx *ctx = vmf->vma->vm_file->private_data;
>  	struct address_space *mapping = vmf->vma->vm_file->f_mapping;
>  	struct inode *inode = file_inode(vmf->vma->vm_file);
>  	pgoff_t offset = vmf->pgoff;
> -	unsigned long addr;
>  	struct page *page;
>  	int err;
>  
> @@ -62,40 +135,25 @@ static vm_fault_t secretmem_fault(struct vm_fault *vmf)
>  retry:
>  	page = find_lock_page(mapping, offset);
>  	if (!page) {
> -		page = secretmem_alloc_page(vmf->gfp_mask);
> +		page = secretmem_alloc_page(ctx, vmf->gfp_mask);
>  		if (!page)
>  			return VM_FAULT_OOM;
>  
> -		err = set_direct_map_invalid_noflush(page, 1);
> -		if (err) {
> -			put_page(page);
> -			return vmf_error(err);
> -		}
> -
>  		__SetPageUptodate(page);
>  		err = add_to_page_cache(page, mapping, offset, vmf->gfp_mask);
>  		if (unlikely(err)) {
> +			secretmem_free_page(ctx, page);
>  			put_page(page);
>  			if (err == -EEXIST)
>  				goto retry;
> -			goto err_restore_direct_map;
> +			return vmf_error(err);
>  		}
>  
> -		addr = (unsigned long)page_address(page);
> -		flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
> +		set_page_private(page, (unsigned long)ctx);
>  	}
>  
>  	vmf->page = page;
>  	return VM_FAULT_LOCKED;
> -
> -err_restore_direct_map:
> -	/*
> -	 * If a split of large page was required, it already happened
> -	 * when we marked the page invalid which guarantees that this call
> -	 * won't fail
> -	 */
> -	set_direct_map_default_noflush(page, 1);
> -	return vmf_error(err);
>  }
>  
>  static const struct vm_operations_struct secretmem_vm_ops = {
> @@ -141,8 +199,9 @@ static int secretmem_migratepage(struct address_space *mapping,
>  
>  static void secretmem_freepage(struct page *page)
>  {
> -	set_direct_map_default_noflush(page, 1);
> -	clear_highpage(page);
> +	struct secretmem_ctx *ctx = (struct secretmem_ctx *)page_private(page);
> +
> +	secretmem_free_page(ctx, page);
>  }
>  
>  static const struct address_space_operations secretmem_aops = {
> @@ -177,13 +236,18 @@ static struct file *secretmem_file_create(unsigned long flags)
>  	if (!ctx)
>  		goto err_free_inode;
>  
> +	ctx->pool = gen_pool_create(PAGE_SHIFT, NUMA_NO_NODE);
> +	if (!ctx->pool)
> +		goto err_free_ctx;
> +
>  	file = alloc_file_pseudo(inode, secretmem_mnt, "secretmem",
>  				 O_RDWR, &secretmem_fops);
>  	if (IS_ERR(file))
> -		goto err_free_ctx;
> +		goto err_free_pool;
>  
>  	mapping_set_unevictable(inode->i_mapping);
>  
> +	inode->i_private = ctx;
>  	inode->i_mapping->private_data = ctx;
>  	inode->i_mapping->a_ops = &secretmem_aops;
>  
> @@ -197,6 +261,8 @@ static struct file *secretmem_file_create(unsigned long flags)
>  
>  	return file;
>  
> +err_free_pool:
> +	gen_pool_destroy(ctx->pool);
>  err_free_ctx:
>  	kfree(ctx);
>  err_free_inode:
> @@ -215,6 +281,9 @@ SYSCALL_DEFINE1(memfd_secret, unsigned long, flags)
>  	if (flags & ~(SECRETMEM_FLAGS_MASK | O_CLOEXEC))
>  		return -EINVAL;
>  
> +	if (!secretmem_cma)
> +		return -ENOMEM;
> +
>  	fd = get_unused_fd_flags(flags & O_CLOEXEC);
>  	if (fd < 0)
>  		return fd;
> @@ -235,11 +304,37 @@ SYSCALL_DEFINE1(memfd_secret, unsigned long, flags)
>  	return err;
>  }
>  
> +static void secretmem_cleanup_chunk(struct gen_pool *pool,
> +				    struct gen_pool_chunk *chunk, void *data)
> +{
> +	unsigned long start = chunk->start_addr;
> +	unsigned long end = chunk->end_addr;
> +	struct page *page = virt_to_page(start);
> +	unsigned long nr_pages = (end - start + 1) / PAGE_SIZE;
> +	int i;
> +
> +	set_direct_map_default_noflush(page, nr_pages);
> +
> +	for (i = 0; i < nr_pages; i++)
> +		clear_highpage(page + i);
> +
> +	cma_release(secretmem_cma, page, nr_pages);
> +}
> +
> +static void secretmem_cleanup_pool(struct secretmem_ctx *ctx)
> +{
> +	struct gen_pool *pool = ctx->pool;
> +
> +	gen_pool_for_each_chunk(pool, secretmem_cleanup_chunk, ctx);
> +	gen_pool_destroy(pool);
> +}
> +
>  static void secretmem_evict_inode(struct inode *inode)
>  {
>  	struct secretmem_ctx *ctx = inode->i_private;
>  
>  	truncate_inode_pages_final(&inode->i_data);
> +	secretmem_cleanup_pool(ctx);
>  	clear_inode(inode);
>  	kfree(ctx);
>  }
> @@ -276,3 +371,29 @@ static int secretmem_init(void)
>  	return ret;
>  }
>  fs_initcall(secretmem_init);
> +
> +static int __init secretmem_setup(char *str)
> +{
> +	phys_addr_t align = PMD_SIZE;
> +	unsigned long reserved_size;
> +	int err;
> +
> +	reserved_size = memparse(str, NULL);
> +	if (!reserved_size)
> +		return 0;
> +
> +	if (reserved_size * 2 > PUD_SIZE)
> +		align = PUD_SIZE;
> +
> +	err = cma_declare_contiguous(0, reserved_size, 0, align, 0, false,
> +				     "secretmem", &secretmem_cma);
> +	if (err) {
> +		pr_err("failed to create CMA: %d\n", err);
> +		return err;
> +	}
> +
> +	pr_info("reserved %luM\n", reserved_size >> 20);
> +
> +	return 0;
> +}
> +__setup("secretmem=", secretmem_setup);
> -- 
> 2.28.0
> 

-- 
Michal Hocko
SUSE Labs

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2021-01-26 11:47 UTC|newest]

Thread overview: 318+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-21 12:27 [PATCH v16 00/11] mm: introduce memfd_secret system call to create "secret" memory areas Mike Rapoport
2021-01-21 12:27 ` Mike Rapoport
2021-01-21 12:27 ` Mike Rapoport
2021-01-21 12:27 ` Mike Rapoport
2021-01-21 12:27 ` [PATCH v16 01/11] mm: add definition of PMD_PAGE_ORDER Mike Rapoport
2021-01-21 12:27   ` Mike Rapoport
2021-01-21 12:27   ` Mike Rapoport
2021-01-21 12:27   ` Mike Rapoport
2021-01-21 12:27 ` [PATCH v16 02/11] mmap: make mlock_future_check() global Mike Rapoport
2021-01-21 12:27   ` Mike Rapoport
2021-01-21 12:27   ` Mike Rapoport
2021-01-21 12:27   ` Mike Rapoport
2021-01-21 12:27 ` [PATCH v16 03/11] riscv/Kconfig: make direct map manipulation options depend on MMU Mike Rapoport
2021-01-21 12:27   ` Mike Rapoport
2021-01-21 12:27   ` Mike Rapoport
2021-01-21 12:27   ` Mike Rapoport
2021-01-21 12:27 ` [PATCH v16 04/11] set_memory: allow set_direct_map_*_noflush() for multiple pages Mike Rapoport
2021-01-21 12:27   ` Mike Rapoport
2021-01-21 12:27   ` Mike Rapoport
2021-01-21 12:27   ` Mike Rapoport
2021-01-21 12:27 ` [PATCH v16 05/11] set_memory: allow querying whether set_direct_map_*() is actually enabled Mike Rapoport
2021-01-21 12:27   ` Mike Rapoport
2021-01-21 12:27   ` Mike Rapoport
2021-01-21 12:27   ` Mike Rapoport
2021-01-21 12:27 ` [PATCH v16 06/11] mm: introduce memfd_secret system call to create "secret" memory areas Mike Rapoport
2021-01-21 12:27   ` Mike Rapoport
2021-01-21 12:27   ` Mike Rapoport
2021-01-21 12:27   ` Mike Rapoport
2021-01-25 17:01   ` Michal Hocko
2021-01-25 17:01     ` Michal Hocko
2021-01-25 17:01     ` Michal Hocko
2021-01-25 17:01     ` Michal Hocko
2021-01-25 21:36     ` Mike Rapoport
2021-01-25 21:36       ` Mike Rapoport
2021-01-25 21:36       ` Mike Rapoport
2021-01-25 21:36       ` Mike Rapoport
2021-01-26  7:16       ` Michal Hocko
2021-01-26  7:16         ` Michal Hocko
2021-01-26  7:16         ` Michal Hocko
2021-01-26  7:16         ` Michal Hocko
2021-01-26  8:33         ` Mike Rapoport
2021-01-26  8:33           ` Mike Rapoport
2021-01-26  8:33           ` Mike Rapoport
2021-01-26  8:33           ` Mike Rapoport
2021-01-26  9:00           ` Michal Hocko
2021-01-26  9:00             ` Michal Hocko
2021-01-26  9:00             ` Michal Hocko
2021-01-26  9:00             ` Michal Hocko
2021-01-26  9:20             ` Mike Rapoport
2021-01-26  9:20               ` Mike Rapoport
2021-01-26  9:20               ` Mike Rapoport
2021-01-26  9:20               ` Mike Rapoport
2021-01-26  9:49               ` Michal Hocko
2021-01-26  9:49                 ` Michal Hocko
2021-01-26  9:49                 ` Michal Hocko
2021-01-26  9:49                 ` Michal Hocko
2021-01-26  9:53                 ` David Hildenbrand
2021-01-26  9:53                   ` David Hildenbrand
2021-01-26  9:53                   ` David Hildenbrand
2021-01-26  9:53                   ` David Hildenbrand
2021-01-26 10:19                   ` Michal Hocko
2021-01-26 10:19                     ` Michal Hocko
2021-01-26 10:19                     ` Michal Hocko
2021-01-26 10:19                     ` Michal Hocko
2021-01-26  9:20             ` Michal Hocko
2021-01-26  9:20               ` Michal Hocko
2021-01-26  9:20               ` Michal Hocko
2021-01-26  9:20               ` Michal Hocko
2021-02-03 12:15   ` Michal Hocko
2021-02-03 12:15     ` Michal Hocko
2021-02-03 12:15     ` Michal Hocko
2021-02-03 12:15     ` Michal Hocko
2021-02-04 11:34     ` Mike Rapoport
2021-02-04 11:34       ` Mike Rapoport
2021-02-04 11:34       ` Mike Rapoport
2021-02-04 11:34       ` Mike Rapoport
2021-01-21 12:27 ` [PATCH v16 07/11] secretmem: use PMD-size pages to amortize direct map fragmentation Mike Rapoport
2021-01-21 12:27   ` Mike Rapoport
2021-01-21 12:27   ` Mike Rapoport
2021-01-21 12:27   ` Mike Rapoport
2021-01-26 11:46   ` Michal Hocko [this message]
2021-01-26 11:46     ` Michal Hocko
2021-01-26 11:46     ` Michal Hocko
2021-01-26 11:46     ` Michal Hocko
2021-01-26 11:56     ` David Hildenbrand
2021-01-26 11:56       ` David Hildenbrand
2021-01-26 11:56       ` David Hildenbrand
2021-01-26 11:56       ` David Hildenbrand
2021-01-26 12:08       ` Michal Hocko
2021-01-26 12:08         ` Michal Hocko
2021-01-26 12:08         ` Michal Hocko
2021-01-26 12:08         ` Michal Hocko
2021-01-28  9:22         ` Mike Rapoport
2021-01-28  9:22           ` Mike Rapoport
2021-01-28  9:22           ` Mike Rapoport
2021-01-28  9:22           ` Mike Rapoport
2021-01-28 13:01           ` Michal Hocko
2021-01-28 13:01             ` Michal Hocko
2021-01-28 13:01             ` Michal Hocko
2021-01-28 13:01             ` Michal Hocko
2021-01-28 13:28             ` Christoph Lameter
2021-01-28 13:28               ` Christoph Lameter
2021-01-28 13:28               ` Christoph Lameter
2021-01-28 13:28               ` Christoph Lameter
2021-01-28 13:28               ` Christoph Lameter
2021-01-28 13:49               ` Michal Hocko
2021-01-28 13:49                 ` Michal Hocko
2021-01-28 13:49                 ` Michal Hocko
2021-01-28 13:49                 ` Michal Hocko
2021-01-28 15:56                 ` Christoph Lameter
2021-01-28 15:56                   ` Christoph Lameter
2021-01-28 15:56                   ` Christoph Lameter
2021-01-28 15:56                   ` Christoph Lameter
2021-01-28 15:56                   ` Christoph Lameter
2021-01-28 16:23                   ` Michal Hocko
2021-01-28 16:23                     ` Michal Hocko
2021-01-28 16:23                     ` Michal Hocko
2021-01-28 16:23                     ` Michal Hocko
2021-01-28 15:28             ` James Bottomley
2021-01-28 15:28               ` James Bottomley
2021-01-28 15:28               ` James Bottomley
2021-01-28 15:28               ` James Bottomley
2021-01-29  7:03               ` Mike Rapoport
2021-01-29  7:03                 ` Mike Rapoport
2021-01-29  7:03                 ` Mike Rapoport
2021-01-29  7:03                 ` Mike Rapoport
2021-01-28 21:05             ` James Bottomley
2021-01-28 21:05               ` James Bottomley
2021-01-28 21:05               ` James Bottomley
2021-01-28 21:05               ` James Bottomley
2021-01-29  7:53               ` Michal Hocko
2021-01-29  7:53                 ` Michal Hocko
2021-01-29  7:53                 ` Michal Hocko
2021-01-29  7:53                 ` Michal Hocko
2021-01-29  8:23               ` Michal Hocko
2021-01-29  8:23                 ` Michal Hocko
2021-01-29  8:23                 ` Michal Hocko
2021-01-29  8:23                 ` Michal Hocko
2021-02-01 16:56                 ` James Bottomley
2021-02-01 16:56                   ` James Bottomley
2021-02-01 16:56                   ` James Bottomley
2021-02-01 16:56                   ` James Bottomley
2021-02-02  9:35                   ` Michal Hocko
2021-02-02  9:35                     ` Michal Hocko
2021-02-02  9:35                     ` Michal Hocko
2021-02-02  9:35                     ` Michal Hocko
2021-02-02 12:48                     ` Mike Rapoport
2021-02-02 12:48                       ` Mike Rapoport
2021-02-02 12:48                       ` Mike Rapoport
2021-02-02 12:48                       ` Mike Rapoport
2021-02-02 13:14                       ` David Hildenbrand
2021-02-02 13:14                         ` David Hildenbrand
2021-02-02 13:14                         ` David Hildenbrand
2021-02-02 13:14                         ` David Hildenbrand
2021-02-02 13:32                         ` Michal Hocko
2021-02-02 13:32                           ` Michal Hocko
2021-02-02 13:32                           ` Michal Hocko
2021-02-02 13:32                           ` Michal Hocko
2021-02-02 14:12                           ` David Hildenbrand
2021-02-02 14:12                             ` David Hildenbrand
2021-02-02 14:12                             ` David Hildenbrand
2021-02-02 14:12                             ` David Hildenbrand
2021-02-02 14:22                             ` Michal Hocko
2021-02-02 14:22                               ` Michal Hocko
2021-02-02 14:22                               ` Michal Hocko
2021-02-02 14:22                               ` Michal Hocko
2021-02-02 14:26                               ` David Hildenbrand
2021-02-02 14:26                                 ` David Hildenbrand
2021-02-02 14:26                                 ` David Hildenbrand
2021-02-02 14:26                                 ` David Hildenbrand
2021-02-02 14:32                                 ` Michal Hocko
2021-02-02 14:32                                   ` Michal Hocko
2021-02-02 14:32                                   ` Michal Hocko
2021-02-02 14:32                                   ` Michal Hocko
2021-02-02 14:34                                   ` David Hildenbrand
2021-02-02 14:34                                     ` David Hildenbrand
2021-02-02 14:34                                     ` David Hildenbrand
2021-02-02 14:34                                     ` David Hildenbrand
2021-02-02 18:15                                     ` Mike Rapoport
2021-02-02 18:15                                       ` Mike Rapoport
2021-02-02 18:15                                       ` Mike Rapoport
2021-02-02 18:15                                       ` Mike Rapoport
2021-02-02 18:55                                       ` James Bottomley
2021-02-02 18:55                                         ` James Bottomley
2021-02-02 18:55                                         ` James Bottomley
2021-02-02 18:55                                         ` James Bottomley
2021-02-03 12:09                                         ` Michal Hocko
2021-02-03 12:09                                           ` Michal Hocko
2021-02-03 12:09                                           ` Michal Hocko
2021-02-03 12:09                                           ` Michal Hocko
2021-02-04 11:31                                           ` Mike Rapoport
2021-02-04 11:31                                             ` Mike Rapoport
2021-02-04 11:31                                             ` Mike Rapoport
2021-02-04 11:31                                             ` Mike Rapoport
2021-02-02 13:27                       ` Michal Hocko
2021-02-02 13:27                         ` Michal Hocko
2021-02-02 13:27                         ` Michal Hocko
2021-02-02 13:27                         ` Michal Hocko
2021-02-02 19:10                         ` Mike Rapoport
2021-02-02 19:10                           ` Mike Rapoport
2021-02-02 19:10                           ` Mike Rapoport
2021-02-02 19:10                           ` Mike Rapoport
2021-02-03  9:12                           ` Michal Hocko
2021-02-03  9:12                             ` Michal Hocko
2021-02-03  9:12                             ` Michal Hocko
2021-02-03  9:12                             ` Michal Hocko
2021-02-04  9:58                             ` Mike Rapoport
2021-02-04  9:58                               ` Mike Rapoport
2021-02-04  9:58                               ` Mike Rapoport
2021-02-04  9:58                               ` Mike Rapoport
2021-02-04 13:02                               ` Michal Hocko
2021-02-04 13:02                                 ` Michal Hocko
2021-02-04 13:02                                 ` Michal Hocko
2021-02-04 13:02                                 ` Michal Hocko
2021-01-29  7:21             ` Mike Rapoport
2021-01-29  7:21               ` Mike Rapoport
2021-01-29  7:21               ` Mike Rapoport
2021-01-29  7:21               ` Mike Rapoport
2021-01-29  8:51               ` Michal Hocko
2021-01-29  8:51                 ` Michal Hocko
2021-01-29  8:51                 ` Michal Hocko
2021-01-29  8:51                 ` Michal Hocko
2021-02-02 14:42                 ` David Hildenbrand
2021-02-02 14:42                   ` David Hildenbrand
2021-02-02 14:42                   ` David Hildenbrand
2021-02-02 14:42                   ` David Hildenbrand
2021-01-21 12:27 ` [PATCH v16 08/11] secretmem: add memcg accounting Mike Rapoport
2021-01-21 12:27   ` Mike Rapoport
2021-01-21 12:27   ` Mike Rapoport
2021-01-21 12:27   ` Mike Rapoport
2021-01-25 16:17   ` Matthew Wilcox
2021-01-25 16:17     ` Matthew Wilcox
2021-01-25 16:17     ` Matthew Wilcox
2021-01-25 16:17     ` Matthew Wilcox
2021-01-25 17:18     ` Shakeel Butt
2021-01-25 17:18       ` Shakeel Butt
2021-01-25 17:18       ` Shakeel Butt
2021-01-25 17:18       ` Shakeel Butt
2021-01-25 17:18       ` Shakeel Butt
2021-01-25 21:35       ` Mike Rapoport
2021-01-25 21:35         ` Mike Rapoport
2021-01-25 21:35         ` Mike Rapoport
2021-01-25 21:35         ` Mike Rapoport
2021-01-28 15:07         ` Shakeel Butt
2021-01-28 15:07           ` Shakeel Butt
2021-01-28 15:07           ` Shakeel Butt
2021-01-28 15:07           ` Shakeel Butt
2021-01-28 15:07           ` Shakeel Butt
2021-01-25 16:54   ` Michal Hocko
2021-01-25 16:54     ` Michal Hocko
2021-01-25 16:54     ` Michal Hocko
2021-01-25 16:54     ` Michal Hocko
2021-01-25 21:38     ` Mike Rapoport
2021-01-25 21:38       ` Mike Rapoport
2021-01-25 21:38       ` Mike Rapoport
2021-01-25 21:38       ` Mike Rapoport
2021-01-26  7:31       ` Michal Hocko
2021-01-26  7:31         ` Michal Hocko
2021-01-26  7:31         ` Michal Hocko
2021-01-26  7:31         ` Michal Hocko
2021-01-26  8:56         ` Mike Rapoport
2021-01-26  8:56           ` Mike Rapoport
2021-01-26  8:56           ` Mike Rapoport
2021-01-26  8:56           ` Mike Rapoport
2021-01-26  9:15           ` Michal Hocko
2021-01-26  9:15             ` Michal Hocko
2021-01-26  9:15             ` Michal Hocko
2021-01-26  9:15             ` Michal Hocko
2021-01-26 14:48       ` Matthew Wilcox
2021-01-26 14:48         ` Matthew Wilcox
2021-01-26 14:48         ` Matthew Wilcox
2021-01-26 14:48         ` Matthew Wilcox
2021-01-26 15:05         ` Michal Hocko
2021-01-26 15:05           ` Michal Hocko
2021-01-26 15:05           ` Michal Hocko
2021-01-26 15:05           ` Michal Hocko
2021-01-27 18:42           ` Roman Gushchin
2021-01-27 18:42             ` Roman Gushchin
2021-01-27 18:42             ` Roman Gushchin
2021-01-27 18:42             ` Roman Gushchin
2021-01-28  7:58             ` Michal Hocko
2021-01-28  7:58               ` Michal Hocko
2021-01-28  7:58               ` Michal Hocko
2021-01-28  7:58               ` Michal Hocko
2021-01-28 14:05               ` Shakeel Butt
2021-01-28 14:05                 ` Shakeel Butt
2021-01-28 14:05                 ` Shakeel Butt
2021-01-28 14:05                 ` Shakeel Butt
2021-01-28 14:05                 ` Shakeel Butt
2021-01-28 14:22                 ` Michal Hocko
2021-01-28 14:22                   ` Michal Hocko
2021-01-28 14:22                   ` Michal Hocko
2021-01-28 14:22                   ` Michal Hocko
2021-01-28 14:57                   ` Shakeel Butt
2021-01-28 14:57                     ` Shakeel Butt
2021-01-28 14:57                     ` Shakeel Butt
2021-01-28 14:57                     ` Shakeel Butt
2021-01-28 14:57                     ` Shakeel Butt
2021-01-21 12:27 ` [PATCH v16 09/11] PM: hibernate: disable when there are active secretmem users Mike Rapoport
2021-01-21 12:27   ` Mike Rapoport
2021-01-21 12:27   ` Mike Rapoport
2021-01-21 12:27   ` Mike Rapoport
2021-01-21 12:27 ` [PATCH v16 10/11] arch, mm: wire up memfd_secret system call where relevant Mike Rapoport
2021-01-21 12:27   ` Mike Rapoport
2021-01-21 12:27   ` Mike Rapoport
2021-01-21 12:27   ` Mike Rapoport
2021-01-25 18:18   ` Catalin Marinas
2021-01-25 18:18     ` Catalin Marinas
2021-01-25 18:18     ` Catalin Marinas
2021-01-25 18:18     ` Catalin Marinas
2021-01-21 12:27 ` [PATCH v16 11/11] secretmem: test: add basic selftest for memfd_secret(2) Mike Rapoport
2021-01-21 12:27   ` Mike Rapoport
2021-01-21 12:27   ` Mike Rapoport
2021-01-21 12:27   ` Mike Rapoport
2021-01-21 22:18 ` [PATCH v16 00/11] mm: introduce memfd_secret system call to create "secret" memory areas Andrew Morton
2021-01-21 22:18   ` Andrew Morton
2021-01-21 22:18   ` Andrew Morton
2021-01-21 22:18   ` Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210126114657.GL827@dhcp22.suse.cz \
    --to=mhocko@suse.com \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=bp@alien8.de \
    --cc=catalin.marinas@arm.com \
    --cc=cl@linux.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=elena.reshetova@intel.com \
    --cc=guro@fb.com \
    --cc=hagen@jauu.net \
    --cc=hpa@zytor.com \
    --cc=jejb@linux.ibm.com \
    --cc=kirill@shutemov.name \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=luto@kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mingo@redhat.com \
    --cc=mtk.manpages@gmail.com \
    --cc=palmer@dabbelt.com \
    --cc=palmerdabbelt@google.com \
    --cc=paul.walmsley@sifive.com \
    --cc=peterz@infradead.org \
    --cc=rick.p.edgecombe@intel.com \
    --cc=rppt@kernel.org \
    --cc=rppt@linux.ibm.com \
    --cc=shakeelb@google.com \
    --cc=shuah@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=tycho@tycho.ws \
    --cc=viro@zeniv.linux.org.uk \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.