kernel-hardening.lists.openwall.com archive mirror
 help / color / mirror / Atom feed
From: Mike Rapoport <rppt@kernel.org>
To: Rick Edgecombe <rick.p.edgecombe@intel.com>
Cc: dave.hansen@intel.com, luto@kernel.org, peterz@infradead.org,
	x86@kernel.org, akpm@linux-foundation.org, keescook@chromium.org,
	shakeelb@google.com, vbabka@suse.cz, linux-mm@kvack.org,
	linux-hardening@vger.kernel.org,
	kernel-hardening@lists.openwall.com, ira.weiny@intel.com,
	dan.j.williams@intel.com, linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH v2 17/19] x86/mm/cpa: PKS protect direct map page tables
Date: Tue, 31 Aug 2021 13:14:16 +0300	[thread overview]
Message-ID: <YS4A+F1H9MvyOeMV@kernel.org> (raw)
In-Reply-To: <20210830235927.6443-18-rick.p.edgecombe@intel.com>

On Mon, Aug 30, 2021 at 04:59:25PM -0700, Rick Edgecombe wrote:
> Protecting direct map page tables is a bit more difficult because a page
> table may be needed for a page split as part of setting the PKS
> permission the new page table. So in the case of an empty cache of page
> tables the page table allocator could get into a situation where it cannot
> create any more page tables.
> 
> Several solutions were looked at:
> 
> 1. Break the direct map with pages allocated from the large page being
> converted to PKS. This would result in a window where the table could be
> written to right before it was linked into the page tables. It also
> depends on high order pages being available, and so would regress from
> the un-protected behavior in that respect.
> 2. Hold some page tables in reserve to be able to break the large page
> for a new 2MB page, but if there are no 2MB page's available we may need
> to add a single page to the cache, in which case we would use up the
> reserve of page tables needed to break a new page, but not get enough
> page tables back to replenish the resereve.
> 3. Always map the direct map at 4k when protecting page tables so that
> pages don't need to be broken to map them with a PKS key. This would have
> undesirable performance.
> 
> 4. Lastly, the strategy employed in this patch, have a separate cache of
> page tables just used for the direct map. Early in boot, squirrel away
> enough page tables to map the direct map at 4k. This comes with the same
> memory overhead of mapping the direct map at 4k, but gets the other
> benefits of mapping the direct map as large pages.
> 
> There is also the problem of protecting page tables that are allocated
> during boot. Instead of recording the tables to protect later, create a
> page table traversing infrastructure to walk every page table in init_mm
> and apply protection. This also covers non-direct map odds-and-ends page
> tables that are allocated during boot. The existing page table traversing
> in pagewalk.c cannot be used for this purpose because there are not actual
> vmas for all of the kernel address space.
> 
> The algorithm for protecting the direct map page table cache, while also
> allocating from it for direct map splits is described in the comments of
> init_pks_dmap_tables().
> 
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> ---
>  arch/x86/include/asm/set_memory.h |   2 +
>  arch/x86/mm/init.c                |  89 ++++++++++
>  arch/x86/mm/pat/set_memory.c      | 263 +++++++++++++++++++++++++++++-
>  3 files changed, 350 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/include/asm/set_memory.h b/arch/x86/include/asm/set_memory.h
> index 1ba2fb45ed05..9f8d0d0ae063 100644
> --- a/arch/x86/include/asm/set_memory.h
> +++ b/arch/x86/include/asm/set_memory.h
> @@ -90,6 +90,8 @@ bool kernel_page_present(struct page *page);
>  
>  extern int kernel_set_to_readonly;
>  
> +void add_dmap_table(unsigned long addr);
> +
>  #ifdef CONFIG_X86_64
>  /*
>   * Prevent speculative access to the page by either unmapping
> diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
> index c8933c6d5efd..a91696e3da96 100644
> --- a/arch/x86/mm/init.c
> +++ b/arch/x86/mm/init.c
> @@ -6,6 +6,7 @@
>  #include <linux/swapfile.h>
>  #include <linux/swapops.h>
>  #include <linux/kmemleak.h>
> +#include <linux/hugetlb.h>
>  #include <linux/sched/task.h>
>  
>  #include <asm/set_memory.h>
> @@ -26,6 +27,7 @@
>  #include <asm/pti.h>
>  #include <asm/text-patching.h>
>  #include <asm/memtype.h>
> +#include <asm/pgalloc.h>
>  
>  /*
>   * We need to define the tracepoints somewhere, and tlb.c
> @@ -119,6 +121,17 @@ __ref void *alloc_low_pages(unsigned int num)
>  	if (after_bootmem) {
>  		unsigned int order;
>  
> +		if (cpu_feature_enabled(X86_FEATURE_PKS_TABLES)) {
> +			struct page *page;
> +
> +			/* 64 bit only allocates order 0 pages */
> +			WARN_ON(num != 1);
> +
> +			page = alloc_table(GFP_ATOMIC | __GFP_ZERO);
> +			if (!page)
> +				return NULL;
> +			return (void *)page_address(page);
> +		}
>  		order = get_order((unsigned long)num << PAGE_SHIFT);
>  		return (void *)__get_free_pages(GFP_ATOMIC | __GFP_ZERO, order);
>  	}
> @@ -504,6 +517,79 @@ bool pfn_range_is_mapped(unsigned long start_pfn, unsigned long end_pfn)
>  	return false;
>  }
>  
> +#ifdef CONFIG_PKS_PG_TABLES
> +/* Page tables needed in bytes */
> +static u64 calc_tables_needed(unsigned int size)
> +{
> +	unsigned int puds = size >> PUD_SHIFT;
> +	unsigned int pmds = size >> PMD_SHIFT;
> +
> +	/*
> +	 * Catch if direct map ever might need more page tables to split
> +	 * down to 4k.
> +	 */
> +	BUILD_BUG_ON(p4d_huge(foo));
> +	BUILD_BUG_ON(pgd_huge(foo));
> +
> +	return (puds + pmds) << PAGE_SHIFT;
> +}
> +
> +/*
> + * If pre boot, reserve large pages from memory that will be mapped. It's ok that this is not
> + * mapped as PKS, other init code in CPA will handle the conversion.
> + */
> +static unsigned int __init reserve_pre_boot(u64 start, u64 end)
> +{
> +	u64 cur = memblock_find_in_range(start, end, HPAGE_SIZE, HPAGE_SIZE);
> +	int i;

Please use memblock_phys_alloc_range() here.
Besides, it seems this reserved pages are not accessed until late_initcall
time, so there is no need to limit the allocation to already mapped areas,
memblock_alloc_raw() would suffice.

> +
> +	if (!cur)
> +		return 0;
> +	memblock_reserve(cur, HPAGE_SIZE);
> +	for (i = 0; i < HPAGE_SIZE; i += PAGE_SIZE)
> +		add_dmap_table((unsigned long)__va(cur + i));
> +	return HPAGE_SIZE;
> +}
> +
> +/* If post boot, memblock is not available. Just reserve from other memory regions */
> +static unsigned int __init reserve_post_boot(void)
> +{
> +	struct page *page = alloc_table(GFP_KERNEL);
> +
> +	if (!page)
> +		return 0;
> +
> +	add_dmap_table((unsigned long)page_address(page));

add_dmap_table() calls use casting everywhere, maybe make it
add_dmap_table(void *)?

> +
> +	return PAGE_SIZE;
> +}
> +
> +static void __init reserve_page_tables(u64 start, u64 end)
> +{
> +	u64 reserve_size = calc_tables_needed(end - start);
> +	u64 reserved = 0;
> +	u64 cur_reserved;
> +
> +	while (reserved < reserve_size) {
> +		if (after_bootmem)
> +			cur_reserved = reserve_post_boot();
> +		else
> +			cur_reserved = reserve_pre_boot(start, end);
> +
> +		if (!cur_reserved) {
> +			WARN(1, "Could not reserve direct map page tables %llu/%llu\n",
> +				reserved,
> +				reserve_size);
> +			return;
> +		}
> +
> +		reserved += cur_reserved;
> +	}
> +}
> +#else
> +static inline void reserve_page_tables(u64 start, u64 end) { }
> +#endif
> +
>  /*
>   * Setup the direct mapping of the physical memory at PAGE_OFFSET.
>   * This runs before bootmem is initialized and gets pages directly from
> @@ -529,6 +615,9 @@ unsigned long __ref init_memory_mapping(unsigned long start,
>  
>  	add_pfn_range_mapped(start >> PAGE_SHIFT, ret >> PAGE_SHIFT);
>  
> +	if (cpu_feature_enabled(X86_FEATURE_PKS_TABLES))
> +		reserve_page_tables(start, end);
> +
>  	return ret >> PAGE_SHIFT;
>  }

-- 
Sincerely yours,
Mike.

  reply	other threads:[~2021-08-31 10:14 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-30 23:59 [RFC PATCH v2 00/19] PKS write protected page tables Rick Edgecombe
2021-08-30 23:59 ` [RFC PATCH v2 01/19] list: Support getting most recent element in list_lru Rick Edgecombe
2021-08-30 23:59 ` [RFC PATCH v2 02/19] list: Support list head not in object for list_lru Rick Edgecombe
2021-08-30 23:59 ` [RFC PATCH v2 03/19] x86/mm/cpa: Add grouped page allocations Rick Edgecombe
2021-08-30 23:59 ` [RFC PATCH v2 04/19] mm: Explicitly zero page table lock ptr Rick Edgecombe
2021-08-30 23:59 ` [RFC PATCH v2 05/19] x86, mm: Use cache of page tables Rick Edgecombe
2021-08-31  8:40   ` Mike Rapoport
2021-08-31 19:09     ` Edgecombe, Rick P
2021-08-30 23:59 ` [RFC PATCH v2 06/19] x86/mm/cpa: Add perm callbacks to grouped pages Rick Edgecombe
2021-08-30 23:59 ` [RFC PATCH v2 07/19] x86/cpufeatures: Add feature for pks tables Rick Edgecombe
2021-08-30 23:59 ` [RFC PATCH v2 08/19] x86/mm/cpa: Add get_grouped_page_atomic() Rick Edgecombe
2021-08-30 23:59 ` [RFC PATCH v2 09/19] x86/mm: Support GFP_ATOMIC in alloc_table_node() Rick Edgecombe
2021-08-31  8:32   ` Mike Rapoport
2021-08-30 23:59 ` [RFC PATCH v2 10/19] x86/mm: Use alloc_table() for fill_pte(), etc Rick Edgecombe
2021-08-31  8:47   ` Mike Rapoport
2021-08-31 18:48     ` Edgecombe, Rick P
2021-08-30 23:59 ` [RFC PATCH v2 11/19] mm/sparsemem: Use alloc_table() for table allocations Rick Edgecombe
2021-08-31  8:55   ` Mike Rapoport
2021-08-31 18:25     ` Edgecombe, Rick P
2021-09-01  7:22       ` Mike Rapoport
2021-09-02 13:56         ` Vlastimil Babka
2021-08-30 23:59 ` [RFC PATCH v2 12/19] x86/mm: Use free_table in unmap path Rick Edgecombe
2021-08-30 23:59 ` [RFC PATCH v2 13/19] mm/debug_vm_page_table: Use setters instead of WRITE_ONCE Rick Edgecombe
2021-08-30 23:59 ` [RFC PATCH v2 14/19] x86/efi: Toggle table protections when copying Rick Edgecombe
2021-08-30 23:59 ` [RFC PATCH v2 15/19] x86/mm/cpa: Add set_memory_pks() Rick Edgecombe
2021-08-30 23:59 ` [RFC PATCH v2 16/19] x86/mm: Protect page tables with PKS Rick Edgecombe
2021-08-31  8:56   ` Mike Rapoport
2021-08-31 17:55     ` Edgecombe, Rick P
2021-08-30 23:59 ` [RFC PATCH v2 17/19] x86/mm/cpa: PKS protect direct map page tables Rick Edgecombe
2021-08-31 10:14   ` Mike Rapoport [this message]
2021-08-31 17:58     ` Edgecombe, Rick P
2021-08-30 23:59 ` [RFC PATCH v2 18/19] x86/mm: Add PKS table soft mode Rick Edgecombe
2021-08-31  3:49   ` Randy Dunlap
2021-08-31 17:55     ` Edgecombe, Rick P
2021-08-30 23:59 ` [RFC PATCH v2 19/19] x86/mm: Add PKS table debug checking Rick Edgecombe
2024-03-14 16:27 ` [RFC PATCH v2 00/19] PKS write protected page tables Kees Cook
2024-03-14 17:10   ` Edgecombe, Rick P
2024-03-14 18:25     ` Ira Weiny
2024-03-14 21:02       ` Boris Lukashev
2024-03-16  3:14 ` Boris Lukashev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YS4A+F1H9MvyOeMV@kernel.org \
    --to=rppt@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=ira.weiny@intel.com \
    --cc=keescook@chromium.org \
    --cc=kernel-hardening@lists.openwall.com \
    --cc=linux-hardening@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rick.p.edgecombe@intel.com \
    --cc=shakeelb@google.com \
    --cc=vbabka@suse.cz \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).