linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Hugh Dickins <hughd@google.com>
To: Dave Hansen <dave.hansen@linux.intel.com>
Cc: linux-kernel@vger.kernel.org, keescook@google.com,
	tglx@linutronix.de, mingo@kernel.org, aarcange@redhat.com,
	jgross@suse.com, jpoimboe@redhat.com, gregkh@linuxfoundation.org,
	peterz@infradead.org, hughd@google.com,
	torvalds@linux-foundation.org, bp@alien8.de, luto@kernel.org,
	ak@linux.intel.com
Subject: Re: [PATCH 5/7] x86/mm/init: remove freed kernel image areas from alias mapping
Date: Fri, 3 Aug 2018 17:35:45 -0700 (PDT)	[thread overview]
Message-ID: <alpine.LSU.2.11.1808031719120.4436@eggly.anvils> (raw)
In-Reply-To: <20180802225831.5F6A2BFC@viggo.jf.intel.com>

On Thu, 2 Aug 2018, Dave Hansen wrote:
> 
> From: Dave Hansen <dave.hansen@linux.intel.com>
> 
> The kernel image is mapped into two places in the virtual address
> space (addresses without KASLR, of course):
> 
> 	1. The kernel direct map (0xffff880000000000)
> 	2. The "high kernel map" (0xffffffff81000000)
> 
> We actually execute out of #2.  If we get the address of a kernel
> symbol, it points to #2, but almost all physical-to-virtual
> translations point to #1.
> 
> Parts of the "high kernel map" alias are mapped in the userspace
> page tables with the Global bit for performance reasons.  The
> parts that we map to userspace do not (er, should not) have
> secrets.

If respun, I think it would be worth reminding us here that with PTI,
the Global bit is not usually set on the high kernel map at all: it's
just used to compensate for poor performance when PCIDs not available.

> 
> This is fine, except that some areas in the kernel image that
> are adjacent to the non-secret-containing areas are unused holes.
> We free these holes back into the normal page allocator and
> reuse them as normal kernel memory.  The memory will, of course,
> get *used* via the normal map, but the alias mapping is kept.
> 
> This otherwise unused alias mapping of the holes will, by default
> keep the Global bit, be mapped out to userspace, and be
> vulnerable to Meltdown.
> 
> Remove the alias mapping of these pages entirely.  This is likely
> to fracture the 2M page mapping the kernel image near these areas,
> but this should affect a minority of the area.
> 
> The pageattr code changes *all* aliases mapping the physical pages
> that it operates on (by default).  We only want to modify a single
> alias, so we need to tweak its behavior.
> 
> This unmapping behavior is currently dependent on PTI being in
> place.  Going forward, we should at least consider doing this for
> all configurations.  Having an extra read-write alias for memory
> is not exactly ideal for debugging things like random memory
> corruption and this does undercut features like DEBUG_PAGEALLOC
> or future work like eXclusive Page Frame Ownership (XPFO).
> 
> Before this patch:
> 
> current_kernel:---[ High Kernel Mapping ]---
> current_kernel-0xffffffff80000000-0xffffffff81000000          16M                               pmd
> current_kernel-0xffffffff81000000-0xffffffff81e00000          14M     ro         PSE     GLB x  pmd
> current_kernel-0xffffffff81e00000-0xffffffff81e11000          68K     ro                 GLB x  pte
> current_kernel-0xffffffff81e11000-0xffffffff82000000        1980K     RW                     NX pte
> current_kernel-0xffffffff82000000-0xffffffff82600000           6M     ro         PSE     GLB NX pmd
> current_kernel-0xffffffff82600000-0xffffffff82c00000           6M     RW         PSE         NX pmd
> current_kernel-0xffffffff82c00000-0xffffffff82e00000           2M     RW                     NX pte
> current_kernel-0xffffffff82e00000-0xffffffff83200000           4M     RW         PSE         NX pmd
> current_kernel-0xffffffff83200000-0xffffffffa0000000         462M                               pmd
> 
>   current_user:---[ High Kernel Mapping ]---
>   current_user-0xffffffff80000000-0xffffffff81000000          16M                               pmd
>   current_user-0xffffffff81000000-0xffffffff81e00000          14M     ro         PSE     GLB x  pmd
>   current_user-0xffffffff81e00000-0xffffffff81e11000          68K     ro                 GLB x  pte
>   current_user-0xffffffff81e11000-0xffffffff82000000        1980K     RW                     NX pte
>   current_user-0xffffffff82000000-0xffffffff82600000           6M     ro         PSE     GLB NX pmd
>   current_user-0xffffffff82600000-0xffffffffa0000000         474M                               pmd
> 
> 
> After this patch:
> 
> current_kernel:---[ High Kernel Mapping ]---
> current_kernel-0xffffffff80000000-0xffffffff81000000          16M                               pmd
> current_kernel-0xffffffff81000000-0xffffffff81e00000          14M     ro         PSE     GLB x  pmd
> current_kernel-0xffffffff81e00000-0xffffffff81e11000          68K     ro                 GLB x  pte
> current_kernel-0xffffffff81e11000-0xffffffff82000000        1980K                               pte
> current_kernel-0xffffffff82000000-0xffffffff82400000           4M     ro         PSE     GLB NX pmd
> current_kernel-0xffffffff82400000-0xffffffff82488000         544K     ro                     NX pte
> current_kernel-0xffffffff82488000-0xffffffff82600000        1504K                               pte
> current_kernel-0xffffffff82600000-0xffffffff82c00000           6M     RW         PSE         NX pmd
> current_kernel-0xffffffff82c00000-0xffffffff82c0d000          52K     RW                     NX pte
> current_kernel-0xffffffff82c0d000-0xffffffff82dc0000        1740K                               pte
> 
>   current_user:---[ High Kernel Mapping ]---
>   current_user-0xffffffff80000000-0xffffffff81000000          16M                               pmd
>   current_user-0xffffffff81000000-0xffffffff81e00000          14M     ro         PSE     GLB x  pmd
>   current_user-0xffffffff81e00000-0xffffffff81e11000          68K     ro                 GLB x  pte
>   current_user-0xffffffff81e11000-0xffffffff82000000        1980K                               pte
>   current_user-0xffffffff82000000-0xffffffff82400000           4M     ro         PSE     GLB NX pmd
>   current_user-0xffffffff82400000-0xffffffff82488000         544K     ro                     NX pte

That entry correctly matches the current-kernel entry, but I would expect
both of them to be marked "GLB".  Probably of very little importance, but
it might indicate that something (my brain, perhaps) is not working quite
as you would hope.

Hugh

>   current_user-0xffffffff82488000-0xffffffff82600000        1504K                               pte
>   current_user-0xffffffff82600000-0xffffffffa0000000         474M                               pmd
> 
> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Kees Cook <keescook@google.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: Juergen Gross <jgross@suse.com>
> Cc: Josh Poimboeuf <jpoimboe@redhat.com>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Andi Kleen <ak@linux.intel.com>
> ---
> 
>  b/arch/x86/include/asm/set_memory.h |    1 +
>  b/arch/x86/mm/init.c                |   25 +++++++++++++++++++++++--
>  b/arch/x86/mm/pageattr.c            |   13 +++++++++++++
>  3 files changed, 37 insertions(+), 2 deletions(-)
> 
> diff -puN arch/x86/include/asm/set_memory.h~x86-unmap-freed-areas-from-kernel-image arch/x86/include/asm/set_memory.h
> --- a/arch/x86/include/asm/set_memory.h~x86-unmap-freed-areas-from-kernel-image	2018-08-02 14:14:49.462483274 -0700
> +++ b/arch/x86/include/asm/set_memory.h	2018-08-02 14:14:49.471483274 -0700
> @@ -46,6 +46,7 @@ int set_memory_np(unsigned long addr, in
>  int set_memory_4k(unsigned long addr, int numpages);
>  int set_memory_encrypted(unsigned long addr, int numpages);
>  int set_memory_decrypted(unsigned long addr, int numpages);
> +int set_memory_np_noalias(unsigned long addr, int numpages);
>  
>  int set_memory_array_uc(unsigned long *addr, int addrinarray);
>  int set_memory_array_wc(unsigned long *addr, int addrinarray);
> diff -puN arch/x86/mm/init.c~x86-unmap-freed-areas-from-kernel-image arch/x86/mm/init.c
> --- a/arch/x86/mm/init.c~x86-unmap-freed-areas-from-kernel-image	2018-08-02 14:14:49.463483274 -0700
> +++ b/arch/x86/mm/init.c	2018-08-02 14:14:49.471483274 -0700
> @@ -780,8 +780,29 @@ void free_init_pages(char *what, unsigne
>   */
>  void free_kernel_image_pages(void *begin, void *end)
>  {
> -	free_init_pages("unused kernel image",
> -			(unsigned long)begin, (unsigned long)end);
> +	unsigned long begin_ul = (unsigned long)begin;
> +	unsigned long end_ul = (unsigned long)end;
> +	unsigned long len_pages = (end_ul - begin_ul) >> PAGE_SHIFT;
> +
> +
> +	free_init_pages("unused kernel image", begin_ul, end_ul);
> +
> +	/*
> +	 * PTI maps some of the kernel into userspace.  For
> +	 * performance, this includes some kernel areas that
> +	 * do not contain secrets.  Those areas might be
> +	 * adjacent to the parts of the kernel image being
> +	 * freed, which may contain secrets.  Remove the
> +	 * "high kernel image mapping" for these freed areas,
> +	 * ensuring they are not even potentially vulnerable
> +	 * to Meltdown regardless of the specific optimizations
> +	 * PTI is currently using.
> +	 *
> +	 * The "noalias" prevents unmapping the direct map
> +	 * alias which is needed to access the freed pages.
> +	 */
> +	if (cpu_feature_enabled(X86_FEATURE_PTI))
> +		set_memory_np_noalias(begin_ul, len_pages);
>  }
>  
>  void __ref free_initmem(void)
> diff -puN arch/x86/mm/pageattr.c~x86-unmap-freed-areas-from-kernel-image arch/x86/mm/pageattr.c
> --- a/arch/x86/mm/pageattr.c~x86-unmap-freed-areas-from-kernel-image	2018-08-02 14:14:49.466483274 -0700
> +++ b/arch/x86/mm/pageattr.c	2018-08-02 14:14:49.472483274 -0700
> @@ -53,6 +53,7 @@ static DEFINE_SPINLOCK(cpa_lock);
>  #define CPA_FLUSHTLB 1
>  #define CPA_ARRAY 2
>  #define CPA_PAGES_ARRAY 4
> +#define CPA_NO_CHECK_ALIAS 8 /* Do not search for aliases */
>  
>  #ifdef CONFIG_PROC_FS
>  static unsigned long direct_pages_count[PG_LEVEL_NUM];
> @@ -1486,6 +1487,9 @@ static int change_page_attr_set_clr(unsi
>  
>  	/* No alias checking for _NX bit modifications */
>  	checkalias = (pgprot_val(mask_set) | pgprot_val(mask_clr)) != _PAGE_NX;
> +	/* Never check aliases if the caller asks for it explicitly: */
> +	if (checkalias && (in_flag & CPA_NO_CHECK_ALIAS))
> +		checkalias = 0;
>  
>  	ret = __change_page_attr_set_clr(&cpa, checkalias);
>  
> @@ -1772,6 +1776,15 @@ int set_memory_np(unsigned long addr, in
>  	return change_page_attr_clear(&addr, numpages, __pgprot(_PAGE_PRESENT), 0);
>  }
>  
> +int set_memory_np_noalias(unsigned long addr, int numpages)
> +{
> +	int cpa_flags = CPA_NO_CHECK_ALIAS;
> +
> +	return change_page_attr_set_clr(&addr, numpages, __pgprot(0),
> +					__pgprot(_PAGE_PRESENT), 0,
> +					cpa_flags, NULL);
> +}
> +
>  int set_memory_4k(unsigned long addr, int numpages)
>  {
>  	return change_page_attr_set_clr(&addr, numpages, __pgprot(0),
> _
> 

  reply	other threads:[~2018-08-04  0:35 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-02 22:58 [PATCH 0/7] [v2] x86/mm/pti: close two Meltdown leaks with Global kernel mapping Dave Hansen
2018-08-02 22:58 ` [PATCH 1/7] x86/mm/pti: clear Global bit more aggressively Dave Hansen
2018-08-05 20:30   ` [tip:x86/pti] x86/mm/pti: Clear " tip-bot for Dave Hansen
2018-08-02 22:58 ` [PATCH 2/7] mm: allow non-direct-map arguments to free_reserved_area() Dave Hansen
2018-08-05 20:31   ` [tip:x86/pti] mm: Allow " tip-bot for Dave Hansen
2018-08-02 22:58 ` [PATCH 3/7] x86/mm/init: pass unconverted symbol addresses to free_init_pages() Dave Hansen
2018-08-04  0:18   ` Hugh Dickins
2018-08-04 17:31     ` Linus Torvalds
2018-08-04 18:23       ` Hugh Dickins
2018-08-05  6:11       ` Andi Kleen
2018-08-05 20:31   ` [tip:x86/pti] x86/mm/init: Pass " tip-bot for Dave Hansen
2018-08-02 22:58 ` [PATCH 4/7] x86/mm/init: add helper for freeing kernel image pages Dave Hansen
2018-08-05 20:32   ` [tip:x86/pti] x86/mm/init: Add " tip-bot for Dave Hansen
2018-08-02 22:58 ` [PATCH 5/7] x86/mm/init: remove freed kernel image areas from alias mapping Dave Hansen
2018-08-04  0:35   ` Hugh Dickins [this message]
2018-08-04 21:38   ` Andy Lutomirski
2018-08-06 15:17     ` Dave Hansen
2018-08-05 20:32   ` [tip:x86/pti] x86/mm/init: Remove " tip-bot for Dave Hansen
2018-08-06 20:21   ` [tip:x86/pti-urgent] " tip-bot for Dave Hansen
2018-08-02 22:58 ` [PATCH 6/7] x86/mm/pageattr: pass named flag instead of 0/1 Dave Hansen
2018-08-05 20:09   ` Thomas Gleixner
2018-08-06 15:09     ` Dave Hansen
2018-08-02 22:58 ` [PATCH 7/7] x86/mm/pageattr: Remove implicit NX behavior Dave Hansen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LSU.2.11.1808031719120.4436@eggly.anvils \
    --to=hughd@google.com \
    --cc=aarcange@redhat.com \
    --cc=ak@linux.intel.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=jgross@suse.com \
    --cc=jpoimboe@redhat.com \
    --cc=keescook@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).