Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Gilbert <benjamin.gilbert@coreos.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	X86 ML <x86@kernel.org>, LKML <linux-kernel@vger.kernel.org>,
	linux-mm@kvack.org, stable <stable@vger.kernel.org>,
	Ingo Molnar <mingo@kernel.org>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Thomas Garnier <thgarnie@google.com>,
	Alexander Kuleshov <kuleshovmail@gmail.com>
Subject: Re: "bad pmd" errors + oops with KPTI on 4.14.11 after loading X.509 certs
Date: Thu, 4 Jan 2018 13:28:59 +0100 (CET)
Message-ID: <alpine.DEB.2.20.1801041320360.1771@nanos> (raw)
In-Reply-To: <CALCETrW8NxLd4v_U_g8JyW5XdVXWhM_MZOUn05J8VTuWOwkj-A@mail.gmail.com>

On Wed, 3 Jan 2018, Andy Lutomirski wrote:
> On Wed, Jan 3, 2018 at 8:35 PM, Benjamin Gilbert
> <benjamin.gilbert@coreos.com> wrote:
> > On Wed, Jan 03, 2018 at 04:37:53PM -0800, Andy Lutomirski wrote:
> >> Maybe try rebuilding a bad kernel with free_ldt_pgtables() modified
> >> to do nothing, and the read /sys/kernel/debug/page_tables/current (or
> >> current_kernel, or whatever it's called).  The problem may be obvious.
> >
> > current_kernel attached.  I have not seen any crashes with
> > free_ldt_pgtables() stubbed out.
> 
> I haven't reproduced it, but I think I see what's wrong.  KASLR sets
> vaddr_end to a totally bogus value.  It should be no larger than
> LDT_BASE_ADDR.  I suspect that your vmemmap is getting randomized into
> the LDT range.  If it weren't for that, it could just as easily land
> in the cpu_entry_area range.  This will need fixing in all versions
> that aren't still called KAISER.
> 
> Our memory map code is utter shite.  This kind of bug should not be
> possible without a giant warning at boot that something is screwed up.

You're right it's utter shite and the KASLR folks who added this insanity
of making vaddr_end depend on a gazillion of config options and not
documenting it in mm.txt or elsewhere where it's obvious to find should
really sit back and think hard about their half baken 'security' features.

Just look at the insanity of comment above the vaddr_end ifdef maze.

Benjamin, can you test the patch below please?

Thanks,

	tglx

8<--------------
--- a/Documentation/x86/x86_64/mm.txt
+++ b/Documentation/x86/x86_64/mm.txt
@@ -12,8 +12,9 @@ ffffea0000000000 - ffffeaffffffffff (=40
 ... unused hole ...
 ffffec0000000000 - fffffbffffffffff (=44 bits) kasan shadow memory (16TB)
 ... unused hole ...
-fffffe0000000000 - fffffe7fffffffff (=39 bits) LDT remap for PTI
-fffffe8000000000 - fffffeffffffffff (=39 bits) cpu_entry_area mapping
+				    vaddr_end for KASLR 
+fffffe0000000000 - fffffe7fffffffff (=39 bits) cpu_entry_area mapping
+fffffe8000000000 - fffffeffffffffff (=39 bits) LDT remap for PTI
 ffffff0000000000 - ffffff7fffffffff (=39 bits) %esp fixup stacks
 ... unused hole ...
 ffffffef00000000 - fffffffeffffffff (=64 GB) EFI region mapping space
@@ -37,7 +38,9 @@ ffd4000000000000 - ffd5ffffffffffff (=49
 ... unused hole ...
 ffdf000000000000 - fffffc0000000000 (=53 bits) kasan shadow memory (8PB)
 ... unused hole ...
-fffffe8000000000 - fffffeffffffffff (=39 bits) cpu_entry_area mapping
+				    vaddr_end for KASLR 
+fffffe0000000000 - fffffe7fffffffff (=39 bits) cpu_entry_area mapping
+... unused hole ...
 ffffff0000000000 - ffffff7fffffffff (=39 bits) %esp fixup stacks
 ... unused hole ...
 ffffffef00000000 - fffffffeffffffff (=64 GB) EFI region mapping space
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -88,7 +88,7 @@ typedef struct { pteval_t pte; } pte_t;
 # define VMALLOC_SIZE_TB	_AC(32, UL)
 # define __VMALLOC_BASE		_AC(0xffffc90000000000, UL)
 # define __VMEMMAP_BASE		_AC(0xffffea0000000000, UL)
-# define LDT_PGD_ENTRY		_AC(-4, UL)
+# define LDT_PGD_ENTRY		_AC(-3, UL)
 # define LDT_BASE_ADDR		(LDT_PGD_ENTRY << PGDIR_SHIFT)
 #endif
 
@@ -110,7 +110,7 @@ typedef struct { pteval_t pte; } pte_t;
 #define ESPFIX_PGD_ENTRY	_AC(-2, UL)
 #define ESPFIX_BASE_ADDR	(ESPFIX_PGD_ENTRY << P4D_SHIFT)
 
-#define CPU_ENTRY_AREA_PGD	_AC(-3, UL)
+#define CPU_ENTRY_AREA_PGD	_AC(-4, UL)
 #define CPU_ENTRY_AREA_BASE	(CPU_ENTRY_AREA_PGD << P4D_SHIFT)
 
 #define EFI_VA_START		( -4 * (_AC(1, UL) << 30))
--- a/arch/x86/mm/kaslr.c
+++ b/arch/x86/mm/kaslr.c
@@ -34,25 +34,14 @@
 #define TB_SHIFT 40
 
 /*
- * Virtual address start and end range for randomization. The end changes base
- * on configuration to have the highest amount of space for randomization.
- * It increases the possible random position for each randomized region.
+ * Virtual address start and end range for randomization.
  *
- * You need to add an if/def entry if you introduce a new memory region
- * compatible with KASLR. Your entry must be in logical order with memory
- * layout. For example, ESPFIX is before EFI because its virtual address is
- * before. You also need to add a BUILD_BUG_ON() in kernel_randomize_memory() to
- * ensure that this order is correct and won't be changed.
+ * The end address could depend on more configuration options to make the
+ * highest amount of space for randomization available, but that's too hard
+ * to keep straight.
  */
 static const unsigned long vaddr_start = __PAGE_OFFSET_BASE;
-
-#if defined(CONFIG_X86_ESPFIX64)
-static const unsigned long vaddr_end = ESPFIX_BASE_ADDR;
-#elif defined(CONFIG_EFI)
-static const unsigned long vaddr_end = EFI_VA_END;
-#else
-static const unsigned long vaddr_end = __START_KERNEL_map;
-#endif
+static const unsigned long vaddr_end = CPU_ENTRY_AREA_BASE;
 
 /* Default values */
 unsigned long page_offset_base = __PAGE_OFFSET_BASE;
@@ -101,15 +90,11 @@ void __init kernel_randomize_memory(void
 	unsigned long remain_entropy;
 
 	/*
-	 * All these BUILD_BUG_ON checks ensures the memory layout is
-	 * consistent with the vaddr_start/vaddr_end variables.
+	 * These BUILD_BUG_ON checks ensure the memory layout is consistent
+	 * with the vaddr_start/vaddr_end variables. These checks are
+	 * limited....
 	 */
 	BUILD_BUG_ON(vaddr_start >= vaddr_end);
-	BUILD_BUG_ON(IS_ENABLED(CONFIG_X86_ESPFIX64) &&
-		     vaddr_end >= EFI_VA_END);
-	BUILD_BUG_ON((IS_ENABLED(CONFIG_X86_ESPFIX64) ||
-		      IS_ENABLED(CONFIG_EFI)) &&
-		     vaddr_end >= __START_KERNEL_map);
 	BUILD_BUG_ON(vaddr_end > __START_KERNEL_map);
 
 	if (!kaslr_memory_enabled())

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply index

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-03  8:36 Benjamin Gilbert
2018-01-03  8:46 ` Benjamin Gilbert
2018-01-03  9:20   ` Greg Kroah-Hartman
2018-01-03 15:48     ` Ingo Molnar
2018-01-03 22:32       ` Benjamin Gilbert
2018-01-03 22:34         ` Thomas Gleixner
2018-01-03 22:49           ` Benjamin Gilbert
2018-01-03 22:57             ` Thomas Gleixner
2018-01-03 22:58               ` Thomas Gleixner
2018-01-03 23:44                 ` Andy Lutomirski
2018-01-03 23:46                   ` Thomas Gleixner
2018-01-04  0:27                 ` Andy Lutomirski
2018-01-04  0:38                   ` Benjamin Gilbert
2018-01-04  0:33     ` Benjamin Gilbert
2018-01-04  0:37       ` Thomas Gleixner
2018-01-04  7:14         ` Ingo Molnar
2018-01-04  7:18           ` Greg Kroah-Hartman
2018-01-04  7:20             ` Ingo Molnar
2018-01-04  8:03               ` Greg Kroah-Hartman
2018-01-04  7:22           ` Ingo Molnar
2018-01-04  0:37       ` Andy Lutomirski
2018-01-04  4:35         ` Benjamin Gilbert
2018-01-04  4:45           ` Andy Lutomirski
2018-01-04 12:28             ` Thomas Gleixner [this message]
2018-01-04 16:17               ` Andy Lutomirski
2018-01-04 16:34                 ` Thomas Gleixner
2018-01-04 19:38               ` Benjamin Gilbert
2018-01-04  1:37       ` Benjamin Gilbert
2018-01-04  4:36         ` Benjamin Gilbert

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.20.1801041320360.1771@nanos \
    --to=tglx@linutronix.de \
    --cc=benjamin.gilbert@coreos.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=kuleshovmail@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=stable@vger.kernel.org \
    --cc=thgarnie@google.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org
	public-inbox-index linux-mm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git