All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/4] x86, boot: KASLR memory randomization
@ 2016-05-03 19:31 ` Thomas Garnier
  0 siblings, 0 replies; 22+ messages in thread
From: Thomas Garnier @ 2016-05-03 19:31 UTC (permalink / raw)
  To: H . Peter Anvin, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Andy Lutomirski, Thomas Garnier, Dmitry Vyukov, Paolo Bonzini,
	Dan Williams, Kees Cook, Stephen Smalley, Kefeng Wang,
	Jonathan Corbet, Matt Fleming, Toshi Kani, Alexander Kuleshov,
	Alexander Popov, Joerg Roedel, Dave Young, Baoquan He,
	Dave Hansen, Mark Salter, Boris Ostrovsky
  Cc: x86, linux-kernel, linux-doc, gthelen, kernel-hardening

This is PATCH v3 for KASLR memory implementation for x86_64.

Recent changes:
    Add performance information on commit.
    Add details on PUD alignment.
    Add information on testing against the KASLR bypass exploit.
    Rebase on next-20160502.

***Background:
The current implementation of KASLR randomizes only the base address of
the kernel and its modules. Research was published showing that static
memory can be overwitten to elevate privileges bypassing KASLR.

In more details:

   The physical memory mapping holds most allocations from boot and heap
   allocators. Knowning the base address and physical memory size, an
   attacker can deduce the PDE virtual address for the vDSO memory page.
   This attack was demonstrated at CanSecWest 2016, in the "Getting
   Physical Extreme Abuse of Intel Based Paged Systems"
   https://goo.gl/ANpWdV (see second part of the presentation). The
   exploits used against Linux worked successfuly against 4.6+ but fail
   with KASLR memory enabled (https://goo.gl/iTtXMJ). Similar research
   was done at Google leading to this patch proposal. Variants exists to
   overwrite /proc or /sys objects ACLs leading to elevation of privileges.
   These variants were testeda against 4.6+.

This set of patches randomizes base address and padding of three
major memory sections (physical memory mapping, vmalloc & vmemmap).
It mitigates exploits relying on predictable kernel addresses. This
feature can be enabled with the CONFIG_RANDOMIZE_MEMORY option.

Padding for the memory hotplug support is managed by
CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING. The default value is 10
terabytes.

The patches were tested on qemu & physical machines. Xen compatibility was
also verified. Multiple reboots were used to verify entropy for each
memory section.

***Problems that needed solving:
 - The three target memory sections are never at the same place between
   boots.
 - The physical memory mapping can use a virtual address not aligned on
   the PGD page table.
 - Have good entropy early at boot before get_random_bytes is available.
 - Add optional padding for memory hotplug compatibility.

***Parts:
 - The first part prepares for the KASLR memory randomization by
   refactoring entropy functions used by the current implementation and
   support PUD level virtual addresses for physical mapping.
   (Patches 01-02)
 - The second part implements the KASLR memory randomization for all
   sections mentioned.
   (Patch 03)
 - The third part adds support for memory hotplug by adding an option to
   define the padding used between the physical memory mapping section
   and the others.
   (Patch 04)

Performance data:

Kernbench shows almost no difference (-+ less than 1%):

Before:

Average Optimal load -j 12 Run (std deviation):
Elapsed Time 102.63 (1.2695)
User Time 1034.89 (1.18115)
System Time 87.056 (0.456416)
Percent CPU 1092.9 (13.892)
Context Switches 199805 (3455.33)
Sleeps 97907.8 (900.636)

After:

Average Optimal load -j 12 Run (std deviation):
Elapsed Time 102.489 (1.10636)
User Time 1034.86 (1.36053)
System Time 87.764 (0.49345)
Percent CPU 1095 (12.7715)
Context Switches 199036 (4298.1)
Sleeps 97681.6 (1031.11)

Hackbench shows 0% difference on average (hackbench 90
repeated 10 times):

attemp,before,after
1,0.076,0.069
2,0.072,0.069
3,0.066,0.066
4,0.066,0.068
5,0.066,0.067
6,0.066,0.069
7,0.067,0.066
8,0.063,0.067
9,0.067,0.065
10,0.068,0.071
average,0.0677,0.0677

Thanks!

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [kernel-hardening] [PATCH v3 0/4] x86, boot: KASLR memory randomization
@ 2016-05-03 19:31 ` Thomas Garnier
  0 siblings, 0 replies; 22+ messages in thread
From: Thomas Garnier @ 2016-05-03 19:31 UTC (permalink / raw)
  To: H . Peter Anvin, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Andy Lutomirski, Thomas Garnier, Dmitry Vyukov, Paolo Bonzini,
	Dan Williams, Kees Cook, Stephen Smalley, Kefeng Wang,
	Jonathan Corbet, Matt Fleming, Toshi Kani, Alexander Kuleshov,
	Alexander Popov, Joerg Roedel, Dave Young, Baoquan He,
	Dave Hansen, Mark Salter, Boris Ostrovsky
  Cc: x86, linux-kernel, linux-doc, gthelen, kernel-hardening

This is PATCH v3 for KASLR memory implementation for x86_64.

Recent changes:
    Add performance information on commit.
    Add details on PUD alignment.
    Add information on testing against the KASLR bypass exploit.
    Rebase on next-20160502.

***Background:
The current implementation of KASLR randomizes only the base address of
the kernel and its modules. Research was published showing that static
memory can be overwitten to elevate privileges bypassing KASLR.

In more details:

   The physical memory mapping holds most allocations from boot and heap
   allocators. Knowning the base address and physical memory size, an
   attacker can deduce the PDE virtual address for the vDSO memory page.
   This attack was demonstrated at CanSecWest 2016, in the "Getting
   Physical Extreme Abuse of Intel Based Paged Systems"
   https://goo.gl/ANpWdV (see second part of the presentation). The
   exploits used against Linux worked successfuly against 4.6+ but fail
   with KASLR memory enabled (https://goo.gl/iTtXMJ). Similar research
   was done at Google leading to this patch proposal. Variants exists to
   overwrite /proc or /sys objects ACLs leading to elevation of privileges.
   These variants were testeda against 4.6+.

This set of patches randomizes base address and padding of three
major memory sections (physical memory mapping, vmalloc & vmemmap).
It mitigates exploits relying on predictable kernel addresses. This
feature can be enabled with the CONFIG_RANDOMIZE_MEMORY option.

Padding for the memory hotplug support is managed by
CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING. The default value is 10
terabytes.

The patches were tested on qemu & physical machines. Xen compatibility was
also verified. Multiple reboots were used to verify entropy for each
memory section.

***Problems that needed solving:
 - The three target memory sections are never at the same place between
   boots.
 - The physical memory mapping can use a virtual address not aligned on
   the PGD page table.
 - Have good entropy early at boot before get_random_bytes is available.
 - Add optional padding for memory hotplug compatibility.

***Parts:
 - The first part prepares for the KASLR memory randomization by
   refactoring entropy functions used by the current implementation and
   support PUD level virtual addresses for physical mapping.
   (Patches 01-02)
 - The second part implements the KASLR memory randomization for all
   sections mentioned.
   (Patch 03)
 - The third part adds support for memory hotplug by adding an option to
   define the padding used between the physical memory mapping section
   and the others.
   (Patch 04)

Performance data:

Kernbench shows almost no difference (-+ less than 1%):

Before:

Average Optimal load -j 12 Run (std deviation):
Elapsed Time 102.63 (1.2695)
User Time 1034.89 (1.18115)
System Time 87.056 (0.456416)
Percent CPU 1092.9 (13.892)
Context Switches 199805 (3455.33)
Sleeps 97907.8 (900.636)

After:

Average Optimal load -j 12 Run (std deviation):
Elapsed Time 102.489 (1.10636)
User Time 1034.86 (1.36053)
System Time 87.764 (0.49345)
Percent CPU 1095 (12.7715)
Context Switches 199036 (4298.1)
Sleeps 97681.6 (1031.11)

Hackbench shows 0% difference on average (hackbench 90
repeated 10 times):

attemp,before,after
1,0.076,0.069
2,0.072,0.069
3,0.066,0.066
4,0.066,0.068
5,0.066,0.067
6,0.066,0.069
7,0.067,0.066
8,0.063,0.067
9,0.067,0.065
10,0.068,0.071
average,0.0677,0.0677

Thanks!

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v3 1/4] x86, boot: Refactor KASLR entropy functions
  2016-05-03 19:31 ` [kernel-hardening] " Thomas Garnier
@ 2016-05-03 19:31   ` Thomas Garnier
  -1 siblings, 0 replies; 22+ messages in thread
From: Thomas Garnier @ 2016-05-03 19:31 UTC (permalink / raw)
  To: H . Peter Anvin, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Andy Lutomirski, Thomas Garnier, Dmitry Vyukov, Paolo Bonzini,
	Dan Williams, Kees Cook, Stephen Smalley, Kefeng Wang,
	Jonathan Corbet, Matt Fleming, Toshi Kani, Alexander Kuleshov,
	Alexander Popov, Joerg Roedel, Dave Young, Baoquan He,
	Dave Hansen, Mark Salter, Boris Ostrovsky
  Cc: x86, linux-kernel, linux-doc, gthelen, kernel-hardening

Move the KASLR entropy functions in x86/libray to be used in early
kernel boot for KASLR memory randomization.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
Based on next-20160502
---
 arch/x86/boot/compressed/kaslr.c | 76 +++-----------------------------------
 arch/x86/include/asm/kaslr.h     |  6 +++
 arch/x86/lib/Makefile            |  1 +
 arch/x86/lib/kaslr.c             | 79 ++++++++++++++++++++++++++++++++++++++++
 4 files changed, 91 insertions(+), 71 deletions(-)
 create mode 100644 arch/x86/include/asm/kaslr.h
 create mode 100644 arch/x86/lib/kaslr.c

diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index 8741a6d..0bdee23 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -11,10 +11,6 @@
  */
 #include "misc.h"
 
-#include <asm/msr.h>
-#include <asm/archrandom.h>
-#include <asm/e820.h>
-
 #include <generated/compile.h>
 #include <linux/module.h>
 #include <linux/uts.h>
@@ -25,26 +21,6 @@
 static const char build_str[] = UTS_RELEASE " (" LINUX_COMPILE_BY "@"
 		LINUX_COMPILE_HOST ") (" LINUX_COMPILER ") " UTS_VERSION;
 
-#define I8254_PORT_CONTROL	0x43
-#define I8254_PORT_COUNTER0	0x40
-#define I8254_CMD_READBACK	0xC0
-#define I8254_SELECT_COUNTER0	0x02
-#define I8254_STATUS_NOTREADY	0x40
-static inline u16 i8254(void)
-{
-	u16 status, timer;
-
-	do {
-		outb(I8254_PORT_CONTROL,
-		     I8254_CMD_READBACK | I8254_SELECT_COUNTER0);
-		status = inb(I8254_PORT_COUNTER0);
-		timer  = inb(I8254_PORT_COUNTER0);
-		timer |= inb(I8254_PORT_COUNTER0) << 8;
-	} while (status & I8254_STATUS_NOTREADY);
-
-	return timer;
-}
-
 static unsigned long rotate_xor(unsigned long hash, const void *area,
 				size_t size)
 {
@@ -61,7 +37,7 @@ static unsigned long rotate_xor(unsigned long hash, const void *area,
 }
 
 /* Attempt to create a simple but unpredictable starting entropy. */
-static unsigned long get_random_boot(void)
+static unsigned long get_boot_seed(void)
 {
 	unsigned long hash = 0;
 
@@ -71,50 +47,6 @@ static unsigned long get_random_boot(void)
 	return hash;
 }
 
-static unsigned long get_random_long(void)
-{
-#ifdef CONFIG_X86_64
-	const unsigned long mix_const = 0x5d6008cbf3848dd3UL;
-#else
-	const unsigned long mix_const = 0x3f39e593UL;
-#endif
-	unsigned long raw, random = get_random_boot();
-	bool use_i8254 = true;
-
-	debug_putstr("KASLR using");
-
-	if (has_cpuflag(X86_FEATURE_RDRAND)) {
-		debug_putstr(" RDRAND");
-		if (rdrand_long(&raw)) {
-			random ^= raw;
-			use_i8254 = false;
-		}
-	}
-
-	if (has_cpuflag(X86_FEATURE_TSC)) {
-		debug_putstr(" RDTSC");
-		raw = rdtsc();
-
-		random ^= raw;
-		use_i8254 = false;
-	}
-
-	if (use_i8254) {
-		debug_putstr(" i8254");
-		random ^= i8254();
-	}
-
-	/* Circular multiply for better bit diffusion */
-	asm("mul %3"
-	    : "=a" (random), "=d" (raw)
-	    : "a" (random), "rm" (mix_const));
-	random += raw;
-
-	debug_putstr("...\n");
-
-	return random;
-}
-
 struct mem_vector {
 	unsigned long start;
 	unsigned long size;
@@ -122,7 +54,6 @@ struct mem_vector {
 
 #define MEM_AVOID_MAX 5
 static struct mem_vector mem_avoid[MEM_AVOID_MAX];
-
 static bool mem_contains(struct mem_vector *region, struct mem_vector *item)
 {
 	/* Item at least partially before region. */
@@ -229,13 +160,16 @@ static void slots_append(unsigned long addr)
 	slots[slot_max++] = addr;
 }
 
+#define KASLR_COMPRESSED_BOOT
+#include "../../lib/kaslr.c"
+
 static unsigned long slots_fetch_random(void)
 {
 	/* Handle case of no slots stored. */
 	if (slot_max == 0)
 		return 0;
 
-	return slots[get_random_long() % slot_max];
+	return slots[kaslr_get_random_boot_long() % slot_max];
 }
 
 static void process_e820_entry(struct e820entry *entry,
diff --git a/arch/x86/include/asm/kaslr.h b/arch/x86/include/asm/kaslr.h
new file mode 100644
index 0000000..2ae1429
--- /dev/null
+++ b/arch/x86/include/asm/kaslr.h
@@ -0,0 +1,6 @@
+#ifndef _ASM_KASLR_H_
+#define _ASM_KASLR_H_
+
+unsigned long kaslr_get_random_boot_long(void);
+
+#endif
diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
index 72a5767..cfa6d07 100644
--- a/arch/x86/lib/Makefile
+++ b/arch/x86/lib/Makefile
@@ -24,6 +24,7 @@ lib-y += usercopy_$(BITS).o usercopy.o getuser.o putuser.o
 lib-y += memcpy_$(BITS).o
 lib-$(CONFIG_RWSEM_XCHGADD_ALGORITHM) += rwsem.o
 lib-$(CONFIG_INSTRUCTION_DECODER) += insn.o inat.o
+lib-$(CONFIG_RANDOMIZE_BASE) += kaslr.o
 
 obj-y += msr.o msr-reg.o msr-reg-export.o
 
diff --git a/arch/x86/lib/kaslr.c b/arch/x86/lib/kaslr.c
new file mode 100644
index 0000000..ffb22ba
--- /dev/null
+++ b/arch/x86/lib/kaslr.c
@@ -0,0 +1,79 @@
+#include <asm/kaslr.h>
+#include <asm/msr.h>
+#include <asm/archrandom.h>
+#include <asm/e820.h>
+#include <asm/io.h>
+
+/* Replace boot functions on library build */
+#ifndef KASLR_COMPRESSED_BOOT
+#include <asm/cpufeature.h>
+#include <asm/setup.h>
+
+#define debug_putstr(v)
+#define has_cpuflag(f) boot_cpu_has(f)
+#define get_boot_seed() kaslr_offset()
+#endif
+
+#define I8254_PORT_CONTROL	0x43
+#define I8254_PORT_COUNTER0	0x40
+#define I8254_CMD_READBACK	0xC0
+#define I8254_SELECT_COUNTER0	0x02
+#define I8254_STATUS_NOTREADY	0x40
+static inline u16 i8254(void)
+{
+	u16 status, timer;
+
+	do {
+		outb(I8254_PORT_CONTROL,
+		     I8254_CMD_READBACK | I8254_SELECT_COUNTER0);
+		status = inb(I8254_PORT_COUNTER0);
+		timer  = inb(I8254_PORT_COUNTER0);
+		timer |= inb(I8254_PORT_COUNTER0) << 8;
+	} while (status & I8254_STATUS_NOTREADY);
+
+	return timer;
+}
+
+unsigned long kaslr_get_random_boot_long(void)
+{
+#ifdef CONFIG_X86_64
+	const unsigned long mix_const = 0x5d6008cbf3848dd3UL;
+#else
+	const unsigned long mix_const = 0x3f39e593UL;
+#endif
+	unsigned long raw, random = get_boot_seed();
+	bool use_i8254 = true;
+
+	debug_putstr("KASLR using");
+
+	if (has_cpuflag(X86_FEATURE_RDRAND)) {
+		debug_putstr(" RDRAND");
+		if (rdrand_long(&raw)) {
+			random ^= raw;
+			use_i8254 = false;
+		}
+	}
+
+	if (has_cpuflag(X86_FEATURE_TSC)) {
+		debug_putstr(" RDTSC");
+		raw = rdtsc();
+
+		random ^= raw;
+		use_i8254 = false;
+	}
+
+	if (use_i8254) {
+		debug_putstr(" i8254");
+		random ^= i8254();
+	}
+
+	/* Circular multiply for better bit diffusion */
+	asm("mul %3"
+	    : "=a" (random), "=d" (raw)
+	    : "a" (random), "rm" (mix_const));
+	random += raw;
+
+	debug_putstr("...\n");
+
+	return random;
+}
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [kernel-hardening] [PATCH v3 1/4] x86, boot: Refactor KASLR entropy functions
@ 2016-05-03 19:31   ` Thomas Garnier
  0 siblings, 0 replies; 22+ messages in thread
From: Thomas Garnier @ 2016-05-03 19:31 UTC (permalink / raw)
  To: H . Peter Anvin, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Andy Lutomirski, Thomas Garnier, Dmitry Vyukov, Paolo Bonzini,
	Dan Williams, Kees Cook, Stephen Smalley, Kefeng Wang,
	Jonathan Corbet, Matt Fleming, Toshi Kani, Alexander Kuleshov,
	Alexander Popov, Joerg Roedel, Dave Young, Baoquan He,
	Dave Hansen, Mark Salter, Boris Ostrovsky
  Cc: x86, linux-kernel, linux-doc, gthelen, kernel-hardening

Move the KASLR entropy functions in x86/libray to be used in early
kernel boot for KASLR memory randomization.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
Based on next-20160502
---
 arch/x86/boot/compressed/kaslr.c | 76 +++-----------------------------------
 arch/x86/include/asm/kaslr.h     |  6 +++
 arch/x86/lib/Makefile            |  1 +
 arch/x86/lib/kaslr.c             | 79 ++++++++++++++++++++++++++++++++++++++++
 4 files changed, 91 insertions(+), 71 deletions(-)
 create mode 100644 arch/x86/include/asm/kaslr.h
 create mode 100644 arch/x86/lib/kaslr.c

diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index 8741a6d..0bdee23 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -11,10 +11,6 @@
  */
 #include "misc.h"
 
-#include <asm/msr.h>
-#include <asm/archrandom.h>
-#include <asm/e820.h>
-
 #include <generated/compile.h>
 #include <linux/module.h>
 #include <linux/uts.h>
@@ -25,26 +21,6 @@
 static const char build_str[] = UTS_RELEASE " (" LINUX_COMPILE_BY "@"
 		LINUX_COMPILE_HOST ") (" LINUX_COMPILER ") " UTS_VERSION;
 
-#define I8254_PORT_CONTROL	0x43
-#define I8254_PORT_COUNTER0	0x40
-#define I8254_CMD_READBACK	0xC0
-#define I8254_SELECT_COUNTER0	0x02
-#define I8254_STATUS_NOTREADY	0x40
-static inline u16 i8254(void)
-{
-	u16 status, timer;
-
-	do {
-		outb(I8254_PORT_CONTROL,
-		     I8254_CMD_READBACK | I8254_SELECT_COUNTER0);
-		status = inb(I8254_PORT_COUNTER0);
-		timer  = inb(I8254_PORT_COUNTER0);
-		timer |= inb(I8254_PORT_COUNTER0) << 8;
-	} while (status & I8254_STATUS_NOTREADY);
-
-	return timer;
-}
-
 static unsigned long rotate_xor(unsigned long hash, const void *area,
 				size_t size)
 {
@@ -61,7 +37,7 @@ static unsigned long rotate_xor(unsigned long hash, const void *area,
 }
 
 /* Attempt to create a simple but unpredictable starting entropy. */
-static unsigned long get_random_boot(void)
+static unsigned long get_boot_seed(void)
 {
 	unsigned long hash = 0;
 
@@ -71,50 +47,6 @@ static unsigned long get_random_boot(void)
 	return hash;
 }
 
-static unsigned long get_random_long(void)
-{
-#ifdef CONFIG_X86_64
-	const unsigned long mix_const = 0x5d6008cbf3848dd3UL;
-#else
-	const unsigned long mix_const = 0x3f39e593UL;
-#endif
-	unsigned long raw, random = get_random_boot();
-	bool use_i8254 = true;
-
-	debug_putstr("KASLR using");
-
-	if (has_cpuflag(X86_FEATURE_RDRAND)) {
-		debug_putstr(" RDRAND");
-		if (rdrand_long(&raw)) {
-			random ^= raw;
-			use_i8254 = false;
-		}
-	}
-
-	if (has_cpuflag(X86_FEATURE_TSC)) {
-		debug_putstr(" RDTSC");
-		raw = rdtsc();
-
-		random ^= raw;
-		use_i8254 = false;
-	}
-
-	if (use_i8254) {
-		debug_putstr(" i8254");
-		random ^= i8254();
-	}
-
-	/* Circular multiply for better bit diffusion */
-	asm("mul %3"
-	    : "=a" (random), "=d" (raw)
-	    : "a" (random), "rm" (mix_const));
-	random += raw;
-
-	debug_putstr("...\n");
-
-	return random;
-}
-
 struct mem_vector {
 	unsigned long start;
 	unsigned long size;
@@ -122,7 +54,6 @@ struct mem_vector {
 
 #define MEM_AVOID_MAX 5
 static struct mem_vector mem_avoid[MEM_AVOID_MAX];
-
 static bool mem_contains(struct mem_vector *region, struct mem_vector *item)
 {
 	/* Item at least partially before region. */
@@ -229,13 +160,16 @@ static void slots_append(unsigned long addr)
 	slots[slot_max++] = addr;
 }
 
+#define KASLR_COMPRESSED_BOOT
+#include "../../lib/kaslr.c"
+
 static unsigned long slots_fetch_random(void)
 {
 	/* Handle case of no slots stored. */
 	if (slot_max == 0)
 		return 0;
 
-	return slots[get_random_long() % slot_max];
+	return slots[kaslr_get_random_boot_long() % slot_max];
 }
 
 static void process_e820_entry(struct e820entry *entry,
diff --git a/arch/x86/include/asm/kaslr.h b/arch/x86/include/asm/kaslr.h
new file mode 100644
index 0000000..2ae1429
--- /dev/null
+++ b/arch/x86/include/asm/kaslr.h
@@ -0,0 +1,6 @@
+#ifndef _ASM_KASLR_H_
+#define _ASM_KASLR_H_
+
+unsigned long kaslr_get_random_boot_long(void);
+
+#endif
diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
index 72a5767..cfa6d07 100644
--- a/arch/x86/lib/Makefile
+++ b/arch/x86/lib/Makefile
@@ -24,6 +24,7 @@ lib-y += usercopy_$(BITS).o usercopy.o getuser.o putuser.o
 lib-y += memcpy_$(BITS).o
 lib-$(CONFIG_RWSEM_XCHGADD_ALGORITHM) += rwsem.o
 lib-$(CONFIG_INSTRUCTION_DECODER) += insn.o inat.o
+lib-$(CONFIG_RANDOMIZE_BASE) += kaslr.o
 
 obj-y += msr.o msr-reg.o msr-reg-export.o
 
diff --git a/arch/x86/lib/kaslr.c b/arch/x86/lib/kaslr.c
new file mode 100644
index 0000000..ffb22ba
--- /dev/null
+++ b/arch/x86/lib/kaslr.c
@@ -0,0 +1,79 @@
+#include <asm/kaslr.h>
+#include <asm/msr.h>
+#include <asm/archrandom.h>
+#include <asm/e820.h>
+#include <asm/io.h>
+
+/* Replace boot functions on library build */
+#ifndef KASLR_COMPRESSED_BOOT
+#include <asm/cpufeature.h>
+#include <asm/setup.h>
+
+#define debug_putstr(v)
+#define has_cpuflag(f) boot_cpu_has(f)
+#define get_boot_seed() kaslr_offset()
+#endif
+
+#define I8254_PORT_CONTROL	0x43
+#define I8254_PORT_COUNTER0	0x40
+#define I8254_CMD_READBACK	0xC0
+#define I8254_SELECT_COUNTER0	0x02
+#define I8254_STATUS_NOTREADY	0x40
+static inline u16 i8254(void)
+{
+	u16 status, timer;
+
+	do {
+		outb(I8254_PORT_CONTROL,
+		     I8254_CMD_READBACK | I8254_SELECT_COUNTER0);
+		status = inb(I8254_PORT_COUNTER0);
+		timer  = inb(I8254_PORT_COUNTER0);
+		timer |= inb(I8254_PORT_COUNTER0) << 8;
+	} while (status & I8254_STATUS_NOTREADY);
+
+	return timer;
+}
+
+unsigned long kaslr_get_random_boot_long(void)
+{
+#ifdef CONFIG_X86_64
+	const unsigned long mix_const = 0x5d6008cbf3848dd3UL;
+#else
+	const unsigned long mix_const = 0x3f39e593UL;
+#endif
+	unsigned long raw, random = get_boot_seed();
+	bool use_i8254 = true;
+
+	debug_putstr("KASLR using");
+
+	if (has_cpuflag(X86_FEATURE_RDRAND)) {
+		debug_putstr(" RDRAND");
+		if (rdrand_long(&raw)) {
+			random ^= raw;
+			use_i8254 = false;
+		}
+	}
+
+	if (has_cpuflag(X86_FEATURE_TSC)) {
+		debug_putstr(" RDTSC");
+		raw = rdtsc();
+
+		random ^= raw;
+		use_i8254 = false;
+	}
+
+	if (use_i8254) {
+		debug_putstr(" i8254");
+		random ^= i8254();
+	}
+
+	/* Circular multiply for better bit diffusion */
+	asm("mul %3"
+	    : "=a" (random), "=d" (raw)
+	    : "a" (random), "rm" (mix_const));
+	random += raw;
+
+	debug_putstr("...\n");
+
+	return random;
+}
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 2/4] x86, boot: PUD VA support for physical mapping (x86_64)
  2016-05-03 19:31 ` [kernel-hardening] " Thomas Garnier
@ 2016-05-03 19:31   ` Thomas Garnier
  -1 siblings, 0 replies; 22+ messages in thread
From: Thomas Garnier @ 2016-05-03 19:31 UTC (permalink / raw)
  To: H . Peter Anvin, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Andy Lutomirski, Thomas Garnier, Dmitry Vyukov, Paolo Bonzini,
	Dan Williams, Kees Cook, Stephen Smalley, Kefeng Wang,
	Jonathan Corbet, Matt Fleming, Toshi Kani, Alexander Kuleshov,
	Alexander Popov, Joerg Roedel, Dave Young, Baoquan He,
	Dave Hansen, Mark Salter, Boris Ostrovsky
  Cc: x86, linux-kernel, linux-doc, gthelen, kernel-hardening

Minor change that allows early boot physical mapping of PUD level virtual
addresses. The current implementation expect the virtual address to be
PUD aligned. For KASLR memory randomization, we need to be able to
randomize the offset used on the PUD table.

It has no impact on current usage.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
Based on next-20160502
---
 arch/x86/mm/init_64.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 89d9747..6adfbce 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -526,10 +526,10 @@ phys_pud_init(pud_t *pud_page, unsigned long addr, unsigned long end,
 {
 	unsigned long pages = 0, next;
 	unsigned long last_map_addr = end;
-	int i = pud_index(addr);
+	int i = pud_index((unsigned long)__va(addr));
 
 	for (; i < PTRS_PER_PUD; i++, addr = next) {
-		pud_t *pud = pud_page + pud_index(addr);
+		pud_t *pud = pud_page + pud_index((unsigned long)__va(addr));
 		pmd_t *pmd;
 		pgprot_t prot = PAGE_KERNEL;
 
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [kernel-hardening] [PATCH v3 2/4] x86, boot: PUD VA support for physical mapping (x86_64)
@ 2016-05-03 19:31   ` Thomas Garnier
  0 siblings, 0 replies; 22+ messages in thread
From: Thomas Garnier @ 2016-05-03 19:31 UTC (permalink / raw)
  To: H . Peter Anvin, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Andy Lutomirski, Thomas Garnier, Dmitry Vyukov, Paolo Bonzini,
	Dan Williams, Kees Cook, Stephen Smalley, Kefeng Wang,
	Jonathan Corbet, Matt Fleming, Toshi Kani, Alexander Kuleshov,
	Alexander Popov, Joerg Roedel, Dave Young, Baoquan He,
	Dave Hansen, Mark Salter, Boris Ostrovsky
  Cc: x86, linux-kernel, linux-doc, gthelen, kernel-hardening

Minor change that allows early boot physical mapping of PUD level virtual
addresses. The current implementation expect the virtual address to be
PUD aligned. For KASLR memory randomization, we need to be able to
randomize the offset used on the PUD table.

It has no impact on current usage.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
Based on next-20160502
---
 arch/x86/mm/init_64.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 89d9747..6adfbce 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -526,10 +526,10 @@ phys_pud_init(pud_t *pud_page, unsigned long addr, unsigned long end,
 {
 	unsigned long pages = 0, next;
 	unsigned long last_map_addr = end;
-	int i = pud_index(addr);
+	int i = pud_index((unsigned long)__va(addr));
 
 	for (; i < PTRS_PER_PUD; i++, addr = next) {
-		pud_t *pud = pud_page + pud_index(addr);
+		pud_t *pud = pud_page + pud_index((unsigned long)__va(addr));
 		pmd_t *pmd;
 		pgprot_t prot = PAGE_KERNEL;
 
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 3/4] x86, boot: Implement ASLR for kernel memory sections (x86_64)
  2016-05-03 19:31 ` [kernel-hardening] " Thomas Garnier
@ 2016-05-03 19:31   ` Thomas Garnier
  -1 siblings, 0 replies; 22+ messages in thread
From: Thomas Garnier @ 2016-05-03 19:31 UTC (permalink / raw)
  To: H . Peter Anvin, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Andy Lutomirski, Thomas Garnier, Dmitry Vyukov, Paolo Bonzini,
	Dan Williams, Kees Cook, Stephen Smalley, Kefeng Wang,
	Jonathan Corbet, Matt Fleming, Toshi Kani, Alexander Kuleshov,
	Alexander Popov, Joerg Roedel, Dave Young, Baoquan He,
	Dave Hansen, Mark Salter, Boris Ostrovsky
  Cc: x86, linux-kernel, linux-doc, gthelen, kernel-hardening

Randomizes the virtual address space of kernel memory sections (physical
memory mapping, vmalloc & vmemmap) for x86_64. This security feature
mitigates exploits relying on predictable kernel addresses. These
addresses can be used to disclose the kernel modules base addresses or
corrupt specific structures to elevate privileges bypassing the current
implementation of KASLR. This feature can be enabled with the
CONFIG_RANDOMIZE_MEMORY option.

The physical memory mapping holds most allocations from boot and heap
allocators. Knowning the base address and physical memory size, an
attacker can deduce the PDE virtual address for the vDSO memory page.
This attack was demonstrated at CanSecWest 2016, in the "Getting
Physical Extreme Abuse of Intel Based Paged Systems"
https://goo.gl/ANpWdV (see second part of the presentation). The
exploits used against Linux worked successfuly against 4.6+ but fail
with KASLR memory enabled (https://goo.gl/iTtXMJ). Similar research
was done at Google leading to this patch proposal. Variants exists to
overwrite /proc or /sys objects ACLs leading to elevation of privileges.
These variants were testeda against 4.6+.

The vmalloc memory section contains the allocation made through the
vmalloc api. The allocations are done sequentially to prevent
fragmentation and each allocation address can easily be deduced
especially from boot.

The vmemmap section holds a representation of the physical
memory (through a struct page array). An attacker could use this section
to disclose the kernel memory layout (walking the page linked list).

The order of each memory section is not changed. The feature looks at
the available space for the sections based on different configuration
options and randomizes the base and space between each. The size of the
physical memory mapping is the available physical memory. No performance
impact was detected while testing the feature.

Entropy is generated using the KASLR early boot functions now shared in
the lib directory (originally written by Kees Cook). Randomization is
done on PGD & PUD page table levels to increase possible addresses. The
physical memory mapping code was adapted to support PUD level virtual
addresses. An additional low memory page is used to ensure each CPU can
start with a PGD aligned virtual address (for realmode).

x86/dump_pagetable was updated to correctly display each section.

Updated documentation on x86_64 memory layout accordingly.

Performance data:

Kernbench shows almost no difference (-+ less than 1%):

Before:

Average Optimal load -j 12 Run (std deviation):
Elapsed Time 102.63 (1.2695)
User Time 1034.89 (1.18115)
System Time 87.056 (0.456416)
Percent CPU 1092.9 (13.892)
Context Switches 199805 (3455.33)
Sleeps 97907.8 (900.636)

After:

Average Optimal load -j 12 Run (std deviation):
Elapsed Time 102.489 (1.10636)
User Time 1034.86 (1.36053)
System Time 87.764 (0.49345)
Percent CPU 1095 (12.7715)
Context Switches 199036 (4298.1)
Sleeps 97681.6 (1031.11)

Hackbench shows 0% difference on average (hackbench 90
repeated 10 times):

attemp,before,after
1,0.076,0.069
2,0.072,0.069
3,0.066,0.066
4,0.066,0.068
5,0.066,0.067
6,0.066,0.069
7,0.067,0.066
8,0.063,0.067
9,0.067,0.065
10,0.068,0.071
average,0.0677,0.0677

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
Based on next-20160502
---
 Documentation/x86/x86_64/mm.txt         |   4 +
 arch/x86/Kconfig                        |  15 ++++
 arch/x86/include/asm/kaslr.h            |  12 +++
 arch/x86/include/asm/page_64_types.h    |  11 ++-
 arch/x86/include/asm/pgtable_64.h       |   1 +
 arch/x86/include/asm/pgtable_64_types.h |  15 +++-
 arch/x86/kernel/head_64.S               |   2 +-
 arch/x86/kernel/setup.c                 |   3 +
 arch/x86/mm/Makefile                    |   1 +
 arch/x86/mm/dump_pagetables.c           |  11 ++-
 arch/x86/mm/init.c                      |   4 +
 arch/x86/mm/kaslr.c                     | 136 ++++++++++++++++++++++++++++++++
 arch/x86/realmode/init.c                |   4 +
 13 files changed, 211 insertions(+), 8 deletions(-)
 create mode 100644 arch/x86/mm/kaslr.c

diff --git a/Documentation/x86/x86_64/mm.txt b/Documentation/x86/x86_64/mm.txt
index 5aa7383..602a52d 100644
--- a/Documentation/x86/x86_64/mm.txt
+++ b/Documentation/x86/x86_64/mm.txt
@@ -39,4 +39,8 @@ memory window (this size is arbitrary, it can be raised later if needed).
 The mappings are not part of any other kernel PGD and are only available
 during EFI runtime calls.
 
+Note that if CONFIG_RANDOMIZE_MEMORY is enabled, the direct mapping of all
+physical memory, vmalloc/ioremap space and virtual memory map are randomized.
+Their order is preserved but their base will be changed early at boot time.
+
 -Andi Kleen, Jul 2004
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 0b128b4..60f33c7 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1988,6 +1988,21 @@ config PHYSICAL_ALIGN
 
 	  Don't change this unless you know what you are doing.
 
+config RANDOMIZE_MEMORY
+	bool "Randomize the kernel memory sections"
+	depends on X86_64
+	depends on RANDOMIZE_BASE
+	default n
+	---help---
+	   Randomizes the virtual address of memory sections (physical memory
+	   mapping, vmalloc & vmemmap). This security feature mitigates exploits
+	   relying on predictable memory locations.
+
+	   Base and padding between memory section is randomized. Their order is
+	   not. Entropy is generated in the same way as RANDOMIZE_BASE.
+
+	   If unsure, say N.
+
 config HOTPLUG_CPU
 	bool "Support for hot-pluggable CPUs"
 	depends on SMP
diff --git a/arch/x86/include/asm/kaslr.h b/arch/x86/include/asm/kaslr.h
index 2ae1429..12c7742 100644
--- a/arch/x86/include/asm/kaslr.h
+++ b/arch/x86/include/asm/kaslr.h
@@ -3,4 +3,16 @@
 
 unsigned long kaslr_get_random_boot_long(void);
 
+#ifdef CONFIG_RANDOMIZE_MEMORY
+extern unsigned long page_offset_base;
+extern unsigned long vmalloc_base;
+extern unsigned long vmemmap_base;
+
+void kernel_randomize_memory(void);
+void kaslr_trampoline_init(void);
+#else
+static inline void kernel_randomize_memory(void) { }
+static inline void kaslr_trampoline_init(void) { }
+#endif /* CONFIG_RANDOMIZE_MEMORY */
+
 #endif
diff --git a/arch/x86/include/asm/page_64_types.h b/arch/x86/include/asm/page_64_types.h
index d5c2f8b..9215e05 100644
--- a/arch/x86/include/asm/page_64_types.h
+++ b/arch/x86/include/asm/page_64_types.h
@@ -1,6 +1,10 @@
 #ifndef _ASM_X86_PAGE_64_DEFS_H
 #define _ASM_X86_PAGE_64_DEFS_H
 
+#ifndef __ASSEMBLY__
+#include <asm/kaslr.h>
+#endif
+
 #ifdef CONFIG_KASAN
 #define KASAN_STACK_ORDER 1
 #else
@@ -32,7 +36,12 @@
  * hypervisor to fit.  Choosing 16 slots here is arbitrary, but it's
  * what Xen requires.
  */
-#define __PAGE_OFFSET           _AC(0xffff880000000000, UL)
+#define __PAGE_OFFSET_BASE      _AC(0xffff880000000000, UL)
+#ifdef CONFIG_RANDOMIZE_MEMORY
+#define __PAGE_OFFSET           page_offset_base
+#else
+#define __PAGE_OFFSET           __PAGE_OFFSET_BASE
+#endif /* CONFIG_RANDOMIZE_MEMORY */
 
 #define __START_KERNEL_map	_AC(0xffffffff80000000, UL)
 
diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h
index 2ee7811..0dfec89 100644
--- a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -21,6 +21,7 @@ extern pmd_t level2_fixmap_pgt[512];
 extern pmd_t level2_ident_pgt[512];
 extern pte_t level1_fixmap_pgt[512];
 extern pgd_t init_level4_pgt[];
+extern pgd_t trampoline_pgd_entry;
 
 #define swapper_pg_dir init_level4_pgt
 
diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h
index e6844df..d388739 100644
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -5,6 +5,7 @@
 
 #ifndef __ASSEMBLY__
 #include <linux/types.h>
+#include <asm/kaslr.h>
 
 /*
  * These are used to make use of C type-checking..
@@ -54,9 +55,17 @@ typedef struct { pteval_t pte; } pte_t;
 
 /* See Documentation/x86/x86_64/mm.txt for a description of the memory map. */
 #define MAXMEM		 _AC(__AC(1, UL) << MAX_PHYSMEM_BITS, UL)
-#define VMALLOC_START    _AC(0xffffc90000000000, UL)
-#define VMALLOC_END      _AC(0xffffe8ffffffffff, UL)
-#define VMEMMAP_START	 _AC(0xffffea0000000000, UL)
+#define VMALLOC_SIZE_TB	 _AC(32, UL)
+#define __VMALLOC_BASE	 _AC(0xffffc90000000000, UL)
+#define __VMEMMAP_BASE	 _AC(0xffffea0000000000, UL)
+#ifdef CONFIG_RANDOMIZE_MEMORY
+#define VMALLOC_START	 vmalloc_base
+#define VMEMMAP_START	 vmemmap_base
+#else
+#define VMALLOC_START	 __VMALLOC_BASE
+#define VMEMMAP_START	 __VMEMMAP_BASE
+#endif /* CONFIG_RANDOMIZE_MEMORY */
+#define VMALLOC_END      (VMALLOC_START + _AC((VMALLOC_SIZE_TB << 40) - 1, UL))
 #define MODULES_VADDR    (__START_KERNEL_map + KERNEL_IMAGE_SIZE)
 #define MODULES_END      _AC(0xffffffffff000000, UL)
 #define MODULES_LEN   (MODULES_END - MODULES_VADDR)
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 5df831e..03a2aa0 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -38,7 +38,7 @@
 
 #define pud_index(x)	(((x) >> PUD_SHIFT) & (PTRS_PER_PUD-1))
 
-L4_PAGE_OFFSET = pgd_index(__PAGE_OFFSET)
+L4_PAGE_OFFSET = pgd_index(__PAGE_OFFSET_BASE)
 L4_START_KERNEL = pgd_index(__START_KERNEL_map)
 L3_START_KERNEL = pud_index(__START_KERNEL_map)
 
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index c4e7b39..a261658 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -113,6 +113,7 @@
 #include <asm/prom.h>
 #include <asm/microcode.h>
 #include <asm/mmu_context.h>
+#include <asm/kaslr.h>
 
 /*
  * max_low_pfn_mapped: highest direct mapped pfn under 4GB
@@ -942,6 +943,8 @@ void __init setup_arch(char **cmdline_p)
 
 	x86_init.oem.arch_setup();
 
+	kernel_randomize_memory();
+
 	iomem_resource.end = (1ULL << boot_cpu_data.x86_phys_bits) - 1;
 	setup_memory_map();
 	parse_setup_data();
diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index 62c0043..96d2b84 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -37,4 +37,5 @@ obj-$(CONFIG_NUMA_EMU)		+= numa_emulation.o
 
 obj-$(CONFIG_X86_INTEL_MPX)	+= mpx.o
 obj-$(CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) += pkeys.o
+obj-$(CONFIG_RANDOMIZE_MEMORY) += kaslr.o
 
diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c
index 99bfb19..4a03f60 100644
--- a/arch/x86/mm/dump_pagetables.c
+++ b/arch/x86/mm/dump_pagetables.c
@@ -72,9 +72,9 @@ static struct addr_marker address_markers[] = {
 	{ 0, "User Space" },
 #ifdef CONFIG_X86_64
 	{ 0x8000000000000000UL, "Kernel Space" },
-	{ PAGE_OFFSET,		"Low Kernel Mapping" },
-	{ VMALLOC_START,        "vmalloc() Area" },
-	{ VMEMMAP_START,        "Vmemmap" },
+	{ 0/* PAGE_OFFSET */,   "Low Kernel Mapping" },
+	{ 0/* VMALLOC_START */, "vmalloc() Area" },
+	{ 0/* VMEMMAP_START */, "Vmemmap" },
 # ifdef CONFIG_X86_ESPFIX64
 	{ ESPFIX_BASE_ADDR,	"ESPfix Area", 16 },
 # endif
@@ -434,6 +434,11 @@ void ptdump_walk_pgd_level_checkwx(void)
 
 static int __init pt_dump_init(void)
 {
+#ifdef CONFIG_X86_64
+	address_markers[LOW_KERNEL_NR].start_address = PAGE_OFFSET;
+	address_markers[VMALLOC_START_NR].start_address = VMALLOC_START;
+	address_markers[VMEMMAP_START_NR].start_address = VMEMMAP_START;
+#endif
 #ifdef CONFIG_X86_32
 	/* Not a compile-time constant on x86-32 */
 	address_markers[VMALLOC_START_NR].start_address = VMALLOC_START;
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 372aad2..e490624 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -17,6 +17,7 @@
 #include <asm/proto.h>
 #include <asm/dma.h>		/* for MAX_DMA_PFN */
 #include <asm/microcode.h>
+#include <asm/kaslr.h>
 
 /*
  * We need to define the tracepoints somewhere, and tlb.c
@@ -590,6 +591,9 @@ void __init init_mem_mapping(void)
 	/* the ISA range is always mapped regardless of memory holes */
 	init_memory_mapping(0, ISA_END_ADDRESS);
 
+	/* Init the trampoline page table if needed for KASLR memory */
+	kaslr_trampoline_init();
+
 	/*
 	 * If the allocation is in bottom-up direction, we setup direct mapping
 	 * in bottom-up, otherwise we setup direct mapping in top-down.
diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
new file mode 100644
index 0000000..3b330a9
--- /dev/null
+++ b/arch/x86/mm/kaslr.c
@@ -0,0 +1,136 @@
+#include <linux/kernel.h>
+#include <linux/errno.h>
+#include <linux/types.h>
+#include <linux/mm.h>
+#include <linux/smp.h>
+#include <linux/init.h>
+#include <linux/memory.h>
+#include <linux/random.h>
+
+#include <asm/processor.h>
+#include <asm/pgtable.h>
+#include <asm/pgalloc.h>
+#include <asm/e820.h>
+#include <asm/init.h>
+#include <asm/setup.h>
+#include <asm/kaslr.h>
+#include <asm/kasan.h>
+
+#include "mm_internal.h"
+
+/* Hold the pgd entry used on booting additional CPUs */
+pgd_t trampoline_pgd_entry;
+
+#define TB_SHIFT 40
+
+/*
+ * Memory base and end randomization is based on different configurations.
+ * We want as much space as possible to increase entropy available.
+ */
+static const unsigned long memory_rand_start = __PAGE_OFFSET_BASE;
+
+#if defined(CONFIG_KASAN)
+static const unsigned long memory_rand_end = KASAN_SHADOW_START;
+#elif defined(CONFIG_X86_ESPFIX64)
+static const unsigned long memory_rand_end = ESPFIX_BASE_ADDR;
+#elif defined(CONFIG_EFI)
+static const unsigned long memory_rand_end = EFI_VA_START;
+#else
+static const unsigned long memory_rand_end = __START_KERNEL_map;
+#endif
+
+/* Default values */
+unsigned long page_offset_base = __PAGE_OFFSET_BASE;
+EXPORT_SYMBOL(page_offset_base);
+unsigned long vmalloc_base = __VMALLOC_BASE;
+EXPORT_SYMBOL(vmalloc_base);
+unsigned long vmemmap_base = __VMEMMAP_BASE;
+EXPORT_SYMBOL(vmemmap_base);
+
+/* Describe each randomized memory sections in sequential order */
+static struct kaslr_memory_region {
+	unsigned long *base;
+	unsigned short size_tb;
+} kaslr_regions[] = {
+	{ &page_offset_base, 64/* Maximum */ },
+	{ &vmalloc_base, VMALLOC_SIZE_TB },
+	{ &vmemmap_base, 1 },
+};
+
+/* Size in Terabytes + 1 hole */
+static inline unsigned long get_padding(struct kaslr_memory_region *region)
+{
+	return ((unsigned long)region->size_tb + 1) << TB_SHIFT;
+}
+
+/* Initialize base and padding for each memory section randomized with KASLR */
+void __init kernel_randomize_memory(void)
+{
+	size_t i;
+	unsigned long addr = memory_rand_start;
+	unsigned long padding, rand, mem_tb;
+	struct rnd_state rnd_st;
+	unsigned long remain_padding = memory_rand_end - memory_rand_start;
+
+	if (!kaslr_enabled())
+		return;
+
+	BUG_ON(kaslr_regions[0].base != &page_offset_base);
+	mem_tb = ((max_pfn << PAGE_SHIFT) >> TB_SHIFT);
+
+	if (mem_tb < kaslr_regions[0].size_tb)
+		kaslr_regions[0].size_tb = mem_tb;
+
+	for (i = 0; i < ARRAY_SIZE(kaslr_regions); i++)
+		remain_padding -= get_padding(&kaslr_regions[i]);
+
+	prandom_seed_state(&rnd_st, kaslr_get_random_boot_long());
+
+	/* Position each section randomly with minimum 1 terabyte between */
+	for (i = 0; i < ARRAY_SIZE(kaslr_regions); i++) {
+		padding = remain_padding / (ARRAY_SIZE(kaslr_regions) - i);
+		prandom_bytes_state(&rnd_st, &rand, sizeof(rand));
+		padding = (rand % (padding + 1)) & PUD_MASK;
+		addr += padding;
+		*kaslr_regions[i].base = addr;
+		addr += get_padding(&kaslr_regions[i]);
+		remain_padding -= padding;
+	}
+}
+
+/*
+ * Create PGD aligned trampoline table to allow real mode initialization
+ * of additional CPUs. Consume only 1 additonal low memory page.
+ */
+void __meminit kaslr_trampoline_init(void)
+{
+	unsigned long addr, next;
+	pgd_t *pgd;
+	pud_t *pud_page, *tr_pud_page;
+	int i;
+
+	/* If KASLR is disabled, default to the existing page table entry */
+	if (!kaslr_enabled()) {
+		trampoline_pgd_entry = init_level4_pgt[pgd_index(PAGE_OFFSET)];
+		return;
+	}
+
+	tr_pud_page = alloc_low_page();
+	set_pgd(&trampoline_pgd_entry, __pgd(_PAGE_TABLE | __pa(tr_pud_page)));
+
+	addr = 0;
+	pgd = pgd_offset_k((unsigned long)__va(addr));
+	pud_page = (pud_t *) pgd_page_vaddr(*pgd);
+
+	for (i = pud_index(addr); i < PTRS_PER_PUD; i++, addr = next) {
+		pud_t *pud, *tr_pud;
+
+		tr_pud = tr_pud_page + pud_index(addr);
+		pud = pud_page + pud_index((unsigned long)__va(addr));
+		next = (addr & PUD_MASK) + PUD_SIZE;
+
+		/* Needed to copy pte or pud alike */
+		BUILD_BUG_ON(sizeof(pud_t) != sizeof(pte_t));
+		*tr_pud = *pud;
+	}
+}
diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
index 0b7a63d..6518314 100644
--- a/arch/x86/realmode/init.c
+++ b/arch/x86/realmode/init.c
@@ -84,7 +84,11 @@ void __init setup_real_mode(void)
 	*trampoline_cr4_features = __read_cr4();
 
 	trampoline_pgd = (u64 *) __va(real_mode_header->trampoline_pgd);
+#ifdef CONFIG_RANDOMIZE_MEMORY
+	trampoline_pgd[0] = trampoline_pgd_entry.pgd;
+#else
 	trampoline_pgd[0] = init_level4_pgt[pgd_index(__PAGE_OFFSET)].pgd;
+#endif
 	trampoline_pgd[511] = init_level4_pgt[511].pgd;
 #endif
 }
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [kernel-hardening] [PATCH v3 3/4] x86, boot: Implement ASLR for kernel memory sections (x86_64)
@ 2016-05-03 19:31   ` Thomas Garnier
  0 siblings, 0 replies; 22+ messages in thread
From: Thomas Garnier @ 2016-05-03 19:31 UTC (permalink / raw)
  To: H . Peter Anvin, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Andy Lutomirski, Thomas Garnier, Dmitry Vyukov, Paolo Bonzini,
	Dan Williams, Kees Cook, Stephen Smalley, Kefeng Wang,
	Jonathan Corbet, Matt Fleming, Toshi Kani, Alexander Kuleshov,
	Alexander Popov, Joerg Roedel, Dave Young, Baoquan He,
	Dave Hansen, Mark Salter, Boris Ostrovsky
  Cc: x86, linux-kernel, linux-doc, gthelen, kernel-hardening

Randomizes the virtual address space of kernel memory sections (physical
memory mapping, vmalloc & vmemmap) for x86_64. This security feature
mitigates exploits relying on predictable kernel addresses. These
addresses can be used to disclose the kernel modules base addresses or
corrupt specific structures to elevate privileges bypassing the current
implementation of KASLR. This feature can be enabled with the
CONFIG_RANDOMIZE_MEMORY option.

The physical memory mapping holds most allocations from boot and heap
allocators. Knowning the base address and physical memory size, an
attacker can deduce the PDE virtual address for the vDSO memory page.
This attack was demonstrated at CanSecWest 2016, in the "Getting
Physical Extreme Abuse of Intel Based Paged Systems"
https://goo.gl/ANpWdV (see second part of the presentation). The
exploits used against Linux worked successfuly against 4.6+ but fail
with KASLR memory enabled (https://goo.gl/iTtXMJ). Similar research
was done at Google leading to this patch proposal. Variants exists to
overwrite /proc or /sys objects ACLs leading to elevation of privileges.
These variants were testeda against 4.6+.

The vmalloc memory section contains the allocation made through the
vmalloc api. The allocations are done sequentially to prevent
fragmentation and each allocation address can easily be deduced
especially from boot.

The vmemmap section holds a representation of the physical
memory (through a struct page array). An attacker could use this section
to disclose the kernel memory layout (walking the page linked list).

The order of each memory section is not changed. The feature looks at
the available space for the sections based on different configuration
options and randomizes the base and space between each. The size of the
physical memory mapping is the available physical memory. No performance
impact was detected while testing the feature.

Entropy is generated using the KASLR early boot functions now shared in
the lib directory (originally written by Kees Cook). Randomization is
done on PGD & PUD page table levels to increase possible addresses. The
physical memory mapping code was adapted to support PUD level virtual
addresses. An additional low memory page is used to ensure each CPU can
start with a PGD aligned virtual address (for realmode).

x86/dump_pagetable was updated to correctly display each section.

Updated documentation on x86_64 memory layout accordingly.

Performance data:

Kernbench shows almost no difference (-+ less than 1%):

Before:

Average Optimal load -j 12 Run (std deviation):
Elapsed Time 102.63 (1.2695)
User Time 1034.89 (1.18115)
System Time 87.056 (0.456416)
Percent CPU 1092.9 (13.892)
Context Switches 199805 (3455.33)
Sleeps 97907.8 (900.636)

After:

Average Optimal load -j 12 Run (std deviation):
Elapsed Time 102.489 (1.10636)
User Time 1034.86 (1.36053)
System Time 87.764 (0.49345)
Percent CPU 1095 (12.7715)
Context Switches 199036 (4298.1)
Sleeps 97681.6 (1031.11)

Hackbench shows 0% difference on average (hackbench 90
repeated 10 times):

attemp,before,after
1,0.076,0.069
2,0.072,0.069
3,0.066,0.066
4,0.066,0.068
5,0.066,0.067
6,0.066,0.069
7,0.067,0.066
8,0.063,0.067
9,0.067,0.065
10,0.068,0.071
average,0.0677,0.0677

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
Based on next-20160502
---
 Documentation/x86/x86_64/mm.txt         |   4 +
 arch/x86/Kconfig                        |  15 ++++
 arch/x86/include/asm/kaslr.h            |  12 +++
 arch/x86/include/asm/page_64_types.h    |  11 ++-
 arch/x86/include/asm/pgtable_64.h       |   1 +
 arch/x86/include/asm/pgtable_64_types.h |  15 +++-
 arch/x86/kernel/head_64.S               |   2 +-
 arch/x86/kernel/setup.c                 |   3 +
 arch/x86/mm/Makefile                    |   1 +
 arch/x86/mm/dump_pagetables.c           |  11 ++-
 arch/x86/mm/init.c                      |   4 +
 arch/x86/mm/kaslr.c                     | 136 ++++++++++++++++++++++++++++++++
 arch/x86/realmode/init.c                |   4 +
 13 files changed, 211 insertions(+), 8 deletions(-)
 create mode 100644 arch/x86/mm/kaslr.c

diff --git a/Documentation/x86/x86_64/mm.txt b/Documentation/x86/x86_64/mm.txt
index 5aa7383..602a52d 100644
--- a/Documentation/x86/x86_64/mm.txt
+++ b/Documentation/x86/x86_64/mm.txt
@@ -39,4 +39,8 @@ memory window (this size is arbitrary, it can be raised later if needed).
 The mappings are not part of any other kernel PGD and are only available
 during EFI runtime calls.
 
+Note that if CONFIG_RANDOMIZE_MEMORY is enabled, the direct mapping of all
+physical memory, vmalloc/ioremap space and virtual memory map are randomized.
+Their order is preserved but their base will be changed early at boot time.
+
 -Andi Kleen, Jul 2004
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 0b128b4..60f33c7 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1988,6 +1988,21 @@ config PHYSICAL_ALIGN
 
 	  Don't change this unless you know what you are doing.
 
+config RANDOMIZE_MEMORY
+	bool "Randomize the kernel memory sections"
+	depends on X86_64
+	depends on RANDOMIZE_BASE
+	default n
+	---help---
+	   Randomizes the virtual address of memory sections (physical memory
+	   mapping, vmalloc & vmemmap). This security feature mitigates exploits
+	   relying on predictable memory locations.
+
+	   Base and padding between memory section is randomized. Their order is
+	   not. Entropy is generated in the same way as RANDOMIZE_BASE.
+
+	   If unsure, say N.
+
 config HOTPLUG_CPU
 	bool "Support for hot-pluggable CPUs"
 	depends on SMP
diff --git a/arch/x86/include/asm/kaslr.h b/arch/x86/include/asm/kaslr.h
index 2ae1429..12c7742 100644
--- a/arch/x86/include/asm/kaslr.h
+++ b/arch/x86/include/asm/kaslr.h
@@ -3,4 +3,16 @@
 
 unsigned long kaslr_get_random_boot_long(void);
 
+#ifdef CONFIG_RANDOMIZE_MEMORY
+extern unsigned long page_offset_base;
+extern unsigned long vmalloc_base;
+extern unsigned long vmemmap_base;
+
+void kernel_randomize_memory(void);
+void kaslr_trampoline_init(void);
+#else
+static inline void kernel_randomize_memory(void) { }
+static inline void kaslr_trampoline_init(void) { }
+#endif /* CONFIG_RANDOMIZE_MEMORY */
+
 #endif
diff --git a/arch/x86/include/asm/page_64_types.h b/arch/x86/include/asm/page_64_types.h
index d5c2f8b..9215e05 100644
--- a/arch/x86/include/asm/page_64_types.h
+++ b/arch/x86/include/asm/page_64_types.h
@@ -1,6 +1,10 @@
 #ifndef _ASM_X86_PAGE_64_DEFS_H
 #define _ASM_X86_PAGE_64_DEFS_H
 
+#ifndef __ASSEMBLY__
+#include <asm/kaslr.h>
+#endif
+
 #ifdef CONFIG_KASAN
 #define KASAN_STACK_ORDER 1
 #else
@@ -32,7 +36,12 @@
  * hypervisor to fit.  Choosing 16 slots here is arbitrary, but it's
  * what Xen requires.
  */
-#define __PAGE_OFFSET           _AC(0xffff880000000000, UL)
+#define __PAGE_OFFSET_BASE      _AC(0xffff880000000000, UL)
+#ifdef CONFIG_RANDOMIZE_MEMORY
+#define __PAGE_OFFSET           page_offset_base
+#else
+#define __PAGE_OFFSET           __PAGE_OFFSET_BASE
+#endif /* CONFIG_RANDOMIZE_MEMORY */
 
 #define __START_KERNEL_map	_AC(0xffffffff80000000, UL)
 
diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h
index 2ee7811..0dfec89 100644
--- a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -21,6 +21,7 @@ extern pmd_t level2_fixmap_pgt[512];
 extern pmd_t level2_ident_pgt[512];
 extern pte_t level1_fixmap_pgt[512];
 extern pgd_t init_level4_pgt[];
+extern pgd_t trampoline_pgd_entry;
 
 #define swapper_pg_dir init_level4_pgt
 
diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h
index e6844df..d388739 100644
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -5,6 +5,7 @@
 
 #ifndef __ASSEMBLY__
 #include <linux/types.h>
+#include <asm/kaslr.h>
 
 /*
  * These are used to make use of C type-checking..
@@ -54,9 +55,17 @@ typedef struct { pteval_t pte; } pte_t;
 
 /* See Documentation/x86/x86_64/mm.txt for a description of the memory map. */
 #define MAXMEM		 _AC(__AC(1, UL) << MAX_PHYSMEM_BITS, UL)
-#define VMALLOC_START    _AC(0xffffc90000000000, UL)
-#define VMALLOC_END      _AC(0xffffe8ffffffffff, UL)
-#define VMEMMAP_START	 _AC(0xffffea0000000000, UL)
+#define VMALLOC_SIZE_TB	 _AC(32, UL)
+#define __VMALLOC_BASE	 _AC(0xffffc90000000000, UL)
+#define __VMEMMAP_BASE	 _AC(0xffffea0000000000, UL)
+#ifdef CONFIG_RANDOMIZE_MEMORY
+#define VMALLOC_START	 vmalloc_base
+#define VMEMMAP_START	 vmemmap_base
+#else
+#define VMALLOC_START	 __VMALLOC_BASE
+#define VMEMMAP_START	 __VMEMMAP_BASE
+#endif /* CONFIG_RANDOMIZE_MEMORY */
+#define VMALLOC_END      (VMALLOC_START + _AC((VMALLOC_SIZE_TB << 40) - 1, UL))
 #define MODULES_VADDR    (__START_KERNEL_map + KERNEL_IMAGE_SIZE)
 #define MODULES_END      _AC(0xffffffffff000000, UL)
 #define MODULES_LEN   (MODULES_END - MODULES_VADDR)
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 5df831e..03a2aa0 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -38,7 +38,7 @@
 
 #define pud_index(x)	(((x) >> PUD_SHIFT) & (PTRS_PER_PUD-1))
 
-L4_PAGE_OFFSET = pgd_index(__PAGE_OFFSET)
+L4_PAGE_OFFSET = pgd_index(__PAGE_OFFSET_BASE)
 L4_START_KERNEL = pgd_index(__START_KERNEL_map)
 L3_START_KERNEL = pud_index(__START_KERNEL_map)
 
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index c4e7b39..a261658 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -113,6 +113,7 @@
 #include <asm/prom.h>
 #include <asm/microcode.h>
 #include <asm/mmu_context.h>
+#include <asm/kaslr.h>
 
 /*
  * max_low_pfn_mapped: highest direct mapped pfn under 4GB
@@ -942,6 +943,8 @@ void __init setup_arch(char **cmdline_p)
 
 	x86_init.oem.arch_setup();
 
+	kernel_randomize_memory();
+
 	iomem_resource.end = (1ULL << boot_cpu_data.x86_phys_bits) - 1;
 	setup_memory_map();
 	parse_setup_data();
diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index 62c0043..96d2b84 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -37,4 +37,5 @@ obj-$(CONFIG_NUMA_EMU)		+= numa_emulation.o
 
 obj-$(CONFIG_X86_INTEL_MPX)	+= mpx.o
 obj-$(CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) += pkeys.o
+obj-$(CONFIG_RANDOMIZE_MEMORY) += kaslr.o
 
diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c
index 99bfb19..4a03f60 100644
--- a/arch/x86/mm/dump_pagetables.c
+++ b/arch/x86/mm/dump_pagetables.c
@@ -72,9 +72,9 @@ static struct addr_marker address_markers[] = {
 	{ 0, "User Space" },
 #ifdef CONFIG_X86_64
 	{ 0x8000000000000000UL, "Kernel Space" },
-	{ PAGE_OFFSET,		"Low Kernel Mapping" },
-	{ VMALLOC_START,        "vmalloc() Area" },
-	{ VMEMMAP_START,        "Vmemmap" },
+	{ 0/* PAGE_OFFSET */,   "Low Kernel Mapping" },
+	{ 0/* VMALLOC_START */, "vmalloc() Area" },
+	{ 0/* VMEMMAP_START */, "Vmemmap" },
 # ifdef CONFIG_X86_ESPFIX64
 	{ ESPFIX_BASE_ADDR,	"ESPfix Area", 16 },
 # endif
@@ -434,6 +434,11 @@ void ptdump_walk_pgd_level_checkwx(void)
 
 static int __init pt_dump_init(void)
 {
+#ifdef CONFIG_X86_64
+	address_markers[LOW_KERNEL_NR].start_address = PAGE_OFFSET;
+	address_markers[VMALLOC_START_NR].start_address = VMALLOC_START;
+	address_markers[VMEMMAP_START_NR].start_address = VMEMMAP_START;
+#endif
 #ifdef CONFIG_X86_32
 	/* Not a compile-time constant on x86-32 */
 	address_markers[VMALLOC_START_NR].start_address = VMALLOC_START;
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 372aad2..e490624 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -17,6 +17,7 @@
 #include <asm/proto.h>
 #include <asm/dma.h>		/* for MAX_DMA_PFN */
 #include <asm/microcode.h>
+#include <asm/kaslr.h>
 
 /*
  * We need to define the tracepoints somewhere, and tlb.c
@@ -590,6 +591,9 @@ void __init init_mem_mapping(void)
 	/* the ISA range is always mapped regardless of memory holes */
 	init_memory_mapping(0, ISA_END_ADDRESS);
 
+	/* Init the trampoline page table if needed for KASLR memory */
+	kaslr_trampoline_init();
+
 	/*
 	 * If the allocation is in bottom-up direction, we setup direct mapping
 	 * in bottom-up, otherwise we setup direct mapping in top-down.
diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
new file mode 100644
index 0000000..3b330a9
--- /dev/null
+++ b/arch/x86/mm/kaslr.c
@@ -0,0 +1,136 @@
+#include <linux/kernel.h>
+#include <linux/errno.h>
+#include <linux/types.h>
+#include <linux/mm.h>
+#include <linux/smp.h>
+#include <linux/init.h>
+#include <linux/memory.h>
+#include <linux/random.h>
+
+#include <asm/processor.h>
+#include <asm/pgtable.h>
+#include <asm/pgalloc.h>
+#include <asm/e820.h>
+#include <asm/init.h>
+#include <asm/setup.h>
+#include <asm/kaslr.h>
+#include <asm/kasan.h>
+
+#include "mm_internal.h"
+
+/* Hold the pgd entry used on booting additional CPUs */
+pgd_t trampoline_pgd_entry;
+
+#define TB_SHIFT 40
+
+/*
+ * Memory base and end randomization is based on different configurations.
+ * We want as much space as possible to increase entropy available.
+ */
+static const unsigned long memory_rand_start = __PAGE_OFFSET_BASE;
+
+#if defined(CONFIG_KASAN)
+static const unsigned long memory_rand_end = KASAN_SHADOW_START;
+#elif defined(CONFIG_X86_ESPFIX64)
+static const unsigned long memory_rand_end = ESPFIX_BASE_ADDR;
+#elif defined(CONFIG_EFI)
+static const unsigned long memory_rand_end = EFI_VA_START;
+#else
+static const unsigned long memory_rand_end = __START_KERNEL_map;
+#endif
+
+/* Default values */
+unsigned long page_offset_base = __PAGE_OFFSET_BASE;
+EXPORT_SYMBOL(page_offset_base);
+unsigned long vmalloc_base = __VMALLOC_BASE;
+EXPORT_SYMBOL(vmalloc_base);
+unsigned long vmemmap_base = __VMEMMAP_BASE;
+EXPORT_SYMBOL(vmemmap_base);
+
+/* Describe each randomized memory sections in sequential order */
+static struct kaslr_memory_region {
+	unsigned long *base;
+	unsigned short size_tb;
+} kaslr_regions[] = {
+	{ &page_offset_base, 64/* Maximum */ },
+	{ &vmalloc_base, VMALLOC_SIZE_TB },
+	{ &vmemmap_base, 1 },
+};
+
+/* Size in Terabytes + 1 hole */
+static inline unsigned long get_padding(struct kaslr_memory_region *region)
+{
+	return ((unsigned long)region->size_tb + 1) << TB_SHIFT;
+}
+
+/* Initialize base and padding for each memory section randomized with KASLR */
+void __init kernel_randomize_memory(void)
+{
+	size_t i;
+	unsigned long addr = memory_rand_start;
+	unsigned long padding, rand, mem_tb;
+	struct rnd_state rnd_st;
+	unsigned long remain_padding = memory_rand_end - memory_rand_start;
+
+	if (!kaslr_enabled())
+		return;
+
+	BUG_ON(kaslr_regions[0].base != &page_offset_base);
+	mem_tb = ((max_pfn << PAGE_SHIFT) >> TB_SHIFT);
+
+	if (mem_tb < kaslr_regions[0].size_tb)
+		kaslr_regions[0].size_tb = mem_tb;
+
+	for (i = 0; i < ARRAY_SIZE(kaslr_regions); i++)
+		remain_padding -= get_padding(&kaslr_regions[i]);
+
+	prandom_seed_state(&rnd_st, kaslr_get_random_boot_long());
+
+	/* Position each section randomly with minimum 1 terabyte between */
+	for (i = 0; i < ARRAY_SIZE(kaslr_regions); i++) {
+		padding = remain_padding / (ARRAY_SIZE(kaslr_regions) - i);
+		prandom_bytes_state(&rnd_st, &rand, sizeof(rand));
+		padding = (rand % (padding + 1)) & PUD_MASK;
+		addr += padding;
+		*kaslr_regions[i].base = addr;
+		addr += get_padding(&kaslr_regions[i]);
+		remain_padding -= padding;
+	}
+}
+
+/*
+ * Create PGD aligned trampoline table to allow real mode initialization
+ * of additional CPUs. Consume only 1 additonal low memory page.
+ */
+void __meminit kaslr_trampoline_init(void)
+{
+	unsigned long addr, next;
+	pgd_t *pgd;
+	pud_t *pud_page, *tr_pud_page;
+	int i;
+
+	/* If KASLR is disabled, default to the existing page table entry */
+	if (!kaslr_enabled()) {
+		trampoline_pgd_entry = init_level4_pgt[pgd_index(PAGE_OFFSET)];
+		return;
+	}
+
+	tr_pud_page = alloc_low_page();
+	set_pgd(&trampoline_pgd_entry, __pgd(_PAGE_TABLE | __pa(tr_pud_page)));
+
+	addr = 0;
+	pgd = pgd_offset_k((unsigned long)__va(addr));
+	pud_page = (pud_t *) pgd_page_vaddr(*pgd);
+
+	for (i = pud_index(addr); i < PTRS_PER_PUD; i++, addr = next) {
+		pud_t *pud, *tr_pud;
+
+		tr_pud = tr_pud_page + pud_index(addr);
+		pud = pud_page + pud_index((unsigned long)__va(addr));
+		next = (addr & PUD_MASK) + PUD_SIZE;
+
+		/* Needed to copy pte or pud alike */
+		BUILD_BUG_ON(sizeof(pud_t) != sizeof(pte_t));
+		*tr_pud = *pud;
+	}
+}
diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
index 0b7a63d..6518314 100644
--- a/arch/x86/realmode/init.c
+++ b/arch/x86/realmode/init.c
@@ -84,7 +84,11 @@ void __init setup_real_mode(void)
 	*trampoline_cr4_features = __read_cr4();
 
 	trampoline_pgd = (u64 *) __va(real_mode_header->trampoline_pgd);
+#ifdef CONFIG_RANDOMIZE_MEMORY
+	trampoline_pgd[0] = trampoline_pgd_entry.pgd;
+#else
 	trampoline_pgd[0] = init_level4_pgt[pgd_index(__PAGE_OFFSET)].pgd;
+#endif
 	trampoline_pgd[511] = init_level4_pgt[511].pgd;
 #endif
 }
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 4/4] x86, boot: Memory hotplug support for KASLR memory randomization
  2016-05-03 19:31 ` [kernel-hardening] " Thomas Garnier
@ 2016-05-03 19:31   ` Thomas Garnier
  -1 siblings, 0 replies; 22+ messages in thread
From: Thomas Garnier @ 2016-05-03 19:31 UTC (permalink / raw)
  To: H . Peter Anvin, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Andy Lutomirski, Thomas Garnier, Dmitry Vyukov, Paolo Bonzini,
	Dan Williams, Kees Cook, Stephen Smalley, Kefeng Wang,
	Jonathan Corbet, Matt Fleming, Toshi Kani, Alexander Kuleshov,
	Alexander Popov, Joerg Roedel, Dave Young, Baoquan He,
	Dave Hansen, Mark Salter, Boris Ostrovsky
  Cc: x86, linux-kernel, linux-doc, gthelen, kernel-hardening

Add a new option (CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING) to define
the padding used for the physical memory mapping section when KASLR
memory is enabled. It ensures there is enough virtual address space when
CONFIG_MEMORY_HOTPLUG is used. The default value is 10 terabytes. If
CONFIG_MEMORY_HOTPLUG is not used, no space is reserved increasing the
entropy available.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
Based on next-20160502
---
 arch/x86/Kconfig    | 15 +++++++++++++++
 arch/x86/mm/kaslr.c | 14 ++++++++++++--
 2 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 60f33c7..5124d9c 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2003,6 +2003,21 @@ config RANDOMIZE_MEMORY
 
 	   If unsure, say N.
 
+config RANDOMIZE_MEMORY_PHYSICAL_PADDING
+	hex "Physical memory mapping padding" if EXPERT
+	depends on RANDOMIZE_MEMORY
+	default "0xa" if MEMORY_HOTPLUG
+	default "0x0"
+	range 0x1 0x40 if MEMORY_HOTPLUG
+	range 0x0 0x40
+	---help---
+	   Define the padding in terabyte added to the existing physical memory
+	   size during kernel memory randomization. It is useful for memory
+	   hotplug support but reduces the entropy available for address
+	   randomization.
+
+	   If unsure, leave at the default value.
+
 config HOTPLUG_CPU
 	bool "Support for hot-pluggable CPUs"
 	depends on SMP
diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
index 3b330a9..ef3dc19 100644
--- a/arch/x86/mm/kaslr.c
+++ b/arch/x86/mm/kaslr.c
@@ -68,15 +68,25 @@ void __init kernel_randomize_memory(void)
 {
 	size_t i;
 	unsigned long addr = memory_rand_start;
-	unsigned long padding, rand, mem_tb;
+	unsigned long padding, rand, mem_tb, page_offset_padding;
 	struct rnd_state rnd_st;
 	unsigned long remain_padding = memory_rand_end - memory_rand_start;
 
 	if (!kaslr_enabled())
 		return;
 
+	/*
+	 * Update Physical memory mapping to available and
+	 * add padding if needed (especially for memory hotplug support).
+	 */
+	page_offset_padding = CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING;
+
+#ifdef CONFIG_MEMORY_HOTPLUG
+	page_offset_padding = max(1UL, page_offset_padding);
+#endif
+
 	BUG_ON(kaslr_regions[0].base != &page_offset_base);
-	mem_tb = ((max_pfn << PAGE_SHIFT) >> TB_SHIFT);
+	mem_tb = ((max_pfn << PAGE_SHIFT) >> TB_SHIFT) + page_offset_padding;
 
 	if (mem_tb < kaslr_regions[0].size_tb)
 		kaslr_regions[0].size_tb = mem_tb;
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [kernel-hardening] [PATCH v3 4/4] x86, boot: Memory hotplug support for KASLR memory randomization
@ 2016-05-03 19:31   ` Thomas Garnier
  0 siblings, 0 replies; 22+ messages in thread
From: Thomas Garnier @ 2016-05-03 19:31 UTC (permalink / raw)
  To: H . Peter Anvin, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Andy Lutomirski, Thomas Garnier, Dmitry Vyukov, Paolo Bonzini,
	Dan Williams, Kees Cook, Stephen Smalley, Kefeng Wang,
	Jonathan Corbet, Matt Fleming, Toshi Kani, Alexander Kuleshov,
	Alexander Popov, Joerg Roedel, Dave Young, Baoquan He,
	Dave Hansen, Mark Salter, Boris Ostrovsky
  Cc: x86, linux-kernel, linux-doc, gthelen, kernel-hardening

Add a new option (CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING) to define
the padding used for the physical memory mapping section when KASLR
memory is enabled. It ensures there is enough virtual address space when
CONFIG_MEMORY_HOTPLUG is used. The default value is 10 terabytes. If
CONFIG_MEMORY_HOTPLUG is not used, no space is reserved increasing the
entropy available.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
Based on next-20160502
---
 arch/x86/Kconfig    | 15 +++++++++++++++
 arch/x86/mm/kaslr.c | 14 ++++++++++++--
 2 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 60f33c7..5124d9c 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2003,6 +2003,21 @@ config RANDOMIZE_MEMORY
 
 	   If unsure, say N.
 
+config RANDOMIZE_MEMORY_PHYSICAL_PADDING
+	hex "Physical memory mapping padding" if EXPERT
+	depends on RANDOMIZE_MEMORY
+	default "0xa" if MEMORY_HOTPLUG
+	default "0x0"
+	range 0x1 0x40 if MEMORY_HOTPLUG
+	range 0x0 0x40
+	---help---
+	   Define the padding in terabyte added to the existing physical memory
+	   size during kernel memory randomization. It is useful for memory
+	   hotplug support but reduces the entropy available for address
+	   randomization.
+
+	   If unsure, leave at the default value.
+
 config HOTPLUG_CPU
 	bool "Support for hot-pluggable CPUs"
 	depends on SMP
diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
index 3b330a9..ef3dc19 100644
--- a/arch/x86/mm/kaslr.c
+++ b/arch/x86/mm/kaslr.c
@@ -68,15 +68,25 @@ void __init kernel_randomize_memory(void)
 {
 	size_t i;
 	unsigned long addr = memory_rand_start;
-	unsigned long padding, rand, mem_tb;
+	unsigned long padding, rand, mem_tb, page_offset_padding;
 	struct rnd_state rnd_st;
 	unsigned long remain_padding = memory_rand_end - memory_rand_start;
 
 	if (!kaslr_enabled())
 		return;
 
+	/*
+	 * Update Physical memory mapping to available and
+	 * add padding if needed (especially for memory hotplug support).
+	 */
+	page_offset_padding = CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING;
+
+#ifdef CONFIG_MEMORY_HOTPLUG
+	page_offset_padding = max(1UL, page_offset_padding);
+#endif
+
 	BUG_ON(kaslr_regions[0].base != &page_offset_base);
-	mem_tb = ((max_pfn << PAGE_SHIFT) >> TB_SHIFT);
+	mem_tb = ((max_pfn << PAGE_SHIFT) >> TB_SHIFT) + page_offset_padding;
 
 	if (mem_tb < kaslr_regions[0].size_tb)
 		kaslr_regions[0].size_tb = mem_tb;
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 4/4] x86, boot: Memory hotplug support for KASLR memory randomization
  2016-05-03 19:31   ` [kernel-hardening] " Thomas Garnier
@ 2016-05-10 18:24     ` Kees Cook
  -1 siblings, 0 replies; 22+ messages in thread
From: Kees Cook @ 2016-05-10 18:24 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: H . Peter Anvin, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Andy Lutomirski, Dmitry Vyukov, Paolo Bonzini, Dan Williams,
	Stephen Smalley, Kefeng Wang, Jonathan Corbet, Matt Fleming,
	Toshi Kani, Alexander Kuleshov, Alexander Popov, Joerg Roedel,
	Dave Young, Baoquan He, Dave Hansen, Mark Salter,
	Boris Ostrovsky, x86, LKML, linux-doc, Greg Thelen,
	kernel-hardening

On Tue, May 3, 2016 at 12:31 PM, Thomas Garnier <thgarnie@google.com> wrote:
> Add a new option (CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING) to define
> the padding used for the physical memory mapping section when KASLR
> memory is enabled. It ensures there is enough virtual address space when
> CONFIG_MEMORY_HOTPLUG is used. The default value is 10 terabytes. If
> CONFIG_MEMORY_HOTPLUG is not used, no space is reserved increasing the
> entropy available.
>
> Signed-off-by: Thomas Garnier <thgarnie@google.com>
> ---
> Based on next-20160502
> ---
>  arch/x86/Kconfig    | 15 +++++++++++++++
>  arch/x86/mm/kaslr.c | 14 ++++++++++++--
>  2 files changed, 27 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 60f33c7..5124d9c 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -2003,6 +2003,21 @@ config RANDOMIZE_MEMORY
>
>            If unsure, say N.
>
> +config RANDOMIZE_MEMORY_PHYSICAL_PADDING
> +       hex "Physical memory mapping padding" if EXPERT
> +       depends on RANDOMIZE_MEMORY
> +       default "0xa" if MEMORY_HOTPLUG
> +       default "0x0"
> +       range 0x1 0x40 if MEMORY_HOTPLUG
> +       range 0x0 0x40
> +       ---help---
> +          Define the padding in terabyte added to the existing physical memory
> +          size during kernel memory randomization. It is useful for memory
> +          hotplug support but reduces the entropy available for address
> +          randomization.
> +
> +          If unsure, leave at the default value.
> +
>  config HOTPLUG_CPU
>         bool "Support for hot-pluggable CPUs"
>         depends on SMP
> diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
> index 3b330a9..ef3dc19 100644
> --- a/arch/x86/mm/kaslr.c
> +++ b/arch/x86/mm/kaslr.c
> @@ -68,15 +68,25 @@ void __init kernel_randomize_memory(void)
>  {
>         size_t i;
>         unsigned long addr = memory_rand_start;
> -       unsigned long padding, rand, mem_tb;
> +       unsigned long padding, rand, mem_tb, page_offset_padding;
>         struct rnd_state rnd_st;
>         unsigned long remain_padding = memory_rand_end - memory_rand_start;
>
>         if (!kaslr_enabled())
>                 return;
>
> +       /*
> +        * Update Physical memory mapping to available and
> +        * add padding if needed (especially for memory hotplug support).
> +        */
> +       page_offset_padding = CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING;
> +
> +#ifdef CONFIG_MEMORY_HOTPLUG
> +       page_offset_padding = max(1UL, page_offset_padding);
> +#endif

Can't the ifdef and max lines be dropped? The Kconfig already enforces
the range to have a minimum of 1 when CONFIG_MEMORY_HOTPLUG is used,
IIUC.

> +
>         BUG_ON(kaslr_regions[0].base != &page_offset_base);
> -       mem_tb = ((max_pfn << PAGE_SHIFT) >> TB_SHIFT);
> +       mem_tb = ((max_pfn << PAGE_SHIFT) >> TB_SHIFT) + page_offset_padding;

In fact, can't this variable be entirely dropped and the mem_tb
calculation could just refer to RANDOMIZE_MEMORY_PHYSICAL_PADDING
directly?

-Kees

>
>         if (mem_tb < kaslr_regions[0].size_tb)
>                 kaslr_regions[0].size_tb = mem_tb;
> --
> 2.8.0.rc3.226.g39d4020
>



-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [kernel-hardening] Re: [PATCH v3 4/4] x86, boot: Memory hotplug support for KASLR memory randomization
@ 2016-05-10 18:24     ` Kees Cook
  0 siblings, 0 replies; 22+ messages in thread
From: Kees Cook @ 2016-05-10 18:24 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: H . Peter Anvin, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Andy Lutomirski, Dmitry Vyukov, Paolo Bonzini, Dan Williams,
	Stephen Smalley, Kefeng Wang, Jonathan Corbet, Matt Fleming,
	Toshi Kani, Alexander Kuleshov, Alexander Popov, Joerg Roedel,
	Dave Young, Baoquan He, Dave Hansen, Mark Salter,
	Boris Ostrovsky, x86, LKML, linux-doc, Greg Thelen,
	kernel-hardening

On Tue, May 3, 2016 at 12:31 PM, Thomas Garnier <thgarnie@google.com> wrote:
> Add a new option (CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING) to define
> the padding used for the physical memory mapping section when KASLR
> memory is enabled. It ensures there is enough virtual address space when
> CONFIG_MEMORY_HOTPLUG is used. The default value is 10 terabytes. If
> CONFIG_MEMORY_HOTPLUG is not used, no space is reserved increasing the
> entropy available.
>
> Signed-off-by: Thomas Garnier <thgarnie@google.com>
> ---
> Based on next-20160502
> ---
>  arch/x86/Kconfig    | 15 +++++++++++++++
>  arch/x86/mm/kaslr.c | 14 ++++++++++++--
>  2 files changed, 27 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 60f33c7..5124d9c 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -2003,6 +2003,21 @@ config RANDOMIZE_MEMORY
>
>            If unsure, say N.
>
> +config RANDOMIZE_MEMORY_PHYSICAL_PADDING
> +       hex "Physical memory mapping padding" if EXPERT
> +       depends on RANDOMIZE_MEMORY
> +       default "0xa" if MEMORY_HOTPLUG
> +       default "0x0"
> +       range 0x1 0x40 if MEMORY_HOTPLUG
> +       range 0x0 0x40
> +       ---help---
> +          Define the padding in terabyte added to the existing physical memory
> +          size during kernel memory randomization. It is useful for memory
> +          hotplug support but reduces the entropy available for address
> +          randomization.
> +
> +          If unsure, leave at the default value.
> +
>  config HOTPLUG_CPU
>         bool "Support for hot-pluggable CPUs"
>         depends on SMP
> diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
> index 3b330a9..ef3dc19 100644
> --- a/arch/x86/mm/kaslr.c
> +++ b/arch/x86/mm/kaslr.c
> @@ -68,15 +68,25 @@ void __init kernel_randomize_memory(void)
>  {
>         size_t i;
>         unsigned long addr = memory_rand_start;
> -       unsigned long padding, rand, mem_tb;
> +       unsigned long padding, rand, mem_tb, page_offset_padding;
>         struct rnd_state rnd_st;
>         unsigned long remain_padding = memory_rand_end - memory_rand_start;
>
>         if (!kaslr_enabled())
>                 return;
>
> +       /*
> +        * Update Physical memory mapping to available and
> +        * add padding if needed (especially for memory hotplug support).
> +        */
> +       page_offset_padding = CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING;
> +
> +#ifdef CONFIG_MEMORY_HOTPLUG
> +       page_offset_padding = max(1UL, page_offset_padding);
> +#endif

Can't the ifdef and max lines be dropped? The Kconfig already enforces
the range to have a minimum of 1 when CONFIG_MEMORY_HOTPLUG is used,
IIUC.

> +
>         BUG_ON(kaslr_regions[0].base != &page_offset_base);
> -       mem_tb = ((max_pfn << PAGE_SHIFT) >> TB_SHIFT);
> +       mem_tb = ((max_pfn << PAGE_SHIFT) >> TB_SHIFT) + page_offset_padding;

In fact, can't this variable be entirely dropped and the mem_tb
calculation could just refer to RANDOMIZE_MEMORY_PHYSICAL_PADDING
directly?

-Kees

>
>         if (mem_tb < kaslr_regions[0].size_tb)
>                 kaslr_regions[0].size_tb = mem_tb;
> --
> 2.8.0.rc3.226.g39d4020
>



-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 4/4] x86, boot: Memory hotplug support for KASLR memory randomization
  2016-05-10 18:24     ` [kernel-hardening] " Kees Cook
@ 2016-05-10 18:49       ` Thomas Garnier
  -1 siblings, 0 replies; 22+ messages in thread
From: Thomas Garnier @ 2016-05-10 18:49 UTC (permalink / raw)
  To: Kees Cook
  Cc: H . Peter Anvin, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Andy Lutomirski, Dmitry Vyukov, Paolo Bonzini, Dan Williams,
	Stephen Smalley, Kefeng Wang, Jonathan Corbet, Matt Fleming,
	Toshi Kani, Alexander Kuleshov, Alexander Popov, Joerg Roedel,
	Dave Young, Baoquan He, Dave Hansen, Mark Salter,
	Boris Ostrovsky, x86, LKML, linux-doc, Greg Thelen,
	kernel-hardening

On Tue, May 10, 2016 at 11:24 AM, Kees Cook <keescook@chromium.org> wrote:
> On Tue, May 3, 2016 at 12:31 PM, Thomas Garnier <thgarnie@google.com> wrote:
>> Add a new option (CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING) to define
>> the padding used for the physical memory mapping section when KASLR
>> memory is enabled. It ensures there is enough virtual address space when
>> CONFIG_MEMORY_HOTPLUG is used. The default value is 10 terabytes. If
>> CONFIG_MEMORY_HOTPLUG is not used, no space is reserved increasing the
>> entropy available.
>>
>> Signed-off-by: Thomas Garnier <thgarnie@google.com>
>> ---
>> Based on next-20160502
>> ---
>>  arch/x86/Kconfig    | 15 +++++++++++++++
>>  arch/x86/mm/kaslr.c | 14 ++++++++++++--
>>  2 files changed, 27 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
>> index 60f33c7..5124d9c 100644
>> --- a/arch/x86/Kconfig
>> +++ b/arch/x86/Kconfig
>> @@ -2003,6 +2003,21 @@ config RANDOMIZE_MEMORY
>>
>>            If unsure, say N.
>>
>> +config RANDOMIZE_MEMORY_PHYSICAL_PADDING
>> +       hex "Physical memory mapping padding" if EXPERT
>> +       depends on RANDOMIZE_MEMORY
>> +       default "0xa" if MEMORY_HOTPLUG
>> +       default "0x0"
>> +       range 0x1 0x40 if MEMORY_HOTPLUG
>> +       range 0x0 0x40
>> +       ---help---
>> +          Define the padding in terabyte added to the existing physical memory
>> +          size during kernel memory randomization. It is useful for memory
>> +          hotplug support but reduces the entropy available for address
>> +          randomization.
>> +
>> +          If unsure, leave at the default value.
>> +
>>  config HOTPLUG_CPU
>>         bool "Support for hot-pluggable CPUs"
>>         depends on SMP
>> diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
>> index 3b330a9..ef3dc19 100644
>> --- a/arch/x86/mm/kaslr.c
>> +++ b/arch/x86/mm/kaslr.c
>> @@ -68,15 +68,25 @@ void __init kernel_randomize_memory(void)
>>  {
>>         size_t i;
>>         unsigned long addr = memory_rand_start;
>> -       unsigned long padding, rand, mem_tb;
>> +       unsigned long padding, rand, mem_tb, page_offset_padding;
>>         struct rnd_state rnd_st;
>>         unsigned long remain_padding = memory_rand_end - memory_rand_start;
>>
>>         if (!kaslr_enabled())
>>                 return;
>>
>> +       /*
>> +        * Update Physical memory mapping to available and
>> +        * add padding if needed (especially for memory hotplug support).
>> +        */
>> +       page_offset_padding = CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING;
>> +
>> +#ifdef CONFIG_MEMORY_HOTPLUG
>> +       page_offset_padding = max(1UL, page_offset_padding);
>> +#endif
>
> Can't the ifdef and max lines be dropped? The Kconfig already enforces
> the range to have a minimum of 1 when CONFIG_MEMORY_HOTPLUG is used,
> IIUC.
>

Sure, I thought I had a bug when enabling it before hotplug but it
seems to work so I can drop it.

>> +
>>         BUG_ON(kaslr_regions[0].base != &page_offset_base);
>> -       mem_tb = ((max_pfn << PAGE_SHIFT) >> TB_SHIFT);
>> +       mem_tb = ((max_pfn << PAGE_SHIFT) >> TB_SHIFT) + page_offset_padding;
>
> In fact, can't this variable be entirely dropped and the mem_tb
> calculation could just refer to RANDOMIZE_MEMORY_PHYSICAL_PADDING
> directly?
>

Yes it can, I will do it on next iteration.

> -Kees
>
>>
>>         if (mem_tb < kaslr_regions[0].size_tb)
>>                 kaslr_regions[0].size_tb = mem_tb;
>> --
>> 2.8.0.rc3.226.g39d4020
>>
>
>
>
> --
> Kees Cook
> Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [kernel-hardening] Re: [PATCH v3 4/4] x86, boot: Memory hotplug support for KASLR memory randomization
@ 2016-05-10 18:49       ` Thomas Garnier
  0 siblings, 0 replies; 22+ messages in thread
From: Thomas Garnier @ 2016-05-10 18:49 UTC (permalink / raw)
  To: Kees Cook
  Cc: H . Peter Anvin, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Andy Lutomirski, Dmitry Vyukov, Paolo Bonzini, Dan Williams,
	Stephen Smalley, Kefeng Wang, Jonathan Corbet, Matt Fleming,
	Toshi Kani, Alexander Kuleshov, Alexander Popov, Joerg Roedel,
	Dave Young, Baoquan He, Dave Hansen, Mark Salter,
	Boris Ostrovsky, x86, LKML, linux-doc, Greg Thelen,
	kernel-hardening

On Tue, May 10, 2016 at 11:24 AM, Kees Cook <keescook@chromium.org> wrote:
> On Tue, May 3, 2016 at 12:31 PM, Thomas Garnier <thgarnie@google.com> wrote:
>> Add a new option (CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING) to define
>> the padding used for the physical memory mapping section when KASLR
>> memory is enabled. It ensures there is enough virtual address space when
>> CONFIG_MEMORY_HOTPLUG is used. The default value is 10 terabytes. If
>> CONFIG_MEMORY_HOTPLUG is not used, no space is reserved increasing the
>> entropy available.
>>
>> Signed-off-by: Thomas Garnier <thgarnie@google.com>
>> ---
>> Based on next-20160502
>> ---
>>  arch/x86/Kconfig    | 15 +++++++++++++++
>>  arch/x86/mm/kaslr.c | 14 ++++++++++++--
>>  2 files changed, 27 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
>> index 60f33c7..5124d9c 100644
>> --- a/arch/x86/Kconfig
>> +++ b/arch/x86/Kconfig
>> @@ -2003,6 +2003,21 @@ config RANDOMIZE_MEMORY
>>
>>            If unsure, say N.
>>
>> +config RANDOMIZE_MEMORY_PHYSICAL_PADDING
>> +       hex "Physical memory mapping padding" if EXPERT
>> +       depends on RANDOMIZE_MEMORY
>> +       default "0xa" if MEMORY_HOTPLUG
>> +       default "0x0"
>> +       range 0x1 0x40 if MEMORY_HOTPLUG
>> +       range 0x0 0x40
>> +       ---help---
>> +          Define the padding in terabyte added to the existing physical memory
>> +          size during kernel memory randomization. It is useful for memory
>> +          hotplug support but reduces the entropy available for address
>> +          randomization.
>> +
>> +          If unsure, leave at the default value.
>> +
>>  config HOTPLUG_CPU
>>         bool "Support for hot-pluggable CPUs"
>>         depends on SMP
>> diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
>> index 3b330a9..ef3dc19 100644
>> --- a/arch/x86/mm/kaslr.c
>> +++ b/arch/x86/mm/kaslr.c
>> @@ -68,15 +68,25 @@ void __init kernel_randomize_memory(void)
>>  {
>>         size_t i;
>>         unsigned long addr = memory_rand_start;
>> -       unsigned long padding, rand, mem_tb;
>> +       unsigned long padding, rand, mem_tb, page_offset_padding;
>>         struct rnd_state rnd_st;
>>         unsigned long remain_padding = memory_rand_end - memory_rand_start;
>>
>>         if (!kaslr_enabled())
>>                 return;
>>
>> +       /*
>> +        * Update Physical memory mapping to available and
>> +        * add padding if needed (especially for memory hotplug support).
>> +        */
>> +       page_offset_padding = CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING;
>> +
>> +#ifdef CONFIG_MEMORY_HOTPLUG
>> +       page_offset_padding = max(1UL, page_offset_padding);
>> +#endif
>
> Can't the ifdef and max lines be dropped? The Kconfig already enforces
> the range to have a minimum of 1 when CONFIG_MEMORY_HOTPLUG is used,
> IIUC.
>

Sure, I thought I had a bug when enabling it before hotplug but it
seems to work so I can drop it.

>> +
>>         BUG_ON(kaslr_regions[0].base != &page_offset_base);
>> -       mem_tb = ((max_pfn << PAGE_SHIFT) >> TB_SHIFT);
>> +       mem_tb = ((max_pfn << PAGE_SHIFT) >> TB_SHIFT) + page_offset_padding;
>
> In fact, can't this variable be entirely dropped and the mem_tb
> calculation could just refer to RANDOMIZE_MEMORY_PHYSICAL_PADDING
> directly?
>

Yes it can, I will do it on next iteration.

> -Kees
>
>>
>>         if (mem_tb < kaslr_regions[0].size_tb)
>>                 kaslr_regions[0].size_tb = mem_tb;
>> --
>> 2.8.0.rc3.226.g39d4020
>>
>
>
>
> --
> Kees Cook
> Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 3/4] x86, boot: Implement ASLR for kernel memory sections (x86_64)
  2016-05-03 19:31   ` [kernel-hardening] " Thomas Garnier
@ 2016-05-10 18:53     ` Kees Cook
  -1 siblings, 0 replies; 22+ messages in thread
From: Kees Cook @ 2016-05-10 18:53 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: H . Peter Anvin, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Andy Lutomirski, Dmitry Vyukov, Paolo Bonzini, Dan Williams,
	Stephen Smalley, Kefeng Wang, Jonathan Corbet, Matt Fleming,
	Toshi Kani, Alexander Kuleshov, Alexander Popov, Joerg Roedel,
	Dave Young, Baoquan He, Dave Hansen, Mark Salter,
	Boris Ostrovsky, x86, LKML, linux-doc, Greg Thelen,
	kernel-hardening

On Tue, May 3, 2016 at 12:31 PM, Thomas Garnier <thgarnie@google.com> wrote:
> Randomizes the virtual address space of kernel memory sections (physical
> memory mapping, vmalloc & vmemmap) for x86_64. This security feature
> mitigates exploits relying on predictable kernel addresses. These
> addresses can be used to disclose the kernel modules base addresses or
> corrupt specific structures to elevate privileges bypassing the current
> implementation of KASLR. This feature can be enabled with the
> CONFIG_RANDOMIZE_MEMORY option.

I'm struggling to come up with a more accurate name for this, since
it's a base randomization of the kernel memory sections. Everything
else seems needlessly long (CONFIG_RANDOMIZE_BASE_MEMORY). I wonder if
things should be renamed generally to CONFIG_KASLR_BASE,
CONFIG_KASLR_MEMORY, etc, but that doesn't need to be part of this
series. Let's leave this as-is, and just make sure it's clear in the
Kconfig.

> The physical memory mapping holds most allocations from boot and heap
> allocators. Knowning the base address and physical memory size, an
> attacker can deduce the PDE virtual address for the vDSO memory page.
> This attack was demonstrated at CanSecWest 2016, in the "Getting
 Physical Extreme Abuse of Intel Based Paged Systems"
> https://goo.gl/ANpWdV (see second part of the presentation). The
> exploits used against Linux worked successfuly against 4.6+ but fail
> with KASLR memory enabled (https://goo.gl/iTtXMJ). Similar research
> was done at Google leading to this patch proposal. Variants exists to
> overwrite /proc or /sys objects ACLs leading to elevation of privileges.
> These variants were testeda against 4.6+.

Typo "tested".

>
> The vmalloc memory section contains the allocation made through the
> vmalloc api. The allocations are done sequentially to prevent
> fragmentation and each allocation address can easily be deduced
> especially from boot.
>
> The vmemmap section holds a representation of the physical
> memory (through a struct page array). An attacker could use this section
> to disclose the kernel memory layout (walking the page linked list).
>
> The order of each memory section is not changed. The feature looks at
> the available space for the sections based on different configuration
> options and randomizes the base and space between each. The size of the
> physical memory mapping is the available physical memory. No performance
> impact was detected while testing the feature.
>
> Entropy is generated using the KASLR early boot functions now shared in
> the lib directory (originally written by Kees Cook). Randomization is
> done on PGD & PUD page table levels to increase possible addresses. The
> physical memory mapping code was adapted to support PUD level virtual
> addresses. An additional low memory page is used to ensure each CPU can
> start with a PGD aligned virtual address (for realmode).
>
> x86/dump_pagetable was updated to correctly display each section.
>
> Updated documentation on x86_64 memory layout accordingly.
>
> Performance data:
>
> Kernbench shows almost no difference (-+ less than 1%):
>
> Before:
>
> Average Optimal load -j 12 Run (std deviation):
> Elapsed Time 102.63 (1.2695)
> User Time 1034.89 (1.18115)
> System Time 87.056 (0.456416)
> Percent CPU 1092.9 (13.892)
> Context Switches 199805 (3455.33)
> Sleeps 97907.8 (900.636)
>
> After:
>
> Average Optimal load -j 12 Run (std deviation):
> Elapsed Time 102.489 (1.10636)
> User Time 1034.86 (1.36053)
> System Time 87.764 (0.49345)
> Percent CPU 1095 (12.7715)
> Context Switches 199036 (4298.1)
> Sleeps 97681.6 (1031.11)
>
> Hackbench shows 0% difference on average (hackbench 90
> repeated 10 times):
>
> attemp,before,after
> 1,0.076,0.069
> 2,0.072,0.069
> 3,0.066,0.066
> 4,0.066,0.068
> 5,0.066,0.067
> 6,0.066,0.069
> 7,0.067,0.066
> 8,0.063,0.067
> 9,0.067,0.065
> 10,0.068,0.071
> average,0.0677,0.0677
>
> Signed-off-by: Thomas Garnier <thgarnie@google.com>
> ---
> Based on next-20160502
> ---
>  Documentation/x86/x86_64/mm.txt         |   4 +
>  arch/x86/Kconfig                        |  15 ++++
>  arch/x86/include/asm/kaslr.h            |  12 +++
>  arch/x86/include/asm/page_64_types.h    |  11 ++-
>  arch/x86/include/asm/pgtable_64.h       |   1 +
>  arch/x86/include/asm/pgtable_64_types.h |  15 +++-
>  arch/x86/kernel/head_64.S               |   2 +-
>  arch/x86/kernel/setup.c                 |   3 +
>  arch/x86/mm/Makefile                    |   1 +
>  arch/x86/mm/dump_pagetables.c           |  11 ++-
>  arch/x86/mm/init.c                      |   4 +
>  arch/x86/mm/kaslr.c                     | 136 ++++++++++++++++++++++++++++++++
>  arch/x86/realmode/init.c                |   4 +
>  13 files changed, 211 insertions(+), 8 deletions(-)
>  create mode 100644 arch/x86/mm/kaslr.c
>
> diff --git a/Documentation/x86/x86_64/mm.txt b/Documentation/x86/x86_64/mm.txt
> index 5aa7383..602a52d 100644
> --- a/Documentation/x86/x86_64/mm.txt
> +++ b/Documentation/x86/x86_64/mm.txt
> @@ -39,4 +39,8 @@ memory window (this size is arbitrary, it can be raised later if needed).
>  The mappings are not part of any other kernel PGD and are only available
>  during EFI runtime calls.
>
> +Note that if CONFIG_RANDOMIZE_MEMORY is enabled, the direct mapping of all
> +physical memory, vmalloc/ioremap space and virtual memory map are randomized.
> +Their order is preserved but their base will be changed early at boot time.

Maybe instead of "changed", say "offset"?

> +
>  -Andi Kleen, Jul 2004
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 0b128b4..60f33c7 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -1988,6 +1988,21 @@ config PHYSICAL_ALIGN
>
>           Don't change this unless you know what you are doing.
>
> +config RANDOMIZE_MEMORY
> +       bool "Randomize the kernel memory sections"
> +       depends on X86_64
> +       depends on RANDOMIZE_BASE

Does this actually _depend_ on RANDOMIZE_BASE? It needs the
lib/kaslr.c code, but this could operate without the kernel base
address having been randomized, correct?

> +       default n

As such, maybe the default should be:

    default RANDOMIZE_BASE

> +       ---help---
> +          Randomizes the virtual address of memory sections (physical memory

How about: Randomizes the base virtual address of kernel memory sections ...

> +          mapping, vmalloc & vmemmap). This security feature mitigates exploits
> +          relying on predictable memory locations.

And "This security feature makes exploits relying on predictable
memory locations less reliable." ?

> +
> +          Base and padding between memory section is randomized. Their order is
> +          not. Entropy is generated in the same way as RANDOMIZE_BASE.

Since base would be mentioned above and padding is separate, I'd change this to:

The order of allocations remains unchanged. Entropy is generated ...

> +
> +          If unsure, say N.
> +
>  config HOTPLUG_CPU
>         bool "Support for hot-pluggable CPUs"
>         depends on SMP
> diff --git a/arch/x86/include/asm/kaslr.h b/arch/x86/include/asm/kaslr.h
> index 2ae1429..12c7742 100644
> --- a/arch/x86/include/asm/kaslr.h
> +++ b/arch/x86/include/asm/kaslr.h
> @@ -3,4 +3,16 @@
>
>  unsigned long kaslr_get_random_boot_long(void);
>
> +#ifdef CONFIG_RANDOMIZE_MEMORY
> +extern unsigned long page_offset_base;
> +extern unsigned long vmalloc_base;
> +extern unsigned long vmemmap_base;
> +
> +void kernel_randomize_memory(void);
> +void kaslr_trampoline_init(void);
> +#else
> +static inline void kernel_randomize_memory(void) { }
> +static inline void kaslr_trampoline_init(void) { }
> +#endif /* CONFIG_RANDOMIZE_MEMORY */
> +
>  #endif
> diff --git a/arch/x86/include/asm/page_64_types.h b/arch/x86/include/asm/page_64_types.h
> index d5c2f8b..9215e05 100644
> --- a/arch/x86/include/asm/page_64_types.h
> +++ b/arch/x86/include/asm/page_64_types.h
> @@ -1,6 +1,10 @@
>  #ifndef _ASM_X86_PAGE_64_DEFS_H
>  #define _ASM_X86_PAGE_64_DEFS_H
>
> +#ifndef __ASSEMBLY__
> +#include <asm/kaslr.h>
> +#endif
> +
>  #ifdef CONFIG_KASAN
>  #define KASAN_STACK_ORDER 1
>  #else
> @@ -32,7 +36,12 @@
>   * hypervisor to fit.  Choosing 16 slots here is arbitrary, but it's
>   * what Xen requires.
>   */
> -#define __PAGE_OFFSET           _AC(0xffff880000000000, UL)
> +#define __PAGE_OFFSET_BASE      _AC(0xffff880000000000, UL)
> +#ifdef CONFIG_RANDOMIZE_MEMORY
> +#define __PAGE_OFFSET           page_offset_base
> +#else
> +#define __PAGE_OFFSET           __PAGE_OFFSET_BASE
> +#endif /* CONFIG_RANDOMIZE_MEMORY */
>
>  #define __START_KERNEL_map     _AC(0xffffffff80000000, UL)
>
> diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h
> index 2ee7811..0dfec89 100644
> --- a/arch/x86/include/asm/pgtable_64.h
> +++ b/arch/x86/include/asm/pgtable_64.h
> @@ -21,6 +21,7 @@ extern pmd_t level2_fixmap_pgt[512];
>  extern pmd_t level2_ident_pgt[512];
>  extern pte_t level1_fixmap_pgt[512];
>  extern pgd_t init_level4_pgt[];
> +extern pgd_t trampoline_pgd_entry;
>
>  #define swapper_pg_dir init_level4_pgt
>
> diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h
> index e6844df..d388739 100644
> --- a/arch/x86/include/asm/pgtable_64_types.h
> +++ b/arch/x86/include/asm/pgtable_64_types.h
> @@ -5,6 +5,7 @@
>
>  #ifndef __ASSEMBLY__
>  #include <linux/types.h>
> +#include <asm/kaslr.h>
>
>  /*
>   * These are used to make use of C type-checking..
> @@ -54,9 +55,17 @@ typedef struct { pteval_t pte; } pte_t;
>
>  /* See Documentation/x86/x86_64/mm.txt for a description of the memory map. */
>  #define MAXMEM          _AC(__AC(1, UL) << MAX_PHYSMEM_BITS, UL)
> -#define VMALLOC_START    _AC(0xffffc90000000000, UL)
> -#define VMALLOC_END      _AC(0xffffe8ffffffffff, UL)
> -#define VMEMMAP_START   _AC(0xffffea0000000000, UL)
> +#define VMALLOC_SIZE_TB         _AC(32, UL)
> +#define __VMALLOC_BASE  _AC(0xffffc90000000000, UL)
> +#define __VMEMMAP_BASE  _AC(0xffffea0000000000, UL)
> +#ifdef CONFIG_RANDOMIZE_MEMORY
> +#define VMALLOC_START   vmalloc_base
> +#define VMEMMAP_START   vmemmap_base
> +#else
> +#define VMALLOC_START   __VMALLOC_BASE
> +#define VMEMMAP_START   __VMEMMAP_BASE
> +#endif /* CONFIG_RANDOMIZE_MEMORY */
> +#define VMALLOC_END      (VMALLOC_START + _AC((VMALLOC_SIZE_TB << 40) - 1, UL))
>  #define MODULES_VADDR    (__START_KERNEL_map + KERNEL_IMAGE_SIZE)
>  #define MODULES_END      _AC(0xffffffffff000000, UL)
>  #define MODULES_LEN   (MODULES_END - MODULES_VADDR)
> diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
> index 5df831e..03a2aa0 100644
> --- a/arch/x86/kernel/head_64.S
> +++ b/arch/x86/kernel/head_64.S
> @@ -38,7 +38,7 @@
>
>  #define pud_index(x)   (((x) >> PUD_SHIFT) & (PTRS_PER_PUD-1))
>
> -L4_PAGE_OFFSET = pgd_index(__PAGE_OFFSET)
> +L4_PAGE_OFFSET = pgd_index(__PAGE_OFFSET_BASE)
>  L4_START_KERNEL = pgd_index(__START_KERNEL_map)
>  L3_START_KERNEL = pud_index(__START_KERNEL_map)
>
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index c4e7b39..a261658 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -113,6 +113,7 @@
>  #include <asm/prom.h>
>  #include <asm/microcode.h>
>  #include <asm/mmu_context.h>
> +#include <asm/kaslr.h>
>
>  /*
>   * max_low_pfn_mapped: highest direct mapped pfn under 4GB
> @@ -942,6 +943,8 @@ void __init setup_arch(char **cmdline_p)
>
>         x86_init.oem.arch_setup();
>
> +       kernel_randomize_memory();
> +
>         iomem_resource.end = (1ULL << boot_cpu_data.x86_phys_bits) - 1;
>         setup_memory_map();
>         parse_setup_data();
> diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
> index 62c0043..96d2b84 100644
> --- a/arch/x86/mm/Makefile
> +++ b/arch/x86/mm/Makefile
> @@ -37,4 +37,5 @@ obj-$(CONFIG_NUMA_EMU)                += numa_emulation.o
>
>  obj-$(CONFIG_X86_INTEL_MPX)    += mpx.o
>  obj-$(CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) += pkeys.o
> +obj-$(CONFIG_RANDOMIZE_MEMORY) += kaslr.o
>
> diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c
> index 99bfb19..4a03f60 100644
> --- a/arch/x86/mm/dump_pagetables.c
> +++ b/arch/x86/mm/dump_pagetables.c
> @@ -72,9 +72,9 @@ static struct addr_marker address_markers[] = {
>         { 0, "User Space" },
>  #ifdef CONFIG_X86_64
>         { 0x8000000000000000UL, "Kernel Space" },
> -       { PAGE_OFFSET,          "Low Kernel Mapping" },
> -       { VMALLOC_START,        "vmalloc() Area" },
> -       { VMEMMAP_START,        "Vmemmap" },
> +       { 0/* PAGE_OFFSET */,   "Low Kernel Mapping" },
> +       { 0/* VMALLOC_START */, "vmalloc() Area" },
> +       { 0/* VMEMMAP_START */, "Vmemmap" },
>  # ifdef CONFIG_X86_ESPFIX64
>         { ESPFIX_BASE_ADDR,     "ESPfix Area", 16 },
>  # endif
> @@ -434,6 +434,11 @@ void ptdump_walk_pgd_level_checkwx(void)
>
>  static int __init pt_dump_init(void)
>  {
> +#ifdef CONFIG_X86_64
> +       address_markers[LOW_KERNEL_NR].start_address = PAGE_OFFSET;
> +       address_markers[VMALLOC_START_NR].start_address = VMALLOC_START;
> +       address_markers[VMEMMAP_START_NR].start_address = VMEMMAP_START;
> +#endif
>  #ifdef CONFIG_X86_32
>         /* Not a compile-time constant on x86-32 */

I'd move this comment above you new ifdef and generalize it to something like:

    /* Various markers are not compile-time constants, so assign them here. */

>         address_markers[VMALLOC_START_NR].start_address = VMALLOC_START;
> diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
> index 372aad2..e490624 100644
> --- a/arch/x86/mm/init.c
> +++ b/arch/x86/mm/init.c
> @@ -17,6 +17,7 @@
>  #include <asm/proto.h>
>  #include <asm/dma.h>           /* for MAX_DMA_PFN */
>  #include <asm/microcode.h>
> +#include <asm/kaslr.h>
>
>  /*
>   * We need to define the tracepoints somewhere, and tlb.c
> @@ -590,6 +591,9 @@ void __init init_mem_mapping(void)
>         /* the ISA range is always mapped regardless of memory holes */
>         init_memory_mapping(0, ISA_END_ADDRESS);
>
> +       /* Init the trampoline page table if needed for KASLR memory */
> +       kaslr_trampoline_init();
> +
>         /*
>          * If the allocation is in bottom-up direction, we setup direct mapping
>          * in bottom-up, otherwise we setup direct mapping in top-down.
> diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
> new file mode 100644
> index 0000000..3b330a9
> --- /dev/null
> +++ b/arch/x86/mm/kaslr.c
> @@ -0,0 +1,136 @@
> +#include <linux/kernel.h>
> +#include <linux/errno.h>
> +#include <linux/types.h>
> +#include <linux/mm.h>
> +#include <linux/smp.h>
> +#include <linux/init.h>
> +#include <linux/memory.h>
> +#include <linux/random.h>
> +
> +#include <asm/processor.h>
> +#include <asm/pgtable.h>
> +#include <asm/pgalloc.h>
> +#include <asm/e820.h>
> +#include <asm/init.h>
> +#include <asm/setup.h>
> +#include <asm/kaslr.h>
> +#include <asm/kasan.h>
> +
> +#include "mm_internal.h"
> +
> +/* Hold the pgd entry used on booting additional CPUs */
> +pgd_t trampoline_pgd_entry;
> +
> +#define TB_SHIFT 40
> +
> +/*
> + * Memory base and end randomization is based on different configurations.
> + * We want as much space as possible to increase entropy available.
> + */
> +static const unsigned long memory_rand_start = __PAGE_OFFSET_BASE;
> +
> +#if defined(CONFIG_KASAN)
> +static const unsigned long memory_rand_end = KASAN_SHADOW_START;
> +#elif defined(CONFIG_X86_ESPFIX64)
> +static const unsigned long memory_rand_end = ESPFIX_BASE_ADDR;
> +#elif defined(CONFIG_EFI)
> +static const unsigned long memory_rand_end = EFI_VA_START;
> +#else
> +static const unsigned long memory_rand_end = __START_KERNEL_map;
> +#endif

Is it worth adding BUILD_BUG_ON()s to verify these values stay in
decreasing size?

> +
> +/* Default values */
> +unsigned long page_offset_base = __PAGE_OFFSET_BASE;
> +EXPORT_SYMBOL(page_offset_base);
> +unsigned long vmalloc_base = __VMALLOC_BASE;
> +EXPORT_SYMBOL(vmalloc_base);
> +unsigned long vmemmap_base = __VMEMMAP_BASE;
> +EXPORT_SYMBOL(vmemmap_base);
> +
> +/* Describe each randomized memory sections in sequential order */
> +static struct kaslr_memory_region {
> +       unsigned long *base;
> +       unsigned short size_tb;
> +} kaslr_regions[] = {
> +       { &page_offset_base, 64/* Maximum */ },
> +       { &vmalloc_base, VMALLOC_SIZE_TB },
> +       { &vmemmap_base, 1 },
> +};

This seems to be __init_data, since it's only used in kernel_randomize_memory()?

> +
> +/* Size in Terabytes + 1 hole */
> +static inline unsigned long get_padding(struct kaslr_memory_region *region)

I think this can be marked __init also?

> +{
> +       return ((unsigned long)region->size_tb + 1) << TB_SHIFT;
> +}
> +
> +/* Initialize base and padding for each memory section randomized with KASLR */
> +void __init kernel_randomize_memory(void)
> +{
> +       size_t i;
> +       unsigned long addr = memory_rand_start;
> +       unsigned long padding, rand, mem_tb;
> +       struct rnd_state rnd_st;
> +       unsigned long remain_padding = memory_rand_end - memory_rand_start;
> +
> +       if (!kaslr_enabled())
> +               return;
> +
> +       BUG_ON(kaslr_regions[0].base != &page_offset_base);

This is statically assigned above, is this BUG_ON useful?

> +       mem_tb = ((max_pfn << PAGE_SHIFT) >> TB_SHIFT);
> +
> +       if (mem_tb < kaslr_regions[0].size_tb)
> +               kaslr_regions[0].size_tb = mem_tb;

Can you add a comment for this? IIUC, this is just recalculating the
max memory size available for padding based on the page shift? Under
what situations would this be changing?

> +
> +       for (i = 0; i < ARRAY_SIZE(kaslr_regions); i++)
> +               remain_padding -= get_padding(&kaslr_regions[i]);
> +
> +       prandom_seed_state(&rnd_st, kaslr_get_random_boot_long());
> +
> +       /* Position each section randomly with minimum 1 terabyte between */
> +       for (i = 0; i < ARRAY_SIZE(kaslr_regions); i++) {
> +               padding = remain_padding / (ARRAY_SIZE(kaslr_regions) - i);
> +               prandom_bytes_state(&rnd_st, &rand, sizeof(rand));
> +               padding = (rand % (padding + 1)) & PUD_MASK;
> +               addr += padding;
> +               *kaslr_regions[i].base = addr;
> +               addr += get_padding(&kaslr_regions[i]);
> +               remain_padding -= padding;
> +       }

What happens if we run out of padding here, and doesn't this loop mean
earlier regions will have, on average, more padding? Should each
instead randomize within a one-time calculation of remaining_padding /
ARRAY_SIZE(kaslr_regions) ?

Also, to get added to the Kconfig, what is the available entropy here?
How far can each of the base addresses get offset?

> +}
> +
> +/*
> + * Create PGD aligned trampoline table to allow real mode initialization
> + * of additional CPUs. Consume only 1 additonal low memory page.

Typo "additional".

> + */
> +void __meminit kaslr_trampoline_init(void)
> +{
> +       unsigned long addr, next;
> +       pgd_t *pgd;
> +       pud_t *pud_page, *tr_pud_page;
> +       int i;
> +
> +       /* If KASLR is disabled, default to the existing page table entry */
> +       if (!kaslr_enabled()) {
> +               trampoline_pgd_entry = init_level4_pgt[pgd_index(PAGE_OFFSET)];
> +               return;
> +       }
> +
> +       tr_pud_page = alloc_low_page();
> +       set_pgd(&trampoline_pgd_entry, __pgd(_PAGE_TABLE | __pa(tr_pud_page)));
> +
> +       addr = 0;
> +       pgd = pgd_offset_k((unsigned long)__va(addr));
> +       pud_page = (pud_t *) pgd_page_vaddr(*pgd);
> +
> +       for (i = pud_index(addr); i < PTRS_PER_PUD; i++, addr = next) {
> +               pud_t *pud, *tr_pud;
> +
> +               tr_pud = tr_pud_page + pud_index(addr);
> +               pud = pud_page + pud_index((unsigned long)__va(addr));
> +               next = (addr & PUD_MASK) + PUD_SIZE;
> +
> +               /* Needed to copy pte or pud alike */
> +               BUILD_BUG_ON(sizeof(pud_t) != sizeof(pte_t));
> +               *tr_pud = *pud;
> +       }
> +}
> diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
> index 0b7a63d..6518314 100644
> --- a/arch/x86/realmode/init.c
> +++ b/arch/x86/realmode/init.c
> @@ -84,7 +84,11 @@ void __init setup_real_mode(void)
>         *trampoline_cr4_features = __read_cr4();
>
>         trampoline_pgd = (u64 *) __va(real_mode_header->trampoline_pgd);
> +#ifdef CONFIG_RANDOMIZE_MEMORY
> +       trampoline_pgd[0] = trampoline_pgd_entry.pgd;
> +#else
>         trampoline_pgd[0] = init_level4_pgt[pgd_index(__PAGE_OFFSET)].pgd;
> +#endif

To avoid this ifdefs, could trampoline_pgd_entry instead be defined
outside of mm/kaslr.c and have .pgd assigned as
init_level4_pgt[pgd_index(__PAGE_OFFSET)].pgd via a static inline of
kaslr_trampoline_init() instead?

>         trampoline_pgd[511] = init_level4_pgt[511].pgd;
>  #endif
>  }
> --
> 2.8.0.rc3.226.g39d4020
>

-Kees

-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [kernel-hardening] Re: [PATCH v3 3/4] x86, boot: Implement ASLR for kernel memory sections (x86_64)
@ 2016-05-10 18:53     ` Kees Cook
  0 siblings, 0 replies; 22+ messages in thread
From: Kees Cook @ 2016-05-10 18:53 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: H . Peter Anvin, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Andy Lutomirski, Dmitry Vyukov, Paolo Bonzini, Dan Williams,
	Stephen Smalley, Kefeng Wang, Jonathan Corbet, Matt Fleming,
	Toshi Kani, Alexander Kuleshov, Alexander Popov, Joerg Roedel,
	Dave Young, Baoquan He, Dave Hansen, Mark Salter,
	Boris Ostrovsky, x86, LKML, linux-doc, Greg Thelen,
	kernel-hardening

On Tue, May 3, 2016 at 12:31 PM, Thomas Garnier <thgarnie@google.com> wrote:
> Randomizes the virtual address space of kernel memory sections (physical
> memory mapping, vmalloc & vmemmap) for x86_64. This security feature
> mitigates exploits relying on predictable kernel addresses. These
> addresses can be used to disclose the kernel modules base addresses or
> corrupt specific structures to elevate privileges bypassing the current
> implementation of KASLR. This feature can be enabled with the
> CONFIG_RANDOMIZE_MEMORY option.

I'm struggling to come up with a more accurate name for this, since
it's a base randomization of the kernel memory sections. Everything
else seems needlessly long (CONFIG_RANDOMIZE_BASE_MEMORY). I wonder if
things should be renamed generally to CONFIG_KASLR_BASE,
CONFIG_KASLR_MEMORY, etc, but that doesn't need to be part of this
series. Let's leave this as-is, and just make sure it's clear in the
Kconfig.

> The physical memory mapping holds most allocations from boot and heap
> allocators. Knowning the base address and physical memory size, an
> attacker can deduce the PDE virtual address for the vDSO memory page.
> This attack was demonstrated at CanSecWest 2016, in the "Getting
 Physical Extreme Abuse of Intel Based Paged Systems"
> https://goo.gl/ANpWdV (see second part of the presentation). The
> exploits used against Linux worked successfuly against 4.6+ but fail
> with KASLR memory enabled (https://goo.gl/iTtXMJ). Similar research
> was done at Google leading to this patch proposal. Variants exists to
> overwrite /proc or /sys objects ACLs leading to elevation of privileges.
> These variants were testeda against 4.6+.

Typo "tested".

>
> The vmalloc memory section contains the allocation made through the
> vmalloc api. The allocations are done sequentially to prevent
> fragmentation and each allocation address can easily be deduced
> especially from boot.
>
> The vmemmap section holds a representation of the physical
> memory (through a struct page array). An attacker could use this section
> to disclose the kernel memory layout (walking the page linked list).
>
> The order of each memory section is not changed. The feature looks at
> the available space for the sections based on different configuration
> options and randomizes the base and space between each. The size of the
> physical memory mapping is the available physical memory. No performance
> impact was detected while testing the feature.
>
> Entropy is generated using the KASLR early boot functions now shared in
> the lib directory (originally written by Kees Cook). Randomization is
> done on PGD & PUD page table levels to increase possible addresses. The
> physical memory mapping code was adapted to support PUD level virtual
> addresses. An additional low memory page is used to ensure each CPU can
> start with a PGD aligned virtual address (for realmode).
>
> x86/dump_pagetable was updated to correctly display each section.
>
> Updated documentation on x86_64 memory layout accordingly.
>
> Performance data:
>
> Kernbench shows almost no difference (-+ less than 1%):
>
> Before:
>
> Average Optimal load -j 12 Run (std deviation):
> Elapsed Time 102.63 (1.2695)
> User Time 1034.89 (1.18115)
> System Time 87.056 (0.456416)
> Percent CPU 1092.9 (13.892)
> Context Switches 199805 (3455.33)
> Sleeps 97907.8 (900.636)
>
> After:
>
> Average Optimal load -j 12 Run (std deviation):
> Elapsed Time 102.489 (1.10636)
> User Time 1034.86 (1.36053)
> System Time 87.764 (0.49345)
> Percent CPU 1095 (12.7715)
> Context Switches 199036 (4298.1)
> Sleeps 97681.6 (1031.11)
>
> Hackbench shows 0% difference on average (hackbench 90
> repeated 10 times):
>
> attemp,before,after
> 1,0.076,0.069
> 2,0.072,0.069
> 3,0.066,0.066
> 4,0.066,0.068
> 5,0.066,0.067
> 6,0.066,0.069
> 7,0.067,0.066
> 8,0.063,0.067
> 9,0.067,0.065
> 10,0.068,0.071
> average,0.0677,0.0677
>
> Signed-off-by: Thomas Garnier <thgarnie@google.com>
> ---
> Based on next-20160502
> ---
>  Documentation/x86/x86_64/mm.txt         |   4 +
>  arch/x86/Kconfig                        |  15 ++++
>  arch/x86/include/asm/kaslr.h            |  12 +++
>  arch/x86/include/asm/page_64_types.h    |  11 ++-
>  arch/x86/include/asm/pgtable_64.h       |   1 +
>  arch/x86/include/asm/pgtable_64_types.h |  15 +++-
>  arch/x86/kernel/head_64.S               |   2 +-
>  arch/x86/kernel/setup.c                 |   3 +
>  arch/x86/mm/Makefile                    |   1 +
>  arch/x86/mm/dump_pagetables.c           |  11 ++-
>  arch/x86/mm/init.c                      |   4 +
>  arch/x86/mm/kaslr.c                     | 136 ++++++++++++++++++++++++++++++++
>  arch/x86/realmode/init.c                |   4 +
>  13 files changed, 211 insertions(+), 8 deletions(-)
>  create mode 100644 arch/x86/mm/kaslr.c
>
> diff --git a/Documentation/x86/x86_64/mm.txt b/Documentation/x86/x86_64/mm.txt
> index 5aa7383..602a52d 100644
> --- a/Documentation/x86/x86_64/mm.txt
> +++ b/Documentation/x86/x86_64/mm.txt
> @@ -39,4 +39,8 @@ memory window (this size is arbitrary, it can be raised later if needed).
>  The mappings are not part of any other kernel PGD and are only available
>  during EFI runtime calls.
>
> +Note that if CONFIG_RANDOMIZE_MEMORY is enabled, the direct mapping of all
> +physical memory, vmalloc/ioremap space and virtual memory map are randomized.
> +Their order is preserved but their base will be changed early at boot time.

Maybe instead of "changed", say "offset"?

> +
>  -Andi Kleen, Jul 2004
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 0b128b4..60f33c7 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -1988,6 +1988,21 @@ config PHYSICAL_ALIGN
>
>           Don't change this unless you know what you are doing.
>
> +config RANDOMIZE_MEMORY
> +       bool "Randomize the kernel memory sections"
> +       depends on X86_64
> +       depends on RANDOMIZE_BASE

Does this actually _depend_ on RANDOMIZE_BASE? It needs the
lib/kaslr.c code, but this could operate without the kernel base
address having been randomized, correct?

> +       default n

As such, maybe the default should be:

    default RANDOMIZE_BASE

> +       ---help---
> +          Randomizes the virtual address of memory sections (physical memory

How about: Randomizes the base virtual address of kernel memory sections ...

> +          mapping, vmalloc & vmemmap). This security feature mitigates exploits
> +          relying on predictable memory locations.

And "This security feature makes exploits relying on predictable
memory locations less reliable." ?

> +
> +          Base and padding between memory section is randomized. Their order is
> +          not. Entropy is generated in the same way as RANDOMIZE_BASE.

Since base would be mentioned above and padding is separate, I'd change this to:

The order of allocations remains unchanged. Entropy is generated ...

> +
> +          If unsure, say N.
> +
>  config HOTPLUG_CPU
>         bool "Support for hot-pluggable CPUs"
>         depends on SMP
> diff --git a/arch/x86/include/asm/kaslr.h b/arch/x86/include/asm/kaslr.h
> index 2ae1429..12c7742 100644
> --- a/arch/x86/include/asm/kaslr.h
> +++ b/arch/x86/include/asm/kaslr.h
> @@ -3,4 +3,16 @@
>
>  unsigned long kaslr_get_random_boot_long(void);
>
> +#ifdef CONFIG_RANDOMIZE_MEMORY
> +extern unsigned long page_offset_base;
> +extern unsigned long vmalloc_base;
> +extern unsigned long vmemmap_base;
> +
> +void kernel_randomize_memory(void);
> +void kaslr_trampoline_init(void);
> +#else
> +static inline void kernel_randomize_memory(void) { }
> +static inline void kaslr_trampoline_init(void) { }
> +#endif /* CONFIG_RANDOMIZE_MEMORY */
> +
>  #endif
> diff --git a/arch/x86/include/asm/page_64_types.h b/arch/x86/include/asm/page_64_types.h
> index d5c2f8b..9215e05 100644
> --- a/arch/x86/include/asm/page_64_types.h
> +++ b/arch/x86/include/asm/page_64_types.h
> @@ -1,6 +1,10 @@
>  #ifndef _ASM_X86_PAGE_64_DEFS_H
>  #define _ASM_X86_PAGE_64_DEFS_H
>
> +#ifndef __ASSEMBLY__
> +#include <asm/kaslr.h>
> +#endif
> +
>  #ifdef CONFIG_KASAN
>  #define KASAN_STACK_ORDER 1
>  #else
> @@ -32,7 +36,12 @@
>   * hypervisor to fit.  Choosing 16 slots here is arbitrary, but it's
>   * what Xen requires.
>   */
> -#define __PAGE_OFFSET           _AC(0xffff880000000000, UL)
> +#define __PAGE_OFFSET_BASE      _AC(0xffff880000000000, UL)
> +#ifdef CONFIG_RANDOMIZE_MEMORY
> +#define __PAGE_OFFSET           page_offset_base
> +#else
> +#define __PAGE_OFFSET           __PAGE_OFFSET_BASE
> +#endif /* CONFIG_RANDOMIZE_MEMORY */
>
>  #define __START_KERNEL_map     _AC(0xffffffff80000000, UL)
>
> diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h
> index 2ee7811..0dfec89 100644
> --- a/arch/x86/include/asm/pgtable_64.h
> +++ b/arch/x86/include/asm/pgtable_64.h
> @@ -21,6 +21,7 @@ extern pmd_t level2_fixmap_pgt[512];
>  extern pmd_t level2_ident_pgt[512];
>  extern pte_t level1_fixmap_pgt[512];
>  extern pgd_t init_level4_pgt[];
> +extern pgd_t trampoline_pgd_entry;
>
>  #define swapper_pg_dir init_level4_pgt
>
> diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h
> index e6844df..d388739 100644
> --- a/arch/x86/include/asm/pgtable_64_types.h
> +++ b/arch/x86/include/asm/pgtable_64_types.h
> @@ -5,6 +5,7 @@
>
>  #ifndef __ASSEMBLY__
>  #include <linux/types.h>
> +#include <asm/kaslr.h>
>
>  /*
>   * These are used to make use of C type-checking..
> @@ -54,9 +55,17 @@ typedef struct { pteval_t pte; } pte_t;
>
>  /* See Documentation/x86/x86_64/mm.txt for a description of the memory map. */
>  #define MAXMEM          _AC(__AC(1, UL) << MAX_PHYSMEM_BITS, UL)
> -#define VMALLOC_START    _AC(0xffffc90000000000, UL)
> -#define VMALLOC_END      _AC(0xffffe8ffffffffff, UL)
> -#define VMEMMAP_START   _AC(0xffffea0000000000, UL)
> +#define VMALLOC_SIZE_TB         _AC(32, UL)
> +#define __VMALLOC_BASE  _AC(0xffffc90000000000, UL)
> +#define __VMEMMAP_BASE  _AC(0xffffea0000000000, UL)
> +#ifdef CONFIG_RANDOMIZE_MEMORY
> +#define VMALLOC_START   vmalloc_base
> +#define VMEMMAP_START   vmemmap_base
> +#else
> +#define VMALLOC_START   __VMALLOC_BASE
> +#define VMEMMAP_START   __VMEMMAP_BASE
> +#endif /* CONFIG_RANDOMIZE_MEMORY */
> +#define VMALLOC_END      (VMALLOC_START + _AC((VMALLOC_SIZE_TB << 40) - 1, UL))
>  #define MODULES_VADDR    (__START_KERNEL_map + KERNEL_IMAGE_SIZE)
>  #define MODULES_END      _AC(0xffffffffff000000, UL)
>  #define MODULES_LEN   (MODULES_END - MODULES_VADDR)
> diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
> index 5df831e..03a2aa0 100644
> --- a/arch/x86/kernel/head_64.S
> +++ b/arch/x86/kernel/head_64.S
> @@ -38,7 +38,7 @@
>
>  #define pud_index(x)   (((x) >> PUD_SHIFT) & (PTRS_PER_PUD-1))
>
> -L4_PAGE_OFFSET = pgd_index(__PAGE_OFFSET)
> +L4_PAGE_OFFSET = pgd_index(__PAGE_OFFSET_BASE)
>  L4_START_KERNEL = pgd_index(__START_KERNEL_map)
>  L3_START_KERNEL = pud_index(__START_KERNEL_map)
>
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index c4e7b39..a261658 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -113,6 +113,7 @@
>  #include <asm/prom.h>
>  #include <asm/microcode.h>
>  #include <asm/mmu_context.h>
> +#include <asm/kaslr.h>
>
>  /*
>   * max_low_pfn_mapped: highest direct mapped pfn under 4GB
> @@ -942,6 +943,8 @@ void __init setup_arch(char **cmdline_p)
>
>         x86_init.oem.arch_setup();
>
> +       kernel_randomize_memory();
> +
>         iomem_resource.end = (1ULL << boot_cpu_data.x86_phys_bits) - 1;
>         setup_memory_map();
>         parse_setup_data();
> diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
> index 62c0043..96d2b84 100644
> --- a/arch/x86/mm/Makefile
> +++ b/arch/x86/mm/Makefile
> @@ -37,4 +37,5 @@ obj-$(CONFIG_NUMA_EMU)                += numa_emulation.o
>
>  obj-$(CONFIG_X86_INTEL_MPX)    += mpx.o
>  obj-$(CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) += pkeys.o
> +obj-$(CONFIG_RANDOMIZE_MEMORY) += kaslr.o
>
> diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c
> index 99bfb19..4a03f60 100644
> --- a/arch/x86/mm/dump_pagetables.c
> +++ b/arch/x86/mm/dump_pagetables.c
> @@ -72,9 +72,9 @@ static struct addr_marker address_markers[] = {
>         { 0, "User Space" },
>  #ifdef CONFIG_X86_64
>         { 0x8000000000000000UL, "Kernel Space" },
> -       { PAGE_OFFSET,          "Low Kernel Mapping" },
> -       { VMALLOC_START,        "vmalloc() Area" },
> -       { VMEMMAP_START,        "Vmemmap" },
> +       { 0/* PAGE_OFFSET */,   "Low Kernel Mapping" },
> +       { 0/* VMALLOC_START */, "vmalloc() Area" },
> +       { 0/* VMEMMAP_START */, "Vmemmap" },
>  # ifdef CONFIG_X86_ESPFIX64
>         { ESPFIX_BASE_ADDR,     "ESPfix Area", 16 },
>  # endif
> @@ -434,6 +434,11 @@ void ptdump_walk_pgd_level_checkwx(void)
>
>  static int __init pt_dump_init(void)
>  {
> +#ifdef CONFIG_X86_64
> +       address_markers[LOW_KERNEL_NR].start_address = PAGE_OFFSET;
> +       address_markers[VMALLOC_START_NR].start_address = VMALLOC_START;
> +       address_markers[VMEMMAP_START_NR].start_address = VMEMMAP_START;
> +#endif
>  #ifdef CONFIG_X86_32
>         /* Not a compile-time constant on x86-32 */

I'd move this comment above you new ifdef and generalize it to something like:

    /* Various markers are not compile-time constants, so assign them here. */

>         address_markers[VMALLOC_START_NR].start_address = VMALLOC_START;
> diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
> index 372aad2..e490624 100644
> --- a/arch/x86/mm/init.c
> +++ b/arch/x86/mm/init.c
> @@ -17,6 +17,7 @@
>  #include <asm/proto.h>
>  #include <asm/dma.h>           /* for MAX_DMA_PFN */
>  #include <asm/microcode.h>
> +#include <asm/kaslr.h>
>
>  /*
>   * We need to define the tracepoints somewhere, and tlb.c
> @@ -590,6 +591,9 @@ void __init init_mem_mapping(void)
>         /* the ISA range is always mapped regardless of memory holes */
>         init_memory_mapping(0, ISA_END_ADDRESS);
>
> +       /* Init the trampoline page table if needed for KASLR memory */
> +       kaslr_trampoline_init();
> +
>         /*
>          * If the allocation is in bottom-up direction, we setup direct mapping
>          * in bottom-up, otherwise we setup direct mapping in top-down.
> diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
> new file mode 100644
> index 0000000..3b330a9
> --- /dev/null
> +++ b/arch/x86/mm/kaslr.c
> @@ -0,0 +1,136 @@
> +#include <linux/kernel.h>
> +#include <linux/errno.h>
> +#include <linux/types.h>
> +#include <linux/mm.h>
> +#include <linux/smp.h>
> +#include <linux/init.h>
> +#include <linux/memory.h>
> +#include <linux/random.h>
> +
> +#include <asm/processor.h>
> +#include <asm/pgtable.h>
> +#include <asm/pgalloc.h>
> +#include <asm/e820.h>
> +#include <asm/init.h>
> +#include <asm/setup.h>
> +#include <asm/kaslr.h>
> +#include <asm/kasan.h>
> +
> +#include "mm_internal.h"
> +
> +/* Hold the pgd entry used on booting additional CPUs */
> +pgd_t trampoline_pgd_entry;
> +
> +#define TB_SHIFT 40
> +
> +/*
> + * Memory base and end randomization is based on different configurations.
> + * We want as much space as possible to increase entropy available.
> + */
> +static const unsigned long memory_rand_start = __PAGE_OFFSET_BASE;
> +
> +#if defined(CONFIG_KASAN)
> +static const unsigned long memory_rand_end = KASAN_SHADOW_START;
> +#elif defined(CONFIG_X86_ESPFIX64)
> +static const unsigned long memory_rand_end = ESPFIX_BASE_ADDR;
> +#elif defined(CONFIG_EFI)
> +static const unsigned long memory_rand_end = EFI_VA_START;
> +#else
> +static const unsigned long memory_rand_end = __START_KERNEL_map;
> +#endif

Is it worth adding BUILD_BUG_ON()s to verify these values stay in
decreasing size?

> +
> +/* Default values */
> +unsigned long page_offset_base = __PAGE_OFFSET_BASE;
> +EXPORT_SYMBOL(page_offset_base);
> +unsigned long vmalloc_base = __VMALLOC_BASE;
> +EXPORT_SYMBOL(vmalloc_base);
> +unsigned long vmemmap_base = __VMEMMAP_BASE;
> +EXPORT_SYMBOL(vmemmap_base);
> +
> +/* Describe each randomized memory sections in sequential order */
> +static struct kaslr_memory_region {
> +       unsigned long *base;
> +       unsigned short size_tb;
> +} kaslr_regions[] = {
> +       { &page_offset_base, 64/* Maximum */ },
> +       { &vmalloc_base, VMALLOC_SIZE_TB },
> +       { &vmemmap_base, 1 },
> +};

This seems to be __init_data, since it's only used in kernel_randomize_memory()?

> +
> +/* Size in Terabytes + 1 hole */
> +static inline unsigned long get_padding(struct kaslr_memory_region *region)

I think this can be marked __init also?

> +{
> +       return ((unsigned long)region->size_tb + 1) << TB_SHIFT;
> +}
> +
> +/* Initialize base and padding for each memory section randomized with KASLR */
> +void __init kernel_randomize_memory(void)
> +{
> +       size_t i;
> +       unsigned long addr = memory_rand_start;
> +       unsigned long padding, rand, mem_tb;
> +       struct rnd_state rnd_st;
> +       unsigned long remain_padding = memory_rand_end - memory_rand_start;
> +
> +       if (!kaslr_enabled())
> +               return;
> +
> +       BUG_ON(kaslr_regions[0].base != &page_offset_base);

This is statically assigned above, is this BUG_ON useful?

> +       mem_tb = ((max_pfn << PAGE_SHIFT) >> TB_SHIFT);
> +
> +       if (mem_tb < kaslr_regions[0].size_tb)
> +               kaslr_regions[0].size_tb = mem_tb;

Can you add a comment for this? IIUC, this is just recalculating the
max memory size available for padding based on the page shift? Under
what situations would this be changing?

> +
> +       for (i = 0; i < ARRAY_SIZE(kaslr_regions); i++)
> +               remain_padding -= get_padding(&kaslr_regions[i]);
> +
> +       prandom_seed_state(&rnd_st, kaslr_get_random_boot_long());
> +
> +       /* Position each section randomly with minimum 1 terabyte between */
> +       for (i = 0; i < ARRAY_SIZE(kaslr_regions); i++) {
> +               padding = remain_padding / (ARRAY_SIZE(kaslr_regions) - i);
> +               prandom_bytes_state(&rnd_st, &rand, sizeof(rand));
> +               padding = (rand % (padding + 1)) & PUD_MASK;
> +               addr += padding;
> +               *kaslr_regions[i].base = addr;
> +               addr += get_padding(&kaslr_regions[i]);
> +               remain_padding -= padding;
> +       }

What happens if we run out of padding here, and doesn't this loop mean
earlier regions will have, on average, more padding? Should each
instead randomize within a one-time calculation of remaining_padding /
ARRAY_SIZE(kaslr_regions) ?

Also, to get added to the Kconfig, what is the available entropy here?
How far can each of the base addresses get offset?

> +}
> +
> +/*
> + * Create PGD aligned trampoline table to allow real mode initialization
> + * of additional CPUs. Consume only 1 additonal low memory page.

Typo "additional".

> + */
> +void __meminit kaslr_trampoline_init(void)
> +{
> +       unsigned long addr, next;
> +       pgd_t *pgd;
> +       pud_t *pud_page, *tr_pud_page;
> +       int i;
> +
> +       /* If KASLR is disabled, default to the existing page table entry */
> +       if (!kaslr_enabled()) {
> +               trampoline_pgd_entry = init_level4_pgt[pgd_index(PAGE_OFFSET)];
> +               return;
> +       }
> +
> +       tr_pud_page = alloc_low_page();
> +       set_pgd(&trampoline_pgd_entry, __pgd(_PAGE_TABLE | __pa(tr_pud_page)));
> +
> +       addr = 0;
> +       pgd = pgd_offset_k((unsigned long)__va(addr));
> +       pud_page = (pud_t *) pgd_page_vaddr(*pgd);
> +
> +       for (i = pud_index(addr); i < PTRS_PER_PUD; i++, addr = next) {
> +               pud_t *pud, *tr_pud;
> +
> +               tr_pud = tr_pud_page + pud_index(addr);
> +               pud = pud_page + pud_index((unsigned long)__va(addr));
> +               next = (addr & PUD_MASK) + PUD_SIZE;
> +
> +               /* Needed to copy pte or pud alike */
> +               BUILD_BUG_ON(sizeof(pud_t) != sizeof(pte_t));
> +               *tr_pud = *pud;
> +       }
> +}
> diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
> index 0b7a63d..6518314 100644
> --- a/arch/x86/realmode/init.c
> +++ b/arch/x86/realmode/init.c
> @@ -84,7 +84,11 @@ void __init setup_real_mode(void)
>         *trampoline_cr4_features = __read_cr4();
>
>         trampoline_pgd = (u64 *) __va(real_mode_header->trampoline_pgd);
> +#ifdef CONFIG_RANDOMIZE_MEMORY
> +       trampoline_pgd[0] = trampoline_pgd_entry.pgd;
> +#else
>         trampoline_pgd[0] = init_level4_pgt[pgd_index(__PAGE_OFFSET)].pgd;
> +#endif

To avoid this ifdefs, could trampoline_pgd_entry instead be defined
outside of mm/kaslr.c and have .pgd assigned as
init_level4_pgt[pgd_index(__PAGE_OFFSET)].pgd via a static inline of
kaslr_trampoline_init() instead?

>         trampoline_pgd[511] = init_level4_pgt[511].pgd;
>  #endif
>  }
> --
> 2.8.0.rc3.226.g39d4020
>

-Kees

-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 1/4] x86, boot: Refactor KASLR entropy functions
  2016-05-03 19:31   ` [kernel-hardening] " Thomas Garnier
@ 2016-05-10 19:05     ` Kees Cook
  -1 siblings, 0 replies; 22+ messages in thread
From: Kees Cook @ 2016-05-10 19:05 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: H . Peter Anvin, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Andy Lutomirski, Dmitry Vyukov, Paolo Bonzini, Dan Williams,
	Stephen Smalley, Kefeng Wang, Jonathan Corbet, Matt Fleming,
	Toshi Kani, Alexander Kuleshov, Alexander Popov, Joerg Roedel,
	Dave Young, Baoquan He, Dave Hansen, Mark Salter,
	Boris Ostrovsky, x86, LKML, linux-doc, Greg Thelen,
	kernel-hardening

On Tue, May 3, 2016 at 12:31 PM, Thomas Garnier <thgarnie@google.com> wrote:
> Move the KASLR entropy functions in x86/libray to be used in early
> kernel boot for KASLR memory randomization.
>
> Signed-off-by: Thomas Garnier <thgarnie@google.com>
> ---
> Based on next-20160502
> ---
>  arch/x86/boot/compressed/kaslr.c | 76 +++-----------------------------------
>  arch/x86/include/asm/kaslr.h     |  6 +++
>  arch/x86/lib/Makefile            |  1 +
>  arch/x86/lib/kaslr.c             | 79 ++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 91 insertions(+), 71 deletions(-)
>  create mode 100644 arch/x86/include/asm/kaslr.h
>  create mode 100644 arch/x86/lib/kaslr.c
>
> diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
> index 8741a6d..0bdee23 100644
> --- a/arch/x86/boot/compressed/kaslr.c
> +++ b/arch/x86/boot/compressed/kaslr.c
> @@ -11,10 +11,6 @@
>   */
>  #include "misc.h"
>
> -#include <asm/msr.h>
> -#include <asm/archrandom.h>
> -#include <asm/e820.h>
> -
>  #include <generated/compile.h>
>  #include <linux/module.h>
>  #include <linux/uts.h>
> @@ -25,26 +21,6 @@
>  static const char build_str[] = UTS_RELEASE " (" LINUX_COMPILE_BY "@"
>                 LINUX_COMPILE_HOST ") (" LINUX_COMPILER ") " UTS_VERSION;
>
> -#define I8254_PORT_CONTROL     0x43
> -#define I8254_PORT_COUNTER0    0x40
> -#define I8254_CMD_READBACK     0xC0
> -#define I8254_SELECT_COUNTER0  0x02
> -#define I8254_STATUS_NOTREADY  0x40
> -static inline u16 i8254(void)
> -{
> -       u16 status, timer;
> -
> -       do {
> -               outb(I8254_PORT_CONTROL,
> -                    I8254_CMD_READBACK | I8254_SELECT_COUNTER0);
> -               status = inb(I8254_PORT_COUNTER0);
> -               timer  = inb(I8254_PORT_COUNTER0);
> -               timer |= inb(I8254_PORT_COUNTER0) << 8;
> -       } while (status & I8254_STATUS_NOTREADY);
> -
> -       return timer;
> -}
> -
>  static unsigned long rotate_xor(unsigned long hash, const void *area,
>                                 size_t size)
>  {
> @@ -61,7 +37,7 @@ static unsigned long rotate_xor(unsigned long hash, const void *area,
>  }
>
>  /* Attempt to create a simple but unpredictable starting entropy. */
> -static unsigned long get_random_boot(void)
> +static unsigned long get_boot_seed(void)
>  {
>         unsigned long hash = 0;
>
> @@ -71,50 +47,6 @@ static unsigned long get_random_boot(void)
>         return hash;
>  }
>
> -static unsigned long get_random_long(void)
> -{
> -#ifdef CONFIG_X86_64
> -       const unsigned long mix_const = 0x5d6008cbf3848dd3UL;
> -#else
> -       const unsigned long mix_const = 0x3f39e593UL;
> -#endif
> -       unsigned long raw, random = get_random_boot();
> -       bool use_i8254 = true;
> -
> -       debug_putstr("KASLR using");
> -
> -       if (has_cpuflag(X86_FEATURE_RDRAND)) {
> -               debug_putstr(" RDRAND");
> -               if (rdrand_long(&raw)) {
> -                       random ^= raw;
> -                       use_i8254 = false;
> -               }
> -       }
> -
> -       if (has_cpuflag(X86_FEATURE_TSC)) {
> -               debug_putstr(" RDTSC");
> -               raw = rdtsc();
> -
> -               random ^= raw;
> -               use_i8254 = false;
> -       }
> -
> -       if (use_i8254) {
> -               debug_putstr(" i8254");
> -               random ^= i8254();
> -       }
> -
> -       /* Circular multiply for better bit diffusion */
> -       asm("mul %3"
> -           : "=a" (random), "=d" (raw)
> -           : "a" (random), "rm" (mix_const));
> -       random += raw;
> -
> -       debug_putstr("...\n");
> -
> -       return random;
> -}
> -
>  struct mem_vector {
>         unsigned long start;
>         unsigned long size;
> @@ -122,7 +54,6 @@ struct mem_vector {
>
>  #define MEM_AVOID_MAX 5
>  static struct mem_vector mem_avoid[MEM_AVOID_MAX];
> -
>  static bool mem_contains(struct mem_vector *region, struct mem_vector *item)
>  {
>         /* Item at least partially before region. */
> @@ -229,13 +160,16 @@ static void slots_append(unsigned long addr)
>         slots[slot_max++] = addr;
>  }
>
> +#define KASLR_COMPRESSED_BOOT
> +#include "../../lib/kaslr.c"
> +
>  static unsigned long slots_fetch_random(void)
>  {
>         /* Handle case of no slots stored. */
>         if (slot_max == 0)
>                 return 0;
>
> -       return slots[get_random_long() % slot_max];
> +       return slots[kaslr_get_random_boot_long() % slot_max];
>  }
>
>  static void process_e820_entry(struct e820entry *entry,
> diff --git a/arch/x86/include/asm/kaslr.h b/arch/x86/include/asm/kaslr.h
> new file mode 100644
> index 0000000..2ae1429
> --- /dev/null
> +++ b/arch/x86/include/asm/kaslr.h
> @@ -0,0 +1,6 @@
> +#ifndef _ASM_KASLR_H_
> +#define _ASM_KASLR_H_
> +
> +unsigned long kaslr_get_random_boot_long(void);
> +
> +#endif
> diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
> index 72a5767..cfa6d07 100644
> --- a/arch/x86/lib/Makefile
> +++ b/arch/x86/lib/Makefile
> @@ -24,6 +24,7 @@ lib-y += usercopy_$(BITS).o usercopy.o getuser.o putuser.o
>  lib-y += memcpy_$(BITS).o
>  lib-$(CONFIG_RWSEM_XCHGADD_ALGORITHM) += rwsem.o
>  lib-$(CONFIG_INSTRUCTION_DECODER) += insn.o inat.o
> +lib-$(CONFIG_RANDOMIZE_BASE) += kaslr.o
>
>  obj-y += msr.o msr-reg.o msr-reg-export.o
>
> diff --git a/arch/x86/lib/kaslr.c b/arch/x86/lib/kaslr.c
> new file mode 100644
> index 0000000..ffb22ba
> --- /dev/null
> +++ b/arch/x86/lib/kaslr.c
> @@ -0,0 +1,79 @@

Please add a file header comment here to describe what's contained and
that it is used in both regular and compressed kernels.

> +#include <asm/kaslr.h>
> +#include <asm/msr.h>
> +#include <asm/archrandom.h>
> +#include <asm/e820.h>
> +#include <asm/io.h>
> +
> +/* Replace boot functions on library build */

I'd expand this comment a bit, something like:

/*
 * When built for the regular kernel, several functions need to be stubbed out
 * or changed to their regular kernel equivalent.
 */

> +#ifndef KASLR_COMPRESSED_BOOT
> +#include <asm/cpufeature.h>
> +#include <asm/setup.h>
> +
> +#define debug_putstr(v)

Hmmm, I don't think this should be removed. Using these routines
should be uncommon, and it'd be nice to retain the debugging output
from them. Can this be refactored into an early printk, or is that
stuff not available yet? If it's not available, I can live with this
going silent, but it'd be nice to not lose it for the memory
randomization.

> +#define has_cpuflag(f) boot_cpu_has(f)
> +#define get_boot_seed() kaslr_offset()

Hmmm... this replacement seed feels like it has much less entropy.
Also, if RANDOMIZE_MEMORY is decoupled from RANDOMIZE_BASE, this will
not be cool. :) But I don't feel to strongly about the config coupling
-- I just wanted to see RANDOMIZE_MEMORY on by default if
RANDOMIZE_BASE is too.

> +#endif
> +
> +#define I8254_PORT_CONTROL     0x43
> +#define I8254_PORT_COUNTER0    0x40
> +#define I8254_CMD_READBACK     0xC0
> +#define I8254_SELECT_COUNTER0  0x02
> +#define I8254_STATUS_NOTREADY  0x40
> +static inline u16 i8254(void)
> +{
> +       u16 status, timer;
> +
> +       do {
> +               outb(I8254_PORT_CONTROL,
> +                    I8254_CMD_READBACK | I8254_SELECT_COUNTER0);
> +               status = inb(I8254_PORT_COUNTER0);
> +               timer  = inb(I8254_PORT_COUNTER0);
> +               timer |= inb(I8254_PORT_COUNTER0) << 8;
> +       } while (status & I8254_STATUS_NOTREADY);
> +
> +       return timer;
> +}
> +
> +unsigned long kaslr_get_random_boot_long(void)

Sorry again for the refactoring in this area: -tip (and soon -next)
will make yet another change to this function to carry a const char *
for the debug_putstr() calls.

> +{
> +#ifdef CONFIG_X86_64
> +       const unsigned long mix_const = 0x5d6008cbf3848dd3UL;
> +#else
> +       const unsigned long mix_const = 0x3f39e593UL;
> +#endif
> +       unsigned long raw, random = get_boot_seed();
> +       bool use_i8254 = true;
> +
> +       debug_putstr("KASLR using");
> +
> +       if (has_cpuflag(X86_FEATURE_RDRAND)) {
> +               debug_putstr(" RDRAND");
> +               if (rdrand_long(&raw)) {
> +                       random ^= raw;
> +                       use_i8254 = false;
> +               }
> +       }
> +
> +       if (has_cpuflag(X86_FEATURE_TSC)) {
> +               debug_putstr(" RDTSC");
> +               raw = rdtsc();
> +
> +               random ^= raw;
> +               use_i8254 = false;
> +       }
> +
> +       if (use_i8254) {
> +               debug_putstr(" i8254");
> +               random ^= i8254();
> +       }
> +
> +       /* Circular multiply for better bit diffusion */
> +       asm("mul %3"
> +           : "=a" (random), "=d" (raw)
> +           : "a" (random), "rm" (mix_const));
> +       random += raw;
> +
> +       debug_putstr("...\n");
> +
> +       return random;
> +}
> --
> 2.8.0.rc3.226.g39d4020
>

-Kees

-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [kernel-hardening] Re: [PATCH v3 1/4] x86, boot: Refactor KASLR entropy functions
@ 2016-05-10 19:05     ` Kees Cook
  0 siblings, 0 replies; 22+ messages in thread
From: Kees Cook @ 2016-05-10 19:05 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: H . Peter Anvin, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Andy Lutomirski, Dmitry Vyukov, Paolo Bonzini, Dan Williams,
	Stephen Smalley, Kefeng Wang, Jonathan Corbet, Matt Fleming,
	Toshi Kani, Alexander Kuleshov, Alexander Popov, Joerg Roedel,
	Dave Young, Baoquan He, Dave Hansen, Mark Salter,
	Boris Ostrovsky, x86, LKML, linux-doc, Greg Thelen,
	kernel-hardening

On Tue, May 3, 2016 at 12:31 PM, Thomas Garnier <thgarnie@google.com> wrote:
> Move the KASLR entropy functions in x86/libray to be used in early
> kernel boot for KASLR memory randomization.
>
> Signed-off-by: Thomas Garnier <thgarnie@google.com>
> ---
> Based on next-20160502
> ---
>  arch/x86/boot/compressed/kaslr.c | 76 +++-----------------------------------
>  arch/x86/include/asm/kaslr.h     |  6 +++
>  arch/x86/lib/Makefile            |  1 +
>  arch/x86/lib/kaslr.c             | 79 ++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 91 insertions(+), 71 deletions(-)
>  create mode 100644 arch/x86/include/asm/kaslr.h
>  create mode 100644 arch/x86/lib/kaslr.c
>
> diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
> index 8741a6d..0bdee23 100644
> --- a/arch/x86/boot/compressed/kaslr.c
> +++ b/arch/x86/boot/compressed/kaslr.c
> @@ -11,10 +11,6 @@
>   */
>  #include "misc.h"
>
> -#include <asm/msr.h>
> -#include <asm/archrandom.h>
> -#include <asm/e820.h>
> -
>  #include <generated/compile.h>
>  #include <linux/module.h>
>  #include <linux/uts.h>
> @@ -25,26 +21,6 @@
>  static const char build_str[] = UTS_RELEASE " (" LINUX_COMPILE_BY "@"
>                 LINUX_COMPILE_HOST ") (" LINUX_COMPILER ") " UTS_VERSION;
>
> -#define I8254_PORT_CONTROL     0x43
> -#define I8254_PORT_COUNTER0    0x40
> -#define I8254_CMD_READBACK     0xC0
> -#define I8254_SELECT_COUNTER0  0x02
> -#define I8254_STATUS_NOTREADY  0x40
> -static inline u16 i8254(void)
> -{
> -       u16 status, timer;
> -
> -       do {
> -               outb(I8254_PORT_CONTROL,
> -                    I8254_CMD_READBACK | I8254_SELECT_COUNTER0);
> -               status = inb(I8254_PORT_COUNTER0);
> -               timer  = inb(I8254_PORT_COUNTER0);
> -               timer |= inb(I8254_PORT_COUNTER0) << 8;
> -       } while (status & I8254_STATUS_NOTREADY);
> -
> -       return timer;
> -}
> -
>  static unsigned long rotate_xor(unsigned long hash, const void *area,
>                                 size_t size)
>  {
> @@ -61,7 +37,7 @@ static unsigned long rotate_xor(unsigned long hash, const void *area,
>  }
>
>  /* Attempt to create a simple but unpredictable starting entropy. */
> -static unsigned long get_random_boot(void)
> +static unsigned long get_boot_seed(void)
>  {
>         unsigned long hash = 0;
>
> @@ -71,50 +47,6 @@ static unsigned long get_random_boot(void)
>         return hash;
>  }
>
> -static unsigned long get_random_long(void)
> -{
> -#ifdef CONFIG_X86_64
> -       const unsigned long mix_const = 0x5d6008cbf3848dd3UL;
> -#else
> -       const unsigned long mix_const = 0x3f39e593UL;
> -#endif
> -       unsigned long raw, random = get_random_boot();
> -       bool use_i8254 = true;
> -
> -       debug_putstr("KASLR using");
> -
> -       if (has_cpuflag(X86_FEATURE_RDRAND)) {
> -               debug_putstr(" RDRAND");
> -               if (rdrand_long(&raw)) {
> -                       random ^= raw;
> -                       use_i8254 = false;
> -               }
> -       }
> -
> -       if (has_cpuflag(X86_FEATURE_TSC)) {
> -               debug_putstr(" RDTSC");
> -               raw = rdtsc();
> -
> -               random ^= raw;
> -               use_i8254 = false;
> -       }
> -
> -       if (use_i8254) {
> -               debug_putstr(" i8254");
> -               random ^= i8254();
> -       }
> -
> -       /* Circular multiply for better bit diffusion */
> -       asm("mul %3"
> -           : "=a" (random), "=d" (raw)
> -           : "a" (random), "rm" (mix_const));
> -       random += raw;
> -
> -       debug_putstr("...\n");
> -
> -       return random;
> -}
> -
>  struct mem_vector {
>         unsigned long start;
>         unsigned long size;
> @@ -122,7 +54,6 @@ struct mem_vector {
>
>  #define MEM_AVOID_MAX 5
>  static struct mem_vector mem_avoid[MEM_AVOID_MAX];
> -
>  static bool mem_contains(struct mem_vector *region, struct mem_vector *item)
>  {
>         /* Item at least partially before region. */
> @@ -229,13 +160,16 @@ static void slots_append(unsigned long addr)
>         slots[slot_max++] = addr;
>  }
>
> +#define KASLR_COMPRESSED_BOOT
> +#include "../../lib/kaslr.c"
> +
>  static unsigned long slots_fetch_random(void)
>  {
>         /* Handle case of no slots stored. */
>         if (slot_max == 0)
>                 return 0;
>
> -       return slots[get_random_long() % slot_max];
> +       return slots[kaslr_get_random_boot_long() % slot_max];
>  }
>
>  static void process_e820_entry(struct e820entry *entry,
> diff --git a/arch/x86/include/asm/kaslr.h b/arch/x86/include/asm/kaslr.h
> new file mode 100644
> index 0000000..2ae1429
> --- /dev/null
> +++ b/arch/x86/include/asm/kaslr.h
> @@ -0,0 +1,6 @@
> +#ifndef _ASM_KASLR_H_
> +#define _ASM_KASLR_H_
> +
> +unsigned long kaslr_get_random_boot_long(void);
> +
> +#endif
> diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
> index 72a5767..cfa6d07 100644
> --- a/arch/x86/lib/Makefile
> +++ b/arch/x86/lib/Makefile
> @@ -24,6 +24,7 @@ lib-y += usercopy_$(BITS).o usercopy.o getuser.o putuser.o
>  lib-y += memcpy_$(BITS).o
>  lib-$(CONFIG_RWSEM_XCHGADD_ALGORITHM) += rwsem.o
>  lib-$(CONFIG_INSTRUCTION_DECODER) += insn.o inat.o
> +lib-$(CONFIG_RANDOMIZE_BASE) += kaslr.o
>
>  obj-y += msr.o msr-reg.o msr-reg-export.o
>
> diff --git a/arch/x86/lib/kaslr.c b/arch/x86/lib/kaslr.c
> new file mode 100644
> index 0000000..ffb22ba
> --- /dev/null
> +++ b/arch/x86/lib/kaslr.c
> @@ -0,0 +1,79 @@

Please add a file header comment here to describe what's contained and
that it is used in both regular and compressed kernels.

> +#include <asm/kaslr.h>
> +#include <asm/msr.h>
> +#include <asm/archrandom.h>
> +#include <asm/e820.h>
> +#include <asm/io.h>
> +
> +/* Replace boot functions on library build */

I'd expand this comment a bit, something like:

/*
 * When built for the regular kernel, several functions need to be stubbed out
 * or changed to their regular kernel equivalent.
 */

> +#ifndef KASLR_COMPRESSED_BOOT
> +#include <asm/cpufeature.h>
> +#include <asm/setup.h>
> +
> +#define debug_putstr(v)

Hmmm, I don't think this should be removed. Using these routines
should be uncommon, and it'd be nice to retain the debugging output
from them. Can this be refactored into an early printk, or is that
stuff not available yet? If it's not available, I can live with this
going silent, but it'd be nice to not lose it for the memory
randomization.

> +#define has_cpuflag(f) boot_cpu_has(f)
> +#define get_boot_seed() kaslr_offset()

Hmmm... this replacement seed feels like it has much less entropy.
Also, if RANDOMIZE_MEMORY is decoupled from RANDOMIZE_BASE, this will
not be cool. :) But I don't feel to strongly about the config coupling
-- I just wanted to see RANDOMIZE_MEMORY on by default if
RANDOMIZE_BASE is too.

> +#endif
> +
> +#define I8254_PORT_CONTROL     0x43
> +#define I8254_PORT_COUNTER0    0x40
> +#define I8254_CMD_READBACK     0xC0
> +#define I8254_SELECT_COUNTER0  0x02
> +#define I8254_STATUS_NOTREADY  0x40
> +static inline u16 i8254(void)
> +{
> +       u16 status, timer;
> +
> +       do {
> +               outb(I8254_PORT_CONTROL,
> +                    I8254_CMD_READBACK | I8254_SELECT_COUNTER0);
> +               status = inb(I8254_PORT_COUNTER0);
> +               timer  = inb(I8254_PORT_COUNTER0);
> +               timer |= inb(I8254_PORT_COUNTER0) << 8;
> +       } while (status & I8254_STATUS_NOTREADY);
> +
> +       return timer;
> +}
> +
> +unsigned long kaslr_get_random_boot_long(void)

Sorry again for the refactoring in this area: -tip (and soon -next)
will make yet another change to this function to carry a const char *
for the debug_putstr() calls.

> +{
> +#ifdef CONFIG_X86_64
> +       const unsigned long mix_const = 0x5d6008cbf3848dd3UL;
> +#else
> +       const unsigned long mix_const = 0x3f39e593UL;
> +#endif
> +       unsigned long raw, random = get_boot_seed();
> +       bool use_i8254 = true;
> +
> +       debug_putstr("KASLR using");
> +
> +       if (has_cpuflag(X86_FEATURE_RDRAND)) {
> +               debug_putstr(" RDRAND");
> +               if (rdrand_long(&raw)) {
> +                       random ^= raw;
> +                       use_i8254 = false;
> +               }
> +       }
> +
> +       if (has_cpuflag(X86_FEATURE_TSC)) {
> +               debug_putstr(" RDTSC");
> +               raw = rdtsc();
> +
> +               random ^= raw;
> +               use_i8254 = false;
> +       }
> +
> +       if (use_i8254) {
> +               debug_putstr(" i8254");
> +               random ^= i8254();
> +       }
> +
> +       /* Circular multiply for better bit diffusion */
> +       asm("mul %3"
> +           : "=a" (random), "=d" (raw)
> +           : "a" (random), "rm" (mix_const));
> +       random += raw;
> +
> +       debug_putstr("...\n");
> +
> +       return random;
> +}
> --
> 2.8.0.rc3.226.g39d4020
>

-Kees

-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 1/4] x86, boot: Refactor KASLR entropy functions
  2016-05-10 19:05     ` [kernel-hardening] " Kees Cook
@ 2016-05-10 20:10       ` Thomas Garnier
  -1 siblings, 0 replies; 22+ messages in thread
From: Thomas Garnier @ 2016-05-10 20:10 UTC (permalink / raw)
  To: Kees Cook
  Cc: H . Peter Anvin, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Andy Lutomirski, Dmitry Vyukov, Paolo Bonzini, Dan Williams,
	Stephen Smalley, Kefeng Wang, Jonathan Corbet, Matt Fleming,
	Toshi Kani, Alexander Kuleshov, Alexander Popov, Joerg Roedel,
	Dave Young, Baoquan He, Dave Hansen, Mark Salter,
	Boris Ostrovsky, x86, LKML, linux-doc, Greg Thelen,
	kernel-hardening

On Tue, May 10, 2016 at 12:05 PM, Kees Cook <keescook@chromium.org> wrote:
> On Tue, May 3, 2016 at 12:31 PM, Thomas Garnier <thgarnie@google.com> wrote:
>> Move the KASLR entropy functions in x86/libray to be used in early
>> kernel boot for KASLR memory randomization.
>>
>> Signed-off-by: Thomas Garnier <thgarnie@google.com>
>> ---
>> Based on next-20160502
>> ---
>>  arch/x86/boot/compressed/kaslr.c | 76 +++-----------------------------------
>>  arch/x86/include/asm/kaslr.h     |  6 +++
>>  arch/x86/lib/Makefile            |  1 +
>>  arch/x86/lib/kaslr.c             | 79 ++++++++++++++++++++++++++++++++++++++++
>>  4 files changed, 91 insertions(+), 71 deletions(-)
>>  create mode 100644 arch/x86/include/asm/kaslr.h
>>  create mode 100644 arch/x86/lib/kaslr.c
>>
>> diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
>> index 8741a6d..0bdee23 100644
>> --- a/arch/x86/boot/compressed/kaslr.c
>> +++ b/arch/x86/boot/compressed/kaslr.c
>> @@ -11,10 +11,6 @@
>>   */
>>  #include "misc.h"
>>
>> -#include <asm/msr.h>
>> -#include <asm/archrandom.h>
>> -#include <asm/e820.h>
>> -
>>  #include <generated/compile.h>
>>  #include <linux/module.h>
>>  #include <linux/uts.h>
>> @@ -25,26 +21,6 @@
>>  static const char build_str[] = UTS_RELEASE " (" LINUX_COMPILE_BY "@"
>>                 LINUX_COMPILE_HOST ") (" LINUX_COMPILER ") " UTS_VERSION;
>>
>> -#define I8254_PORT_CONTROL     0x43
>> -#define I8254_PORT_COUNTER0    0x40
>> -#define I8254_CMD_READBACK     0xC0
>> -#define I8254_SELECT_COUNTER0  0x02
>> -#define I8254_STATUS_NOTREADY  0x40
>> -static inline u16 i8254(void)
>> -{
>> -       u16 status, timer;
>> -
>> -       do {
>> -               outb(I8254_PORT_CONTROL,
>> -                    I8254_CMD_READBACK | I8254_SELECT_COUNTER0);
>> -               status = inb(I8254_PORT_COUNTER0);
>> -               timer  = inb(I8254_PORT_COUNTER0);
>> -               timer |= inb(I8254_PORT_COUNTER0) << 8;
>> -       } while (status & I8254_STATUS_NOTREADY);
>> -
>> -       return timer;
>> -}
>> -
>>  static unsigned long rotate_xor(unsigned long hash, const void *area,
>>                                 size_t size)
>>  {
>> @@ -61,7 +37,7 @@ static unsigned long rotate_xor(unsigned long hash, const void *area,
>>  }
>>
>>  /* Attempt to create a simple but unpredictable starting entropy. */
>> -static unsigned long get_random_boot(void)
>> +static unsigned long get_boot_seed(void)
>>  {
>>         unsigned long hash = 0;
>>
>> @@ -71,50 +47,6 @@ static unsigned long get_random_boot(void)
>>         return hash;
>>  }
>>
>> -static unsigned long get_random_long(void)
>> -{
>> -#ifdef CONFIG_X86_64
>> -       const unsigned long mix_const = 0x5d6008cbf3848dd3UL;
>> -#else
>> -       const unsigned long mix_const = 0x3f39e593UL;
>> -#endif
>> -       unsigned long raw, random = get_random_boot();
>> -       bool use_i8254 = true;
>> -
>> -       debug_putstr("KASLR using");
>> -
>> -       if (has_cpuflag(X86_FEATURE_RDRAND)) {
>> -               debug_putstr(" RDRAND");
>> -               if (rdrand_long(&raw)) {
>> -                       random ^= raw;
>> -                       use_i8254 = false;
>> -               }
>> -       }
>> -
>> -       if (has_cpuflag(X86_FEATURE_TSC)) {
>> -               debug_putstr(" RDTSC");
>> -               raw = rdtsc();
>> -
>> -               random ^= raw;
>> -               use_i8254 = false;
>> -       }
>> -
>> -       if (use_i8254) {
>> -               debug_putstr(" i8254");
>> -               random ^= i8254();
>> -       }
>> -
>> -       /* Circular multiply for better bit diffusion */
>> -       asm("mul %3"
>> -           : "=a" (random), "=d" (raw)
>> -           : "a" (random), "rm" (mix_const));
>> -       random += raw;
>> -
>> -       debug_putstr("...\n");
>> -
>> -       return random;
>> -}
>> -
>>  struct mem_vector {
>>         unsigned long start;
>>         unsigned long size;
>> @@ -122,7 +54,6 @@ struct mem_vector {
>>
>>  #define MEM_AVOID_MAX 5
>>  static struct mem_vector mem_avoid[MEM_AVOID_MAX];
>> -
>>  static bool mem_contains(struct mem_vector *region, struct mem_vector *item)
>>  {
>>         /* Item at least partially before region. */
>> @@ -229,13 +160,16 @@ static void slots_append(unsigned long addr)
>>         slots[slot_max++] = addr;
>>  }
>>
>> +#define KASLR_COMPRESSED_BOOT
>> +#include "../../lib/kaslr.c"
>> +
>>  static unsigned long slots_fetch_random(void)
>>  {
>>         /* Handle case of no slots stored. */
>>         if (slot_max == 0)
>>                 return 0;
>>
>> -       return slots[get_random_long() % slot_max];
>> +       return slots[kaslr_get_random_boot_long() % slot_max];
>>  }
>>
>>  static void process_e820_entry(struct e820entry *entry,
>> diff --git a/arch/x86/include/asm/kaslr.h b/arch/x86/include/asm/kaslr.h
>> new file mode 100644
>> index 0000000..2ae1429
>> --- /dev/null
>> +++ b/arch/x86/include/asm/kaslr.h
>> @@ -0,0 +1,6 @@
>> +#ifndef _ASM_KASLR_H_
>> +#define _ASM_KASLR_H_
>> +
>> +unsigned long kaslr_get_random_boot_long(void);
>> +
>> +#endif
>> diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
>> index 72a5767..cfa6d07 100644
>> --- a/arch/x86/lib/Makefile
>> +++ b/arch/x86/lib/Makefile
>> @@ -24,6 +24,7 @@ lib-y += usercopy_$(BITS).o usercopy.o getuser.o putuser.o
>>  lib-y += memcpy_$(BITS).o
>>  lib-$(CONFIG_RWSEM_XCHGADD_ALGORITHM) += rwsem.o
>>  lib-$(CONFIG_INSTRUCTION_DECODER) += insn.o inat.o
>> +lib-$(CONFIG_RANDOMIZE_BASE) += kaslr.o
>>
>>  obj-y += msr.o msr-reg.o msr-reg-export.o
>>
>> diff --git a/arch/x86/lib/kaslr.c b/arch/x86/lib/kaslr.c
>> new file mode 100644
>> index 0000000..ffb22ba
>> --- /dev/null
>> +++ b/arch/x86/lib/kaslr.c
>> @@ -0,0 +1,79 @@
>
> Please add a file header comment here to describe what's contained and
> that it is used in both regular and compressed kernels.
>

Will do.

>> +#include <asm/kaslr.h>
>> +#include <asm/msr.h>
>> +#include <asm/archrandom.h>
>> +#include <asm/e820.h>
>> +#include <asm/io.h>
>> +
>> +/* Replace boot functions on library build */
>
> I'd expand this comment a bit, something like:
>
> /*
>  * When built for the regular kernel, several functions need to be stubbed out
>  * or changed to their regular kernel equivalent.
>  */
>

Will do.

>> +#ifndef KASLR_COMPRESSED_BOOT
>> +#include <asm/cpufeature.h>
>> +#include <asm/setup.h>
>> +
>> +#define debug_putstr(v)
>
> Hmmm, I don't think this should be removed. Using these routines
> should be uncommon, and it'd be nice to retain the debugging output
> from them. Can this be refactored into an early printk, or is that
> stuff not available yet? If it's not available, I can live with this
> going silent, but it'd be nice to not lose it for the memory
> randomization.
>
>> +#define has_cpuflag(f) boot_cpu_has(f)
>> +#define get_boot_seed() kaslr_offset()
>
> Hmmm... this replacement seed feels like it has much less entropy.
> Also, if RANDOMIZE_MEMORY is decoupled from RANDOMIZE_BASE, this will
> not be cool. :) But I don't feel to strongly about the config coupling
> -- I just wanted to see RANDOMIZE_MEMORY on by default if
> RANDOMIZE_BASE is too.
>

Currently, RANDOMIZE_MEMORY is coupled with RANDOMIZE_BASE. I think it
makes sense that they remain together. Also there is not much additional entropy
available as a base here. The table used on KASLR compressed boot is gone.

>> +#endif
>> +
>> +#define I8254_PORT_CONTROL     0x43
>> +#define I8254_PORT_COUNTER0    0x40
>> +#define I8254_CMD_READBACK     0xC0
>> +#define I8254_SELECT_COUNTER0  0x02
>> +#define I8254_STATUS_NOTREADY  0x40
>> +static inline u16 i8254(void)
>> +{
>> +       u16 status, timer;
>> +
>> +       do {
>> +               outb(I8254_PORT_CONTROL,
>> +                    I8254_CMD_READBACK | I8254_SELECT_COUNTER0);
>> +               status = inb(I8254_PORT_COUNTER0);
>> +               timer  = inb(I8254_PORT_COUNTER0);
>> +               timer |= inb(I8254_PORT_COUNTER0) << 8;
>> +       } while (status & I8254_STATUS_NOTREADY);
>> +
>> +       return timer;
>> +}
>> +
>> +unsigned long kaslr_get_random_boot_long(void)
>
> Sorry again for the refactoring in this area: -tip (and soon -next)
> will make yet another change to this function to carry a const char *
> for the debug_putstr() calls.
>

I will keep an eye on it and send the next iteration when the change arrive
on next.

Thanks for the comments,
Thomas

>> +{
>> +#ifdef CONFIG_X86_64
>> +       const unsigned long mix_const = 0x5d6008cbf3848dd3UL;
>> +#else
>> +       const unsigned long mix_const = 0x3f39e593UL;
>> +#endif
>> +       unsigned long raw, random = get_boot_seed();
>> +       bool use_i8254 = true;
>> +
>> +       debug_putstr("KASLR using");
>> +
>> +       if (has_cpuflag(X86_FEATURE_RDRAND)) {
>> +               debug_putstr(" RDRAND");
>> +               if (rdrand_long(&raw)) {
>> +                       random ^= raw;
>> +                       use_i8254 = false;
>> +               }
>> +       }
>> +
>> +       if (has_cpuflag(X86_FEATURE_TSC)) {
>> +               debug_putstr(" RDTSC");
>> +               raw = rdtsc();
>> +
>> +               random ^= raw;
>> +               use_i8254 = false;
>> +       }
>> +
>> +       if (use_i8254) {
>> +               debug_putstr(" i8254");
>> +               random ^= i8254();
>> +       }
>> +
>> +       /* Circular multiply for better bit diffusion */
>> +       asm("mul %3"
>> +           : "=a" (random), "=d" (raw)
>> +           : "a" (random), "rm" (mix_const));
>> +       random += raw;
>> +
>> +       debug_putstr("...\n");
>> +
>> +       return random;
>> +}
>> --
>> 2.8.0.rc3.226.g39d4020
>>
>
> -Kees
>
> --
> Kees Cook
> Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [kernel-hardening] Re: [PATCH v3 1/4] x86, boot: Refactor KASLR entropy functions
@ 2016-05-10 20:10       ` Thomas Garnier
  0 siblings, 0 replies; 22+ messages in thread
From: Thomas Garnier @ 2016-05-10 20:10 UTC (permalink / raw)
  To: Kees Cook
  Cc: H . Peter Anvin, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Andy Lutomirski, Dmitry Vyukov, Paolo Bonzini, Dan Williams,
	Stephen Smalley, Kefeng Wang, Jonathan Corbet, Matt Fleming,
	Toshi Kani, Alexander Kuleshov, Alexander Popov, Joerg Roedel,
	Dave Young, Baoquan He, Dave Hansen, Mark Salter,
	Boris Ostrovsky, x86, LKML, linux-doc, Greg Thelen,
	kernel-hardening

On Tue, May 10, 2016 at 12:05 PM, Kees Cook <keescook@chromium.org> wrote:
> On Tue, May 3, 2016 at 12:31 PM, Thomas Garnier <thgarnie@google.com> wrote:
>> Move the KASLR entropy functions in x86/libray to be used in early
>> kernel boot for KASLR memory randomization.
>>
>> Signed-off-by: Thomas Garnier <thgarnie@google.com>
>> ---
>> Based on next-20160502
>> ---
>>  arch/x86/boot/compressed/kaslr.c | 76 +++-----------------------------------
>>  arch/x86/include/asm/kaslr.h     |  6 +++
>>  arch/x86/lib/Makefile            |  1 +
>>  arch/x86/lib/kaslr.c             | 79 ++++++++++++++++++++++++++++++++++++++++
>>  4 files changed, 91 insertions(+), 71 deletions(-)
>>  create mode 100644 arch/x86/include/asm/kaslr.h
>>  create mode 100644 arch/x86/lib/kaslr.c
>>
>> diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
>> index 8741a6d..0bdee23 100644
>> --- a/arch/x86/boot/compressed/kaslr.c
>> +++ b/arch/x86/boot/compressed/kaslr.c
>> @@ -11,10 +11,6 @@
>>   */
>>  #include "misc.h"
>>
>> -#include <asm/msr.h>
>> -#include <asm/archrandom.h>
>> -#include <asm/e820.h>
>> -
>>  #include <generated/compile.h>
>>  #include <linux/module.h>
>>  #include <linux/uts.h>
>> @@ -25,26 +21,6 @@
>>  static const char build_str[] = UTS_RELEASE " (" LINUX_COMPILE_BY "@"
>>                 LINUX_COMPILE_HOST ") (" LINUX_COMPILER ") " UTS_VERSION;
>>
>> -#define I8254_PORT_CONTROL     0x43
>> -#define I8254_PORT_COUNTER0    0x40
>> -#define I8254_CMD_READBACK     0xC0
>> -#define I8254_SELECT_COUNTER0  0x02
>> -#define I8254_STATUS_NOTREADY  0x40
>> -static inline u16 i8254(void)
>> -{
>> -       u16 status, timer;
>> -
>> -       do {
>> -               outb(I8254_PORT_CONTROL,
>> -                    I8254_CMD_READBACK | I8254_SELECT_COUNTER0);
>> -               status = inb(I8254_PORT_COUNTER0);
>> -               timer  = inb(I8254_PORT_COUNTER0);
>> -               timer |= inb(I8254_PORT_COUNTER0) << 8;
>> -       } while (status & I8254_STATUS_NOTREADY);
>> -
>> -       return timer;
>> -}
>> -
>>  static unsigned long rotate_xor(unsigned long hash, const void *area,
>>                                 size_t size)
>>  {
>> @@ -61,7 +37,7 @@ static unsigned long rotate_xor(unsigned long hash, const void *area,
>>  }
>>
>>  /* Attempt to create a simple but unpredictable starting entropy. */
>> -static unsigned long get_random_boot(void)
>> +static unsigned long get_boot_seed(void)
>>  {
>>         unsigned long hash = 0;
>>
>> @@ -71,50 +47,6 @@ static unsigned long get_random_boot(void)
>>         return hash;
>>  }
>>
>> -static unsigned long get_random_long(void)
>> -{
>> -#ifdef CONFIG_X86_64
>> -       const unsigned long mix_const = 0x5d6008cbf3848dd3UL;
>> -#else
>> -       const unsigned long mix_const = 0x3f39e593UL;
>> -#endif
>> -       unsigned long raw, random = get_random_boot();
>> -       bool use_i8254 = true;
>> -
>> -       debug_putstr("KASLR using");
>> -
>> -       if (has_cpuflag(X86_FEATURE_RDRAND)) {
>> -               debug_putstr(" RDRAND");
>> -               if (rdrand_long(&raw)) {
>> -                       random ^= raw;
>> -                       use_i8254 = false;
>> -               }
>> -       }
>> -
>> -       if (has_cpuflag(X86_FEATURE_TSC)) {
>> -               debug_putstr(" RDTSC");
>> -               raw = rdtsc();
>> -
>> -               random ^= raw;
>> -               use_i8254 = false;
>> -       }
>> -
>> -       if (use_i8254) {
>> -               debug_putstr(" i8254");
>> -               random ^= i8254();
>> -       }
>> -
>> -       /* Circular multiply for better bit diffusion */
>> -       asm("mul %3"
>> -           : "=a" (random), "=d" (raw)
>> -           : "a" (random), "rm" (mix_const));
>> -       random += raw;
>> -
>> -       debug_putstr("...\n");
>> -
>> -       return random;
>> -}
>> -
>>  struct mem_vector {
>>         unsigned long start;
>>         unsigned long size;
>> @@ -122,7 +54,6 @@ struct mem_vector {
>>
>>  #define MEM_AVOID_MAX 5
>>  static struct mem_vector mem_avoid[MEM_AVOID_MAX];
>> -
>>  static bool mem_contains(struct mem_vector *region, struct mem_vector *item)
>>  {
>>         /* Item at least partially before region. */
>> @@ -229,13 +160,16 @@ static void slots_append(unsigned long addr)
>>         slots[slot_max++] = addr;
>>  }
>>
>> +#define KASLR_COMPRESSED_BOOT
>> +#include "../../lib/kaslr.c"
>> +
>>  static unsigned long slots_fetch_random(void)
>>  {
>>         /* Handle case of no slots stored. */
>>         if (slot_max == 0)
>>                 return 0;
>>
>> -       return slots[get_random_long() % slot_max];
>> +       return slots[kaslr_get_random_boot_long() % slot_max];
>>  }
>>
>>  static void process_e820_entry(struct e820entry *entry,
>> diff --git a/arch/x86/include/asm/kaslr.h b/arch/x86/include/asm/kaslr.h
>> new file mode 100644
>> index 0000000..2ae1429
>> --- /dev/null
>> +++ b/arch/x86/include/asm/kaslr.h
>> @@ -0,0 +1,6 @@
>> +#ifndef _ASM_KASLR_H_
>> +#define _ASM_KASLR_H_
>> +
>> +unsigned long kaslr_get_random_boot_long(void);
>> +
>> +#endif
>> diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
>> index 72a5767..cfa6d07 100644
>> --- a/arch/x86/lib/Makefile
>> +++ b/arch/x86/lib/Makefile
>> @@ -24,6 +24,7 @@ lib-y += usercopy_$(BITS).o usercopy.o getuser.o putuser.o
>>  lib-y += memcpy_$(BITS).o
>>  lib-$(CONFIG_RWSEM_XCHGADD_ALGORITHM) += rwsem.o
>>  lib-$(CONFIG_INSTRUCTION_DECODER) += insn.o inat.o
>> +lib-$(CONFIG_RANDOMIZE_BASE) += kaslr.o
>>
>>  obj-y += msr.o msr-reg.o msr-reg-export.o
>>
>> diff --git a/arch/x86/lib/kaslr.c b/arch/x86/lib/kaslr.c
>> new file mode 100644
>> index 0000000..ffb22ba
>> --- /dev/null
>> +++ b/arch/x86/lib/kaslr.c
>> @@ -0,0 +1,79 @@
>
> Please add a file header comment here to describe what's contained and
> that it is used in both regular and compressed kernels.
>

Will do.

>> +#include <asm/kaslr.h>
>> +#include <asm/msr.h>
>> +#include <asm/archrandom.h>
>> +#include <asm/e820.h>
>> +#include <asm/io.h>
>> +
>> +/* Replace boot functions on library build */
>
> I'd expand this comment a bit, something like:
>
> /*
>  * When built for the regular kernel, several functions need to be stubbed out
>  * or changed to their regular kernel equivalent.
>  */
>

Will do.

>> +#ifndef KASLR_COMPRESSED_BOOT
>> +#include <asm/cpufeature.h>
>> +#include <asm/setup.h>
>> +
>> +#define debug_putstr(v)
>
> Hmmm, I don't think this should be removed. Using these routines
> should be uncommon, and it'd be nice to retain the debugging output
> from them. Can this be refactored into an early printk, or is that
> stuff not available yet? If it's not available, I can live with this
> going silent, but it'd be nice to not lose it for the memory
> randomization.
>
>> +#define has_cpuflag(f) boot_cpu_has(f)
>> +#define get_boot_seed() kaslr_offset()
>
> Hmmm... this replacement seed feels like it has much less entropy.
> Also, if RANDOMIZE_MEMORY is decoupled from RANDOMIZE_BASE, this will
> not be cool. :) But I don't feel to strongly about the config coupling
> -- I just wanted to see RANDOMIZE_MEMORY on by default if
> RANDOMIZE_BASE is too.
>

Currently, RANDOMIZE_MEMORY is coupled with RANDOMIZE_BASE. I think it
makes sense that they remain together. Also there is not much additional entropy
available as a base here. The table used on KASLR compressed boot is gone.

>> +#endif
>> +
>> +#define I8254_PORT_CONTROL     0x43
>> +#define I8254_PORT_COUNTER0    0x40
>> +#define I8254_CMD_READBACK     0xC0
>> +#define I8254_SELECT_COUNTER0  0x02
>> +#define I8254_STATUS_NOTREADY  0x40
>> +static inline u16 i8254(void)
>> +{
>> +       u16 status, timer;
>> +
>> +       do {
>> +               outb(I8254_PORT_CONTROL,
>> +                    I8254_CMD_READBACK | I8254_SELECT_COUNTER0);
>> +               status = inb(I8254_PORT_COUNTER0);
>> +               timer  = inb(I8254_PORT_COUNTER0);
>> +               timer |= inb(I8254_PORT_COUNTER0) << 8;
>> +       } while (status & I8254_STATUS_NOTREADY);
>> +
>> +       return timer;
>> +}
>> +
>> +unsigned long kaslr_get_random_boot_long(void)
>
> Sorry again for the refactoring in this area: -tip (and soon -next)
> will make yet another change to this function to carry a const char *
> for the debug_putstr() calls.
>

I will keep an eye on it and send the next iteration when the change arrive
on next.

Thanks for the comments,
Thomas

>> +{
>> +#ifdef CONFIG_X86_64
>> +       const unsigned long mix_const = 0x5d6008cbf3848dd3UL;
>> +#else
>> +       const unsigned long mix_const = 0x3f39e593UL;
>> +#endif
>> +       unsigned long raw, random = get_boot_seed();
>> +       bool use_i8254 = true;
>> +
>> +       debug_putstr("KASLR using");
>> +
>> +       if (has_cpuflag(X86_FEATURE_RDRAND)) {
>> +               debug_putstr(" RDRAND");
>> +               if (rdrand_long(&raw)) {
>> +                       random ^= raw;
>> +                       use_i8254 = false;
>> +               }
>> +       }
>> +
>> +       if (has_cpuflag(X86_FEATURE_TSC)) {
>> +               debug_putstr(" RDTSC");
>> +               raw = rdtsc();
>> +
>> +               random ^= raw;
>> +               use_i8254 = false;
>> +       }
>> +
>> +       if (use_i8254) {
>> +               debug_putstr(" i8254");
>> +               random ^= i8254();
>> +       }
>> +
>> +       /* Circular multiply for better bit diffusion */
>> +       asm("mul %3"
>> +           : "=a" (random), "=d" (raw)
>> +           : "a" (random), "rm" (mix_const));
>> +       random += raw;
>> +
>> +       debug_putstr("...\n");
>> +
>> +       return random;
>> +}
>> --
>> 2.8.0.rc3.226.g39d4020
>>
>
> -Kees
>
> --
> Kees Cook
> Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 3/4] x86, boot: Implement ASLR for kernel memory sections (x86_64)
  2016-05-10 18:53     ` [kernel-hardening] " Kees Cook
@ 2016-05-10 21:28       ` Thomas Garnier
  -1 siblings, 0 replies; 22+ messages in thread
From: Thomas Garnier @ 2016-05-10 21:28 UTC (permalink / raw)
  To: Kees Cook
  Cc: H . Peter Anvin, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Andy Lutomirski, Dmitry Vyukov, Paolo Bonzini, Dan Williams,
	Stephen Smalley, Kefeng Wang, Jonathan Corbet, Matt Fleming,
	Toshi Kani, Alexander Kuleshov, Alexander Popov, Joerg Roedel,
	Dave Young, Baoquan He, Dave Hansen, Mark Salter,
	Boris Ostrovsky, x86, LKML, linux-doc, Greg Thelen,
	kernel-hardening

On Tue, May 10, 2016 at 11:53 AM, Kees Cook <keescook@chromium.org> wrote:
> On Tue, May 3, 2016 at 12:31 PM, Thomas Garnier <thgarnie@google.com> wrote:
>> Randomizes the virtual address space of kernel memory sections (physical
>> memory mapping, vmalloc & vmemmap) for x86_64. This security feature
>> mitigates exploits relying on predictable kernel addresses. These
>> addresses can be used to disclose the kernel modules base addresses or
>> corrupt specific structures to elevate privileges bypassing the current
>> implementation of KASLR. This feature can be enabled with the
>> CONFIG_RANDOMIZE_MEMORY option.
>
> I'm struggling to come up with a more accurate name for this, since
> it's a base randomization of the kernel memory sections. Everything
> else seems needlessly long (CONFIG_RANDOMIZE_BASE_MEMORY). I wonder if
> things should be renamed generally to CONFIG_KASLR_BASE,
> CONFIG_KASLR_MEMORY, etc, but that doesn't need to be part of this
> series. Let's leave this as-is, and just make sure it's clear in the
> Kconfig.
>

I agree, leaving it like this for now.

>> The physical memory mapping holds most allocations from boot and heap
>> allocators. Knowning the base address and physical memory size, an
>> attacker can deduce the PDE virtual address for the vDSO memory page.
>> This attack was demonstrated at CanSecWest 2016, in the "Getting
>  Physical Extreme Abuse of Intel Based Paged Systems"
>> https://goo.gl/ANpWdV (see second part of the presentation). The
>> exploits used against Linux worked successfuly against 4.6+ but fail
>> with KASLR memory enabled (https://goo.gl/iTtXMJ). Similar research
>> was done at Google leading to this patch proposal. Variants exists to
>> overwrite /proc or /sys objects ACLs leading to elevation of privileges.
>> These variants were testeda against 4.6+.
>
> Typo "tested".
>

Corrected for next iteration.

>>
>> The vmalloc memory section contains the allocation made through the
>> vmalloc api. The allocations are done sequentially to prevent
>> fragmentation and each allocation address can easily be deduced
>> especially from boot.
>>
>> The vmemmap section holds a representation of the physical
>> memory (through a struct page array). An attacker could use this section
>> to disclose the kernel memory layout (walking the page linked list).
>>
>> The order of each memory section is not changed. The feature looks at
>> the available space for the sections based on different configuration
>> options and randomizes the base and space between each. The size of the
>> physical memory mapping is the available physical memory. No performance
>> impact was detected while testing the feature.
>>
>> Entropy is generated using the KASLR early boot functions now shared in
>> the lib directory (originally written by Kees Cook). Randomization is
>> done on PGD & PUD page table levels to increase possible addresses. The
>> physical memory mapping code was adapted to support PUD level virtual
>> addresses. An additional low memory page is used to ensure each CPU can
>> start with a PGD aligned virtual address (for realmode).
>>
>> x86/dump_pagetable was updated to correctly display each section.
>>
>> Updated documentation on x86_64 memory layout accordingly.
>>
>> Performance data:
>>
>> Kernbench shows almost no difference (-+ less than 1%):
>>
>> Before:
>>
>> Average Optimal load -j 12 Run (std deviation):
>> Elapsed Time 102.63 (1.2695)
>> User Time 1034.89 (1.18115)
>> System Time 87.056 (0.456416)
>> Percent CPU 1092.9 (13.892)
>> Context Switches 199805 (3455.33)
>> Sleeps 97907.8 (900.636)
>>
>> After:
>>
>> Average Optimal load -j 12 Run (std deviation):
>> Elapsed Time 102.489 (1.10636)
>> User Time 1034.86 (1.36053)
>> System Time 87.764 (0.49345)
>> Percent CPU 1095 (12.7715)
>> Context Switches 199036 (4298.1)
>> Sleeps 97681.6 (1031.11)
>>
>> Hackbench shows 0% difference on average (hackbench 90
>> repeated 10 times):
>>
>> attemp,before,after
>> 1,0.076,0.069
>> 2,0.072,0.069
>> 3,0.066,0.066
>> 4,0.066,0.068
>> 5,0.066,0.067
>> 6,0.066,0.069
>> 7,0.067,0.066
>> 8,0.063,0.067
>> 9,0.067,0.065
>> 10,0.068,0.071
>> average,0.0677,0.0677
>>
>> Signed-off-by: Thomas Garnier <thgarnie@google.com>
>> ---
>> Based on next-20160502
>> ---
>>  Documentation/x86/x86_64/mm.txt         |   4 +
>>  arch/x86/Kconfig                        |  15 ++++
>>  arch/x86/include/asm/kaslr.h            |  12 +++
>>  arch/x86/include/asm/page_64_types.h    |  11 ++-
>>  arch/x86/include/asm/pgtable_64.h       |   1 +
>>  arch/x86/include/asm/pgtable_64_types.h |  15 +++-
>>  arch/x86/kernel/head_64.S               |   2 +-
>>  arch/x86/kernel/setup.c                 |   3 +
>>  arch/x86/mm/Makefile                    |   1 +
>>  arch/x86/mm/dump_pagetables.c           |  11 ++-
>>  arch/x86/mm/init.c                      |   4 +
>>  arch/x86/mm/kaslr.c                     | 136 ++++++++++++++++++++++++++++++++
>>  arch/x86/realmode/init.c                |   4 +
>>  13 files changed, 211 insertions(+), 8 deletions(-)
>>  create mode 100644 arch/x86/mm/kaslr.c
>>
>> diff --git a/Documentation/x86/x86_64/mm.txt b/Documentation/x86/x86_64/mm.txt
>> index 5aa7383..602a52d 100644
>> --- a/Documentation/x86/x86_64/mm.txt
>> +++ b/Documentation/x86/x86_64/mm.txt
>> @@ -39,4 +39,8 @@ memory window (this size is arbitrary, it can be raised later if needed).
>>  The mappings are not part of any other kernel PGD and are only available
>>  during EFI runtime calls.
>>
>> +Note that if CONFIG_RANDOMIZE_MEMORY is enabled, the direct mapping of all
>> +physical memory, vmalloc/ioremap space and virtual memory map are randomized.
>> +Their order is preserved but their base will be changed early at boot time.
>
> Maybe instead of "changed", say "offset"?
>

Corrected for next iteration.

>> +
>>  -Andi Kleen, Jul 2004
>> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
>> index 0b128b4..60f33c7 100644
>> --- a/arch/x86/Kconfig
>> +++ b/arch/x86/Kconfig
>> @@ -1988,6 +1988,21 @@ config PHYSICAL_ALIGN
>>
>>           Don't change this unless you know what you are doing.
>>
>> +config RANDOMIZE_MEMORY
>> +       bool "Randomize the kernel memory sections"
>> +       depends on X86_64
>> +       depends on RANDOMIZE_BASE
>
> Does this actually _depend_ on RANDOMIZE_BASE? It needs the
> lib/kaslr.c code, but this could operate without the kernel base
> address having been randomized, correct?
>

It could but I would rather group them. There is no value having one
without the other.

>> +       default n
>
> As such, maybe the default should be:
>
>     default RANDOMIZE_BASE
>

Changed for next iteration.

>> +       ---help---
>> +          Randomizes the virtual address of memory sections (physical memory
>
> How about: Randomizes the base virtual address of kernel memory sections ...
>

Changed for next iteration.

>> +          mapping, vmalloc & vmemmap). This security feature mitigates exploits
>> +          relying on predictable memory locations.
>
> And "This security feature makes exploits relying on predictable
> memory locations less reliable." ?
>

Changed for next iteration.

>> +
>> +          Base and padding between memory section is randomized. Their order is
>> +          not. Entropy is generated in the same way as RANDOMIZE_BASE.
>
> Since base would be mentioned above and padding is separate, I'd change this to:
>
> The order of allocations remains unchanged. Entropy is generated ...
>

Changed for next iteration.

>> +
>> +          If unsure, say N.
>> +
>>  config HOTPLUG_CPU
>>         bool "Support for hot-pluggable CPUs"
>>         depends on SMP
>> diff --git a/arch/x86/include/asm/kaslr.h b/arch/x86/include/asm/kaslr.h
>> index 2ae1429..12c7742 100644
>> --- a/arch/x86/include/asm/kaslr.h
>> +++ b/arch/x86/include/asm/kaslr.h
>> @@ -3,4 +3,16 @@
>>
>>  unsigned long kaslr_get_random_boot_long(void);
>>
>> +#ifdef CONFIG_RANDOMIZE_MEMORY
>> +extern unsigned long page_offset_base;
>> +extern unsigned long vmalloc_base;
>> +extern unsigned long vmemmap_base;
>> +
>> +void kernel_randomize_memory(void);
>> +void kaslr_trampoline_init(void);
>> +#else
>> +static inline void kernel_randomize_memory(void) { }
>> +static inline void kaslr_trampoline_init(void) { }
>> +#endif /* CONFIG_RANDOMIZE_MEMORY */
>> +
>>  #endif
>> diff --git a/arch/x86/include/asm/page_64_types.h b/arch/x86/include/asm/page_64_types.h
>> index d5c2f8b..9215e05 100644
>> --- a/arch/x86/include/asm/page_64_types.h
>> +++ b/arch/x86/include/asm/page_64_types.h
>> @@ -1,6 +1,10 @@
>>  #ifndef _ASM_X86_PAGE_64_DEFS_H
>>  #define _ASM_X86_PAGE_64_DEFS_H
>>
>> +#ifndef __ASSEMBLY__
>> +#include <asm/kaslr.h>
>> +#endif
>> +
>>  #ifdef CONFIG_KASAN
>>  #define KASAN_STACK_ORDER 1
>>  #else
>> @@ -32,7 +36,12 @@
>>   * hypervisor to fit.  Choosing 16 slots here is arbitrary, but it's
>>   * what Xen requires.
>>   */
>> -#define __PAGE_OFFSET           _AC(0xffff880000000000, UL)
>> +#define __PAGE_OFFSET_BASE      _AC(0xffff880000000000, UL)
>> +#ifdef CONFIG_RANDOMIZE_MEMORY
>> +#define __PAGE_OFFSET           page_offset_base
>> +#else
>> +#define __PAGE_OFFSET           __PAGE_OFFSET_BASE
>> +#endif /* CONFIG_RANDOMIZE_MEMORY */
>>
>>  #define __START_KERNEL_map     _AC(0xffffffff80000000, UL)
>>
>> diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h
>> index 2ee7811..0dfec89 100644
>> --- a/arch/x86/include/asm/pgtable_64.h
>> +++ b/arch/x86/include/asm/pgtable_64.h
>> @@ -21,6 +21,7 @@ extern pmd_t level2_fixmap_pgt[512];
>>  extern pmd_t level2_ident_pgt[512];
>>  extern pte_t level1_fixmap_pgt[512];
>>  extern pgd_t init_level4_pgt[];
>> +extern pgd_t trampoline_pgd_entry;
>>
>>  #define swapper_pg_dir init_level4_pgt
>>
>> diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h
>> index e6844df..d388739 100644
>> --- a/arch/x86/include/asm/pgtable_64_types.h
>> +++ b/arch/x86/include/asm/pgtable_64_types.h
>> @@ -5,6 +5,7 @@
>>
>>  #ifndef __ASSEMBLY__
>>  #include <linux/types.h>
>> +#include <asm/kaslr.h>
>>
>>  /*
>>   * These are used to make use of C type-checking..
>> @@ -54,9 +55,17 @@ typedef struct { pteval_t pte; } pte_t;
>>
>>  /* See Documentation/x86/x86_64/mm.txt for a description of the memory map. */
>>  #define MAXMEM          _AC(__AC(1, UL) << MAX_PHYSMEM_BITS, UL)
>> -#define VMALLOC_START    _AC(0xffffc90000000000, UL)
>> -#define VMALLOC_END      _AC(0xffffe8ffffffffff, UL)
>> -#define VMEMMAP_START   _AC(0xffffea0000000000, UL)
>> +#define VMALLOC_SIZE_TB         _AC(32, UL)
>> +#define __VMALLOC_BASE  _AC(0xffffc90000000000, UL)
>> +#define __VMEMMAP_BASE  _AC(0xffffea0000000000, UL)
>> +#ifdef CONFIG_RANDOMIZE_MEMORY
>> +#define VMALLOC_START   vmalloc_base
>> +#define VMEMMAP_START   vmemmap_base
>> +#else
>> +#define VMALLOC_START   __VMALLOC_BASE
>> +#define VMEMMAP_START   __VMEMMAP_BASE
>> +#endif /* CONFIG_RANDOMIZE_MEMORY */
>> +#define VMALLOC_END      (VMALLOC_START + _AC((VMALLOC_SIZE_TB << 40) - 1, UL))
>>  #define MODULES_VADDR    (__START_KERNEL_map + KERNEL_IMAGE_SIZE)
>>  #define MODULES_END      _AC(0xffffffffff000000, UL)
>>  #define MODULES_LEN   (MODULES_END - MODULES_VADDR)
>> diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
>> index 5df831e..03a2aa0 100644
>> --- a/arch/x86/kernel/head_64.S
>> +++ b/arch/x86/kernel/head_64.S
>> @@ -38,7 +38,7 @@
>>
>>  #define pud_index(x)   (((x) >> PUD_SHIFT) & (PTRS_PER_PUD-1))
>>
>> -L4_PAGE_OFFSET = pgd_index(__PAGE_OFFSET)
>> +L4_PAGE_OFFSET = pgd_index(__PAGE_OFFSET_BASE)
>>  L4_START_KERNEL = pgd_index(__START_KERNEL_map)
>>  L3_START_KERNEL = pud_index(__START_KERNEL_map)
>>
>> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
>> index c4e7b39..a261658 100644
>> --- a/arch/x86/kernel/setup.c
>> +++ b/arch/x86/kernel/setup.c
>> @@ -113,6 +113,7 @@
>>  #include <asm/prom.h>
>>  #include <asm/microcode.h>
>>  #include <asm/mmu_context.h>
>> +#include <asm/kaslr.h>
>>
>>  /*
>>   * max_low_pfn_mapped: highest direct mapped pfn under 4GB
>> @@ -942,6 +943,8 @@ void __init setup_arch(char **cmdline_p)
>>
>>         x86_init.oem.arch_setup();
>>
>> +       kernel_randomize_memory();
>> +
>>         iomem_resource.end = (1ULL << boot_cpu_data.x86_phys_bits) - 1;
>>         setup_memory_map();
>>         parse_setup_data();
>> diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
>> index 62c0043..96d2b84 100644
>> --- a/arch/x86/mm/Makefile
>> +++ b/arch/x86/mm/Makefile
>> @@ -37,4 +37,5 @@ obj-$(CONFIG_NUMA_EMU)                += numa_emulation.o
>>
>>  obj-$(CONFIG_X86_INTEL_MPX)    += mpx.o
>>  obj-$(CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) += pkeys.o
>> +obj-$(CONFIG_RANDOMIZE_MEMORY) += kaslr.o
>>
>> diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c
>> index 99bfb19..4a03f60 100644
>> --- a/arch/x86/mm/dump_pagetables.c
>> +++ b/arch/x86/mm/dump_pagetables.c
>> @@ -72,9 +72,9 @@ static struct addr_marker address_markers[] = {
>>         { 0, "User Space" },
>>  #ifdef CONFIG_X86_64
>>         { 0x8000000000000000UL, "Kernel Space" },
>> -       { PAGE_OFFSET,          "Low Kernel Mapping" },
>> -       { VMALLOC_START,        "vmalloc() Area" },
>> -       { VMEMMAP_START,        "Vmemmap" },
>> +       { 0/* PAGE_OFFSET */,   "Low Kernel Mapping" },
>> +       { 0/* VMALLOC_START */, "vmalloc() Area" },
>> +       { 0/* VMEMMAP_START */, "Vmemmap" },
>>  # ifdef CONFIG_X86_ESPFIX64
>>         { ESPFIX_BASE_ADDR,     "ESPfix Area", 16 },
>>  # endif
>> @@ -434,6 +434,11 @@ void ptdump_walk_pgd_level_checkwx(void)
>>
>>  static int __init pt_dump_init(void)
>>  {
>> +#ifdef CONFIG_X86_64
>> +       address_markers[LOW_KERNEL_NR].start_address = PAGE_OFFSET;
>> +       address_markers[VMALLOC_START_NR].start_address = VMALLOC_START;
>> +       address_markers[VMEMMAP_START_NR].start_address = VMEMMAP_START;
>> +#endif
>>  #ifdef CONFIG_X86_32
>>         /* Not a compile-time constant on x86-32 */
>
> I'd move this comment above you new ifdef and generalize it to something like:
>
>     /* Various markers are not compile-time constants, so assign them here. */
>

Done for next iteration.

>>         address_markers[VMALLOC_START_NR].start_address = VMALLOC_START;
>> diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
>> index 372aad2..e490624 100644
>> --- a/arch/x86/mm/init.c
>> +++ b/arch/x86/mm/init.c
>> @@ -17,6 +17,7 @@
>>  #include <asm/proto.h>
>>  #include <asm/dma.h>           /* for MAX_DMA_PFN */
>>  #include <asm/microcode.h>
>> +#include <asm/kaslr.h>
>>
>>  /*
>>   * We need to define the tracepoints somewhere, and tlb.c
>> @@ -590,6 +591,9 @@ void __init init_mem_mapping(void)
>>         /* the ISA range is always mapped regardless of memory holes */
>>         init_memory_mapping(0, ISA_END_ADDRESS);
>>
>> +       /* Init the trampoline page table if needed for KASLR memory */
>> +       kaslr_trampoline_init();
>> +
>>         /*
>>          * If the allocation is in bottom-up direction, we setup direct mapping
>>          * in bottom-up, otherwise we setup direct mapping in top-down.
>> diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
>> new file mode 100644
>> index 0000000..3b330a9
>> --- /dev/null
>> +++ b/arch/x86/mm/kaslr.c
>> @@ -0,0 +1,136 @@
>> +#include <linux/kernel.h>
>> +#include <linux/errno.h>
>> +#include <linux/types.h>
>> +#include <linux/mm.h>
>> +#include <linux/smp.h>
>> +#include <linux/init.h>
>> +#include <linux/memory.h>
>> +#include <linux/random.h>
>> +
>> +#include <asm/processor.h>
>> +#include <asm/pgtable.h>
>> +#include <asm/pgalloc.h>
>> +#include <asm/e820.h>
>> +#include <asm/init.h>
>> +#include <asm/setup.h>
>> +#include <asm/kaslr.h>
>> +#include <asm/kasan.h>
>> +
>> +#include "mm_internal.h"
>> +
>> +/* Hold the pgd entry used on booting additional CPUs */
>> +pgd_t trampoline_pgd_entry;
>> +
>> +#define TB_SHIFT 40
>> +
>> +/*
>> + * Memory base and end randomization is based on different configurations.
>> + * We want as much space as possible to increase entropy available.
>> + */
>> +static const unsigned long memory_rand_start = __PAGE_OFFSET_BASE;
>> +
>> +#if defined(CONFIG_KASAN)
>> +static const unsigned long memory_rand_end = KASAN_SHADOW_START;
>> +#elif defined(CONFIG_X86_ESPFIX64)
>> +static const unsigned long memory_rand_end = ESPFIX_BASE_ADDR;
>> +#elif defined(CONFIG_EFI)
>> +static const unsigned long memory_rand_end = EFI_VA_START;
>> +#else
>> +static const unsigned long memory_rand_end = __START_KERNEL_map;
>> +#endif
>
> Is it worth adding BUILD_BUG_ON()s to verify these values stay in
> decreasing size?
>

Will add that for the next iteration.

>> +
>> +/* Default values */
>> +unsigned long page_offset_base = __PAGE_OFFSET_BASE;
>> +EXPORT_SYMBOL(page_offset_base);
>> +unsigned long vmalloc_base = __VMALLOC_BASE;
>> +EXPORT_SYMBOL(vmalloc_base);
>> +unsigned long vmemmap_base = __VMEMMAP_BASE;
>> +EXPORT_SYMBOL(vmemmap_base);
>> +
>> +/* Describe each randomized memory sections in sequential order */
>> +static struct kaslr_memory_region {
>> +       unsigned long *base;
>> +       unsigned short size_tb;
>> +} kaslr_regions[] = {
>> +       { &page_offset_base, 64/* Maximum */ },
>> +       { &vmalloc_base, VMALLOC_SIZE_TB },
>> +       { &vmemmap_base, 1 },
>> +};
>
> This seems to be __init_data, since it's only used in kernel_randomize_memory()?
>

Done for next iteration.

>> +
>> +/* Size in Terabytes + 1 hole */
>> +static inline unsigned long get_padding(struct kaslr_memory_region *region)
>
> I think this can be marked __init also?
>

Done for next iteration.

>> +{
>> +       return ((unsigned long)region->size_tb + 1) << TB_SHIFT;
>> +}
>> +
>> +/* Initialize base and padding for each memory section randomized with KASLR */
>> +void __init kernel_randomize_memory(void)
>> +{
>> +       size_t i;
>> +       unsigned long addr = memory_rand_start;
>> +       unsigned long padding, rand, mem_tb;
>> +       struct rnd_state rnd_st;
>> +       unsigned long remain_padding = memory_rand_end - memory_rand_start;
>> +
>> +       if (!kaslr_enabled())
>> +               return;
>> +
>> +       BUG_ON(kaslr_regions[0].base != &page_offset_base);
>
> This is statically assigned above, is this BUG_ON useful?
>
>> +       mem_tb = ((max_pfn << PAGE_SHIFT) >> TB_SHIFT);
>> +
>> +       if (mem_tb < kaslr_regions[0].size_tb)
>> +               kaslr_regions[0].size_tb = mem_tb;
>
> Can you add a comment for this? IIUC, this is just recalculating the
> max memory size available for padding based on the page shift? Under
> what situations would this be changing?
>

It ensures the maximum memory taken for physical mapping is 64TB. To
be sure it can
work with the other sections.

>> +
>> +       for (i = 0; i < ARRAY_SIZE(kaslr_regions); i++)
>> +               remain_padding -= get_padding(&kaslr_regions[i]);
>> +
>> +       prandom_seed_state(&rnd_st, kaslr_get_random_boot_long());
>> +
>> +       /* Position each section randomly with minimum 1 terabyte between */
>> +       for (i = 0; i < ARRAY_SIZE(kaslr_regions); i++) {
>> +               padding = remain_padding / (ARRAY_SIZE(kaslr_regions) - i);
>> +               prandom_bytes_state(&rnd_st, &rand, sizeof(rand));
>> +               padding = (rand % (padding + 1)) & PUD_MASK;
>> +               addr += padding;
>> +               *kaslr_regions[i].base = addr;
>> +               addr += get_padding(&kaslr_regions[i]);
>> +               remain_padding -= padding;
>> +       }
>
> What happens if we run out of padding here, and doesn't this loop mean
> earlier regions will have, on average, more padding? Should each
> instead randomize within a one-time calculation of remaining_padding /
> ARRAY_SIZE(kaslr_regions) ?
>

Yes, padding is more likely to be bigger on first section. If we make
it standard across
sections it means that guessing the base of one give you all the other
sections. That's
why it is separate.

> Also, to get added to the Kconfig, what is the available entropy here?
> How far can each of the base addresses get offset?
>

Yes, it depends of the configuration of course but each memory section
has about 30,000
memory possible virtual addresses in average (best case scenario). I
will edit the Kconfig.

>> +}
>> +
>> +/*
>> + * Create PGD aligned trampoline table to allow real mode initialization
>> + * of additional CPUs. Consume only 1 additonal low memory page.
>
> Typo "additional".
>

Fixed for next iteration.

>> + */
>> +void __meminit kaslr_trampoline_init(void)
>> +{
>> +       unsigned long addr, next;
>> +       pgd_t *pgd;
>> +       pud_t *pud_page, *tr_pud_page;
>> +       int i;
>> +
>> +       /* If KASLR is disabled, default to the existing page table entry */
>> +       if (!kaslr_enabled()) {
>> +               trampoline_pgd_entry = init_level4_pgt[pgd_index(PAGE_OFFSET)];
>> +               return;
>> +       }
>> +
>> +       tr_pud_page = alloc_low_page();
>> +       set_pgd(&trampoline_pgd_entry, __pgd(_PAGE_TABLE | __pa(tr_pud_page)));
>> +
>> +       addr = 0;
>> +       pgd = pgd_offset_k((unsigned long)__va(addr));
>> +       pud_page = (pud_t *) pgd_page_vaddr(*pgd);
>> +
>> +       for (i = pud_index(addr); i < PTRS_PER_PUD; i++, addr = next) {
>> +               pud_t *pud, *tr_pud;
>> +
>> +               tr_pud = tr_pud_page + pud_index(addr);
>> +               pud = pud_page + pud_index((unsigned long)__va(addr));
>> +               next = (addr & PUD_MASK) + PUD_SIZE;
>> +
>> +               /* Needed to copy pte or pud alike */
>> +               BUILD_BUG_ON(sizeof(pud_t) != sizeof(pte_t));
>> +               *tr_pud = *pud;
>> +       }
>> +}
>> diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
>> index 0b7a63d..6518314 100644
>> --- a/arch/x86/realmode/init.c
>> +++ b/arch/x86/realmode/init.c
>> @@ -84,7 +84,11 @@ void __init setup_real_mode(void)
>>         *trampoline_cr4_features = __read_cr4();
>>
>>         trampoline_pgd = (u64 *) __va(real_mode_header->trampoline_pgd);
>> +#ifdef CONFIG_RANDOMIZE_MEMORY
>> +       trampoline_pgd[0] = trampoline_pgd_entry.pgd;
>> +#else
>>         trampoline_pgd[0] = init_level4_pgt[pgd_index(__PAGE_OFFSET)].pgd;
>> +#endif
>
> To avoid this ifdefs, could trampoline_pgd_entry instead be defined
> outside of mm/kaslr.c and have .pgd assigned as
> init_level4_pgt[pgd_index(__PAGE_OFFSET)].pgd via a static inline of
> kaslr_trampoline_init() instead?
>

Yes, I will change that.

>>         trampoline_pgd[511] = init_level4_pgt[511].pgd;
>>  #endif
>>  }
>> --
>> 2.8.0.rc3.226.g39d4020
>>
>
> -Kees
>
> --
> Kees Cook
> Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [kernel-hardening] Re: [PATCH v3 3/4] x86, boot: Implement ASLR for kernel memory sections (x86_64)
@ 2016-05-10 21:28       ` Thomas Garnier
  0 siblings, 0 replies; 22+ messages in thread
From: Thomas Garnier @ 2016-05-10 21:28 UTC (permalink / raw)
  To: Kees Cook
  Cc: H . Peter Anvin, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Andy Lutomirski, Dmitry Vyukov, Paolo Bonzini, Dan Williams,
	Stephen Smalley, Kefeng Wang, Jonathan Corbet, Matt Fleming,
	Toshi Kani, Alexander Kuleshov, Alexander Popov, Joerg Roedel,
	Dave Young, Baoquan He, Dave Hansen, Mark Salter,
	Boris Ostrovsky, x86, LKML, linux-doc, Greg Thelen,
	kernel-hardening

On Tue, May 10, 2016 at 11:53 AM, Kees Cook <keescook@chromium.org> wrote:
> On Tue, May 3, 2016 at 12:31 PM, Thomas Garnier <thgarnie@google.com> wrote:
>> Randomizes the virtual address space of kernel memory sections (physical
>> memory mapping, vmalloc & vmemmap) for x86_64. This security feature
>> mitigates exploits relying on predictable kernel addresses. These
>> addresses can be used to disclose the kernel modules base addresses or
>> corrupt specific structures to elevate privileges bypassing the current
>> implementation of KASLR. This feature can be enabled with the
>> CONFIG_RANDOMIZE_MEMORY option.
>
> I'm struggling to come up with a more accurate name for this, since
> it's a base randomization of the kernel memory sections. Everything
> else seems needlessly long (CONFIG_RANDOMIZE_BASE_MEMORY). I wonder if
> things should be renamed generally to CONFIG_KASLR_BASE,
> CONFIG_KASLR_MEMORY, etc, but that doesn't need to be part of this
> series. Let's leave this as-is, and just make sure it's clear in the
> Kconfig.
>

I agree, leaving it like this for now.

>> The physical memory mapping holds most allocations from boot and heap
>> allocators. Knowning the base address and physical memory size, an
>> attacker can deduce the PDE virtual address for the vDSO memory page.
>> This attack was demonstrated at CanSecWest 2016, in the "Getting
>  Physical Extreme Abuse of Intel Based Paged Systems"
>> https://goo.gl/ANpWdV (see second part of the presentation). The
>> exploits used against Linux worked successfuly against 4.6+ but fail
>> with KASLR memory enabled (https://goo.gl/iTtXMJ). Similar research
>> was done at Google leading to this patch proposal. Variants exists to
>> overwrite /proc or /sys objects ACLs leading to elevation of privileges.
>> These variants were testeda against 4.6+.
>
> Typo "tested".
>

Corrected for next iteration.

>>
>> The vmalloc memory section contains the allocation made through the
>> vmalloc api. The allocations are done sequentially to prevent
>> fragmentation and each allocation address can easily be deduced
>> especially from boot.
>>
>> The vmemmap section holds a representation of the physical
>> memory (through a struct page array). An attacker could use this section
>> to disclose the kernel memory layout (walking the page linked list).
>>
>> The order of each memory section is not changed. The feature looks at
>> the available space for the sections based on different configuration
>> options and randomizes the base and space between each. The size of the
>> physical memory mapping is the available physical memory. No performance
>> impact was detected while testing the feature.
>>
>> Entropy is generated using the KASLR early boot functions now shared in
>> the lib directory (originally written by Kees Cook). Randomization is
>> done on PGD & PUD page table levels to increase possible addresses. The
>> physical memory mapping code was adapted to support PUD level virtual
>> addresses. An additional low memory page is used to ensure each CPU can
>> start with a PGD aligned virtual address (for realmode).
>>
>> x86/dump_pagetable was updated to correctly display each section.
>>
>> Updated documentation on x86_64 memory layout accordingly.
>>
>> Performance data:
>>
>> Kernbench shows almost no difference (-+ less than 1%):
>>
>> Before:
>>
>> Average Optimal load -j 12 Run (std deviation):
>> Elapsed Time 102.63 (1.2695)
>> User Time 1034.89 (1.18115)
>> System Time 87.056 (0.456416)
>> Percent CPU 1092.9 (13.892)
>> Context Switches 199805 (3455.33)
>> Sleeps 97907.8 (900.636)
>>
>> After:
>>
>> Average Optimal load -j 12 Run (std deviation):
>> Elapsed Time 102.489 (1.10636)
>> User Time 1034.86 (1.36053)
>> System Time 87.764 (0.49345)
>> Percent CPU 1095 (12.7715)
>> Context Switches 199036 (4298.1)
>> Sleeps 97681.6 (1031.11)
>>
>> Hackbench shows 0% difference on average (hackbench 90
>> repeated 10 times):
>>
>> attemp,before,after
>> 1,0.076,0.069
>> 2,0.072,0.069
>> 3,0.066,0.066
>> 4,0.066,0.068
>> 5,0.066,0.067
>> 6,0.066,0.069
>> 7,0.067,0.066
>> 8,0.063,0.067
>> 9,0.067,0.065
>> 10,0.068,0.071
>> average,0.0677,0.0677
>>
>> Signed-off-by: Thomas Garnier <thgarnie@google.com>
>> ---
>> Based on next-20160502
>> ---
>>  Documentation/x86/x86_64/mm.txt         |   4 +
>>  arch/x86/Kconfig                        |  15 ++++
>>  arch/x86/include/asm/kaslr.h            |  12 +++
>>  arch/x86/include/asm/page_64_types.h    |  11 ++-
>>  arch/x86/include/asm/pgtable_64.h       |   1 +
>>  arch/x86/include/asm/pgtable_64_types.h |  15 +++-
>>  arch/x86/kernel/head_64.S               |   2 +-
>>  arch/x86/kernel/setup.c                 |   3 +
>>  arch/x86/mm/Makefile                    |   1 +
>>  arch/x86/mm/dump_pagetables.c           |  11 ++-
>>  arch/x86/mm/init.c                      |   4 +
>>  arch/x86/mm/kaslr.c                     | 136 ++++++++++++++++++++++++++++++++
>>  arch/x86/realmode/init.c                |   4 +
>>  13 files changed, 211 insertions(+), 8 deletions(-)
>>  create mode 100644 arch/x86/mm/kaslr.c
>>
>> diff --git a/Documentation/x86/x86_64/mm.txt b/Documentation/x86/x86_64/mm.txt
>> index 5aa7383..602a52d 100644
>> --- a/Documentation/x86/x86_64/mm.txt
>> +++ b/Documentation/x86/x86_64/mm.txt
>> @@ -39,4 +39,8 @@ memory window (this size is arbitrary, it can be raised later if needed).
>>  The mappings are not part of any other kernel PGD and are only available
>>  during EFI runtime calls.
>>
>> +Note that if CONFIG_RANDOMIZE_MEMORY is enabled, the direct mapping of all
>> +physical memory, vmalloc/ioremap space and virtual memory map are randomized.
>> +Their order is preserved but their base will be changed early at boot time.
>
> Maybe instead of "changed", say "offset"?
>

Corrected for next iteration.

>> +
>>  -Andi Kleen, Jul 2004
>> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
>> index 0b128b4..60f33c7 100644
>> --- a/arch/x86/Kconfig
>> +++ b/arch/x86/Kconfig
>> @@ -1988,6 +1988,21 @@ config PHYSICAL_ALIGN
>>
>>           Don't change this unless you know what you are doing.
>>
>> +config RANDOMIZE_MEMORY
>> +       bool "Randomize the kernel memory sections"
>> +       depends on X86_64
>> +       depends on RANDOMIZE_BASE
>
> Does this actually _depend_ on RANDOMIZE_BASE? It needs the
> lib/kaslr.c code, but this could operate without the kernel base
> address having been randomized, correct?
>

It could but I would rather group them. There is no value having one
without the other.

>> +       default n
>
> As such, maybe the default should be:
>
>     default RANDOMIZE_BASE
>

Changed for next iteration.

>> +       ---help---
>> +          Randomizes the virtual address of memory sections (physical memory
>
> How about: Randomizes the base virtual address of kernel memory sections ...
>

Changed for next iteration.

>> +          mapping, vmalloc & vmemmap). This security feature mitigates exploits
>> +          relying on predictable memory locations.
>
> And "This security feature makes exploits relying on predictable
> memory locations less reliable." ?
>

Changed for next iteration.

>> +
>> +          Base and padding between memory section is randomized. Their order is
>> +          not. Entropy is generated in the same way as RANDOMIZE_BASE.
>
> Since base would be mentioned above and padding is separate, I'd change this to:
>
> The order of allocations remains unchanged. Entropy is generated ...
>

Changed for next iteration.

>> +
>> +          If unsure, say N.
>> +
>>  config HOTPLUG_CPU
>>         bool "Support for hot-pluggable CPUs"
>>         depends on SMP
>> diff --git a/arch/x86/include/asm/kaslr.h b/arch/x86/include/asm/kaslr.h
>> index 2ae1429..12c7742 100644
>> --- a/arch/x86/include/asm/kaslr.h
>> +++ b/arch/x86/include/asm/kaslr.h
>> @@ -3,4 +3,16 @@
>>
>>  unsigned long kaslr_get_random_boot_long(void);
>>
>> +#ifdef CONFIG_RANDOMIZE_MEMORY
>> +extern unsigned long page_offset_base;
>> +extern unsigned long vmalloc_base;
>> +extern unsigned long vmemmap_base;
>> +
>> +void kernel_randomize_memory(void);
>> +void kaslr_trampoline_init(void);
>> +#else
>> +static inline void kernel_randomize_memory(void) { }
>> +static inline void kaslr_trampoline_init(void) { }
>> +#endif /* CONFIG_RANDOMIZE_MEMORY */
>> +
>>  #endif
>> diff --git a/arch/x86/include/asm/page_64_types.h b/arch/x86/include/asm/page_64_types.h
>> index d5c2f8b..9215e05 100644
>> --- a/arch/x86/include/asm/page_64_types.h
>> +++ b/arch/x86/include/asm/page_64_types.h
>> @@ -1,6 +1,10 @@
>>  #ifndef _ASM_X86_PAGE_64_DEFS_H
>>  #define _ASM_X86_PAGE_64_DEFS_H
>>
>> +#ifndef __ASSEMBLY__
>> +#include <asm/kaslr.h>
>> +#endif
>> +
>>  #ifdef CONFIG_KASAN
>>  #define KASAN_STACK_ORDER 1
>>  #else
>> @@ -32,7 +36,12 @@
>>   * hypervisor to fit.  Choosing 16 slots here is arbitrary, but it's
>>   * what Xen requires.
>>   */
>> -#define __PAGE_OFFSET           _AC(0xffff880000000000, UL)
>> +#define __PAGE_OFFSET_BASE      _AC(0xffff880000000000, UL)
>> +#ifdef CONFIG_RANDOMIZE_MEMORY
>> +#define __PAGE_OFFSET           page_offset_base
>> +#else
>> +#define __PAGE_OFFSET           __PAGE_OFFSET_BASE
>> +#endif /* CONFIG_RANDOMIZE_MEMORY */
>>
>>  #define __START_KERNEL_map     _AC(0xffffffff80000000, UL)
>>
>> diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h
>> index 2ee7811..0dfec89 100644
>> --- a/arch/x86/include/asm/pgtable_64.h
>> +++ b/arch/x86/include/asm/pgtable_64.h
>> @@ -21,6 +21,7 @@ extern pmd_t level2_fixmap_pgt[512];
>>  extern pmd_t level2_ident_pgt[512];
>>  extern pte_t level1_fixmap_pgt[512];
>>  extern pgd_t init_level4_pgt[];
>> +extern pgd_t trampoline_pgd_entry;
>>
>>  #define swapper_pg_dir init_level4_pgt
>>
>> diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h
>> index e6844df..d388739 100644
>> --- a/arch/x86/include/asm/pgtable_64_types.h
>> +++ b/arch/x86/include/asm/pgtable_64_types.h
>> @@ -5,6 +5,7 @@
>>
>>  #ifndef __ASSEMBLY__
>>  #include <linux/types.h>
>> +#include <asm/kaslr.h>
>>
>>  /*
>>   * These are used to make use of C type-checking..
>> @@ -54,9 +55,17 @@ typedef struct { pteval_t pte; } pte_t;
>>
>>  /* See Documentation/x86/x86_64/mm.txt for a description of the memory map. */
>>  #define MAXMEM          _AC(__AC(1, UL) << MAX_PHYSMEM_BITS, UL)
>> -#define VMALLOC_START    _AC(0xffffc90000000000, UL)
>> -#define VMALLOC_END      _AC(0xffffe8ffffffffff, UL)
>> -#define VMEMMAP_START   _AC(0xffffea0000000000, UL)
>> +#define VMALLOC_SIZE_TB         _AC(32, UL)
>> +#define __VMALLOC_BASE  _AC(0xffffc90000000000, UL)
>> +#define __VMEMMAP_BASE  _AC(0xffffea0000000000, UL)
>> +#ifdef CONFIG_RANDOMIZE_MEMORY
>> +#define VMALLOC_START   vmalloc_base
>> +#define VMEMMAP_START   vmemmap_base
>> +#else
>> +#define VMALLOC_START   __VMALLOC_BASE
>> +#define VMEMMAP_START   __VMEMMAP_BASE
>> +#endif /* CONFIG_RANDOMIZE_MEMORY */
>> +#define VMALLOC_END      (VMALLOC_START + _AC((VMALLOC_SIZE_TB << 40) - 1, UL))
>>  #define MODULES_VADDR    (__START_KERNEL_map + KERNEL_IMAGE_SIZE)
>>  #define MODULES_END      _AC(0xffffffffff000000, UL)
>>  #define MODULES_LEN   (MODULES_END - MODULES_VADDR)
>> diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
>> index 5df831e..03a2aa0 100644
>> --- a/arch/x86/kernel/head_64.S
>> +++ b/arch/x86/kernel/head_64.S
>> @@ -38,7 +38,7 @@
>>
>>  #define pud_index(x)   (((x) >> PUD_SHIFT) & (PTRS_PER_PUD-1))
>>
>> -L4_PAGE_OFFSET = pgd_index(__PAGE_OFFSET)
>> +L4_PAGE_OFFSET = pgd_index(__PAGE_OFFSET_BASE)
>>  L4_START_KERNEL = pgd_index(__START_KERNEL_map)
>>  L3_START_KERNEL = pud_index(__START_KERNEL_map)
>>
>> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
>> index c4e7b39..a261658 100644
>> --- a/arch/x86/kernel/setup.c
>> +++ b/arch/x86/kernel/setup.c
>> @@ -113,6 +113,7 @@
>>  #include <asm/prom.h>
>>  #include <asm/microcode.h>
>>  #include <asm/mmu_context.h>
>> +#include <asm/kaslr.h>
>>
>>  /*
>>   * max_low_pfn_mapped: highest direct mapped pfn under 4GB
>> @@ -942,6 +943,8 @@ void __init setup_arch(char **cmdline_p)
>>
>>         x86_init.oem.arch_setup();
>>
>> +       kernel_randomize_memory();
>> +
>>         iomem_resource.end = (1ULL << boot_cpu_data.x86_phys_bits) - 1;
>>         setup_memory_map();
>>         parse_setup_data();
>> diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
>> index 62c0043..96d2b84 100644
>> --- a/arch/x86/mm/Makefile
>> +++ b/arch/x86/mm/Makefile
>> @@ -37,4 +37,5 @@ obj-$(CONFIG_NUMA_EMU)                += numa_emulation.o
>>
>>  obj-$(CONFIG_X86_INTEL_MPX)    += mpx.o
>>  obj-$(CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) += pkeys.o
>> +obj-$(CONFIG_RANDOMIZE_MEMORY) += kaslr.o
>>
>> diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c
>> index 99bfb19..4a03f60 100644
>> --- a/arch/x86/mm/dump_pagetables.c
>> +++ b/arch/x86/mm/dump_pagetables.c
>> @@ -72,9 +72,9 @@ static struct addr_marker address_markers[] = {
>>         { 0, "User Space" },
>>  #ifdef CONFIG_X86_64
>>         { 0x8000000000000000UL, "Kernel Space" },
>> -       { PAGE_OFFSET,          "Low Kernel Mapping" },
>> -       { VMALLOC_START,        "vmalloc() Area" },
>> -       { VMEMMAP_START,        "Vmemmap" },
>> +       { 0/* PAGE_OFFSET */,   "Low Kernel Mapping" },
>> +       { 0/* VMALLOC_START */, "vmalloc() Area" },
>> +       { 0/* VMEMMAP_START */, "Vmemmap" },
>>  # ifdef CONFIG_X86_ESPFIX64
>>         { ESPFIX_BASE_ADDR,     "ESPfix Area", 16 },
>>  # endif
>> @@ -434,6 +434,11 @@ void ptdump_walk_pgd_level_checkwx(void)
>>
>>  static int __init pt_dump_init(void)
>>  {
>> +#ifdef CONFIG_X86_64
>> +       address_markers[LOW_KERNEL_NR].start_address = PAGE_OFFSET;
>> +       address_markers[VMALLOC_START_NR].start_address = VMALLOC_START;
>> +       address_markers[VMEMMAP_START_NR].start_address = VMEMMAP_START;
>> +#endif
>>  #ifdef CONFIG_X86_32
>>         /* Not a compile-time constant on x86-32 */
>
> I'd move this comment above you new ifdef and generalize it to something like:
>
>     /* Various markers are not compile-time constants, so assign them here. */
>

Done for next iteration.

>>         address_markers[VMALLOC_START_NR].start_address = VMALLOC_START;
>> diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
>> index 372aad2..e490624 100644
>> --- a/arch/x86/mm/init.c
>> +++ b/arch/x86/mm/init.c
>> @@ -17,6 +17,7 @@
>>  #include <asm/proto.h>
>>  #include <asm/dma.h>           /* for MAX_DMA_PFN */
>>  #include <asm/microcode.h>
>> +#include <asm/kaslr.h>
>>
>>  /*
>>   * We need to define the tracepoints somewhere, and tlb.c
>> @@ -590,6 +591,9 @@ void __init init_mem_mapping(void)
>>         /* the ISA range is always mapped regardless of memory holes */
>>         init_memory_mapping(0, ISA_END_ADDRESS);
>>
>> +       /* Init the trampoline page table if needed for KASLR memory */
>> +       kaslr_trampoline_init();
>> +
>>         /*
>>          * If the allocation is in bottom-up direction, we setup direct mapping
>>          * in bottom-up, otherwise we setup direct mapping in top-down.
>> diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
>> new file mode 100644
>> index 0000000..3b330a9
>> --- /dev/null
>> +++ b/arch/x86/mm/kaslr.c
>> @@ -0,0 +1,136 @@
>> +#include <linux/kernel.h>
>> +#include <linux/errno.h>
>> +#include <linux/types.h>
>> +#include <linux/mm.h>
>> +#include <linux/smp.h>
>> +#include <linux/init.h>
>> +#include <linux/memory.h>
>> +#include <linux/random.h>
>> +
>> +#include <asm/processor.h>
>> +#include <asm/pgtable.h>
>> +#include <asm/pgalloc.h>
>> +#include <asm/e820.h>
>> +#include <asm/init.h>
>> +#include <asm/setup.h>
>> +#include <asm/kaslr.h>
>> +#include <asm/kasan.h>
>> +
>> +#include "mm_internal.h"
>> +
>> +/* Hold the pgd entry used on booting additional CPUs */
>> +pgd_t trampoline_pgd_entry;
>> +
>> +#define TB_SHIFT 40
>> +
>> +/*
>> + * Memory base and end randomization is based on different configurations.
>> + * We want as much space as possible to increase entropy available.
>> + */
>> +static const unsigned long memory_rand_start = __PAGE_OFFSET_BASE;
>> +
>> +#if defined(CONFIG_KASAN)
>> +static const unsigned long memory_rand_end = KASAN_SHADOW_START;
>> +#elif defined(CONFIG_X86_ESPFIX64)
>> +static const unsigned long memory_rand_end = ESPFIX_BASE_ADDR;
>> +#elif defined(CONFIG_EFI)
>> +static const unsigned long memory_rand_end = EFI_VA_START;
>> +#else
>> +static const unsigned long memory_rand_end = __START_KERNEL_map;
>> +#endif
>
> Is it worth adding BUILD_BUG_ON()s to verify these values stay in
> decreasing size?
>

Will add that for the next iteration.

>> +
>> +/* Default values */
>> +unsigned long page_offset_base = __PAGE_OFFSET_BASE;
>> +EXPORT_SYMBOL(page_offset_base);
>> +unsigned long vmalloc_base = __VMALLOC_BASE;
>> +EXPORT_SYMBOL(vmalloc_base);
>> +unsigned long vmemmap_base = __VMEMMAP_BASE;
>> +EXPORT_SYMBOL(vmemmap_base);
>> +
>> +/* Describe each randomized memory sections in sequential order */
>> +static struct kaslr_memory_region {
>> +       unsigned long *base;
>> +       unsigned short size_tb;
>> +} kaslr_regions[] = {
>> +       { &page_offset_base, 64/* Maximum */ },
>> +       { &vmalloc_base, VMALLOC_SIZE_TB },
>> +       { &vmemmap_base, 1 },
>> +};
>
> This seems to be __init_data, since it's only used in kernel_randomize_memory()?
>

Done for next iteration.

>> +
>> +/* Size in Terabytes + 1 hole */
>> +static inline unsigned long get_padding(struct kaslr_memory_region *region)
>
> I think this can be marked __init also?
>

Done for next iteration.

>> +{
>> +       return ((unsigned long)region->size_tb + 1) << TB_SHIFT;
>> +}
>> +
>> +/* Initialize base and padding for each memory section randomized with KASLR */
>> +void __init kernel_randomize_memory(void)
>> +{
>> +       size_t i;
>> +       unsigned long addr = memory_rand_start;
>> +       unsigned long padding, rand, mem_tb;
>> +       struct rnd_state rnd_st;
>> +       unsigned long remain_padding = memory_rand_end - memory_rand_start;
>> +
>> +       if (!kaslr_enabled())
>> +               return;
>> +
>> +       BUG_ON(kaslr_regions[0].base != &page_offset_base);
>
> This is statically assigned above, is this BUG_ON useful?
>
>> +       mem_tb = ((max_pfn << PAGE_SHIFT) >> TB_SHIFT);
>> +
>> +       if (mem_tb < kaslr_regions[0].size_tb)
>> +               kaslr_regions[0].size_tb = mem_tb;
>
> Can you add a comment for this? IIUC, this is just recalculating the
> max memory size available for padding based on the page shift? Under
> what situations would this be changing?
>

It ensures the maximum memory taken for physical mapping is 64TB. To
be sure it can
work with the other sections.

>> +
>> +       for (i = 0; i < ARRAY_SIZE(kaslr_regions); i++)
>> +               remain_padding -= get_padding(&kaslr_regions[i]);
>> +
>> +       prandom_seed_state(&rnd_st, kaslr_get_random_boot_long());
>> +
>> +       /* Position each section randomly with minimum 1 terabyte between */
>> +       for (i = 0; i < ARRAY_SIZE(kaslr_regions); i++) {
>> +               padding = remain_padding / (ARRAY_SIZE(kaslr_regions) - i);
>> +               prandom_bytes_state(&rnd_st, &rand, sizeof(rand));
>> +               padding = (rand % (padding + 1)) & PUD_MASK;
>> +               addr += padding;
>> +               *kaslr_regions[i].base = addr;
>> +               addr += get_padding(&kaslr_regions[i]);
>> +               remain_padding -= padding;
>> +       }
>
> What happens if we run out of padding here, and doesn't this loop mean
> earlier regions will have, on average, more padding? Should each
> instead randomize within a one-time calculation of remaining_padding /
> ARRAY_SIZE(kaslr_regions) ?
>

Yes, padding is more likely to be bigger on first section. If we make
it standard across
sections it means that guessing the base of one give you all the other
sections. That's
why it is separate.

> Also, to get added to the Kconfig, what is the available entropy here?
> How far can each of the base addresses get offset?
>

Yes, it depends of the configuration of course but each memory section
has about 30,000
memory possible virtual addresses in average (best case scenario). I
will edit the Kconfig.

>> +}
>> +
>> +/*
>> + * Create PGD aligned trampoline table to allow real mode initialization
>> + * of additional CPUs. Consume only 1 additonal low memory page.
>
> Typo "additional".
>

Fixed for next iteration.

>> + */
>> +void __meminit kaslr_trampoline_init(void)
>> +{
>> +       unsigned long addr, next;
>> +       pgd_t *pgd;
>> +       pud_t *pud_page, *tr_pud_page;
>> +       int i;
>> +
>> +       /* If KASLR is disabled, default to the existing page table entry */
>> +       if (!kaslr_enabled()) {
>> +               trampoline_pgd_entry = init_level4_pgt[pgd_index(PAGE_OFFSET)];
>> +               return;
>> +       }
>> +
>> +       tr_pud_page = alloc_low_page();
>> +       set_pgd(&trampoline_pgd_entry, __pgd(_PAGE_TABLE | __pa(tr_pud_page)));
>> +
>> +       addr = 0;
>> +       pgd = pgd_offset_k((unsigned long)__va(addr));
>> +       pud_page = (pud_t *) pgd_page_vaddr(*pgd);
>> +
>> +       for (i = pud_index(addr); i < PTRS_PER_PUD; i++, addr = next) {
>> +               pud_t *pud, *tr_pud;
>> +
>> +               tr_pud = tr_pud_page + pud_index(addr);
>> +               pud = pud_page + pud_index((unsigned long)__va(addr));
>> +               next = (addr & PUD_MASK) + PUD_SIZE;
>> +
>> +               /* Needed to copy pte or pud alike */
>> +               BUILD_BUG_ON(sizeof(pud_t) != sizeof(pte_t));
>> +               *tr_pud = *pud;
>> +       }
>> +}
>> diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
>> index 0b7a63d..6518314 100644
>> --- a/arch/x86/realmode/init.c
>> +++ b/arch/x86/realmode/init.c
>> @@ -84,7 +84,11 @@ void __init setup_real_mode(void)
>>         *trampoline_cr4_features = __read_cr4();
>>
>>         trampoline_pgd = (u64 *) __va(real_mode_header->trampoline_pgd);
>> +#ifdef CONFIG_RANDOMIZE_MEMORY
>> +       trampoline_pgd[0] = trampoline_pgd_entry.pgd;
>> +#else
>>         trampoline_pgd[0] = init_level4_pgt[pgd_index(__PAGE_OFFSET)].pgd;
>> +#endif
>
> To avoid this ifdefs, could trampoline_pgd_entry instead be defined
> outside of mm/kaslr.c and have .pgd assigned as
> init_level4_pgt[pgd_index(__PAGE_OFFSET)].pgd via a static inline of
> kaslr_trampoline_init() instead?
>

Yes, I will change that.

>>         trampoline_pgd[511] = init_level4_pgt[511].pgd;
>>  #endif
>>  }
>> --
>> 2.8.0.rc3.226.g39d4020
>>
>
> -Kees
>
> --
> Kees Cook
> Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2016-05-10 21:28 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-03 19:31 [PATCH v3 0/4] x86, boot: KASLR memory randomization Thomas Garnier
2016-05-03 19:31 ` [kernel-hardening] " Thomas Garnier
2016-05-03 19:31 ` [PATCH v3 1/4] x86, boot: Refactor KASLR entropy functions Thomas Garnier
2016-05-03 19:31   ` [kernel-hardening] " Thomas Garnier
2016-05-10 19:05   ` Kees Cook
2016-05-10 19:05     ` [kernel-hardening] " Kees Cook
2016-05-10 20:10     ` Thomas Garnier
2016-05-10 20:10       ` [kernel-hardening] " Thomas Garnier
2016-05-03 19:31 ` [PATCH v3 2/4] x86, boot: PUD VA support for physical mapping (x86_64) Thomas Garnier
2016-05-03 19:31   ` [kernel-hardening] " Thomas Garnier
2016-05-03 19:31 ` [PATCH v3 3/4] x86, boot: Implement ASLR for kernel memory sections (x86_64) Thomas Garnier
2016-05-03 19:31   ` [kernel-hardening] " Thomas Garnier
2016-05-10 18:53   ` Kees Cook
2016-05-10 18:53     ` [kernel-hardening] " Kees Cook
2016-05-10 21:28     ` Thomas Garnier
2016-05-10 21:28       ` [kernel-hardening] " Thomas Garnier
2016-05-03 19:31 ` [PATCH v3 4/4] x86, boot: Memory hotplug support for KASLR memory randomization Thomas Garnier
2016-05-03 19:31   ` [kernel-hardening] " Thomas Garnier
2016-05-10 18:24   ` Kees Cook
2016-05-10 18:24     ` [kernel-hardening] " Kees Cook
2016-05-10 18:49     ` Thomas Garnier
2016-05-10 18:49       ` [kernel-hardening] " Thomas Garnier

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.