linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6 0/3] x86/mm: memory area address KASLR
@ 2016-05-25 22:57 Kees Cook
  2016-05-25 22:57 ` [PATCH v6 1/3] x86/mm: PUD VA support for physical mapping (x86_64) Kees Cook
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Kees Cook @ 2016-05-25 22:57 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kees Cook, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Borislav Petkov, Juergen Gross, Thomas Garnier, Matt Fleming,
	Toshi Kani, Baoquan He, Andrew Morton, Dan Williams, Dave Hansen,
	Aneesh Kumar K.V, Kirill A. Shutemov, Martin Schwidefsky,
	Andy Lutomirski, Alexander Kuleshov, Alexander Popov,
	Joerg Roedel, Dave Young, Lv Zheng, Mark Salter, Stephen Smalley,
	Dmitry Vyukov, Boris Ostrovsky, David Rientjes,
	Christian Borntraeger, Jan Beulich, Kefeng Wang, Seth Jennings,
	Yinghai Lu, linux-kernel

This is v6 of Thomas Garnier's KASLR for memory areas (physical memory
mapping, vmalloc, vmemmap). It expects to be applied on top of the last
set of text base address KASLR changes. Minor changes were made to
reorganize some defines, add some file comments, etc.

***Background:
The current implementation of KASLR randomizes only the base address of
the kernel and its modules. Research was published showing that static
memory can be overwitten to elevate privileges bypassing KASLR.

In more details:

   The physical memory mapping holds most allocations from boot and heap
   allocators. Knowning the base address and physical memory size, an
   attacker can deduce the PDE virtual address for the vDSO memory page.
   This attack was demonstrated at CanSecWest 2016, in the "Getting
   Physical Extreme Abuse of Intel Based Paged Systems"
   https://goo.gl/ANpWdV (see second part of the presentation). The
   exploits used against Linux worked successfuly against 4.6+ but fail
   with KASLR memory enabled (https://goo.gl/iTtXMJ). Similar research
   was done at Google leading to this patch proposal. Variants exists to
   overwrite /proc or /sys objects ACLs leading to elevation of privileges.
   These variants were tested against 4.6+.

This set of patches randomizes base address and padding of three
major memory sections (physical memory mapping, vmalloc & vmemmap).
It mitigates exploits relying on predictable kernel addresses. This
feature can be enabled with the CONFIG_RANDOMIZE_MEMORY option.

Padding for the memory hotplug support is managed by
CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING. The default value is 10
terabytes.

The patches were tested on qemu & physical machines. Xen compatibility was
also verified. Multiple reboots were used to verify entropy for each
memory section.

***Problems that needed solving:
 - The three target memory sections are never at the same place between
   boots.
 - The physical memory mapping can use a virtual address not aligned on
   the PGD page table.
 - Have good entropy early at boot before get_random_bytes is available.
 - Add optional padding for memory hotplug compatibility.

***Parts:
 - The first part prepares for the KASLR memory randomization by
   refactoring entropy functions used by the current implementation and
   support PUD level virtual addresses for physical mapping.
   (Patches 01-02)
 - The second part implements the KASLR memory randomization for all
   sections mentioned.
   (Patch 03)
 - The third part adds support for memory hotplug by adding an option to
   define the padding used between the physical memory mapping section
   and the others.
   (Patch 04)

Performance data:

Kernbench shows almost no difference (-+ less than 1%):

Before:

Average Optimal load -j 12 Run (std deviation):
Elapsed Time 102.63 (1.2695)
User Time 1034.89 (1.18115)
System Time 87.056 (0.456416)
Percent CPU 1092.9 (13.892)
Context Switches 199805 (3455.33)
Sleeps 97907.8 (900.636)

After:

Average Optimal load -j 12 Run (std deviation):
Elapsed Time 102.489 (1.10636)
User Time 1034.86 (1.36053)
System Time 87.764 (0.49345)
Percent CPU 1095 (12.7715)
Context Switches 199036 (4298.1)
Sleeps 97681.6 (1031.11)

Hackbench shows 0% difference on average (hackbench 90
repeated 10 times):

attemp,before,after
1,0.076,0.069
2,0.072,0.069
3,0.066,0.066
4,0.066,0.068
5,0.066,0.067
6,0.066,0.069
7,0.067,0.066
8,0.063,0.067
9,0.067,0.065
10,0.068,0.071
average,0.0677,0.0677

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v6 1/3] x86/mm: PUD VA support for physical mapping (x86_64)
  2016-05-25 22:57 [PATCH v6 0/3] x86/mm: memory area address KASLR Kees Cook
@ 2016-05-25 22:57 ` Kees Cook
  2016-06-17  9:02   ` Ingo Molnar
  2016-05-25 22:57 ` [PATCH v6 2/3] x86/mm: Implement ASLR for kernel memory sections (x86_64) Kees Cook
  2016-05-25 22:57 ` [PATCH v6 3/3] x86/mm: Memory hotplug support for KASLR memory randomization Kees Cook
  2 siblings, 1 reply; 10+ messages in thread
From: Kees Cook @ 2016-05-25 22:57 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kees Cook, Thomas Garnier, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86, Borislav Petkov, Juergen Gross,
	Matt Fleming, Toshi Kani, Baoquan He, Andrew Morton,
	Dan Williams, Dave Hansen, Aneesh Kumar K.V, Kirill A. Shutemov,
	Martin Schwidefsky, Andy Lutomirski, Alexander Kuleshov,
	Alexander Popov, Joerg Roedel, Dave Young, Lv Zheng, Mark Salter,
	Stephen Smalley, Dmitry Vyukov, Boris Ostrovsky, David Rientjes,
	Christian Borntraeger, Jan Beulich, Kefeng Wang, Seth Jennings,
	Yinghai Lu, linux-kernel

From: Thomas Garnier <thgarnie@google.com>

Minor change that allows early boot physical mapping of PUD level virtual
addresses. The current implementation expects the virtual address to be
PUD aligned. For KASLR memory randomization, we need to be able to
randomize the offset used on the PUD table.

It has no impact on current usage.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 arch/x86/mm/init_64.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index bce2e5d9edd4..f205f39bd808 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -454,10 +454,10 @@ phys_pud_init(pud_t *pud_page, unsigned long addr, unsigned long end,
 {
 	unsigned long pages = 0, next;
 	unsigned long last_map_addr = end;
-	int i = pud_index(addr);
+	int i = pud_index((unsigned long)__va(addr));
 
 	for (; i < PTRS_PER_PUD; i++, addr = next) {
-		pud_t *pud = pud_page + pud_index(addr);
+		pud_t *pud = pud_page + pud_index((unsigned long)__va(addr));
 		pmd_t *pmd;
 		pgprot_t prot = PAGE_KERNEL;
 
-- 
2.6.3

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v6 2/3] x86/mm: Implement ASLR for kernel memory sections (x86_64)
  2016-05-25 22:57 [PATCH v6 0/3] x86/mm: memory area address KASLR Kees Cook
  2016-05-25 22:57 ` [PATCH v6 1/3] x86/mm: PUD VA support for physical mapping (x86_64) Kees Cook
@ 2016-05-25 22:57 ` Kees Cook
  2016-06-17 10:26   ` Ingo Molnar
  2016-05-25 22:57 ` [PATCH v6 3/3] x86/mm: Memory hotplug support for KASLR memory randomization Kees Cook
  2 siblings, 1 reply; 10+ messages in thread
From: Kees Cook @ 2016-05-25 22:57 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kees Cook, Thomas Garnier, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86, Borislav Petkov, Juergen Gross,
	Matt Fleming, Toshi Kani, Baoquan He, Andrew Morton,
	Dan Williams, Dave Hansen, Aneesh Kumar K.V, Kirill A. Shutemov,
	Martin Schwidefsky, Andy Lutomirski, Alexander Kuleshov,
	Alexander Popov, Joerg Roedel, Dave Young, Lv Zheng, Mark Salter,
	Stephen Smalley, Dmitry Vyukov, Boris Ostrovsky, David Rientjes,
	Christian Borntraeger, Jan Beulich, Kefeng Wang, Seth Jennings,
	Yinghai Lu, linux-kernel

From: Thomas Garnier <thgarnie@google.com>

Randomizes the virtual address space of kernel memory sections (physical
memory mapping, vmalloc & vmemmap) for x86_64. This security feature
mitigates exploits relying on predictable kernel addresses. These
addresses can be used to disclose the kernel modules base addresses
or corrupt specific structures to elevate privileges bypassing the
current implementation of KASLR. This feature can be enabled with the
CONFIG_RANDOMIZE_MEMORY option.

The physical memory mapping holds most allocations from boot and heap
allocators. Knowing the base address and physical memory size, an
attacker can deduce the PDE virtual address for the vDSO memory page.
This attack was demonstrated at CanSecWest 2016, in the "Getting
Physical Extreme Abuse of Intel Based Paged Systems"
https://goo.gl/ANpWdV (see second part of the presentation). The
exploits used against Linux worked successfully against 4.6+ but fail
with KASLR memory enabled (https://goo.gl/iTtXMJ). Similar research
was done at Google leading to this patch proposal. Variants exists to
overwrite /proc or /sys objects ACLs leading to elevation of privileges.
These variants were tested against 4.6+.

The vmalloc memory section contains the allocation made through the
vmalloc API. The allocations are done sequentially to prevent
fragmentation and each allocation address can easily be deduced
especially from boot.

The vmemmap section holds a representation of the physical
memory (through a struct page array). An attacker could use this section
to disclose the kernel memory layout (walking the page linked list).

The order of each memory section is not changed. The feature looks at
the available space for the sections based on different configuration
options and randomizes the base and space between each. The size of the
physical memory mapping is the available physical memory. No performance
impact was detected while testing the feature.

Entropy is generated using the KASLR early boot functions now shared in
the lib directory (originally written by Kees Cook). Randomization is
done on PGD & PUD page table levels to increase possible addresses. The
physical memory mapping code was adapted to support PUD level virtual
addresses. This implementation on the best configuration provides 30,000
possible virtual addresses in average for each memory section.  An
additional low memory page is used to ensure each CPU can start with a
PGD aligned virtual address (for realmode).

x86/dump_pagetable was updated to correctly display each section.

The page offset used by the compressed kernel was changed to the static
value, since it is not yet randomized during this boot stage.

Updated documentation on x86_64 memory layout accordingly.

Performance data:

Kernbench shows almost no difference (-+ less than 1%):

Before:

Average Optimal load -j 12 Run (std deviation):
Elapsed Time 102.63 (1.2695)
User Time 1034.89 (1.18115)
System Time 87.056 (0.456416)
Percent CPU 1092.9 (13.892)
Context Switches 199805 (3455.33)
Sleeps 97907.8 (900.636)

After:

Average Optimal load -j 12 Run (std deviation):
Elapsed Time 102.489 (1.10636)
User Time 1034.86 (1.36053)
System Time 87.764 (0.49345)
Percent CPU 1095 (12.7715)
Context Switches 199036 (4298.1)
Sleeps 97681.6 (1031.11)

Hackbench shows 0% difference on average (hackbench 90
repeated 10 times):

attemp,before,after
1,0.076,0.069
2,0.072,0.069
3,0.066,0.066
4,0.066,0.068
5,0.066,0.067
6,0.066,0.069
7,0.067,0.066
8,0.063,0.067
9,0.067,0.065
10,0.068,0.071
average,0.0677,0.0677

Signed-off-by: Thomas Garnier <thgarnie@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 Documentation/x86/x86_64/mm.txt         |   4 +
 arch/x86/Kconfig                        |  17 ++++
 arch/x86/boot/compressed/pagetable.c    |   3 +
 arch/x86/include/asm/kaslr.h            |  10 ++
 arch/x86/include/asm/page_64_types.h    |  11 ++-
 arch/x86/include/asm/pgtable.h          |  14 +++
 arch/x86/include/asm/pgtable_64_types.h |  15 ++-
 arch/x86/kernel/head_64.S               |   2 +-
 arch/x86/kernel/setup.c                 |   3 +
 arch/x86/mm/Makefile                    |   1 +
 arch/x86/mm/dump_pagetables.c           |  16 ++-
 arch/x86/mm/init.c                      |   4 +
 arch/x86/mm/kaslr.c                     | 167 ++++++++++++++++++++++++++++++++
 arch/x86/realmode/init.c                |   5 +-
 14 files changed, 262 insertions(+), 10 deletions(-)
 create mode 100644 arch/x86/mm/kaslr.c

diff --git a/Documentation/x86/x86_64/mm.txt b/Documentation/x86/x86_64/mm.txt
index 5aa738346062..8c7dd5957ae1 100644
--- a/Documentation/x86/x86_64/mm.txt
+++ b/Documentation/x86/x86_64/mm.txt
@@ -39,4 +39,8 @@ memory window (this size is arbitrary, it can be raised later if needed).
 The mappings are not part of any other kernel PGD and are only available
 during EFI runtime calls.
 
+Note that if CONFIG_RANDOMIZE_MEMORY is enabled, the direct mapping of all
+physical memory, vmalloc/ioremap space and virtual memory map are randomized.
+Their order is preserved but their base will be offset early at boot time.
+
 -Andi Kleen, Jul 2004
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 770ae5259dff..adab3fef3bb4 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1993,6 +1993,23 @@ config PHYSICAL_ALIGN
 
 	  Don't change this unless you know what you are doing.
 
+config RANDOMIZE_MEMORY
+	bool "Randomize the kernel memory sections"
+	depends on X86_64
+	depends on RANDOMIZE_BASE
+	default RANDOMIZE_BASE
+	---help---
+	   Randomizes the base virtual address of kernel memory sections
+	   (physical memory mapping, vmalloc & vmemmap). This security feature
+	   makes exploits relying on predictable memory locations less reliable.
+
+	   The order of allocations remains unchanged. Entropy is generated in
+	   the same way as RANDOMIZE_BASE. Current implementation in the optimal
+	   configuration have in average 30,000 different possible virtual
+	   addresses for each memory section.
+
+	   If unsure, say N.
+
 config HOTPLUG_CPU
 	bool "Support for hot-pluggable CPUs"
 	depends on SMP
diff --git a/arch/x86/boot/compressed/pagetable.c b/arch/x86/boot/compressed/pagetable.c
index 6e31a6aac4d3..56589d0a804b 100644
--- a/arch/x86/boot/compressed/pagetable.c
+++ b/arch/x86/boot/compressed/pagetable.c
@@ -20,6 +20,9 @@
 /* These actually do the work of building the kernel identity maps. */
 #include <asm/init.h>
 #include <asm/pgtable.h>
+/* Use the static base for this part of the boot process */
+#undef __PAGE_OFFSET
+#define __PAGE_OFFSET __PAGE_OFFSET_BASE
 #include "../../mm/ident_map.c"
 
 /* Used by pgtable.h asm code to force instruction serialization. */
diff --git a/arch/x86/include/asm/kaslr.h b/arch/x86/include/asm/kaslr.h
index 5547438db5ea..1052a797d71d 100644
--- a/arch/x86/include/asm/kaslr.h
+++ b/arch/x86/include/asm/kaslr.h
@@ -3,4 +3,14 @@
 
 unsigned long kaslr_get_random_long(const char *purpose);
 
+#ifdef CONFIG_RANDOMIZE_MEMORY
+extern unsigned long page_offset_base;
+extern unsigned long vmalloc_base;
+extern unsigned long vmemmap_base;
+
+void kernel_randomize_memory(void);
+#else
+static inline void kernel_randomize_memory(void) { }
+#endif /* CONFIG_RANDOMIZE_MEMORY */
+
 #endif
diff --git a/arch/x86/include/asm/page_64_types.h b/arch/x86/include/asm/page_64_types.h
index d5c2f8b40faa..9215e0527647 100644
--- a/arch/x86/include/asm/page_64_types.h
+++ b/arch/x86/include/asm/page_64_types.h
@@ -1,6 +1,10 @@
 #ifndef _ASM_X86_PAGE_64_DEFS_H
 #define _ASM_X86_PAGE_64_DEFS_H
 
+#ifndef __ASSEMBLY__
+#include <asm/kaslr.h>
+#endif
+
 #ifdef CONFIG_KASAN
 #define KASAN_STACK_ORDER 1
 #else
@@ -32,7 +36,12 @@
  * hypervisor to fit.  Choosing 16 slots here is arbitrary, but it's
  * what Xen requires.
  */
-#define __PAGE_OFFSET           _AC(0xffff880000000000, UL)
+#define __PAGE_OFFSET_BASE      _AC(0xffff880000000000, UL)
+#ifdef CONFIG_RANDOMIZE_MEMORY
+#define __PAGE_OFFSET           page_offset_base
+#else
+#define __PAGE_OFFSET           __PAGE_OFFSET_BASE
+#endif /* CONFIG_RANDOMIZE_MEMORY */
 
 #define __START_KERNEL_map	_AC(0xffffffff80000000, UL)
 
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 1a27396b6ea0..4cf2ce6dc088 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -729,6 +729,20 @@ extern int direct_gbpages;
 void init_mem_mapping(void);
 void early_alloc_pgt_buf(void);
 
+#ifdef CONFIG_X86_64
+/* Realmode trampoline initialization. */
+extern pgd_t trampoline_pgd_entry;
+# ifdef CONFIG_RANDOMIZE_MEMORY
+void __meminit init_trampoline(void);
+# else
+static inline void __meminit init_trampoline(void)
+{
+	/* Default trampoline pgd value */
+	trampoline_pgd_entry = init_level4_pgt[pgd_index(__PAGE_OFFSET)];
+}
+# endif
+#endif
+
 /* local pte updates need not use xchg for locking */
 static inline pte_t native_local_ptep_get_and_clear(pte_t *ptep)
 {
diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h
index e6844dfb4471..d38873900499 100644
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -5,6 +5,7 @@
 
 #ifndef __ASSEMBLY__
 #include <linux/types.h>
+#include <asm/kaslr.h>
 
 /*
  * These are used to make use of C type-checking..
@@ -54,9 +55,17 @@ typedef struct { pteval_t pte; } pte_t;
 
 /* See Documentation/x86/x86_64/mm.txt for a description of the memory map. */
 #define MAXMEM		 _AC(__AC(1, UL) << MAX_PHYSMEM_BITS, UL)
-#define VMALLOC_START    _AC(0xffffc90000000000, UL)
-#define VMALLOC_END      _AC(0xffffe8ffffffffff, UL)
-#define VMEMMAP_START	 _AC(0xffffea0000000000, UL)
+#define VMALLOC_SIZE_TB	 _AC(32, UL)
+#define __VMALLOC_BASE	 _AC(0xffffc90000000000, UL)
+#define __VMEMMAP_BASE	 _AC(0xffffea0000000000, UL)
+#ifdef CONFIG_RANDOMIZE_MEMORY
+#define VMALLOC_START	 vmalloc_base
+#define VMEMMAP_START	 vmemmap_base
+#else
+#define VMALLOC_START	 __VMALLOC_BASE
+#define VMEMMAP_START	 __VMEMMAP_BASE
+#endif /* CONFIG_RANDOMIZE_MEMORY */
+#define VMALLOC_END      (VMALLOC_START + _AC((VMALLOC_SIZE_TB << 40) - 1, UL))
 #define MODULES_VADDR    (__START_KERNEL_map + KERNEL_IMAGE_SIZE)
 #define MODULES_END      _AC(0xffffffffff000000, UL)
 #define MODULES_LEN   (MODULES_END - MODULES_VADDR)
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 5df831ef1442..03a2aa067ff3 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -38,7 +38,7 @@
 
 #define pud_index(x)	(((x) >> PUD_SHIFT) & (PTRS_PER_PUD-1))
 
-L4_PAGE_OFFSET = pgd_index(__PAGE_OFFSET)
+L4_PAGE_OFFSET = pgd_index(__PAGE_OFFSET_BASE)
 L4_START_KERNEL = pgd_index(__START_KERNEL_map)
 L3_START_KERNEL = pud_index(__START_KERNEL_map)
 
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index c4e7b3991b60..a2616584b6e9 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -113,6 +113,7 @@
 #include <asm/prom.h>
 #include <asm/microcode.h>
 #include <asm/mmu_context.h>
+#include <asm/kaslr.h>
 
 /*
  * max_low_pfn_mapped: highest direct mapped pfn under 4GB
@@ -942,6 +943,8 @@ void __init setup_arch(char **cmdline_p)
 
 	x86_init.oem.arch_setup();
 
+	kernel_randomize_memory();
+
 	iomem_resource.end = (1ULL << boot_cpu_data.x86_phys_bits) - 1;
 	setup_memory_map();
 	parse_setup_data();
diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index 62c0043a5fd5..96d2b847e09e 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -37,4 +37,5 @@ obj-$(CONFIG_NUMA_EMU)		+= numa_emulation.o
 
 obj-$(CONFIG_X86_INTEL_MPX)	+= mpx.o
 obj-$(CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) += pkeys.o
+obj-$(CONFIG_RANDOMIZE_MEMORY) += kaslr.o
 
diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c
index 99bfb192803f..9a17250bcbe0 100644
--- a/arch/x86/mm/dump_pagetables.c
+++ b/arch/x86/mm/dump_pagetables.c
@@ -72,9 +72,9 @@ static struct addr_marker address_markers[] = {
 	{ 0, "User Space" },
 #ifdef CONFIG_X86_64
 	{ 0x8000000000000000UL, "Kernel Space" },
-	{ PAGE_OFFSET,		"Low Kernel Mapping" },
-	{ VMALLOC_START,        "vmalloc() Area" },
-	{ VMEMMAP_START,        "Vmemmap" },
+	{ 0/* PAGE_OFFSET */,   "Low Kernel Mapping" },
+	{ 0/* VMALLOC_START */, "vmalloc() Area" },
+	{ 0/* VMEMMAP_START */, "Vmemmap" },
 # ifdef CONFIG_X86_ESPFIX64
 	{ ESPFIX_BASE_ADDR,	"ESPfix Area", 16 },
 # endif
@@ -434,8 +434,16 @@ void ptdump_walk_pgd_level_checkwx(void)
 
 static int __init pt_dump_init(void)
 {
+	/*
+	 * Various markers are not compile-time constants, so assign them
+	 * here.
+	 */
+#ifdef CONFIG_X86_64
+	address_markers[LOW_KERNEL_NR].start_address = PAGE_OFFSET;
+	address_markers[VMALLOC_START_NR].start_address = VMALLOC_START;
+	address_markers[VMEMMAP_START_NR].start_address = VMEMMAP_START;
+#endif
 #ifdef CONFIG_X86_32
-	/* Not a compile-time constant on x86-32 */
 	address_markers[VMALLOC_START_NR].start_address = VMALLOC_START;
 	address_markers[VMALLOC_END_NR].start_address = VMALLOC_END;
 # ifdef CONFIG_HIGHMEM
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 372aad2b3291..cc82830bc8c4 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -17,6 +17,7 @@
 #include <asm/proto.h>
 #include <asm/dma.h>		/* for MAX_DMA_PFN */
 #include <asm/microcode.h>
+#include <asm/kaslr.h>
 
 /*
  * We need to define the tracepoints somewhere, and tlb.c
@@ -590,6 +591,9 @@ void __init init_mem_mapping(void)
 	/* the ISA range is always mapped regardless of memory holes */
 	init_memory_mapping(0, ISA_END_ADDRESS);
 
+	/* Init the trampoline, possibly with KASLR memory offset */
+	init_trampoline();
+
 	/*
 	 * If the allocation is in bottom-up direction, we setup direct mapping
 	 * in bottom-up, otherwise we setup direct mapping in top-down.
diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
new file mode 100644
index 000000000000..7a1aa44cff1b
--- /dev/null
+++ b/arch/x86/mm/kaslr.c
@@ -0,0 +1,167 @@
+/*
+ * This file implements KASLR memory randomization for x86_64. It randomizes
+ * the virtual address space of kernel memory sections (physical memory
+ * mapping, vmalloc & vmemmap) for x86_64. This security feature mitigates
+ * exploits relying on predictable kernel addresses.
+ *
+ * Entropy is generated using the KASLR early boot functions now shared in
+ * the lib directory (originally written by Kees Cook). Randomization is
+ * done on PGD & PUD page table levels to increase possible addresses. The
+ * physical memory mapping code was adapted to support PUD level virtual
+ * addresses. This implementation on the best configuration provides 30,000
+ * possible virtual addresses in average for each memory section.  An
+ * additional low memory page is used to ensure each CPU can start with a
+ * PGD aligned virtual address (for realmode).
+ *
+ * The order of each memory section is not changed. The feature looks at
+ * the available space for the sections based on different configuration
+ * options and randomizes the base and space between each. The size of the
+ * physical memory mapping is the available physical memory.
+ *
+ */
+#include <linux/kernel.h>
+#include <linux/errno.h>
+#include <linux/types.h>
+#include <linux/mm.h>
+#include <linux/smp.h>
+#include <linux/init.h>
+#include <linux/memory.h>
+#include <linux/random.h>
+
+#include <asm/processor.h>
+#include <asm/pgtable.h>
+#include <asm/pgalloc.h>
+#include <asm/e820.h>
+#include <asm/init.h>
+#include <asm/setup.h>
+#include <asm/kaslr.h>
+#include <asm/kasan.h>
+
+#include "mm_internal.h"
+
+#define TB_SHIFT 40
+
+/*
+ * Memory base and end randomization is based on different configurations.
+ * We want as much space as possible to increase entropy available.
+ */
+static const unsigned long memory_rand_start = __PAGE_OFFSET_BASE;
+
+#if defined(CONFIG_KASAN)
+static const unsigned long memory_rand_end = KASAN_SHADOW_START;
+#elif defined(CONFIG_X86_ESPFIX64)
+static const unsigned long memory_rand_end = ESPFIX_BASE_ADDR;
+#elif defined(CONFIG_EFI)
+static const unsigned long memory_rand_end = EFI_VA_START;
+#else
+static const unsigned long memory_rand_end = __START_KERNEL_map;
+#endif
+
+/* Default values */
+unsigned long page_offset_base = __PAGE_OFFSET_BASE;
+EXPORT_SYMBOL(page_offset_base);
+unsigned long vmalloc_base = __VMALLOC_BASE;
+EXPORT_SYMBOL(vmalloc_base);
+unsigned long vmemmap_base = __VMEMMAP_BASE;
+EXPORT_SYMBOL(vmemmap_base);
+
+/* Describe each randomized memory sections in sequential order */
+static __initdata struct kaslr_memory_region {
+	unsigned long *base;
+	unsigned short size_tb;
+} kaslr_regions[] = {
+	{ &page_offset_base, 64/* Maximum */ },
+	{ &vmalloc_base, VMALLOC_SIZE_TB },
+	{ &vmemmap_base, 1 },
+};
+
+/* Size in Terabytes + 1 hole */
+static __init unsigned long get_padding(struct kaslr_memory_region *region)
+{
+	return ((unsigned long)region->size_tb + 1) << TB_SHIFT;
+}
+
+/* Initialize base and padding for each memory section randomized with KASLR */
+void __init kernel_randomize_memory(void)
+{
+	size_t i;
+	unsigned long addr = memory_rand_start;
+	unsigned long padding, rand, mem_tb;
+	struct rnd_state rnd_st;
+	unsigned long remain_padding = memory_rand_end - memory_rand_start;
+
+	/*
+	 * All these BUILD_BUG_ON checks ensures the memory layout is
+	 * consistent with the current KASLR design.
+	 */
+	BUILD_BUG_ON(memory_rand_start >= memory_rand_end);
+	BUILD_BUG_ON(config_enabled(CONFIG_KASAN) &&
+		memory_rand_end >= ESPFIX_BASE_ADDR);
+	BUILD_BUG_ON((config_enabled(CONFIG_KASAN) ||
+			config_enabled(CONFIG_X86_ESPFIX64)) &&
+		memory_rand_end >= EFI_VA_START);
+	BUILD_BUG_ON((config_enabled(CONFIG_KASAN) ||
+			config_enabled(CONFIG_X86_ESPFIX64) ||
+			config_enabled(CONFIG_EFI)) &&
+		memory_rand_end >= __START_KERNEL_map);
+	BUILD_BUG_ON(memory_rand_end > __START_KERNEL_map);
+
+	if (!kaslr_enabled())
+		return;
+
+	BUG_ON(kaslr_regions[0].base != &page_offset_base);
+	mem_tb = ((max_pfn << PAGE_SHIFT) >> TB_SHIFT);
+
+	if (mem_tb < kaslr_regions[0].size_tb)
+		kaslr_regions[0].size_tb = mem_tb;
+
+	for (i = 0; i < ARRAY_SIZE(kaslr_regions); i++)
+		remain_padding -= get_padding(&kaslr_regions[i]);
+
+	prandom_seed_state(&rnd_st, kaslr_get_random_long("Memory"));
+
+	/* Position each section randomly with minimum 1 terabyte between */
+	for (i = 0; i < ARRAY_SIZE(kaslr_regions); i++) {
+		padding = remain_padding / (ARRAY_SIZE(kaslr_regions) - i);
+		prandom_bytes_state(&rnd_st, &rand, sizeof(rand));
+		padding = (rand % (padding + 1)) & PUD_MASK;
+		addr += padding;
+		*kaslr_regions[i].base = addr;
+		addr += get_padding(&kaslr_regions[i]);
+		remain_padding -= padding;
+	}
+}
+
+/*
+ * Create PGD aligned trampoline table to allow real mode initialization
+ * of additional CPUs. Consume only 1 low memory page.
+ */
+void __meminit init_trampoline(void)
+{
+	unsigned long addr, next;
+	pgd_t *pgd;
+	pud_t *pud_page, *tr_pud_page;
+	int i;
+
+	if (!kaslr_enabled())
+		return;
+
+	tr_pud_page = alloc_low_page();
+	set_pgd(&trampoline_pgd_entry, __pgd(_PAGE_TABLE | __pa(tr_pud_page)));
+
+	addr = 0;
+	pgd = pgd_offset_k((unsigned long)__va(addr));
+	pud_page = (pud_t *) pgd_page_vaddr(*pgd);
+
+	for (i = pud_index(addr); i < PTRS_PER_PUD; i++, addr = next) {
+		pud_t *pud, *tr_pud;
+
+		tr_pud = tr_pud_page + pud_index(addr);
+		pud = pud_page + pud_index((unsigned long)__va(addr));
+		next = (addr & PUD_MASK) + PUD_SIZE;
+
+		/* Needed to copy pte or pud alike */
+		BUILD_BUG_ON(sizeof(pud_t) != sizeof(pte_t));
+		*tr_pud = *pud;
+	}
+}
diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
index 0b7a63d98440..705e3fffb4a1 100644
--- a/arch/x86/realmode/init.c
+++ b/arch/x86/realmode/init.c
@@ -8,6 +8,9 @@
 struct real_mode_header *real_mode_header;
 u32 *trampoline_cr4_features;
 
+/* Hold the pgd entry used on booting additional CPUs */
+pgd_t trampoline_pgd_entry;
+
 void __init reserve_real_mode(void)
 {
 	phys_addr_t mem;
@@ -84,7 +87,7 @@ void __init setup_real_mode(void)
 	*trampoline_cr4_features = __read_cr4();
 
 	trampoline_pgd = (u64 *) __va(real_mode_header->trampoline_pgd);
-	trampoline_pgd[0] = init_level4_pgt[pgd_index(__PAGE_OFFSET)].pgd;
+	trampoline_pgd[0] = trampoline_pgd_entry.pgd;
 	trampoline_pgd[511] = init_level4_pgt[511].pgd;
 #endif
 }
-- 
2.6.3

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v6 3/3] x86/mm: Memory hotplug support for KASLR memory randomization
  2016-05-25 22:57 [PATCH v6 0/3] x86/mm: memory area address KASLR Kees Cook
  2016-05-25 22:57 ` [PATCH v6 1/3] x86/mm: PUD VA support for physical mapping (x86_64) Kees Cook
  2016-05-25 22:57 ` [PATCH v6 2/3] x86/mm: Implement ASLR for kernel memory sections (x86_64) Kees Cook
@ 2016-05-25 22:57 ` Kees Cook
  2 siblings, 0 replies; 10+ messages in thread
From: Kees Cook @ 2016-05-25 22:57 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kees Cook, Thomas Garnier, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86, Borislav Petkov, Juergen Gross,
	Matt Fleming, Toshi Kani, Baoquan He, Andrew Morton,
	Dan Williams, Dave Hansen, Aneesh Kumar K.V, Kirill A. Shutemov,
	Martin Schwidefsky, Andy Lutomirski, Alexander Kuleshov,
	Alexander Popov, Joerg Roedel, Dave Young, Lv Zheng, Mark Salter,
	Stephen Smalley, Dmitry Vyukov, Boris Ostrovsky, David Rientjes,
	Christian Borntraeger, Jan Beulich, Kefeng Wang, Seth Jennings,
	Yinghai Lu, linux-kernel

From: Thomas Garnier <thgarnie@google.com>

Add a new option (CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING) to define
the padding used for the physical memory mapping section when KASLR
memory is enabled. It ensures there is enough virtual address space when
CONFIG_MEMORY_HOTPLUG is used. The default value is 10 terabytes. If
CONFIG_MEMORY_HOTPLUG is not used, no space is reserved increasing the
entropy available.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 arch/x86/Kconfig    | 15 +++++++++++++++
 arch/x86/mm/kaslr.c |  7 ++++++-
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index adab3fef3bb4..214b3fadbc11 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2010,6 +2010,21 @@ config RANDOMIZE_MEMORY
 
 	   If unsure, say N.
 
+config RANDOMIZE_MEMORY_PHYSICAL_PADDING
+	hex "Physical memory mapping padding" if EXPERT
+	depends on RANDOMIZE_MEMORY
+	default "0xa" if MEMORY_HOTPLUG
+	default "0x0"
+	range 0x1 0x40 if MEMORY_HOTPLUG
+	range 0x0 0x40
+	---help---
+	   Define the padding in terabytes added to the existing physical
+	   memory size during kernel memory randomization. It is useful
+	   for memory hotplug support but reduces the entropy available for
+	   address randomization.
+
+	   If unsure, leave at the default value.
+
 config HOTPLUG_CPU
 	bool "Support for hot-pluggable CPUs"
 	depends on SMP
diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
index 7a1aa44cff1b..0c9264ed9357 100644
--- a/arch/x86/mm/kaslr.c
+++ b/arch/x86/mm/kaslr.c
@@ -109,8 +109,13 @@ void __init kernel_randomize_memory(void)
 	if (!kaslr_enabled())
 		return;
 
+	/*
+	 * Update Physical memory mapping to available and
+	 * add padding if needed (especially for memory hotplug support).
+	 */
 	BUG_ON(kaslr_regions[0].base != &page_offset_base);
-	mem_tb = ((max_pfn << PAGE_SHIFT) >> TB_SHIFT);
+	mem_tb = ((max_pfn << PAGE_SHIFT) >> TB_SHIFT) +
+		CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING;
 
 	if (mem_tb < kaslr_regions[0].size_tb)
 		kaslr_regions[0].size_tb = mem_tb;
-- 
2.6.3

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v6 1/3] x86/mm: PUD VA support for physical mapping (x86_64)
  2016-05-25 22:57 ` [PATCH v6 1/3] x86/mm: PUD VA support for physical mapping (x86_64) Kees Cook
@ 2016-06-17  9:02   ` Ingo Molnar
  2016-06-20 16:17     ` Thomas Garnier
  0 siblings, 1 reply; 10+ messages in thread
From: Ingo Molnar @ 2016-06-17  9:02 UTC (permalink / raw)
  To: Kees Cook
  Cc: Thomas Garnier, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	x86, Borislav Petkov, Juergen Gross, Matt Fleming, Toshi Kani,
	Baoquan He, Andrew Morton, Dan Williams, Dave Hansen,
	Aneesh Kumar K.V, Kirill A. Shutemov, Martin Schwidefsky,
	Andy Lutomirski, Alexander Kuleshov, Alexander Popov,
	Joerg Roedel, Dave Young, Lv Zheng, Mark Salter, Stephen Smalley,
	Dmitry Vyukov, Boris Ostrovsky, David Rientjes,
	Christian Borntraeger, Jan Beulich, Kefeng Wang, Seth Jennings,
	Yinghai Lu, linux-kernel


* Kees Cook <keescook@chromium.org> wrote:

> From: Thomas Garnier <thgarnie@google.com>
> 
> Minor change that allows early boot physical mapping of PUD level virtual
> addresses. The current implementation expects the virtual address to be
> PUD aligned. For KASLR memory randomization, we need to be able to
> randomize the offset used on the PUD table.
> 
> It has no impact on current usage.
> 
> Signed-off-by: Thomas Garnier <thgarnie@google.com>
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---
>  arch/x86/mm/init_64.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> index bce2e5d9edd4..f205f39bd808 100644
> --- a/arch/x86/mm/init_64.c
> +++ b/arch/x86/mm/init_64.c
> @@ -454,10 +454,10 @@ phys_pud_init(pud_t *pud_page, unsigned long addr, unsigned long end,
>  {
>  	unsigned long pages = 0, next;
>  	unsigned long last_map_addr = end;
> -	int i = pud_index(addr);
> +	int i = pud_index((unsigned long)__va(addr));
>
>  
>  	for (; i < PTRS_PER_PUD; i++, addr = next) {
> -		pud_t *pud = pud_page + pud_index(addr);
> +		pud_t *pud = pud_page + pud_index((unsigned long)__va(addr));
>  		pmd_t *pmd;
>  		pgprot_t prot = PAGE_KERNEL;

So I really dislike two things about this code.

Firstly a pre-existing problem is that the parameter names to phys_pud_init() 
suck:

static unsigned long __meminit
phys_pud_init(pud_t *pud_page, unsigned long addr, unsigned long end,
                         unsigned long page_size_mask)

so 'unsigned long addr' is usually the signature of a virtual address - but that's 
no true here: it's a physical address.

Same goes for 'unsigned long end'. Plus it's unclear what the connection between 
'addr' and 'end' - it's not at all obvious 'at a glance' that they are the start 
and end addresses of a physical memory range.

All of these problems can be solved by renaming them to 'paddr_start' and 
'paddr_end'.

Btw., I believe this misnomer and confusing code resulted in the buggy 
'pud_index(addr)' not being noticed to begin with ...

Secondly, and that's a new problem introduced by this patch:

> +	int i = pud_index((unsigned long)__va(addr));
> +		pud_t *pud = pud_page + pud_index((unsigned long)__va(addr));

... beyond the repetition, using type casts is fragile. Type casts should be a red 
flag to anyone involved in low level, security relevant code! So I'm pretty 
unhappy about seeing such a problem in such a patch.

This code should be doing something like:

	unsigned long vaddr_start = __va(paddr_start);

... which gets rid of the type cast, the repetition and documents the code much 
better as well. Also see how easily the connection between the variables is 
self-documented just by picking names carefully:

	paddr_start
	paddr_end
	vaddr_start
	vaddr_end

Also, _please_ add a comment to phys_pud_init() that explains what the function 
does.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v6 2/3] x86/mm: Implement ASLR for kernel memory sections (x86_64)
  2016-05-25 22:57 ` [PATCH v6 2/3] x86/mm: Implement ASLR for kernel memory sections (x86_64) Kees Cook
@ 2016-06-17 10:26   ` Ingo Molnar
  2016-06-17 17:29     ` Kees Cook
  2016-06-21 16:46     ` Thomas Garnier
  0 siblings, 2 replies; 10+ messages in thread
From: Ingo Molnar @ 2016-06-17 10:26 UTC (permalink / raw)
  To: Kees Cook
  Cc: Thomas Garnier, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	x86, Borislav Petkov, Juergen Gross, Matt Fleming, Toshi Kani,
	Baoquan He, Andrew Morton, Dan Williams, Dave Hansen,
	Aneesh Kumar K.V, Kirill A. Shutemov, Martin Schwidefsky,
	Andy Lutomirski, Alexander Kuleshov, Alexander Popov,
	Joerg Roedel, Dave Young, Lv Zheng, Mark Salter, Stephen Smalley,
	Dmitry Vyukov, Boris Ostrovsky, David Rientjes,
	Christian Borntraeger, Jan Beulich, Kefeng Wang, Seth Jennings,
	Yinghai Lu, linux-kernel


* Kees Cook <keescook@chromium.org> wrote:

> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -1993,6 +1993,23 @@ config PHYSICAL_ALIGN
>  
>  	  Don't change this unless you know what you are doing.
>  
> +config RANDOMIZE_MEMORY
> +	bool "Randomize the kernel memory sections"
> +	depends on X86_64
> +	depends on RANDOMIZE_BASE
> +	default RANDOMIZE_BASE
> +	---help---
> +	   Randomizes the base virtual address of kernel memory sections
> +	   (physical memory mapping, vmalloc & vmemmap). This security feature
> +	   makes exploits relying on predictable memory locations less reliable.
> +
> +	   The order of allocations remains unchanged. Entropy is generated in
> +	   the same way as RANDOMIZE_BASE. Current implementation in the optimal
> +	   configuration have in average 30,000 different possible virtual
> +	   addresses for each memory section.
> +
> +	   If unsure, say N.

So this is really poor naming!

The user, in first approximation, will see something like this:

  Randomize the kernel memory sections (RANDOMIZE_MEMORY) [Y/n/?] (NEW) 

... and could naively conclude that this feature will randomize memory contents.

Furthermore, there's no apparent connection to another, related kernel feature:

  Randomize the address of the kernel image (KASLR) (RANDOMIZE_BASE) [Y/n/?]

A better naming would be something like:

     Randomize other static kernel memory addresses (RANDOMIZE_ALL) [Y/n/?] (NEW) 

(assuming it's truly 'all')

But ... I don't see it explained anywhere why a user, once he expressed interest 
in randomization would even want to _not_ randomize as much as possible. I.e. why 
does this new option exist at all, shouldn't we just gradually extend 
randomization to more memory areas and control it via CONFIG_RANDOMIZE_BASE?

Btw., CONFIG_RANDOMIZE_BASE is probably a misnomer, and after these patches it 
will be more of a misnomer - so we should rename it to something better. For 
example CONFIG_KASLR would have the advantage of being obviously named at a 
glance, to anyone who knows what 'ASLR' means. To those who don't know the short 
description will tell it:

  Kernel address space layout randomization (KASLR) [Y/n/?]

This would also fit the kernel internal naming of kaslr.c/etc.

What do you think?

> +++ b/arch/x86/mm/kaslr.c
> @@ -0,0 +1,167 @@
> +/*
> + * This file implements KASLR memory randomization for x86_64. It randomizes
> + * the virtual address space of kernel memory sections (physical memory
> + * mapping, vmalloc & vmemmap) for x86_64. This security feature mitigates
> + * exploits relying on predictable kernel addresses.
> + *
> + * Entropy is generated using the KASLR early boot functions now shared in
> + * the lib directory (originally written by Kees Cook). Randomization is
> + * done on PGD & PUD page table levels to increase possible addresses. The
> + * physical memory mapping code was adapted to support PUD level virtual
> + * addresses. This implementation on the best configuration provides 30,000
> + * possible virtual addresses in average for each memory section.  An
> + * additional low memory page is used to ensure each CPU can start with a
> + * PGD aligned virtual address (for realmode).
> + *
> + * The order of each memory section is not changed. The feature looks at
> + * the available space for the sections based on different configuration
> + * options and randomizes the base and space between each. The size of the
> + * physical memory mapping is the available physical memory.
> + *
> + */

(Stray newline.)

> +#include <linux/kernel.h>
> +#include <linux/errno.h>
> +#include <linux/types.h>
> +#include <linux/mm.h>
> +#include <linux/smp.h>
> +#include <linux/init.h>
> +#include <linux/memory.h>
> +#include <linux/random.h>
> +
> +#include <asm/processor.h>
> +#include <asm/pgtable.h>
> +#include <asm/pgalloc.h>
> +#include <asm/e820.h>
> +#include <asm/init.h>
> +#include <asm/setup.h>
> +#include <asm/kaslr.h>
> +#include <asm/kasan.h>
> +
> +#include "mm_internal.h"

How many of those include lines are truly unnecessary, or did this just got copied 
from another file?

> +/*
> + * Memory base and end randomization is based on different configurations.
> + * We want as much space as possible to increase entropy available.
> + */
> +static const unsigned long memory_rand_start = __PAGE_OFFSET_BASE;
> +
> +#if defined(CONFIG_KASAN)
> +static const unsigned long memory_rand_end = KASAN_SHADOW_START;
> +#elif defined(CONFIG_X86_ESPFIX64)
> +static const unsigned long memory_rand_end = ESPFIX_BASE_ADDR;
> +#elif defined(CONFIG_EFI)
> +static const unsigned long memory_rand_end = EFI_VA_START;
> +#else
> +static const unsigned long memory_rand_end = __START_KERNEL_map;
> +#endif

It's unclear to me from the naming and the description whether these are virtual 
or physical memory ranges. Nor is it explained why this laundry list of kernel 
options influences the randomization range - and it's not explained how new kernel 
features modifying the virtual memory layout should plug into this mechanism.

Could we please get a bit more verbose introduction into why all of this is done?

> +/* Default values */
> +unsigned long page_offset_base = __PAGE_OFFSET_BASE;
> +EXPORT_SYMBOL(page_offset_base);
> +unsigned long vmalloc_base = __VMALLOC_BASE;
> +EXPORT_SYMBOL(vmalloc_base);
> +unsigned long vmemmap_base = __VMEMMAP_BASE;
> +EXPORT_SYMBOL(vmemmap_base);
> +
> +/* Describe each randomized memory sections in sequential order */
> +static __initdata struct kaslr_memory_region {
> +	unsigned long *base;
> +	unsigned short size_tb;
> +} kaslr_regions[] = {
> +	{ &page_offset_base, 64/* Maximum */ },
> +	{ &vmalloc_base, VMALLOC_SIZE_TB },
> +	{ &vmemmap_base, 1 },
> +};

So this comment is totally misleading. This area does not 'describe' each memory 
section, it actually puts live pointers to virtual memory address offset control 
variables used by other subsystems into this array - allowing us to randomize them 
programmatically.

This patch should be split up into 4 components:

 - core patch adding all the infrastructure but not actually randomizing anything,
   the kaslr_regions[] array is empty.

 - a patch adding page_offset_base randomization.

 - a patch adding vmalloc_base randomization.

 - a patch adding vmemmap_base randomization.

This way if any breakage is introduced by this randomization it can be bisected to 
in a finegrained fashion. It would also have saved 5 minutes from my life trying 
to interpret the code here :-/

In fact I'd suggest a series of 7 patches, with each randomization patch split 
into two further patches:

 - a patch making address offset variables dynamic.

 - a patch actually enabling the randomization of that virtual address range.

This structure allows us to carefully evaluate the code size and runtime impact of 
randomizing a given range.

> +
> +/* Size in Terabytes + 1 hole */
> +static __init unsigned long get_padding(struct kaslr_memory_region *region)
> +{
> +	return ((unsigned long)region->size_tb + 1) << TB_SHIFT;
> +}

So why is size_tb a short, forcing ugly type casts like this? Also, since this 
appears to be a trivial shift, why isn't this an inline?

> +/* Initialize base and padding for each memory section randomized with KASLR */
> +void __init kernel_randomize_memory(void)
> +{
> +	size_t i;
> +	unsigned long addr = memory_rand_start;
> +	unsigned long padding, rand, mem_tb;
> +	struct rnd_state rnd_st;
> +	unsigned long remain_padding = memory_rand_end - memory_rand_start;

Yeah, so these variable names are awful :-/

Look at how postfix and prefix naming is mixed in the same function: 
'memory_rand_start', but 'remain_padding'. Please use postfixes for all of them.

Also why is one variable named 'rand', another 'rnd'? Did we run out of 'a' 
characters?

Similarly, we have 'memory' and 'mem'. Pick one variant and harmonize!

This patch is an object lesson in how fragile, insecure code is written.

> +
> +	/*
> +	 * All these BUILD_BUG_ON checks ensures the memory layout is
> +	 * consistent with the current KASLR design.
> +	 */
> +	BUILD_BUG_ON(memory_rand_start >= memory_rand_end);
> +	BUILD_BUG_ON(config_enabled(CONFIG_KASAN) &&
> +		memory_rand_end >= ESPFIX_BASE_ADDR);
> +	BUILD_BUG_ON((config_enabled(CONFIG_KASAN) ||
> +			config_enabled(CONFIG_X86_ESPFIX64)) &&
> +		memory_rand_end >= EFI_VA_START);
> +	BUILD_BUG_ON((config_enabled(CONFIG_KASAN) ||
> +			config_enabled(CONFIG_X86_ESPFIX64) ||
> +			config_enabled(CONFIG_EFI)) &&
> +		memory_rand_end >= __START_KERNEL_map);
> +	BUILD_BUG_ON(memory_rand_end > __START_KERNEL_map);
> +
> +	if (!kaslr_enabled())
> +		return;
> +
> +	BUG_ON(kaslr_regions[0].base != &page_offset_base);
> +	mem_tb = ((max_pfn << PAGE_SHIFT) >> TB_SHIFT);

Why is the granularity of the randomization 1TB? I don't see this explained 
anywhere. Some of the randomization may have PUD-granularity limitation - but PUD 
entries have a size of 512 GB.

I think we should extend the kaslr_regions[] table with a minimum required 
alignment field instead of random granularity limitations.

> +/*
> + * Create PGD aligned trampoline table to allow real mode initialization
> + * of additional CPUs. Consume only 1 low memory page.
> + */
> +void __meminit init_trampoline(void)

This too should probably be a separate patch, for bisectability and reviewability 
reasons.

> +	unsigned long addr, next;
> +	pgd_t *pgd;
> +	pud_t *pud_page, *tr_pud_page;
> +	int i;

I had to look twice to understand that 'tr' stands for 'trampoline'. Please rename 
to 'pud_page_tramp' or so.

> +	if (!kaslr_enabled())
> +		return;
> +
> +	tr_pud_page = alloc_low_page();
> +	set_pgd(&trampoline_pgd_entry, __pgd(_PAGE_TABLE | __pa(tr_pud_page)));

Shouldn't we set it _after_ we've initialized the tr_pud_page page table?

Also, why _PAGE_TABLE? AFAICS using _PAGE_TABLE adds _PAGE_USER.

> +
> +	addr = 0;
> +	pgd = pgd_offset_k((unsigned long)__va(addr));

Why the type cast?

Also, similar complaint to the first patch: why is a physical address named in an 
ambiguous fashion, as 'addr'?

> +	pud_page = (pud_t *) pgd_page_vaddr(*pgd);

Side note: it appears all users of pgd_page_vaddr() are casting it immediately to 
a pointer type. Could we change the return type of this function to 'void *' or 
'pud_t *'? AFAICS it's mostly used in platform code.

> +
> +	for (i = pud_index(addr); i < PTRS_PER_PUD; i++, addr = next) {
> +		pud_t *pud, *tr_pud;
> +
> +		tr_pud = tr_pud_page + pud_index(addr);
> +		pud = pud_page + pud_index((unsigned long)__va(addr));

Why is this type cast needed?

> +		next = (addr & PUD_MASK) + PUD_SIZE;
> +
> +		/* Needed to copy pte or pud alike */
> +		BUILD_BUG_ON(sizeof(pud_t) != sizeof(pte_t));
> +		*tr_pud = *pud;

So I don't understand what brought the size of 'pte' into this code. We are 
copying pud entries around AFAICS, why should the size of the pte entry matter?

Thanks,

	Ingo> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v6 2/3] x86/mm: Implement ASLR for kernel memory sections (x86_64)
  2016-06-17 10:26   ` Ingo Molnar
@ 2016-06-17 17:29     ` Kees Cook
  2016-06-21 16:46     ` Thomas Garnier
  1 sibling, 0 replies; 10+ messages in thread
From: Kees Cook @ 2016-06-17 17:29 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Thomas Garnier, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	x86, Borislav Petkov, Juergen Gross, Matt Fleming, Toshi Kani,
	Baoquan He, Andrew Morton, Dan Williams, Dave Hansen,
	Aneesh Kumar K.V, Kirill A. Shutemov, Martin Schwidefsky,
	Andy Lutomirski, Alexander Kuleshov, Alexander Popov,
	Joerg Roedel, Dave Young, Lv Zheng, Mark Salter, Stephen Smalley,
	Dmitry Vyukov, Boris Ostrovsky, David Rientjes,
	Christian Borntraeger, Jan Beulich, Kefeng Wang, Seth Jennings,
	Yinghai Lu, LKML

Thanks for the review! I'll let Thomas address the feedback, though
I've got some thoughts below on naming.

On Fri, Jun 17, 2016 at 3:26 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Kees Cook <keescook@chromium.org> wrote:
>
>> --- a/arch/x86/Kconfig
>> +++ b/arch/x86/Kconfig
>> @@ -1993,6 +1993,23 @@ config PHYSICAL_ALIGN
>>
>>         Don't change this unless you know what you are doing.
>>
>> +config RANDOMIZE_MEMORY
>> +     bool "Randomize the kernel memory sections"
>> +     depends on X86_64
>> +     depends on RANDOMIZE_BASE
>> +     default RANDOMIZE_BASE
>> +     ---help---
>> +        Randomizes the base virtual address of kernel memory sections
>> +        (physical memory mapping, vmalloc & vmemmap). This security feature
>> +        makes exploits relying on predictable memory locations less reliable.
>> +
>> +        The order of allocations remains unchanged. Entropy is generated in
>> +        the same way as RANDOMIZE_BASE. Current implementation in the optimal
>> +        configuration have in average 30,000 different possible virtual
>> +        addresses for each memory section.
>> +
>> +        If unsure, say N.
>
> So this is really poor naming!

I think "RANDOMIZE" should probably be changed, yes, though I have
some caveats...

> The user, in first approximation, will see something like this:
>
>   Randomize the kernel memory sections (RANDOMIZE_MEMORY) [Y/n/?] (NEW)
>
> ... and could naively conclude that this feature will randomize memory contents.
>
> Furthermore, there's no apparent connection to another, related kernel feature:
>
>   Randomize the address of the kernel image (KASLR) (RANDOMIZE_BASE) [Y/n/?]
>
> A better naming would be something like:
>
>      Randomize other static kernel memory addresses (RANDOMIZE_ALL) [Y/n/?] (NEW)
>
> (assuming it's truly 'all')

There's always more to be identified and added. A few thoughts:

"Address Space Layout Randomization" is a recognizable name, though I
tried to avoid it in the descriptive texts because
CONFIG_RANDOMIZE_BASE does not randomize "everything" and it doesn't
randomize link order, etc, it only provides a randomized base address
to the kernel text. Now, this is in line with all the userspace ASLR:
each one of stack, mmap, brk, and text have their ASLR done via base
address offsets. ASLR is a technique, and is frequently confused with
"coverage", which I have been trying to fix. i.e. Kernel ASLR of text
base address is one piece of coverage, but it leaves other things not
yet ASLRed.

So, now that we have something else besides base text address ASLR,
I'm happy to create a configs that carry the name "KASLR".

> But ... I don't see it explained anywhere why a user, once he expressed interest
> in randomization would even want to _not_ randomize as much as possible. I.e. why
> does this new option exist at all, shouldn't we just gradually extend
> randomization to more memory areas and control it via CONFIG_RANDOMIZE_BASE?

I think it's important to maintain a higher granularity of CONFIG
options since the maturity of each given feature will vary from arch
to arch. Also, it becomes hard to compare, for example, an x86 .config
with an arm64 .config to see which features are enabled. If
CONFIG_RANDOMIZE_MEMORY landed, it'd be easy to see that it is missing
on arm64 from looking at .config files.

Now, that said, a lot of people have wanted a "make this build as
secure as possible" CONFIG, so it could be nice to have a CONFIG_KASLR
that selects each of the available sub-configs.

> Btw., CONFIG_RANDOMIZE_BASE is probably a misnomer, and after these patches it
> will be more of a misnomer - so we should rename it to something better. For
> example CONFIG_KASLR would have the advantage of being obviously named at a
> glance, to anyone who knows what 'ASLR' means. To those who don't know the short
> description will tell it:
>
>   Kernel address space layout randomization (KASLR) [Y/n/?]
>
> This would also fit the kernel internal naming of kaslr.c/etc.
>
> What do you think?

How about something like this:

CONFIG_KASLR
    depends on HAVE_ARCH_KASLR_TEXT || HAVE_ARCH_KASLR_MEMORY
CONFIG_KASLR_TEXT
    depends on CONFIG_KASLR && HAVE_ARCH_KASLR_TEXT
    default CONFIG_KASLR
CONFIG_KASLR_MEMORY
    depends on CONFIG_KASLR && HAVE_ARCH_KASLR_MEMORY
    default CONFIG_KASLR

And then we can select HAVE_ARCH_KASLR_TEXT for x86, arm64, and MIPS,
and when this lands, x86 would gain HAVE_ARCH_KASLR_MEMORY.

-Kees

-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v6 1/3] x86/mm: PUD VA support for physical mapping (x86_64)
  2016-06-17  9:02   ` Ingo Molnar
@ 2016-06-20 16:17     ` Thomas Garnier
  2016-06-21  8:18       ` Ingo Molnar
  0 siblings, 1 reply; 10+ messages in thread
From: Thomas Garnier @ 2016-06-20 16:17 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kees Cook, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Borislav Petkov, Juergen Gross, Matt Fleming, Toshi Kani,
	Baoquan He, Andrew Morton, Dan Williams, Dave Hansen,
	Aneesh Kumar K.V, Kirill A. Shutemov, Martin Schwidefsky,
	Andy Lutomirski, Alexander Kuleshov, Alexander Popov,
	Joerg Roedel, Dave Young, Lv Zheng, Mark Salter, Stephen Smalley,
	Dmitry Vyukov, Boris Ostrovsky, David Rientjes,
	Christian Borntraeger, Jan Beulich, Kefeng Wang, Seth Jennings,
	Yinghai Lu, LKML

On Fri, Jun 17, 2016 at 2:02 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Kees Cook <keescook@chromium.org> wrote:
>
>> From: Thomas Garnier <thgarnie@google.com>
>>
>> Minor change that allows early boot physical mapping of PUD level virtual
>> addresses. The current implementation expects the virtual address to be
>> PUD aligned. For KASLR memory randomization, we need to be able to
>> randomize the offset used on the PUD table.
>>
>> It has no impact on current usage.
>>
>> Signed-off-by: Thomas Garnier <thgarnie@google.com>
>> Signed-off-by: Kees Cook <keescook@chromium.org>
>> ---
>>  arch/x86/mm/init_64.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
>> index bce2e5d9edd4..f205f39bd808 100644
>> --- a/arch/x86/mm/init_64.c
>> +++ b/arch/x86/mm/init_64.c
>> @@ -454,10 +454,10 @@ phys_pud_init(pud_t *pud_page, unsigned long addr, unsigned long end,
>>  {
>>       unsigned long pages = 0, next;
>>       unsigned long last_map_addr = end;
>> -     int i = pud_index(addr);
>> +     int i = pud_index((unsigned long)__va(addr));
>>
>>
>>       for (; i < PTRS_PER_PUD; i++, addr = next) {
>> -             pud_t *pud = pud_page + pud_index(addr);
>> +             pud_t *pud = pud_page + pud_index((unsigned long)__va(addr));
>>               pmd_t *pmd;
>>               pgprot_t prot = PAGE_KERNEL;
>
> So I really dislike two things about this code.
>
> Firstly a pre-existing problem is that the parameter names to phys_pud_init()
> suck:
>
> static unsigned long __meminit
> phys_pud_init(pud_t *pud_page, unsigned long addr, unsigned long end,
>                          unsigned long page_size_mask)
>
> so 'unsigned long addr' is usually the signature of a virtual address - but that's
> no true here: it's a physical address.
>
> Same goes for 'unsigned long end'. Plus it's unclear what the connection between
> 'addr' and 'end' - it's not at all obvious 'at a glance' that they are the start
> and end addresses of a physical memory range.
>
> All of these problems can be solved by renaming them to 'paddr_start' and
> 'paddr_end'.
>
> Btw., I believe this misnomer and confusing code resulted in the buggy
> 'pud_index(addr)' not being noticed to begin with ...
>

I will add a new commit that rename variables as described.

> Secondly, and that's a new problem introduced by this patch:
>
>> +     int i = pud_index((unsigned long)__va(addr));
>> +             pud_t *pud = pud_page + pud_index((unsigned long)__va(addr));
>
> ... beyond the repetition, using type casts is fragile. Type casts should be a red
> flag to anyone involved in low level, security relevant code! So I'm pretty
> unhappy about seeing such a problem in such a patch.
>
> This code should be doing something like:
>
>         unsigned long vaddr_start = __va(paddr_start);
>
> ... which gets rid of the type cast, the repetition and documents the code much
> better as well.

Unfortunately, we can't do that because __va return a void*. We will
get this warning on compile:

arch/x86/mm/init_64.c:537:8: warning: assignment makes integer from
pointer without a cast [enabled by default]
  vaddr = __va(paddr_start);

If we used void*, we would need to type cast even more places. What do
you think?

> Also see how easily the connection between the variables is
> self-documented just by picking names carefully:
>
>         paddr_start
>         paddr_end
>         vaddr_start
>         vaddr_end
>

Will do on kernel_physical_mapping_init down.

> Also, _please_ add a comment to phys_pud_init() that explains what the function
> does.
>

Will do.

> Thanks,
>
>         Ingo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v6 1/3] x86/mm: PUD VA support for physical mapping (x86_64)
  2016-06-20 16:17     ` Thomas Garnier
@ 2016-06-21  8:18       ` Ingo Molnar
  0 siblings, 0 replies; 10+ messages in thread
From: Ingo Molnar @ 2016-06-21  8:18 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Kees Cook, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Borislav Petkov, Juergen Gross, Matt Fleming, Toshi Kani,
	Baoquan He, Andrew Morton, Dan Williams, Dave Hansen,
	Aneesh Kumar K.V, Kirill A. Shutemov, Martin Schwidefsky,
	Andy Lutomirski, Alexander Kuleshov, Alexander Popov,
	Joerg Roedel, Dave Young, Lv Zheng, Mark Salter, Stephen Smalley,
	Dmitry Vyukov, Boris Ostrovsky, David Rientjes,
	Christian Borntraeger, Jan Beulich, Kefeng Wang, Seth Jennings,
	Yinghai Lu, LKML


* Thomas Garnier <thgarnie@google.com> wrote:

> > Secondly, and that's a new problem introduced by this patch:
> >
> >> +     int i = pud_index((unsigned long)__va(addr));
> >> +             pud_t *pud = pud_page + pud_index((unsigned long)__va(addr));
> >
> > ... beyond the repetition, using type casts is fragile. Type casts should be a red
> > flag to anyone involved in low level, security relevant code! So I'm pretty
> > unhappy about seeing such a problem in such a patch.
> >
> > This code should be doing something like:
> >
> >         unsigned long vaddr_start = __va(paddr_start);
> >
> > ... which gets rid of the type cast, the repetition and documents the code much
> > better as well.
> 
> Unfortunately, we can't do that because __va return a void*. We will
> get this warning on compile:
> 
> arch/x86/mm/init_64.c:537:8: warning: assignment makes integer from
> pointer without a cast [enabled by default]
>   vaddr = __va(paddr_start);
> 
> If we used void*, we would need to type cast even more places. What do
> you think?

Hm, indeed, you are right - so I guess the type cast is OK.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v6 2/3] x86/mm: Implement ASLR for kernel memory sections (x86_64)
  2016-06-17 10:26   ` Ingo Molnar
  2016-06-17 17:29     ` Kees Cook
@ 2016-06-21 16:46     ` Thomas Garnier
  1 sibling, 0 replies; 10+ messages in thread
From: Thomas Garnier @ 2016-06-21 16:46 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kees Cook, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Borislav Petkov, Juergen Gross, Matt Fleming, Toshi Kani,
	Baoquan He, Andrew Morton, Dan Williams, Dave Hansen,
	Aneesh Kumar K.V, Kirill A. Shutemov, Martin Schwidefsky,
	Andy Lutomirski, Alexander Kuleshov, Alexander Popov,
	Joerg Roedel, Dave Young, Lv Zheng, Mark Salter, Stephen Smalley,
	Dmitry Vyukov, Boris Ostrovsky, David Rientjes,
	Christian Borntraeger, Jan Beulich, Kefeng Wang, Seth Jennings,
	Yinghai Lu, LKML

On Fri, Jun 17, 2016 at 3:26 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Kees Cook <keescook@chromium.org> wrote:
>
>> --- a/arch/x86/Kconfig
>> +++ b/arch/x86/Kconfig
>> @@ -1993,6 +1993,23 @@ config PHYSICAL_ALIGN
>>
>>         Don't change this unless you know what you are doing.
>>
>> +config RANDOMIZE_MEMORY
>> +     bool "Randomize the kernel memory sections"
>> +     depends on X86_64
>> +     depends on RANDOMIZE_BASE
>> +     default RANDOMIZE_BASE
>> +     ---help---
>> +        Randomizes the base virtual address of kernel memory sections
>> +        (physical memory mapping, vmalloc & vmemmap). This security feature
>> +        makes exploits relying on predictable memory locations less reliable.
>> +
>> +        The order of allocations remains unchanged. Entropy is generated in
>> +        the same way as RANDOMIZE_BASE. Current implementation in the optimal
>> +        configuration have in average 30,000 different possible virtual
>> +        addresses for each memory section.
>> +
>> +        If unsure, say N.
>
> So this is really poor naming!
>
> The user, in first approximation, will see something like this:
>
>   Randomize the kernel memory sections (RANDOMIZE_MEMORY) [Y/n/?] (NEW)
>
> ... and could naively conclude that this feature will randomize memory contents.
>
> Furthermore, there's no apparent connection to another, related kernel feature:
>
>   Randomize the address of the kernel image (KASLR) (RANDOMIZE_BASE) [Y/n/?]
>
> A better naming would be something like:
>
>      Randomize other static kernel memory addresses (RANDOMIZE_ALL) [Y/n/?] (NEW)
>
> (assuming it's truly 'all')
>
> But ... I don't see it explained anywhere why a user, once he expressed interest
> in randomization would even want to _not_ randomize as much as possible. I.e. why
> does this new option exist at all, shouldn't we just gradually extend
> randomization to more memory areas and control it via CONFIG_RANDOMIZE_BASE?
>
> Btw., CONFIG_RANDOMIZE_BASE is probably a misnomer, and after these patches it
> will be more of a misnomer - so we should rename it to something better. For
> example CONFIG_KASLR would have the advantage of being obviously named at a
> glance, to anyone who knows what 'ASLR' means. To those who don't know the short
> description will tell it:
>
>   Kernel address space layout randomization (KASLR) [Y/n/?]
>
> This would also fit the kernel internal naming of kaslr.c/etc.
>
> What do you think?
>
>> +++ b/arch/x86/mm/kaslr.c
>> @@ -0,0 +1,167 @@
>> +/*
>> + * This file implements KASLR memory randomization for x86_64. It randomizes
>> + * the virtual address space of kernel memory sections (physical memory
>> + * mapping, vmalloc & vmemmap) for x86_64. This security feature mitigates
>> + * exploits relying on predictable kernel addresses.
>> + *
>> + * Entropy is generated using the KASLR early boot functions now shared in
>> + * the lib directory (originally written by Kees Cook). Randomization is
>> + * done on PGD & PUD page table levels to increase possible addresses. The
>> + * physical memory mapping code was adapted to support PUD level virtual
>> + * addresses. This implementation on the best configuration provides 30,000
>> + * possible virtual addresses in average for each memory section.  An
>> + * additional low memory page is used to ensure each CPU can start with a
>> + * PGD aligned virtual address (for realmode).
>> + *
>> + * The order of each memory section is not changed. The feature looks at
>> + * the available space for the sections based on different configuration
>> + * options and randomizes the base and space between each. The size of the
>> + * physical memory mapping is the available physical memory.
>> + *
>> + */
>
> (Stray newline.)
>
>> +#include <linux/kernel.h>
>> +#include <linux/errno.h>
>> +#include <linux/types.h>
>> +#include <linux/mm.h>
>> +#include <linux/smp.h>
>> +#include <linux/init.h>
>> +#include <linux/memory.h>
>> +#include <linux/random.h>
>> +
>> +#include <asm/processor.h>
>> +#include <asm/pgtable.h>
>> +#include <asm/pgalloc.h>
>> +#include <asm/e820.h>
>> +#include <asm/init.h>
>> +#include <asm/setup.h>
>> +#include <asm/kaslr.h>
>> +#include <asm/kasan.h>
>> +
>> +#include "mm_internal.h"
>
> How many of those include lines are truly unnecessary, or did this just got copied
> from another file?
>

No that's correct. I will clean that up.

>> +/*
>> + * Memory base and end randomization is based on different configurations.
>> + * We want as much space as possible to increase entropy available.
>> + */
>> +static const unsigned long memory_rand_start = __PAGE_OFFSET_BASE;
>> +
>> +#if defined(CONFIG_KASAN)
>> +static const unsigned long memory_rand_end = KASAN_SHADOW_START;
>> +#elif defined(CONFIG_X86_ESPFIX64)
>> +static const unsigned long memory_rand_end = ESPFIX_BASE_ADDR;
>> +#elif defined(CONFIG_EFI)
>> +static const unsigned long memory_rand_end = EFI_VA_START;
>> +#else
>> +static const unsigned long memory_rand_end = __START_KERNEL_map;
>> +#endif
>
> It's unclear to me from the naming and the description whether these are virtual
> or physical memory ranges. Nor is it explained why this laundry list of kernel
> options influences the randomization range - and it's not explained how new kernel
> features modifying the virtual memory layout should plug into this mechanism.
>
> Could we please get a bit more verbose introduction into why all of this is done?
>

The comment was describing that we want as much as possible for
entropy. I will make that clearer and I will add information on how to
change changing these variables.

>> +/* Default values */
>> +unsigned long page_offset_base = __PAGE_OFFSET_BASE;
>> +EXPORT_SYMBOL(page_offset_base);
>> +unsigned long vmalloc_base = __VMALLOC_BASE;
>> +EXPORT_SYMBOL(vmalloc_base);
>> +unsigned long vmemmap_base = __VMEMMAP_BASE;
>> +EXPORT_SYMBOL(vmemmap_base);
>> +
>> +/* Describe each randomized memory sections in sequential order */
>> +static __initdata struct kaslr_memory_region {
>> +     unsigned long *base;
>> +     unsigned short size_tb;
>> +} kaslr_regions[] = {
>> +     { &page_offset_base, 64/* Maximum */ },
>> +     { &vmalloc_base, VMALLOC_SIZE_TB },
>> +     { &vmemmap_base, 1 },
>> +};
>
> So this comment is totally misleading. This area does not 'describe' each memory
> section, it actually puts live pointers to virtual memory address offset control
> variables used by other subsystems into this array - allowing us to randomize them
> programmatically.
>
> This patch should be split up into 4 components:
>
>  - core patch adding all the infrastructure but not actually randomizing anything,
>    the kaslr_regions[] array is empty.
>
>  - a patch adding page_offset_base randomization.
>
>  - a patch adding vmalloc_base randomization.
>
>  - a patch adding vmemmap_base randomization.
>
> This way if any breakage is introduced by this randomization it can be bisected to
> in a finegrained fashion. It would also have saved 5 minutes from my life trying
> to interpret the code here :-/
>
> In fact I'd suggest a series of 7 patches, with each randomization patch split
> into two further patches:
>
>  - a patch making address offset variables dynamic.
>
>  - a patch actually enabling the randomization of that virtual address range.
>
> This structure allows us to carefully evaluate the code size and runtime impact of
> randomizing a given range.
>

Make sense. I will build a 4 component patch a described.

>> +
>> +/* Size in Terabytes + 1 hole */
>> +static __init unsigned long get_padding(struct kaslr_memory_region *region)
>> +{
>> +     return ((unsigned long)region->size_tb + 1) << TB_SHIFT;
>> +}
>
> So why is size_tb a short, forcing ugly type casts like this? Also, since this
> appears to be a trivial shift, why isn't this an inline?
>

Changed.

>> +/* Initialize base and padding for each memory section randomized with KASLR */
>> +void __init kernel_randomize_memory(void)
>> +{
>> +     size_t i;
>> +     unsigned long addr = memory_rand_start;
>> +     unsigned long padding, rand, mem_tb;
>> +     struct rnd_state rnd_st;
>> +     unsigned long remain_padding = memory_rand_end - memory_rand_start;
>
> Yeah, so these variable names are awful :-/
>
> Look at how postfix and prefix naming is mixed in the same function:
> 'memory_rand_start', but 'remain_padding'. Please use postfixes for all of them.
>
> Also why is one variable named 'rand', another 'rnd'? Did we run out of 'a'
> characters?
>
> Similarly, we have 'memory' and 'mem'. Pick one variant and harmonize!
>
> This patch is an object lesson in how fragile, insecure code is written.
>

Will change.

>> +
>> +     /*
>> +      * All these BUILD_BUG_ON checks ensures the memory layout is
>> +      * consistent with the current KASLR design.
>> +      */
>> +     BUILD_BUG_ON(memory_rand_start >= memory_rand_end);
>> +     BUILD_BUG_ON(config_enabled(CONFIG_KASAN) &&
>> +             memory_rand_end >= ESPFIX_BASE_ADDR);
>> +     BUILD_BUG_ON((config_enabled(CONFIG_KASAN) ||
>> +                     config_enabled(CONFIG_X86_ESPFIX64)) &&
>> +             memory_rand_end >= EFI_VA_START);
>> +     BUILD_BUG_ON((config_enabled(CONFIG_KASAN) ||
>> +                     config_enabled(CONFIG_X86_ESPFIX64) ||
>> +                     config_enabled(CONFIG_EFI)) &&
>> +             memory_rand_end >= __START_KERNEL_map);
>> +     BUILD_BUG_ON(memory_rand_end > __START_KERNEL_map);
>> +
>> +     if (!kaslr_enabled())
>> +             return;
>> +
>> +     BUG_ON(kaslr_regions[0].base != &page_offset_base);
>> +     mem_tb = ((max_pfn << PAGE_SHIFT) >> TB_SHIFT);
>
> Why is the granularity of the randomization 1TB? I don't see this explained
> anywhere. Some of the randomization may have PUD-granularity limitation - but PUD
> entries have a size of 512 GB.
>
> I think we should extend the kaslr_regions[] table with a minimum required
> alignment field instead of random granularity limitations.
>

I will change it so that the next region can start on the next PUD
entry with minimum padding.

>> +/*
>> + * Create PGD aligned trampoline table to allow real mode initialization
>> + * of additional CPUs. Consume only 1 low memory page.
>> + */
>> +void __meminit init_trampoline(void)
>
> This too should probably be a separate patch, for bisectability and reviewability
> reasons.
>

Will do.

>> +     unsigned long addr, next;
>> +     pgd_t *pgd;
>> +     pud_t *pud_page, *tr_pud_page;
>> +     int i;
>
> I had to look twice to understand that 'tr' stands for 'trampoline'. Please rename
> to 'pud_page_tramp' or so.
>

Will do

>> +     if (!kaslr_enabled())
>> +             return;
>> +
>> +     tr_pud_page = alloc_low_page();
>> +     set_pgd(&trampoline_pgd_entry, __pgd(_PAGE_TABLE | __pa(tr_pud_page)));
>
> Shouldn't we set it _after_ we've initialized the tr_pud_page page table?
>
> Also, why _PAGE_TABLE? AFAICS using _PAGE_TABLE adds _PAGE_USER.
>

It is used by pgd_populate in kernel_physical_mapping_init. I will
update to _KERNPG_TABLE.

>> +
>> +     addr = 0;
>> +     pgd = pgd_offset_k((unsigned long)__va(addr));
>
> Why the type cast?
>
> Also, similar complaint to the first patch: why is a physical address named in an
> ambiguous fashion, as 'addr'?
>

Will adapt as the other patch.

>> +     pud_page = (pud_t *) pgd_page_vaddr(*pgd);
>
> Side note: it appears all users of pgd_page_vaddr() are casting it immediately to
> a pointer type. Could we change the return type of this function to 'void *' or
> 'pud_t *'? AFAICS it's mostly used in platform code.
>

I agree but I think that should be a separate change.

>> +
>> +     for (i = pud_index(addr); i < PTRS_PER_PUD; i++, addr = next) {
>> +             pud_t *pud, *tr_pud;
>> +
>> +             tr_pud = tr_pud_page + pud_index(addr);
>> +             pud = pud_page + pud_index((unsigned long)__va(addr));
>
> Why is this type cast needed?
>

Because __va return a pointer and pud_index expect an unsigned long. I
will edit in a similar way as the other patch.

>> +             next = (addr & PUD_MASK) + PUD_SIZE;
>> +
>> +             /* Needed to copy pte or pud alike */
>> +             BUILD_BUG_ON(sizeof(pud_t) != sizeof(pte_t));
>> +             *tr_pud = *pud;
>
> So I don't understand what brought the size of 'pte' into this code. We are
> copying pud entries around AFAICS, why should the size of the pte entry matter?
>

To support PSE. Depending of the page_size_mask this entry can be a
pud entry or a pte. I will remove the BUILD_BUG_ON, it is confusing.

> Thanks,
>
>         Ingo>

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2016-06-21 16:47 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-25 22:57 [PATCH v6 0/3] x86/mm: memory area address KASLR Kees Cook
2016-05-25 22:57 ` [PATCH v6 1/3] x86/mm: PUD VA support for physical mapping (x86_64) Kees Cook
2016-06-17  9:02   ` Ingo Molnar
2016-06-20 16:17     ` Thomas Garnier
2016-06-21  8:18       ` Ingo Molnar
2016-05-25 22:57 ` [PATCH v6 2/3] x86/mm: Implement ASLR for kernel memory sections (x86_64) Kees Cook
2016-06-17 10:26   ` Ingo Molnar
2016-06-17 17:29     ` Kees Cook
2016-06-21 16:46     ` Thomas Garnier
2016-05-25 22:57 ` [PATCH v6 3/3] x86/mm: Memory hotplug support for KASLR memory randomization Kees Cook

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).