linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [GIT PULL] x86/kaslr for v3.14
@ 2014-01-20 16:47 H. Peter Anvin
  2014-01-20 22:54 ` Linus Torvalds
  2014-01-26 10:16 ` Richard Weinberger
  0 siblings, 2 replies; 70+ messages in thread
From: H. Peter Anvin @ 2014-01-20 16:47 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Cong Ding, H. Peter Anvin, H. Peter Anvin, Ingo Molnar,
	Ingo Molnar, Kees Cook, Linux Kernel Mailing List,
	Mathias Krause, Michael Davidson, Thomas Gleixner, Wei Yongjun

Hi Linus,

This enables kernel address space randomization for x86.

The following changes since commit d0e639c9e06d44e713170031fe05fb60ebe680af:

  Linux 3.12-rc4 (2013-10-06 14:00:20 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86-kaslr-for-linus

The head of this tree is da2b6fb990cf782b18952f534ec7323453bc4fc9.

  x86, kaslr: Clarify RANDOMIZE_BASE_MAX_OFFSET (2014-01-14 10:45:56 -0800)

----------------------------------------------------------------
H. Peter Anvin (2):
      x86, boot: Rename get_flags() and check_flags() to *_cpuflags()
      x86, kaslr: Add a circular multiply for better bit diffusion

Kees Cook (10):
      x86, boot: Move CPU flags out of cpucheck
      x86, kaslr: Return location from decompress_kernel
      x86, kaslr: Provide randomness functions
      x86, kaslr: Select random position from e820 maps
      x86, kaslr: Report kernel offset on panic
      x86, kaslr: Raise the maximum virtual address to -1 GiB on x86_64
      x86/relocs: Add percpu fixup for GNU ld 2.23
      x86, kaslr: Mix entropy sources together as needed
      x86, kaslr: Use char array to gain sizeof sanity
      x86, kaslr: Clarify RANDOMIZE_BASE_MAX_OFFSET

Michael Davidson (1):
      x86, relocs: Add more per-cpu gold special cases

Wei Yongjun (1):
      x86, kaslr: Remove unused including <linux/version.h>

 Documentation/kernel-parameters.txt     |   4 +
 arch/x86/Kconfig                        |  59 +++++-
 arch/x86/boot/Makefile                  |   2 +-
 arch/x86/boot/boot.h                    |  10 +-
 arch/x86/boot/compressed/Makefile       |   2 +-
 arch/x86/boot/compressed/aslr.c         | 316 ++++++++++++++++++++++++++++++++
 arch/x86/boot/compressed/cmdline.c      |   2 +-
 arch/x86/boot/compressed/cpuflags.c     |  12 ++
 arch/x86/boot/compressed/head_32.S      |  10 +-
 arch/x86/boot/compressed/head_64.S      |  16 +-
 arch/x86/boot/compressed/misc.c         |  18 +-
 arch/x86/boot/compressed/misc.h         |  37 +++-
 arch/x86/boot/cpucheck.c                | 100 +---------
 arch/x86/boot/cpuflags.c                | 104 +++++++++++
 arch/x86/boot/cpuflags.h                |  19 ++
 arch/x86/include/asm/archrandom.h       |  21 +++
 arch/x86/include/asm/page_64_types.h    |  15 +-
 arch/x86/include/asm/pgtable_64_types.h |   2 +-
 arch/x86/kernel/cpu/rdrand.c            |  14 --
 arch/x86/kernel/setup.c                 |  26 +++
 arch/x86/mm/init_32.c                   |   3 +
 arch/x86/tools/relocs.c                 |  20 +-
 22 files changed, 654 insertions(+), 158 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index fcbb736d55fe..773fc4c077b4 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1975,6 +1975,10 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 	noapic		[SMP,APIC] Tells the kernel to not make use of any
 			IOAPICs that may be present in the system.
 
+	nokaslr		[X86]
+			Disable kernel base offset ASLR (Address Space
+			Layout Randomization) if built into the kernel.
+
 	noautogroup	Disable scheduler automatic task group creation.
 
 	nobats		[PPC] Do not use BATs for mapping kernel lowmem
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index ee2fb9d37745..5c9e19dccf2f 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1722,16 +1722,67 @@ config RELOCATABLE
 
 	  Note: If CONFIG_RELOCATABLE=y, then the kernel runs from the address
 	  it has been loaded at and the compile time physical address
-	  (CONFIG_PHYSICAL_START) is ignored.
+	  (CONFIG_PHYSICAL_START) is used as the minimum location.
 
-# Relocation on x86-32 needs some additional build support
+config RANDOMIZE_BASE
+	bool "Randomize the address of the kernel image"
+	depends on RELOCATABLE
+	depends on !HIBERNATION
+	default n
+	---help---
+	   Randomizes the physical and virtual address at which the
+	   kernel image is decompressed, as a security feature that
+	   deters exploit attempts relying on knowledge of the location
+	   of kernel internals.
+
+	   Entropy is generated using the RDRAND instruction if it is
+	   supported. If RDTSC is supported, it is used as well. If
+	   neither RDRAND nor RDTSC are supported, then randomness is
+	   read from the i8254 timer.
+
+	   The kernel will be offset by up to RANDOMIZE_BASE_MAX_OFFSET,
+	   and aligned according to PHYSICAL_ALIGN. Since the kernel is
+	   built using 2GiB addressing, and PHYSICAL_ALGIN must be at a
+	   minimum of 2MiB, only 10 bits of entropy is theoretically
+	   possible. At best, due to page table layouts, 64-bit can use
+	   9 bits of entropy and 32-bit uses 8 bits.
+
+	   If unsure, say N.
+
+config RANDOMIZE_BASE_MAX_OFFSET
+	hex "Maximum kASLR offset allowed" if EXPERT
+	depends on RANDOMIZE_BASE
+	range 0x0 0x20000000 if X86_32
+	default "0x20000000" if X86_32
+	range 0x0 0x40000000 if X86_64
+	default "0x40000000" if X86_64
+	---help---
+	  The lesser of RANDOMIZE_BASE_MAX_OFFSET and available physical
+	  memory is used to determine the maximal offset in bytes that will
+	  be applied to the kernel when kernel Address Space Layout
+	  Randomization (kASLR) is active. This must be a multiple of
+	  PHYSICAL_ALIGN.
+
+	  On 32-bit this is limited to 512MiB by page table layouts. The
+	  default is 512MiB.
+
+	  On 64-bit this is limited by how the kernel fixmap page table is
+	  positioned, so this cannot be larger than 1GiB currently. Without
+	  RANDOMIZE_BASE, there is a 512MiB to 1.5GiB split between kernel
+	  and modules. When RANDOMIZE_BASE_MAX_OFFSET is above 512MiB, the
+	  modules area will shrink to compensate, up to the current maximum
+	  1GiB to 1GiB split. The default is 1GiB.
+
+	  If unsure, leave at the default value.
+
+# Relocation on x86 needs some additional build support
 config X86_NEED_RELOCS
 	def_bool y
-	depends on X86_32 && RELOCATABLE
+	depends on RANDOMIZE_BASE || (X86_32 && RELOCATABLE)
 
 config PHYSICAL_ALIGN
 	hex "Alignment value to which kernel should be aligned"
-	default "0x1000000"
+	default "0x200000"
 	range 0x2000 0x1000000 if X86_32
 	range 0x200000 0x1000000 if X86_64
 	---help---
diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
index 379814bc41e3..0da2e37b37c3 100644
--- a/arch/x86/boot/Makefile
+++ b/arch/x86/boot/Makefile
@@ -20,7 +20,7 @@ targets		:= vmlinux.bin setup.bin setup.elf bzImage
 targets		+= fdimage fdimage144 fdimage288 image.iso mtools.conf
 subdir-		:= compressed
 
-setup-y		+= a20.o bioscall.o cmdline.o copy.o cpu.o cpucheck.o
+setup-y		+= a20.o bioscall.o cmdline.o copy.o cpu.o cpuflags.o cpucheck.o
 setup-y		+= early_serial_console.o edd.o header.o main.o mca.o memory.o
 setup-y		+= pm.o pmjump.o printf.o regs.o string.o tty.o video.o
 setup-y		+= video-mode.o version.o
diff --git a/arch/x86/boot/boot.h b/arch/x86/boot/boot.h
index ef72baeff484..50f8c5e0f37e 100644
--- a/arch/x86/boot/boot.h
+++ b/arch/x86/boot/boot.h
@@ -26,9 +26,8 @@
 #include <asm/boot.h>
 #include <asm/setup.h>
 #include "bitops.h"
-#include <asm/cpufeature.h>
-#include <asm/processor-flags.h>
 #include "ctype.h"
+#include "cpuflags.h"
 
 /* Useful macros */
 #define BUILD_BUG_ON(condition) ((void)sizeof(char[1 - 2*!!(condition)]))
@@ -307,14 +306,7 @@ static inline int cmdline_find_option_bool(const char *option)
 	return __cmdline_find_option_bool(cmd_line_ptr, option);
 }
 
-
 /* cpu.c, cpucheck.c */
-struct cpu_features {
-	int level;		/* Family, or 64 for x86-64 */
-	int model;
-	u32 flags[NCAPINTS];
-};
-extern struct cpu_features cpu;
 int check_cpu(int *cpu_level_ptr, int *req_level_ptr, u32 **err_flags_ptr);
 int validate_cpu(void);
 
diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index dcd90df10ab4..ae8b5dbbd8c5 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -27,7 +27,7 @@ HOST_EXTRACFLAGS += -I$(srctree)/tools/include
 
 VMLINUX_OBJS = $(obj)/vmlinux.lds $(obj)/head_$(BITS).o $(obj)/misc.o \
 	$(obj)/string.o $(obj)/cmdline.o $(obj)/early_serial_console.o \
-	$(obj)/piggy.o
+	$(obj)/piggy.o $(obj)/cpuflags.o $(obj)/aslr.o
 
 $(obj)/eboot.o: KBUILD_CFLAGS += -fshort-wchar -mno-red-zone
 
diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
new file mode 100644
index 000000000000..90a21f430117
--- /dev/null
+++ b/arch/x86/boot/compressed/aslr.c
@@ -0,0 +1,316 @@
+#include "misc.h"
+
+#ifdef CONFIG_RANDOMIZE_BASE
+#include <asm/msr.h>
+#include <asm/archrandom.h>
+#include <asm/e820.h>
+
+#include <generated/compile.h>
+#include <linux/module.h>
+#include <linux/uts.h>
+#include <linux/utsname.h>
+#include <generated/utsrelease.h>
+
+/* Simplified build-specific string for starting entropy. */
+static const char build_str[] = UTS_RELEASE " (" LINUX_COMPILE_BY "@"
+		LINUX_COMPILE_HOST ") (" LINUX_COMPILER ") " UTS_VERSION;
+
+#define I8254_PORT_CONTROL	0x43
+#define I8254_PORT_COUNTER0	0x40
+#define I8254_CMD_READBACK	0xC0
+#define I8254_SELECT_COUNTER0	0x02
+#define I8254_STATUS_NOTREADY	0x40
+static inline u16 i8254(void)
+{
+	u16 status, timer;
+
+	do {
+		outb(I8254_PORT_CONTROL,
+		     I8254_CMD_READBACK | I8254_SELECT_COUNTER0);
+		status = inb(I8254_PORT_COUNTER0);
+		timer  = inb(I8254_PORT_COUNTER0);
+		timer |= inb(I8254_PORT_COUNTER0) << 8;
+	} while (status & I8254_STATUS_NOTREADY);
+
+	return timer;
+}
+
+static unsigned long rotate_xor(unsigned long hash, const void *area,
+				size_t size)
+{
+	size_t i;
+	unsigned long *ptr = (unsigned long *)area;
+
+	for (i = 0; i < size / sizeof(hash); i++) {
+		/* Rotate by odd number of bits and XOR. */
+		hash = (hash << ((sizeof(hash) * 8) - 7)) | (hash >> 7);
+		hash ^= ptr[i];
+	}
+
+	return hash;
+}
+
+/* Attempt to create a simple but unpredictable starting entropy. */
+static unsigned long get_random_boot(void)
+{
+	unsigned long hash = 0;
+
+	hash = rotate_xor(hash, build_str, sizeof(build_str));
+	hash = rotate_xor(hash, real_mode, sizeof(*real_mode));
+
+	return hash;
+}
+
+static unsigned long get_random_long(void)
+{
+#ifdef CONFIG_X86_64
+	const unsigned long mix_const = 0x5d6008cbf3848dd3UL;
+#else
+	const unsigned long mix_const = 0x3f39e593UL;
+#endif
+	unsigned long raw, random = get_random_boot();
+	bool use_i8254 = true;
+
+	debug_putstr("KASLR using");
+
+	if (has_cpuflag(X86_FEATURE_RDRAND)) {
+		debug_putstr(" RDRAND");
+		if (rdrand_long(&raw)) {
+			random ^= raw;
+			use_i8254 = false;
+		}
+	}
+
+	if (has_cpuflag(X86_FEATURE_TSC)) {
+		debug_putstr(" RDTSC");
+		rdtscll(raw);
+
+		random ^= raw;
+		use_i8254 = false;
+	}
+
+	if (use_i8254) {
+		debug_putstr(" i8254");
+		random ^= i8254();
+	}
+
+	/* Circular multiply for better bit diffusion */
+	asm("mul %3"
+	    : "=a" (random), "=d" (raw)
+	    : "a" (random), "rm" (mix_const));
+	random += raw;
+
+	debug_putstr("...\n");
+
+	return random;
+}
+
+struct mem_vector {
+	unsigned long start;
+	unsigned long size;
+};
+
+#define MEM_AVOID_MAX 5
+struct mem_vector mem_avoid[MEM_AVOID_MAX];
+
+static bool mem_contains(struct mem_vector *region, struct mem_vector *item)
+{
+	/* Item at least partially before region. */
+	if (item->start < region->start)
+		return false;
+	/* Item at least partially after region. */
+	if (item->start + item->size > region->start + region->size)
+		return false;
+	return true;
+}
+
+static bool mem_overlaps(struct mem_vector *one, struct mem_vector *two)
+{
+	/* Item one is entirely before item two. */
+	if (one->start + one->size <= two->start)
+		return false;
+	/* Item one is entirely after item two. */
+	if (one->start >= two->start + two->size)
+		return false;
+	return true;
+}
+
+static void mem_avoid_init(unsigned long input, unsigned long input_size,
+			   unsigned long output, unsigned long output_size)
+{
+	u64 initrd_start, initrd_size;
+	u64 cmd_line, cmd_line_size;
+	unsigned long unsafe, unsafe_len;
+	char *ptr;
+
+	/*
+	 * Avoid the region that is unsafe to overlap during
+	 * decompression (see calculations at top of misc.c).
+	 */
+	unsafe_len = (output_size >> 12) + 32768 + 18;
+	unsafe = (unsigned long)input + input_size - unsafe_len;
+	mem_avoid[0].start = unsafe;
+	mem_avoid[0].size = unsafe_len;
+
+	/* Avoid initrd. */
+	initrd_start  = (u64)real_mode->ext_ramdisk_image << 32;
+	initrd_start |= real_mode->hdr.ramdisk_image;
+	initrd_size  = (u64)real_mode->ext_ramdisk_size << 32;
+	initrd_size |= real_mode->hdr.ramdisk_size;
+	mem_avoid[1].start = initrd_start;
+	mem_avoid[1].size = initrd_size;
+
+	/* Avoid kernel command line. */
+	cmd_line  = (u64)real_mode->ext_cmd_line_ptr << 32;
+	cmd_line |= real_mode->hdr.cmd_line_ptr;
+	/* Calculate size of cmd_line. */
+	ptr = (char *)(unsigned long)cmd_line;
+	for (cmd_line_size = 0; ptr[cmd_line_size++]; )
+		;
+	mem_avoid[2].start = cmd_line;
+	mem_avoid[2].size = cmd_line_size;
+
+	/* Avoid heap memory. */
+	mem_avoid[3].start = (unsigned long)free_mem_ptr;
+	mem_avoid[3].size = BOOT_HEAP_SIZE;
+
+	/* Avoid stack memory. */
+	mem_avoid[4].start = (unsigned long)free_mem_end_ptr;
+	mem_avoid[4].size = BOOT_STACK_SIZE;
+}
+
+/* Does this memory vector overlap a known avoided area? */
+bool mem_avoid_overlap(struct mem_vector *img)
+{
+	int i;
+
+	for (i = 0; i < MEM_AVOID_MAX; i++) {
+		if (mem_overlaps(img, &mem_avoid[i]))
+			return true;
+	}
+
+	return false;
+}
+
+unsigned long slots[CONFIG_RANDOMIZE_BASE_MAX_OFFSET / CONFIG_PHYSICAL_ALIGN];
+unsigned long slot_max = 0;
+
+static void slots_append(unsigned long addr)
+{
+	/* Overflowing the slots list should be impossible. */
+	if (slot_max >= CONFIG_RANDOMIZE_BASE_MAX_OFFSET /
+			CONFIG_PHYSICAL_ALIGN)
+		return;
+
+	slots[slot_max++] = addr;
+}
+
+static unsigned long slots_fetch_random(void)
+{
+	/* Handle case of no slots stored. */
+	if (slot_max == 0)
+		return 0;
+
+	return slots[get_random_long() % slot_max];
+}
+
+static void process_e820_entry(struct e820entry *entry,
+			       unsigned long minimum,
+			       unsigned long image_size)
+{
+	struct mem_vector region, img;
+
+	/* Skip non-RAM entries. */
+	if (entry->type != E820_RAM)
+		return;
+
+	/* Ignore entries entirely above our maximum. */
+	if (entry->addr >= CONFIG_RANDOMIZE_BASE_MAX_OFFSET)
+		return;
+
+	/* Ignore entries entirely below our minimum. */
+	if (entry->addr + entry->size < minimum)
+		return;
+
+	region.start = entry->addr;
+	region.size = entry->size;
+
+	/* Potentially raise address to minimum location. */
+	if (region.start < minimum)
+		region.start = minimum;
+
+	/* Potentially raise address to meet alignment requirements. */
+	region.start = ALIGN(region.start, CONFIG_PHYSICAL_ALIGN);
+
+	/* Did we raise the address above the bounds of this e820 region? */
+	if (region.start > entry->addr + entry->size)
+		return;
+
+	/* Reduce size by any delta from the original address. */
+	region.size -= region.start - entry->addr;
+
+	/* Reduce maximum size to fit end of image within maximum limit. */
+	if (region.start + region.size > CONFIG_RANDOMIZE_BASE_MAX_OFFSET)
+		region.size = CONFIG_RANDOMIZE_BASE_MAX_OFFSET - region.start;
+
+	/* Walk each aligned slot and check for avoided areas. */
+	for (img.start = region.start, img.size = image_size ;
+	     mem_contains(&region, &img) ;
+	     img.start += CONFIG_PHYSICAL_ALIGN) {
+		if (mem_avoid_overlap(&img))
+			continue;
+		slots_append(img.start);
+	}
+}
+
+static unsigned long find_random_addr(unsigned long minimum,
+				      unsigned long size)
+{
+	int i;
+	unsigned long addr;
+
+	/* Make sure minimum is aligned. */
+	minimum = ALIGN(minimum, CONFIG_PHYSICAL_ALIGN);
+
+	/* Verify potential e820 positions, appending to slots list. */
+	for (i = 0; i < real_mode->e820_entries; i++) {
+		process_e820_entry(&real_mode->e820_map[i], minimum, size);
+	}
+
+	return slots_fetch_random();
+}
+
+unsigned char *choose_kernel_location(unsigned char *input,
+				      unsigned long input_size,
+				      unsigned char *output,
+				      unsigned long output_size)
+{
+	unsigned long choice = (unsigned long)output;
+	unsigned long random;
+
+	if (cmdline_find_option_bool("nokaslr")) {
+		debug_putstr("KASLR disabled...\n");
+		goto out;
+	}
+
+	/* Record the various known unsafe memory ranges. */
+	mem_avoid_init((unsigned long)input, input_size,
+		       (unsigned long)output, output_size);
+
+	/* Walk e820 and find a random address. */
+	random = find_random_addr(choice, output_size);
+	if (!random) {
+		debug_putstr("KASLR could not find suitable E820 region...\n");
+		goto out;
+	}
+
+	/* Always enforce the minimum. */
+	if (random < choice)
+		goto out;
+
+	choice = random;
+out:
+	return (unsigned char *)choice;
+}
+
+#endif /* CONFIG_RANDOMIZE_BASE */
diff --git a/arch/x86/boot/compressed/cmdline.c b/arch/x86/boot/compressed/cmdline.c
index bffd73b45b1f..b68e3033e6b9 100644
--- a/arch/x86/boot/compressed/cmdline.c
+++ b/arch/x86/boot/compressed/cmdline.c
@@ -1,6 +1,6 @@
 #include "misc.h"
 
-#ifdef CONFIG_EARLY_PRINTK
+#if CONFIG_EARLY_PRINTK || CONFIG_RANDOMIZE_BASE
 
 static unsigned long fs;
 static inline void set_fs(unsigned long seg)
diff --git a/arch/x86/boot/compressed/cpuflags.c b/arch/x86/boot/compressed/cpuflags.c
new file mode 100644
index 000000000000..aa313466118b
--- /dev/null
+++ b/arch/x86/boot/compressed/cpuflags.c
@@ -0,0 +1,12 @@
+#ifdef CONFIG_RANDOMIZE_BASE
+
+#include "../cpuflags.c"
+
+bool has_cpuflag(int flag)
+{
+	get_cpuflags();
+
+	return test_bit(flag, cpu.flags);
+}
+
+#endif
diff --git a/arch/x86/boot/compressed/head_32.S b/arch/x86/boot/compressed/head_32.S
index 5d6f6891b188..9116aac232c7 100644
--- a/arch/x86/boot/compressed/head_32.S
+++ b/arch/x86/boot/compressed/head_32.S
@@ -117,9 +117,11 @@ preferred_addr:
 	addl    %eax, %ebx
 	notl	%eax
 	andl    %eax, %ebx
-#else
-	movl	$LOAD_PHYSICAL_ADDR, %ebx
+	cmpl	$LOAD_PHYSICAL_ADDR, %ebx
+	jge	1f
 #endif
+	movl	$LOAD_PHYSICAL_ADDR, %ebx
+1:
 
 	/* Target address to relocate to for decompression */
 	addl	$z_extract_offset, %ebx
@@ -191,14 +193,14 @@ relocated:
 	leal	boot_heap(%ebx), %eax
 	pushl	%eax		/* heap area */
 	pushl	%esi		/* real mode pointer */
-	call	decompress_kernel
+	call	decompress_kernel /* returns kernel location in %eax */
 	addl	$24, %esp
 
 /*
  * Jump to the decompressed kernel.
  */
 	xorl	%ebx, %ebx
-	jmp	*%ebp
+	jmp	*%eax
 
 /*
  * Stack and heap for uncompression
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index c337422b575d..c5c1ae0997e7 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -94,9 +94,11 @@ ENTRY(startup_32)
 	addl	%eax, %ebx
 	notl	%eax
 	andl	%eax, %ebx
-#else
-	movl	$LOAD_PHYSICAL_ADDR, %ebx
+	cmpl	$LOAD_PHYSICAL_ADDR, %ebx
+	jge	1f
 #endif
+	movl	$LOAD_PHYSICAL_ADDR, %ebx
+1:
 
 	/* Target address to relocate to for decompression */
 	addl	$z_extract_offset, %ebx
@@ -269,9 +271,11 @@ preferred_addr:
 	addq	%rax, %rbp
 	notq	%rax
 	andq	%rax, %rbp
-#else
-	movq	$LOAD_PHYSICAL_ADDR, %rbp
+	cmpq	$LOAD_PHYSICAL_ADDR, %rbp
+	jge	1f
 #endif
+	movq	$LOAD_PHYSICAL_ADDR, %rbp
+1:
 
 	/* Target address to relocate to for decompression */
 	leaq	z_extract_offset(%rbp), %rbx
@@ -339,13 +343,13 @@ relocated:
 	movl	$z_input_len, %ecx	/* input_len */
 	movq	%rbp, %r8		/* output target address */
 	movq	$z_output_len, %r9	/* decompressed length */
-	call	decompress_kernel
+	call	decompress_kernel	/* returns kernel location in %rax */
 	popq	%rsi
 
 /*
  * Jump to the decompressed kernel.
  */
-	jmp	*%rbp
+	jmp	*%rax
 
 	.code32
 no_longmode:
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 434f077d2c4d..196eaf373a06 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -112,14 +112,8 @@ struct boot_params *real_mode;		/* Pointer to real-mode data */
 void *memset(void *s, int c, size_t n);
 void *memcpy(void *dest, const void *src, size_t n);
 
-#ifdef CONFIG_X86_64
-#define memptr long
-#else
-#define memptr unsigned
-#endif
-
-static memptr free_mem_ptr;
-static memptr free_mem_end_ptr;
+memptr free_mem_ptr;
+memptr free_mem_end_ptr;
 
 static char *vidmem;
 static int vidport;
@@ -395,7 +389,7 @@ static void parse_elf(void *output)
 	free(phdrs);
 }
 
-asmlinkage void decompress_kernel(void *rmode, memptr heap,
+asmlinkage void *decompress_kernel(void *rmode, memptr heap,
 				  unsigned char *input_data,
 				  unsigned long input_len,
 				  unsigned char *output,
@@ -422,6 +416,10 @@ asmlinkage void decompress_kernel(void *rmode, memptr heap,
 	free_mem_ptr     = heap;	/* Heap */
 	free_mem_end_ptr = heap + BOOT_HEAP_SIZE;
 
+	output = choose_kernel_location(input_data, input_len,
+					output, output_len);
+
+	/* Validate memory location choices. */
 	if ((unsigned long)output & (MIN_KERNEL_ALIGN - 1))
 		error("Destination address inappropriately aligned");
 #ifdef CONFIG_X86_64
@@ -441,5 +439,5 @@ asmlinkage void decompress_kernel(void *rmode, memptr heap,
 	parse_elf(output);
 	handle_relocations(output, output_len);
 	debug_putstr("done.\nBooting the kernel.\n");
-	return;
+	return output;
 }
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index 674019d8e235..24e3e569a13c 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -23,7 +23,15 @@
 #define BOOT_BOOT_H
 #include "../ctype.h"
 
+#ifdef CONFIG_X86_64
+#define memptr long
+#else
+#define memptr unsigned
+#endif
+
 /* misc.c */
+extern memptr free_mem_ptr;
+extern memptr free_mem_end_ptr;
 extern struct boot_params *real_mode;		/* Pointer to real-mode data */
 void __putstr(const char *s);
 #define error_putstr(__x)  __putstr(__x)
@@ -39,23 +47,40 @@ static inline void debug_putstr(const char *s)
 
 #endif
 
-#ifdef CONFIG_EARLY_PRINTK
-
+#if CONFIG_EARLY_PRINTK || CONFIG_RANDOMIZE_BASE
 /* cmdline.c */
 int cmdline_find_option(const char *option, char *buffer, int bufsize);
 int cmdline_find_option_bool(const char *option);
+#endif
 
-/* early_serial_console.c */
-extern int early_serial_base;
-void console_init(void);
 
+#if CONFIG_RANDOMIZE_BASE
+/* aslr.c */
+unsigned char *choose_kernel_location(unsigned char *input,
+				      unsigned long input_size,
+				      unsigned char *output,
+				      unsigned long output_size);
+/* cpuflags.c */
+bool has_cpuflag(int flag);
 #else
+static inline
+unsigned char *choose_kernel_location(unsigned char *input,
+				      unsigned long input_size,
+				      unsigned char *output,
+				      unsigned long output_size)
+{
+	return output;
+}
+#endif
 
+#ifdef CONFIG_EARLY_PRINTK
 /* early_serial_console.c */
+extern int early_serial_base;
+void console_init(void);
+#else
 static const int early_serial_base;
 static inline void console_init(void)
 { }
-
 #endif
 
 #endif
diff --git a/arch/x86/boot/cpucheck.c b/arch/x86/boot/cpucheck.c
index 4d3ff037201f..100a9a10076a 100644
--- a/arch/x86/boot/cpucheck.c
+++ b/arch/x86/boot/cpucheck.c
@@ -28,8 +28,6 @@
 #include <asm/required-features.h>
 #include <asm/msr-index.h>
 
-struct cpu_features cpu;
-static u32 cpu_vendor[3];
 static u32 err_flags[NCAPINTS];
 
 static const int req_level = CONFIG_X86_MINIMUM_CPU_FAMILY;
@@ -69,92 +67,8 @@ static int is_transmeta(void)
 	       cpu_vendor[2] == A32('M', 'x', '8', '6');
 }
 
-static int has_fpu(void)
-{
-	u16 fcw = -1, fsw = -1;
-	u32 cr0;
-
-	asm("movl %%cr0,%0" : "=r" (cr0));
-	if (cr0 & (X86_CR0_EM|X86_CR0_TS)) {
-		cr0 &= ~(X86_CR0_EM|X86_CR0_TS);
-		asm volatile("movl %0,%%cr0" : : "r" (cr0));
-	}
-
-	asm volatile("fninit ; fnstsw %0 ; fnstcw %1"
-		     : "+m" (fsw), "+m" (fcw));
-
-	return fsw == 0 && (fcw & 0x103f) == 0x003f;
-}
-
-static int has_eflag(u32 mask)
-{
-	u32 f0, f1;
-
-	asm("pushfl ; "
-	    "pushfl ; "
-	    "popl %0 ; "
-	    "movl %0,%1 ; "
-	    "xorl %2,%1 ; "
-	    "pushl %1 ; "
-	    "popfl ; "
-	    "pushfl ; "
-	    "popl %1 ; "
-	    "popfl"
-	    : "=&r" (f0), "=&r" (f1)
-	    : "ri" (mask));
-
-	return !!((f0^f1) & mask);
-}
-
-static void get_flags(void)
-{
-	u32 max_intel_level, max_amd_level;
-	u32 tfms;
-
-	if (has_fpu())
-		set_bit(X86_FEATURE_FPU, cpu.flags);
-
-	if (has_eflag(X86_EFLAGS_ID)) {
-		asm("cpuid"
-		    : "=a" (max_intel_level),
-		      "=b" (cpu_vendor[0]),
-		      "=d" (cpu_vendor[1]),
-		      "=c" (cpu_vendor[2])
-		    : "a" (0));
-
-		if (max_intel_level >= 0x00000001 &&
-		    max_intel_level <= 0x0000ffff) {
-			asm("cpuid"
-			    : "=a" (tfms),
-			      "=c" (cpu.flags[4]),
-			      "=d" (cpu.flags[0])
-			    : "a" (0x00000001)
-			    : "ebx");
-			cpu.level = (tfms >> 8) & 15;
-			cpu.model = (tfms >> 4) & 15;
-			if (cpu.level >= 6)
-				cpu.model += ((tfms >> 16) & 0xf) << 4;
-		}
-
-		asm("cpuid"
-		    : "=a" (max_amd_level)
-		    : "a" (0x80000000)
-		    : "ebx", "ecx", "edx");
-
-		if (max_amd_level >= 0x80000001 &&
-		    max_amd_level <= 0x8000ffff) {
-			u32 eax = 0x80000001;
-			asm("cpuid"
-			    : "+a" (eax),
-			      "=c" (cpu.flags[6]),
-			      "=d" (cpu.flags[1])
-			    : : "ebx");
-		}
-	}
-}
-
 /* Returns a bitmask of which words we have error bits in */
-static int check_flags(void)
+static int check_cpuflags(void)
 {
 	u32 err;
 	int i;
@@ -187,8 +101,8 @@ int check_cpu(int *cpu_level_ptr, int *req_level_ptr, u32 **err_flags_ptr)
 	if (has_eflag(X86_EFLAGS_AC))
 		cpu.level = 4;
 
-	get_flags();
-	err = check_flags();
+	get_cpuflags();
+	err = check_cpuflags();
 
 	if (test_bit(X86_FEATURE_LM, cpu.flags))
 		cpu.level = 64;
@@ -207,8 +121,8 @@ int check_cpu(int *cpu_level_ptr, int *req_level_ptr, u32 **err_flags_ptr)
 		eax &= ~(1 << 15);
 		asm("wrmsr" : : "a" (eax), "d" (edx), "c" (ecx));
 
-		get_flags();	/* Make sure it really did something */
-		err = check_flags();
+		get_cpuflags();	/* Make sure it really did something */
+		err = check_cpuflags();
 	} else if (err == 0x01 &&
 		   !(err_flags[0] & ~(1 << X86_FEATURE_CX8)) &&
 		   is_centaur() && cpu.model >= 6) {
@@ -223,7 +137,7 @@ int check_cpu(int *cpu_level_ptr, int *req_level_ptr, u32 **err_flags_ptr)
 		asm("wrmsr" : : "a" (eax), "d" (edx), "c" (ecx));
 
 		set_bit(X86_FEATURE_CX8, cpu.flags);
-		err = check_flags();
+		err = check_cpuflags();
 	} else if (err == 0x01 && is_transmeta()) {
 		/* Transmeta might have masked feature bits in word 0 */
 
@@ -238,7 +152,7 @@ int check_cpu(int *cpu_level_ptr, int *req_level_ptr, u32 **err_flags_ptr)
 		    : : "ecx", "ebx");
 		asm("wrmsr" : : "a" (eax), "d" (edx), "c" (ecx));
 
-		err = check_flags();
+		err = check_cpuflags();
 	}
 
 	if (err_flags_ptr)
diff --git a/arch/x86/boot/cpuflags.c b/arch/x86/boot/cpuflags.c
new file mode 100644
index 000000000000..a9fcb7cfb241
--- /dev/null
+++ b/arch/x86/boot/cpuflags.c
@@ -0,0 +1,104 @@
+#include <linux/types.h>
+#include "bitops.h"
+
+#include <asm/processor-flags.h>
+#include <asm/required-features.h>
+#include <asm/msr-index.h>
+#include "cpuflags.h"
+
+struct cpu_features cpu;
+u32 cpu_vendor[3];
+
+static bool loaded_flags;
+
+static int has_fpu(void)
+{
+	u16 fcw = -1, fsw = -1;
+	unsigned long cr0;
+
+	asm volatile("mov %%cr0,%0" : "=r" (cr0));
+	if (cr0 & (X86_CR0_EM|X86_CR0_TS)) {
+		cr0 &= ~(X86_CR0_EM|X86_CR0_TS);
+		asm volatile("mov %0,%%cr0" : : "r" (cr0));
+	}
+
+	asm volatile("fninit ; fnstsw %0 ; fnstcw %1"
+		     : "+m" (fsw), "+m" (fcw));
+
+	return fsw == 0 && (fcw & 0x103f) == 0x003f;
+}
+
+int has_eflag(unsigned long mask)
+{
+	unsigned long f0, f1;
+
+	asm volatile("pushf	\n\t"
+		     "pushf	\n\t"
+		     "pop %0	\n\t"
+		     "mov %0,%1	\n\t"
+		     "xor %2,%1	\n\t"
+		     "push %1	\n\t"
+		     "popf	\n\t"
+		     "pushf	\n\t"
+		     "pop %1	\n\t"
+		     "popf"
+		     : "=&r" (f0), "=&r" (f1)
+		     : "ri" (mask));
+
+	return !!((f0^f1) & mask);
+}
+
+/* Handle x86_32 PIC using ebx. */
+#if defined(__i386__) && defined(__PIC__)
+# define EBX_REG "=r"
+#else
+# define EBX_REG "=b"
+#endif
+
+static inline void cpuid(u32 id, u32 *a, u32 *b, u32 *c, u32 *d)
+{
+	asm volatile(".ifnc %%ebx,%3 ; movl  %%ebx,%3 ; .endif	\n\t"
+		     "cpuid					\n\t"
+		     ".ifnc %%ebx,%3 ; xchgl %%ebx,%3 ; .endif	\n\t"
+		    : "=a" (*a), "=c" (*c), "=d" (*d), EBX_REG (*b)
+		    : "a" (id)
+	);
+}
+
+void get_cpuflags(void)
+{
+	u32 max_intel_level, max_amd_level;
+	u32 tfms;
+	u32 ignored;
+
+	if (loaded_flags)
+		return;
+	loaded_flags = true;
+
+	if (has_fpu())
+		set_bit(X86_FEATURE_FPU, cpu.flags);
+
+	if (has_eflag(X86_EFLAGS_ID)) {
+		cpuid(0x0, &max_intel_level, &cpu_vendor[0], &cpu_vendor[2],
+		      &cpu_vendor[1]);
+
+		if (max_intel_level >= 0x00000001 &&
+		    max_intel_level <= 0x0000ffff) {
+			cpuid(0x1, &tfms, &ignored, &cpu.flags[4],
+			      &cpu.flags[0]);
+			cpu.level = (tfms >> 8) & 15;
+			cpu.model = (tfms >> 4) & 15;
+			if (cpu.level >= 6)
+				cpu.model += ((tfms >> 16) & 0xf) << 4;
+		}
+
+		cpuid(0x80000000, &max_amd_level, &ignored, &ignored,
+		      &ignored);
+
+		if (max_amd_level >= 0x80000001 &&
+		    max_amd_level <= 0x8000ffff) {
+			cpuid(0x80000001, &ignored, &ignored, &cpu.flags[6],
+			      &cpu.flags[1]);
+		}
+	}
+}
diff --git a/arch/x86/boot/cpuflags.h b/arch/x86/boot/cpuflags.h
new file mode 100644
index 000000000000..ea97697e51e4
--- /dev/null
+++ b/arch/x86/boot/cpuflags.h
@@ -0,0 +1,19 @@
+#ifndef BOOT_CPUFLAGS_H
+#define BOOT_CPUFLAGS_H
+
+#include <asm/cpufeature.h>
+#include <asm/processor-flags.h>
+
+struct cpu_features {
+	int level;		/* Family, or 64 for x86-64 */
+	int model;
+	u32 flags[NCAPINTS];
+};
+
+extern struct cpu_features cpu;
+extern u32 cpu_vendor[3];
+
+int has_eflag(unsigned long mask);
+void get_cpuflags(void);
+
+#endif
diff --git a/arch/x86/include/asm/archrandom.h b/arch/x86/include/asm/archrandom.h
index 0d9ec770f2f8..e6a92455740e 100644
--- a/arch/x86/include/asm/archrandom.h
+++ b/arch/x86/include/asm/archrandom.h
@@ -39,6 +39,20 @@
 
 #ifdef CONFIG_ARCH_RANDOM
 
+/* Instead of arch_get_random_long() when alternatives haven't run. */
+static inline int rdrand_long(unsigned long *v)
+{
+	int ok;
+	asm volatile("1: " RDRAND_LONG "\n\t"
+		     "jc 2f\n\t"
+		     "decl %0\n\t"
+		     "jnz 1b\n\t"
+		     "2:"
+		     : "=r" (ok), "=a" (*v)
+		     : "0" (RDRAND_RETRY_LOOPS));
+	return ok;
+}
+
 #define GET_RANDOM(name, type, rdrand, nop)			\
 static inline int name(type *v)					\
 {								\
@@ -68,6 +82,13 @@ GET_RANDOM(arch_get_random_int, unsigned int, RDRAND_INT, ASM_NOP3);
 
 #endif /* CONFIG_X86_64 */
 
+#else
+
+static inline int rdrand_long(unsigned long *v)
+{
+	return 0;
+}
+
 #endif  /* CONFIG_ARCH_RANDOM */
 
 extern void x86_init_rdrand(struct cpuinfo_x86 *c);
diff --git a/arch/x86/include/asm/page_64_types.h b/arch/x86/include/asm/page_64_types.h
index 43dcd804ebd5..8de6d9cf3b95 100644
--- a/arch/x86/include/asm/page_64_types.h
+++ b/arch/x86/include/asm/page_64_types.h
@@ -39,9 +39,18 @@
 #define __VIRTUAL_MASK_SHIFT	47
 
 /*
- * Kernel image size is limited to 512 MB (see level2_kernel_pgt in
- * arch/x86/kernel/head_64.S), and it is mapped here:
+ * Kernel image size is limited to 1GiB due to the fixmap living in the
+ * next 1GiB (see level2_kernel_pgt in arch/x86/kernel/head_64.S). Use
+ * 512MiB by default, leaving 1.5GiB for modules once the page tables
+ * are fully set up. If kernel ASLR is configured, it can extend the
+ * kernel page table mapping, reducing the size of the modules area.
  */
-#define KERNEL_IMAGE_SIZE	(512 * 1024 * 1024)
+#define KERNEL_IMAGE_SIZE_DEFAULT      (512 * 1024 * 1024)
+#if defined(CONFIG_RANDOMIZE_BASE) && \
+	CONFIG_RANDOMIZE_BASE_MAX_OFFSET > KERNEL_IMAGE_SIZE_DEFAULT
+#define KERNEL_IMAGE_SIZE   CONFIG_RANDOMIZE_BASE_MAX_OFFSET
+#else
+#define KERNEL_IMAGE_SIZE      KERNEL_IMAGE_SIZE_DEFAULT
+#endif
 
 #endif /* _ASM_X86_PAGE_64_DEFS_H */
diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h
index 2d883440cb9a..c883bf726398 100644
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -58,7 +58,7 @@ typedef struct { pteval_t pte; } pte_t;
 #define VMALLOC_START    _AC(0xffffc90000000000, UL)
 #define VMALLOC_END      _AC(0xffffe8ffffffffff, UL)
 #define VMEMMAP_START	 _AC(0xffffea0000000000, UL)
-#define MODULES_VADDR    _AC(0xffffffffa0000000, UL)
+#define MODULES_VADDR    (__START_KERNEL_map + KERNEL_IMAGE_SIZE)
 #define MODULES_END      _AC(0xffffffffff000000, UL)
 #define MODULES_LEN   (MODULES_END - MODULES_VADDR)
 
diff --git a/arch/x86/kernel/cpu/rdrand.c b/arch/x86/kernel/cpu/rdrand.c
index 88db010845cb..384df5105fbc 100644
--- a/arch/x86/kernel/cpu/rdrand.c
+++ b/arch/x86/kernel/cpu/rdrand.c
@@ -31,20 +31,6 @@ static int __init x86_rdrand_setup(char *s)
 }
 __setup("nordrand", x86_rdrand_setup);
 
-/* We can't use arch_get_random_long() here since alternatives haven't run */
-static inline int rdrand_long(unsigned long *v)
-{
-	int ok;
-	asm volatile("1: " RDRAND_LONG "\n\t"
-		     "jc 2f\n\t"
-		     "decl %0\n\t"
-		     "jnz 1b\n\t"
-		     "2:"
-		     : "=r" (ok), "=a" (*v)
-		     : "0" (RDRAND_RETRY_LOOPS));
-	return ok;
-}
-
 /*
  * Force a reseed cycle; we are architecturally guaranteed a reseed
  * after no more than 512 128-bit chunks of random data.  This also
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index f0de6294b955..1708862fc40d 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -824,6 +824,20 @@ static void __init trim_low_memory_range(void)
 }
 	
 /*
+ * Dump out kernel offset information on panic.
+ */
+static int
+dump_kernel_offset(struct notifier_block *self, unsigned long v, void *p)
+{
+	pr_emerg("Kernel Offset: 0x%lx from 0x%lx "
+		 "(relocation range: 0x%lx-0x%lx)\n",
+		 (unsigned long)&_text - __START_KERNEL, __START_KERNEL,
+		 __START_KERNEL_map, MODULES_VADDR-1);
+
+	return 0;
+}
+
+/*
  * Determine if we were loaded by an EFI loader.  If so, then we have also been
  * passed the efi memmap, systab, etc., so we should use these data structures
  * for initialization.  Note, the efi init code path is determined by the
@@ -1242,3 +1256,15 @@ void __init i386_reserve_resources(void)
 }
 
 #endif /* CONFIG_X86_32 */
+
+static struct notifier_block kernel_offset_notifier = {
+	.notifier_call = dump_kernel_offset
+};
+
+static int __init register_kernel_offset_dumper(void)
+{
+	atomic_notifier_chain_register(&panic_notifier_list,
+					&kernel_offset_notifier);
+	return 0;
+}
+__initcall(register_kernel_offset_dumper);
diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index 4287f1ffba7e..5bdc5430597c 100644
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -806,6 +806,9 @@ void __init mem_init(void)
 	BUILD_BUG_ON(VMALLOC_START			>= VMALLOC_END);
 #undef high_memory
 #undef __FIXADDR_TOP
+#ifdef CONFIG_RANDOMIZE_BASE
+	BUILD_BUG_ON(CONFIG_RANDOMIZE_BASE_MAX_OFFSET > KERNEL_IMAGE_SIZE);
+#endif
 
 #ifdef CONFIG_HIGHMEM
 	BUG_ON(PKMAP_BASE + LAST_PKMAP*PAGE_SIZE	> FIXADDR_START);
diff --git a/arch/x86/tools/relocs.c b/arch/x86/tools/relocs.c
index f7bab68a4b83..11f9285a2ff6 100644
--- a/arch/x86/tools/relocs.c
+++ b/arch/x86/tools/relocs.c
@@ -722,15 +722,25 @@ static void percpu_init(void)
 
 /*
  * Check to see if a symbol lies in the .data..percpu section.
- * For some as yet not understood reason the "__init_begin"
- * symbol which immediately preceeds the .data..percpu section
- * also shows up as it it were part of it so we do an explict
- * check for that symbol name and ignore it.
+ *
+ * The linker incorrectly associates some symbols with the
+ * .data..percpu section so we also need to check the symbol
+ * name to make sure that we classify the symbol correctly.
+ *
+ * The GNU linker incorrectly associates:
+ *	__init_begin
+ *	__per_cpu_load
+ *
+ * The "gold" linker incorrectly associates:
+ *	init_per_cpu__irq_stack_union
+ *	init_per_cpu__gdt_page
  */
 static int is_percpu_sym(ElfW(Sym) *sym, const char *symname)
 {
 	return (sym->st_shndx == per_cpu_shndx) &&
-		strcmp(symname, "__init_begin");
+		strcmp(symname, "__init_begin") &&
+		strcmp(symname, "__per_cpu_load") &&
+		strncmp(symname, "init_per_cpu_", 13);
 }
 
 

^ permalink raw reply related	[flat|nested] 70+ messages in thread

* Re: [GIT PULL] x86/kaslr for v3.14
  2014-01-20 16:47 [GIT PULL] x86/kaslr for v3.14 H. Peter Anvin
@ 2014-01-20 22:54 ` Linus Torvalds
  2014-01-20 23:00   ` H. Peter Anvin
                     ` (3 more replies)
  2014-01-26 10:16 ` Richard Weinberger
  1 sibling, 4 replies; 70+ messages in thread
From: Linus Torvalds @ 2014-01-20 22:54 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Cong Ding, H. Peter Anvin, Ingo Molnar, Ingo Molnar, Kees Cook,
	Linux Kernel Mailing List, Mathias Krause, Michael Davidson,
	Thomas Gleixner, Wei Yongjun

So I pulled this, but one question:

On Mon, Jan 20, 2014 at 8:47 AM, H. Peter Anvin <hpa@zytor.com> wrote:
> +config RANDOMIZE_BASE
> +       bool "Randomize the address of the kernel image"
> +       depends on RELOCATABLE
> +       depends on !HIBERNATION

How fundamental is that "!HIBERNATION" issue? Right now that
anti-dependency on hibernation support will mean that no distro kernel
will actually use the kernel address space randomization. Which
long-term is a problem.

I'm not sure HIBERNATION is really getting all that much use, but I
suspect distros would still want to support it.

Is it just a temporary "I wasn't able to make it work, need to get
some PM people involved", or is it something really fundamental?

            Linus

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [GIT PULL] x86/kaslr for v3.14
  2014-01-20 22:54 ` Linus Torvalds
@ 2014-01-20 23:00   ` H. Peter Anvin
  2014-01-20 23:12   ` Linus Torvalds
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 70+ messages in thread
From: H. Peter Anvin @ 2014-01-20 23:00 UTC (permalink / raw)
  To: Linus Torvalds, H. Peter Anvin
  Cc: Cong Ding, Ingo Molnar, Ingo Molnar, Kees Cook,
	Linux Kernel Mailing List, Mathias Krause, Michael Davidson,
	Thomas Gleixner, Wei Yongjun

On 01/20/2014 02:54 PM, Linus Torvalds wrote:
> So I pulled this, but one question:
> 
> On Mon, Jan 20, 2014 at 8:47 AM, H. Peter Anvin <hpa@zytor.com> wrote:
>> +config RANDOMIZE_BASE
>> +       bool "Randomize the address of the kernel image"
>> +       depends on RELOCATABLE
>> +       depends on !HIBERNATION
> 
> How fundamental is that "!HIBERNATION" issue? Right now that
> anti-dependency on hibernation support will mean that no distro kernel
> will actually use the kernel address space randomization. Which
> long-term is a problem.
> 
> I'm not sure HIBERNATION is really getting all that much use, but I
> suspect distros would still want to support it.
> 
> Is it just a temporary "I wasn't able to make it work, need to get
> some PM people involved", or is it something really fundamental?
> 

Kees, could you comment?

	-hpa



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [GIT PULL] x86/kaslr for v3.14
  2014-01-20 22:54 ` Linus Torvalds
  2014-01-20 23:00   ` H. Peter Anvin
@ 2014-01-20 23:12   ` Linus Torvalds
  2014-01-20 23:13     ` H. Peter Anvin
  2014-01-21 10:27     ` Ingo Molnar
  2014-01-21  5:18   ` Kees Cook
  2014-01-23  9:39   ` Pavel Machek
  3 siblings, 2 replies; 70+ messages in thread
From: Linus Torvalds @ 2014-01-20 23:12 UTC (permalink / raw)
  To: H. Peter Anvin, Arnaldo Carvalho de Melo, Peter Zijlstra
  Cc: Cong Ding, H. Peter Anvin, Ingo Molnar, Ingo Molnar, Kees Cook,
	Linux Kernel Mailing List, Mathias Krause, Michael Davidson,
	Thomas Gleixner, Wei Yongjun

On Mon, Jan 20, 2014 at 2:54 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> So I pulled this, but one question:

.. oh, and since I decided to test it, and was looking for problems:
enabling kaslr breaks "perf". The *profile* looks fine, but the
disassembly doesn't work.

I'm not entirely surprised. I decided I wanted to test it for a
reason, after all. So it's not unexpected, but perhaps people hadn't
thought about it, and clearly hadn't tested it.

Kernel modules disassemble fine, so clearly perf knows about code that
moves around, but apparently it gets surprised when the core vmlinux
file disassembly doesn't match addresses.

             Linus

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [GIT PULL] x86/kaslr for v3.14
  2014-01-20 23:12   ` Linus Torvalds
@ 2014-01-20 23:13     ` H. Peter Anvin
  2014-01-21  9:00       ` Peter Zijlstra
  2014-01-21 10:27     ` Ingo Molnar
  1 sibling, 1 reply; 70+ messages in thread
From: H. Peter Anvin @ 2014-01-20 23:13 UTC (permalink / raw)
  To: Linus Torvalds, Arnaldo Carvalho de Melo, Peter Zijlstra
  Cc: Cong Ding, H. Peter Anvin, Ingo Molnar, Ingo Molnar, Kees Cook,
	Linux Kernel Mailing List, Mathias Krause, Michael Davidson,
	Thomas Gleixner, Wei Yongjun

On 01/20/2014 03:12 PM, Linus Torvalds wrote:
> On Mon, Jan 20, 2014 at 2:54 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>> So I pulled this, but one question:
> 
> .. oh, and since I decided to test it, and was looking for problems:
> enabling kaslr breaks "perf". The *profile* looks fine, but the
> disassembly doesn't work.
> 
> I'm not entirely surprised. I decided I wanted to test it for a
> reason, after all. So it's not unexpected, but perhaps people hadn't
> thought about it, and clearly hadn't tested it.
> 
> Kernel modules disassemble fine, so clearly perf knows about code that
> moves around, but apparently it gets surprised when the core vmlinux
> file disassembly doesn't match addresses.
> 

So this is presumably something that needs to be fixed in perf?

	-hpa



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [GIT PULL] x86/kaslr for v3.14
  2014-01-20 22:54 ` Linus Torvalds
  2014-01-20 23:00   ` H. Peter Anvin
  2014-01-20 23:12   ` Linus Torvalds
@ 2014-01-21  5:18   ` Kees Cook
  2014-01-23  9:39   ` Pavel Machek
  3 siblings, 0 replies; 70+ messages in thread
From: Kees Cook @ 2014-01-21  5:18 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: H. Peter Anvin, Cong Ding, H. Peter Anvin, Ingo Molnar,
	Ingo Molnar, Linux Kernel Mailing List, Mathias Krause,
	Michael Davidson, Thomas Gleixner, Wei Yongjun

On Mon, Jan 20, 2014 at 2:54 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> So I pulled this, but one question:
>
> On Mon, Jan 20, 2014 at 8:47 AM, H. Peter Anvin <hpa@zytor.com> wrote:
>> +config RANDOMIZE_BASE
>> +       bool "Randomize the address of the kernel image"
>> +       depends on RELOCATABLE
>> +       depends on !HIBERNATION
>
> How fundamental is that "!HIBERNATION" issue? Right now that
> anti-dependency on hibernation support will mean that no distro kernel
> will actually use the kernel address space randomization. Which
> long-term is a problem.
>
> I'm not sure HIBERNATION is really getting all that much use, but I
> suspect distros would still want to support it.
>
> Is it just a temporary "I wasn't able to make it work, need to get
> some PM people involved", or is it something really fundamental?

Right, this is a "need to get PM people involved" situation. When
kASLR was being designed, hibernation learning about the kernel base
looked like a separable problem, and given the very long list of
requirements for making it work at all, I carved this out as "future
work".

As for perf, it's similar -- it's another entirely solvable problem,
but perf needs to be untaught some of its assumptions.

We've had a static kernel base forever, so I'm expecting some bumps in
the road here. I'm hopeful none of it will be too painful, though.

-Kees

-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [GIT PULL] x86/kaslr for v3.14
  2014-01-20 23:13     ` H. Peter Anvin
@ 2014-01-21  9:00       ` Peter Zijlstra
  2014-01-21 14:20         ` H. Peter Anvin
  0 siblings, 1 reply; 70+ messages in thread
From: Peter Zijlstra @ 2014-01-21  9:00 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Arnaldo Carvalho de Melo, Cong Ding,
	H. Peter Anvin, Ingo Molnar, Ingo Molnar, Kees Cook,
	Linux Kernel Mailing List, Mathias Krause, Michael Davidson,
	Thomas Gleixner, Wei Yongjun

On Mon, Jan 20, 2014 at 03:13:33PM -0800, H. Peter Anvin wrote:
> On 01/20/2014 03:12 PM, Linus Torvalds wrote:
> > On Mon, Jan 20, 2014 at 2:54 PM, Linus Torvalds
> > <torvalds@linux-foundation.org> wrote:
> >> So I pulled this, but one question:
> > 
> > .. oh, and since I decided to test it, and was looking for problems:
> > enabling kaslr breaks "perf". The *profile* looks fine, but the
> > disassembly doesn't work.
> > 
> > I'm not entirely surprised. I decided I wanted to test it for a
> > reason, after all. So it's not unexpected, but perhaps people hadn't
> > thought about it, and clearly hadn't tested it.
> > 
> > Kernel modules disassemble fine, so clearly perf knows about code that
> > moves around, but apparently it gets surprised when the core vmlinux
> > file disassembly doesn't match addresses.
> > 
> 
> So this is presumably something that needs to be fixed in perf?

Where do we learn about the offset from userspace?

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [GIT PULL] x86/kaslr for v3.14
  2014-01-20 23:12   ` Linus Torvalds
  2014-01-20 23:13     ` H. Peter Anvin
@ 2014-01-21 10:27     ` Ingo Molnar
  2014-01-21 13:55       ` H. Peter Anvin
  1 sibling, 1 reply; 70+ messages in thread
From: Ingo Molnar @ 2014-01-21 10:27 UTC (permalink / raw)
  To: Linus Torvalds, Adrian Hunter, Arnaldo Carvalho de Melo
  Cc: H. Peter Anvin, Arnaldo Carvalho de Melo, Peter Zijlstra,
	Cong Ding, H. Peter Anvin, Ingo Molnar, Kees Cook,
	Linux Kernel Mailing List, Mathias Krause, Michael Davidson,
	Thomas Gleixner, Wei Yongjun


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Mon, Jan 20, 2014 at 2:54 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> > So I pulled this, but one question:
> 
> .. oh, and since I decided to test it, and was looking for problems: 
> enabling kaslr breaks "perf". The *profile* looks fine, but the 
> disassembly doesn't work.
> 
> I'm not entirely surprised. I decided I wanted to test it for a 
> reason, after all. So it's not unexpected, but perhaps people hadn't 
> thought about it, and clearly hadn't tested it.
> 
> Kernel modules disassemble fine, so clearly perf knows about code 
> that moves around, but apparently it gets surprised when the core 
> vmlinux file disassembly doesn't match addresses.

Hm, live annotation of the kernel image is a relatively new perf 
feature, and KASLR predated that (by years) - which would at least in 
part explain why it went unnoticed. (Although it does not excuse the 
lack of testing.)

I've Cc:-ed Adrian Hunter, the author of the /proc/kcore annotation 
feature.

Adrian: enabling x86 KASLR in Linus's latestest kernel via 
CONFIG_RANDOMIZE_BASE=y [this is also present in the -tip tree] breaks 
/proc/kcore annotation, because the kernel image position is now per 
bootup randomized and perf's annotation code gets surprised by that.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [GIT PULL] x86/kaslr for v3.14
  2014-01-21 10:27     ` Ingo Molnar
@ 2014-01-21 13:55       ` H. Peter Anvin
  2014-01-21 14:03         ` Ingo Molnar
  0 siblings, 1 reply; 70+ messages in thread
From: H. Peter Anvin @ 2014-01-21 13:55 UTC (permalink / raw)
  To: Ingo Molnar, Linus Torvalds, Adrian Hunter, Arnaldo Carvalho de Melo
  Cc: Arnaldo Carvalho de Melo, Peter Zijlstra, Cong Ding,
	H. Peter Anvin, Ingo Molnar, Kees Cook,
	Linux Kernel Mailing List, Mathias Krause, Michael Davidson,
	Thomas Gleixner, Wei Yongjun

On 01/21/2014 02:27 AM, Ingo Molnar wrote:
> 
> Hm, live annotation of the kernel image is a relatively new perf 
> feature, and KASLR predated that (by years) - which would at least in 
> part explain why it went unnoticed. (Although it does not excuse the 
> lack of testing.)
> 

kASLR is new, but on 32 bits we have relocated the kernel for a long
time.  kASLR is the first use case of relocating the 64-bit kernel, though.

	-hpa


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [GIT PULL] x86/kaslr for v3.14
  2014-01-21 13:55       ` H. Peter Anvin
@ 2014-01-21 14:03         ` Ingo Molnar
  2014-01-21 14:05           ` H. Peter Anvin
  0 siblings, 1 reply; 70+ messages in thread
From: Ingo Molnar @ 2014-01-21 14:03 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Adrian Hunter, Arnaldo Carvalho de Melo,
	Arnaldo Carvalho de Melo, Peter Zijlstra, Cong Ding,
	H. Peter Anvin, Ingo Molnar, Kees Cook,
	Linux Kernel Mailing List, Mathias Krause, Michael Davidson,
	Thomas Gleixner, Wei Yongjun


* H. Peter Anvin <hpa@zytor.com> wrote:

> On 01/21/2014 02:27 AM, Ingo Molnar wrote:
> > 
> > Hm, live annotation of the kernel image is a relatively new perf 
> > feature, and KASLR predated that (by years) - which would at least in 
> > part explain why it went unnoticed. (Although it does not excuse the 
> > lack of testing.)
> 
> kASLR is new, but on 32 bits we have relocated the kernel for a long 
> time. [...]

I doubt many people develop on 32-bit x86, and the group of people 
looking at annotated 32-bit assembly kernel profiles ought to be 
another order of magnitude smaller than that ...

> [...]  kASLR is the first use case of relocating the 64-bit kernel, 
> though.

Yeah.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [GIT PULL] x86/kaslr for v3.14
  2014-01-21 14:03         ` Ingo Molnar
@ 2014-01-21 14:05           ` H. Peter Anvin
  2014-01-21 14:14             ` Ingo Molnar
  0 siblings, 1 reply; 70+ messages in thread
From: H. Peter Anvin @ 2014-01-21 14:05 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Adrian Hunter, Arnaldo Carvalho de Melo,
	Arnaldo Carvalho de Melo, Peter Zijlstra, Cong Ding,
	H. Peter Anvin, Ingo Molnar, Kees Cook,
	Linux Kernel Mailing List, Mathias Krause, Michael Davidson,
	Thomas Gleixner, Wei Yongjun

On 01/21/2014 06:03 AM, Ingo Molnar wrote:
> 
> * H. Peter Anvin <hpa@zytor.com> wrote:
> 
>> On 01/21/2014 02:27 AM, Ingo Molnar wrote:
>>>
>>> Hm, live annotation of the kernel image is a relatively new perf 
>>> feature, and KASLR predated that (by years) - which would at least in 
>>> part explain why it went unnoticed. (Although it does not excuse the 
>>> lack of testing.)
>>
>> kASLR is new, but on 32 bits we have relocated the kernel for a long 
>> time. [...]
> 
> I doubt many people develop on 32-bit x86, and the group of people 
> looking at annotated 32-bit assembly kernel profiles ought to be 
> another order of magnitude smaller than that ...
> 

Yes... I was commenting on the statement that "kASLR predated that by
years".  It hasn't been common.

	-hpa


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [GIT PULL] x86/kaslr for v3.14
  2014-01-21 14:05           ` H. Peter Anvin
@ 2014-01-21 14:14             ` Ingo Molnar
  2014-01-21 14:17               ` H. Peter Anvin
  0 siblings, 1 reply; 70+ messages in thread
From: Ingo Molnar @ 2014-01-21 14:14 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Adrian Hunter, Arnaldo Carvalho de Melo,
	Arnaldo Carvalho de Melo, Peter Zijlstra, Cong Ding,
	H. Peter Anvin, Ingo Molnar, Kees Cook,
	Linux Kernel Mailing List, Mathias Krause, Michael Davidson,
	Thomas Gleixner, Wei Yongjun


* H. Peter Anvin <hpa@zytor.com> wrote:

> On 01/21/2014 06:03 AM, Ingo Molnar wrote:
> > 
> > * H. Peter Anvin <hpa@zytor.com> wrote:
> > 
> >> On 01/21/2014 02:27 AM, Ingo Molnar wrote:
> >>>
> >>> Hm, live annotation of the kernel image is a relatively new perf 
> >>> feature, and KASLR predated that (by years) - which would at least in 
> >>> part explain why it went unnoticed. (Although it does not excuse the 
> >>> lack of testing.)
> >>
> >> kASLR is new, but on 32 bits we have relocated the kernel for a long 
> >> time. [...]
> > 
> > I doubt many people develop on 32-bit x86, and the group of people 
> > looking at annotated 32-bit assembly kernel profiles ought to be 
> > another order of magnitude smaller than that ...
> 
> Yes... I was commenting on the statement that "kASLR predated that 
> by years".  It hasn't been common.

Ah, I didn't mean to suggest that it's an old upstream feature: what I 
mean is that the KASLR patch is pretty old, and it has been deployed 
by the Chromium guys for quite some time, and by others?

It was just never combined with perf live annotation which is a recent 
perf feature.

Anyway ... I suspect it's the fixing of the bug that matters most, not 
its genealogy ;)

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [GIT PULL] x86/kaslr for v3.14
  2014-01-21 14:14             ` Ingo Molnar
@ 2014-01-21 14:17               ` H. Peter Anvin
  0 siblings, 0 replies; 70+ messages in thread
From: H. Peter Anvin @ 2014-01-21 14:17 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Adrian Hunter, Arnaldo Carvalho de Melo,
	Arnaldo Carvalho de Melo, Peter Zijlstra, Cong Ding,
	H. Peter Anvin, Ingo Molnar, Kees Cook,
	Linux Kernel Mailing List, Mathias Krause, Michael Davidson,
	Thomas Gleixner, Wei Yongjun

On 01/21/2014 06:14 AM, Ingo Molnar wrote:
> 
> Ah, I didn't mean to suggest that it's an old upstream feature: what I 
> mean is that the KASLR patch is pretty old, and it has been deployed 
> by the Chromium guys for quite some time, and by others?
> 
> It was just never combined with perf live annotation which is a recent 
> perf feature.
> 

Ah, right.  Parallel convergence.

> Anyway ... I suspect it's the fixing of the bug that matters most, not 
> its genealogy ;)

Indeed.

	-hpa



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [GIT PULL] x86/kaslr for v3.14
  2014-01-21  9:00       ` Peter Zijlstra
@ 2014-01-21 14:20         ` H. Peter Anvin
  2014-01-21 14:39           ` Ingo Molnar
  2014-01-21 18:37           ` Kees Cook
  0 siblings, 2 replies; 70+ messages in thread
From: H. Peter Anvin @ 2014-01-21 14:20 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Arnaldo Carvalho de Melo, Cong Ding,
	H. Peter Anvin, Ingo Molnar, Ingo Molnar, Kees Cook,
	Linux Kernel Mailing List, Mathias Krause, Michael Davidson,
	Thomas Gleixner, Wei Yongjun

On 01/21/2014 01:00 AM, Peter Zijlstra wrote:
>>
>> So this is presumably something that needs to be fixed in perf?
> 
> Where do we learn about the offset from userspace?
> 

Now this is tricky... if this offset is too easy to get it completely
defeats kASLR.  On the other hand, I presume that if we are exporting
/proc/kcore we're not secure anyway.  Kees, I assume that in "secure"
mode perf annotations simply wouldn't work anyway?

	-hpa


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [GIT PULL] x86/kaslr for v3.14
  2014-01-21 14:20         ` H. Peter Anvin
@ 2014-01-21 14:39           ` Ingo Molnar
  2014-01-21 14:51             ` H. Peter Anvin
  2014-01-21 18:37           ` Kees Cook
  1 sibling, 1 reply; 70+ messages in thread
From: Ingo Molnar @ 2014-01-21 14:39 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Peter Zijlstra, Linus Torvalds, Arnaldo Carvalho de Melo,
	Cong Ding, H. Peter Anvin, Ingo Molnar, Kees Cook,
	Linux Kernel Mailing List, Mathias Krause, Michael Davidson,
	Thomas Gleixner, Wei Yongjun


* H. Peter Anvin <hpa@zytor.com> wrote:

> On 01/21/2014 01:00 AM, Peter Zijlstra wrote:
> >>
> >> So this is presumably something that needs to be fixed in perf?
> > 
> > Where do we learn about the offset from userspace?
> > 
> 
> Now this is tricky... if this offset is too easy to get it 
> completely defeats kASLR.  On the other hand, I presume that if we 
> are exporting /proc/kcore we're not secure anyway.  Kees, I assume 
> that in "secure" mode perf annotations simply wouldn't work anyway?

So /proc/kcore is:

  aldebaran:~> ls -l /proc/kcore 
  -r-------- 1 root root 140737486266368 Jan 21 15:35 /proc/kcore

i.e. root only.

The thing is, one of my first remarks on this whole KASLR series was 
that tooling needs to work. I suggested that the kernel should only 
expose non-randomized addresses and that all facilities need to 
continue to 'just work' with those. That argument was ignored AFAICS 
and the problem still isn't solved.

I'd argue that solving it in the kernel instead of making all tooling 
variants aware of KASLR one by one is a far more intelligent and 
efficient solution ...

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [GIT PULL] x86/kaslr for v3.14
  2014-01-21 14:39           ` Ingo Molnar
@ 2014-01-21 14:51             ` H. Peter Anvin
  2014-01-21 14:56               ` Ingo Molnar
  0 siblings, 1 reply; 70+ messages in thread
From: H. Peter Anvin @ 2014-01-21 14:51 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Linus Torvalds, Arnaldo Carvalho de Melo,
	Cong Ding, H. Peter Anvin, Ingo Molnar, Kees Cook,
	Linux Kernel Mailing List, Mathias Krause, Michael Davidson,
	Thomas Gleixner, Wei Yongjun

On 01/21/2014 06:39 AM, Ingo Molnar wrote:
>>
>> Now this is tricky... if this offset is too easy to get it 
>> completely defeats kASLR.  On the other hand, I presume that if we 
>> are exporting /proc/kcore we're not secure anyway.  Kees, I assume 
>> that in "secure" mode perf annotations simply wouldn't work anyway?
> 
> So /proc/kcore is:
> 
>   aldebaran:~> ls -l /proc/kcore 
>   -r-------- 1 root root 140737486266368 Jan 21 15:35 /proc/kcore
> 
> i.e. root only.
> 

Yes, I mean if we want to be sealed against user space intrusion I'm
assuming /proc/kcore is unavailable, but that we don't

> The thing is, one of my first remarks on this whole KASLR series was 
> that tooling needs to work. I suggested that the kernel should only 
> expose non-randomized addresses and that all facilities need to 
> continue to 'just work' with those. That argument was ignored AFAICS 
> and the problem still isn't solved.
> 
> I'd argue that solving it in the kernel instead of making all tooling 
> variants aware of KASLR one by one is a far more intelligent and 
> efficient solution ...

Not ignored, but found not to really work all that well (we had that
discussion in the context of relocated kernels, too.)  The problem you
end up with is that as soon as you run into situations where you have to
deal with pointers during debugging, be it using kgdb, stack dumps or
whatever, all the work that you have done in the kernel to try to hide
relocation from the debug infrastructure all of a sudden becomes a huge
liability, and ends up backfiring in a horrific way.

	-hpa


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [GIT PULL] x86/kaslr for v3.14
  2014-01-21 14:51             ` H. Peter Anvin
@ 2014-01-21 14:56               ` Ingo Molnar
  0 siblings, 0 replies; 70+ messages in thread
From: Ingo Molnar @ 2014-01-21 14:56 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Peter Zijlstra, Linus Torvalds, Arnaldo Carvalho de Melo,
	Cong Ding, H. Peter Anvin, Ingo Molnar, Kees Cook,
	Linux Kernel Mailing List, Mathias Krause, Michael Davidson,
	Thomas Gleixner, Wei Yongjun


* H. Peter Anvin <hpa@zytor.com> wrote:

> > The thing is, one of my first remarks on this whole KASLR series 
> > was that tooling needs to work. I suggested that the kernel should 
> > only expose non-randomized addresses and that all facilities need 
> > to continue to 'just work' with those. That argument was ignored 
> > AFAICS and the problem still isn't solved.
> > 
> > I'd argue that solving it in the kernel instead of making all 
> > tooling variants aware of KASLR one by one is a far more 
> > intelligent and efficient solution ...
> 
> Not ignored, but found not to really work all that well (we had that 
> discussion in the context of relocated kernels, too.)  The problem 
> you end up with is that as soon as you run into situations where you 
> have to deal with pointers during debugging, be it using kgdb, stack 
> dumps or whatever, all the work that you have done in the kernel to 
> try to hide relocation from the debug infrastructure all of a sudden 
> becomes a huge liability, and ends up backfiring in a horrific way.

The thing is, that 'huge liability' is now pushed into tooling, which 
isn't in any better position to judge a piece of data in a backtrace 
than the kernel - in fact it's in an arguably worse position, as it 
does not generate that data.

kgdb is an entirely different animal, I'm talking about the 99% 
usecase: code profiling and tooling interpreting code addresses that 
come from the kernel.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [GIT PULL] x86/kaslr for v3.14
  2014-01-21 14:20         ` H. Peter Anvin
  2014-01-21 14:39           ` Ingo Molnar
@ 2014-01-21 18:37           ` Kees Cook
  1 sibling, 0 replies; 70+ messages in thread
From: Kees Cook @ 2014-01-21 18:37 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Peter Zijlstra, Linus Torvalds, Arnaldo Carvalho de Melo,
	Cong Ding, H. Peter Anvin, Ingo Molnar, Ingo Molnar,
	Linux Kernel Mailing List, Mathias Krause, Michael Davidson,
	Thomas Gleixner, Wei Yongjun

On Tue, Jan 21, 2014 at 6:20 AM, H. Peter Anvin <hpa@zytor.com> wrote:
> On 01/21/2014 01:00 AM, Peter Zijlstra wrote:
>>>
>>> So this is presumably something that needs to be fixed in perf?
>>
>> Where do we learn about the offset from userspace?
>
> Now this is tricky... if this offset is too easy to get it completely
> defeats kASLR.  On the other hand, I presume that if we are exporting
> /proc/kcore we're not secure anyway.  Kees, I assume that in "secure"
> mode perf annotations simply wouldn't work anyway?

The goal scope of the kernel base address randomization is to keep it
secret from non-root users, confined processes, and/or remote systems.
For local secrecy, if you're running with kaslr and you haven't set
kptr_restrict, dmesg_restrict, and perf_event_paranoid, that's a
problem since you're likely leaking things trivially through
/proc/kallsyms, dmesg, and/or perf.

-Kees

-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [GIT PULL] x86/kaslr for v3.14
  2014-01-20 22:54 ` Linus Torvalds
                     ` (2 preceding siblings ...)
  2014-01-21  5:18   ` Kees Cook
@ 2014-01-23  9:39   ` Pavel Machek
  3 siblings, 0 replies; 70+ messages in thread
From: Pavel Machek @ 2014-01-23  9:39 UTC (permalink / raw)
  To: Linus Torvalds, Rafael J. Wysocki
  Cc: H. Peter Anvin, Cong Ding, H. Peter Anvin, Ingo Molnar,
	Ingo Molnar, Kees Cook, Linux Kernel Mailing List,
	Mathias Krause, Michael Davidson, Thomas Gleixner, Wei Yongjun

On Mon 2014-01-20 14:54:12, Linus Torvalds wrote:
> So I pulled this, but one question:
> 
> On Mon, Jan 20, 2014 at 8:47 AM, H. Peter Anvin <hpa@zytor.com> wrote:
> > +config RANDOMIZE_BASE
> > +       bool "Randomize the address of the kernel image"
> > +       depends on RELOCATABLE
> > +       depends on !HIBERNATION
> 
> How fundamental is that "!HIBERNATION" issue? Right now that
> anti-dependency on hibernation support will mean that no distro kernel
> will actually use the kernel address space randomization. Which
> long-term is a problem.

Hibernation does some interesting tricks, and it did depend on image
layout staying same.

Rafael did some great work on x86-64 to enable resuming different
kernel version, so this should no longer be a problem (on
x86-64). Just test it, failure will not be subtle.
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [GIT PULL] x86/kaslr for v3.14
  2014-01-20 16:47 [GIT PULL] x86/kaslr for v3.14 H. Peter Anvin
  2014-01-20 22:54 ` Linus Torvalds
@ 2014-01-26 10:16 ` Richard Weinberger
  2014-01-27  5:33   ` H. Peter Anvin
  1 sibling, 1 reply; 70+ messages in thread
From: Richard Weinberger @ 2014-01-26 10:16 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Cong Ding, H. Peter Anvin, Ingo Molnar,
	Ingo Molnar, Kees Cook, Linux Kernel Mailing List,
	Mathias Krause, Michael Davidson, Thomas Gleixner, Wei Yongjun

On Mon, Jan 20, 2014 at 5:47 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> Hi Linus,
>
> This enables kernel address space randomization for x86.
>
> The following changes since commit d0e639c9e06d44e713170031fe05fb60ebe680af:
>
>   Linux 3.12-rc4 (2013-10-06 14:00:20 -0700)
>
> are available in the git repository at:
>
>   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86-kaslr-for-linus
>
> The head of this tree is da2b6fb990cf782b18952f534ec7323453bc4fc9.
>
>   x86, kaslr: Clarify RANDOMIZE_BASE_MAX_OFFSET (2014-01-14 10:45:56 -0800)
>
> ----------------------------------------------------------------
> H. Peter Anvin (2):
>       x86, boot: Rename get_flags() and check_flags() to *_cpuflags()
>       x86, kaslr: Add a circular multiply for better bit diffusion
>
> Kees Cook (10):
>       x86, boot: Move CPU flags out of cpucheck
>       x86, kaslr: Return location from decompress_kernel
>       x86, kaslr: Provide randomness functions
>       x86, kaslr: Select random position from e820 maps
>       x86, kaslr: Report kernel offset on panic
>       x86, kaslr: Raise the maximum virtual address to -1 GiB on x86_64
>       x86/relocs: Add percpu fixup for GNU ld 2.23
>       x86, kaslr: Mix entropy sources together as needed
>       x86, kaslr: Use char array to gain sizeof sanity
>       x86, kaslr: Clarify RANDOMIZE_BASE_MAX_OFFSET
>
> Michael Davidson (1):
>       x86, relocs: Add more per-cpu gold special cases
>
> Wei Yongjun (1):
>       x86, kaslr: Remove unused including <linux/version.h>

Currently we print the kernel offset only upon a panic() using the
panic notifier list.
This way it does not show up if the kernel hits a BUG() in process
context or something less critical.
Wouldn't make more sense to report the offset in every dump_stack() or
show_regs() call?
Or at least add a hint that the kernel was built with kaslr enabled.

I'm a heavy user of addr2line -e vmlinux <booboo-happened-here>, so I
want to make sure that I don't waste
too much time with finding out that the addresses are useless. :-)

-- 
Thanks,
//richard

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [GIT PULL] x86/kaslr for v3.14
  2014-01-26 10:16 ` Richard Weinberger
@ 2014-01-27  5:33   ` H. Peter Anvin
  2014-01-27  6:49     ` Richard Weinberger
  0 siblings, 1 reply; 70+ messages in thread
From: H. Peter Anvin @ 2014-01-27  5:33 UTC (permalink / raw)
  To: Richard Weinberger, H. Peter Anvin
  Cc: Linus Torvalds, Cong Ding, Ingo Molnar, Ingo Molnar, Kees Cook,
	Linux Kernel Mailing List, Mathias Krause, Michael Davidson,
	Thomas Gleixner, Wei Yongjun

On 01/26/2014 02:16 AM, Richard Weinberger wrote:
> 
> Currently we print the kernel offset only upon a panic() using the
> panic notifier list.
> This way it does not show up if the kernel hits a BUG() in process
> context or something less critical.
> Wouldn't make more sense to report the offset in every dump_stack() or
> show_regs() call?

No, because that information is available to user space unless we panic.

	-hpa


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [GIT PULL] x86/kaslr for v3.14
  2014-01-27  5:33   ` H. Peter Anvin
@ 2014-01-27  6:49     ` Richard Weinberger
  2014-01-27  6:51       ` H. Peter Anvin
                         ` (2 more replies)
  0 siblings, 3 replies; 70+ messages in thread
From: Richard Weinberger @ 2014-01-27  6:49 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: H. Peter Anvin, Linus Torvalds, Cong Ding, Ingo Molnar,
	Ingo Molnar, Kees Cook, Linux Kernel Mailing List,
	Mathias Krause, Michael Davidson, Thomas Gleixner, Wei Yongjun

On Mon, Jan 27, 2014 at 6:33 AM, H. Peter Anvin <hpa@linux.intel.com> wrote:
> On 01/26/2014 02:16 AM, Richard Weinberger wrote:
>>
>> Currently we print the kernel offset only upon a panic() using the
>> panic notifier list.
>> This way it does not show up if the kernel hits a BUG() in process
>> context or something less critical.
>> Wouldn't make more sense to report the offset in every dump_stack() or
>> show_regs() call?
>
> No, because that information is available to user space unless we panic.

Didn't you mean non-root?
I thought one has to set dmesg_restrict anyways if kASLR is used.

And isn't the offset available to perf too?
Of course only for root, but still user space.

-- 
Thanks,
//richard

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [GIT PULL] x86/kaslr for v3.14
  2014-01-27  6:49     ` Richard Weinberger
@ 2014-01-27  6:51       ` H. Peter Anvin
  2014-01-27  7:38         ` Ingo Molnar
  2014-01-30 22:07         ` Vivek Goyal
  2014-01-27  6:52       ` H. Peter Anvin
  2014-01-27 17:05       ` Kees Cook
  2 siblings, 2 replies; 70+ messages in thread
From: H. Peter Anvin @ 2014-01-27  6:51 UTC (permalink / raw)
  To: Richard Weinberger, H. Peter Anvin
  Cc: Linus Torvalds, Cong Ding, Ingo Molnar, Ingo Molnar, Kees Cook,
	Linux Kernel Mailing List, Mathias Krause, Michael Davidson,
	Thomas Gleixner, Wei Yongjun

On 01/26/2014 10:49 PM, Richard Weinberger wrote:
>>
>> No, because that information is available to user space unless we panic.
> 
> Didn't you mean non-root?
> I thought one has to set dmesg_restrict anyways if kASLR is used.
> 
> And isn't the offset available to perf too?
> Of course only for root, but still user space.
> 

For certain system security levels one want to protect even from a rogue
root.  In those cases, leaking that information via dmesg and perf isn't
going to work, either.

With lower security settings, by all means...

	-hpa


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [GIT PULL] x86/kaslr for v3.14
  2014-01-27  6:49     ` Richard Weinberger
  2014-01-27  6:51       ` H. Peter Anvin
@ 2014-01-27  6:52       ` H. Peter Anvin
  2014-01-27  7:34         ` Richard Weinberger
  2014-01-27 17:05       ` Kees Cook
  2 siblings, 1 reply; 70+ messages in thread
From: H. Peter Anvin @ 2014-01-27  6:52 UTC (permalink / raw)
  To: Richard Weinberger, H. Peter Anvin
  Cc: Linus Torvalds, Cong Ding, Ingo Molnar, Ingo Molnar, Kees Cook,
	Linux Kernel Mailing List, Mathias Krause, Michael Davidson,
	Thomas Gleixner, Wei Yongjun

Of course, stack traces themselves contain that information, so one
could argue that oops=panic is required in order for kASLR to provide
any kind of security against root.  (oops=panic is probably a good idea
in secure environments anyway...)

	-hpa


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [GIT PULL] x86/kaslr for v3.14
  2014-01-27  6:52       ` H. Peter Anvin
@ 2014-01-27  7:34         ` Richard Weinberger
  0 siblings, 0 replies; 70+ messages in thread
From: Richard Weinberger @ 2014-01-27  7:34 UTC (permalink / raw)
  To: H. Peter Anvin, H. Peter Anvin
  Cc: Linus Torvalds, Cong Ding, Ingo Molnar, Ingo Molnar, Kees Cook,
	Linux Kernel Mailing List, Mathias Krause, Michael Davidson,
	Thomas Gleixner, Wei Yongjun

Am 27.01.2014 07:52, schrieb H. Peter Anvin:
> Of course, stack traces themselves contain that information, so one
> could argue that oops=panic is required in order for kASLR to provide
> any kind of security against root.  (oops=panic is probably a good idea
> in secure environments anyway...)

Now I understand your point.

/proc/<pid>/stack and a world-readable /boot also need to be disabled.
Deploying a secure kASLR is not easy, especially for end-user distros.

Maybe a CONFIG_RANDOMIZE_BASE_I_MEAN_IT which disables various sources of
information leakage would help too. ;-)

Thanks,
//richard

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [GIT PULL] x86/kaslr for v3.14
  2014-01-27  6:51       ` H. Peter Anvin
@ 2014-01-27  7:38         ` Ingo Molnar
  2014-01-27  7:43           ` Ingo Molnar
  2014-01-27  7:59           ` Richard Weinberger
  2014-01-30 22:07         ` Vivek Goyal
  1 sibling, 2 replies; 70+ messages in thread
From: Ingo Molnar @ 2014-01-27  7:38 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Richard Weinberger, H. Peter Anvin, Linus Torvalds, Cong Ding,
	Ingo Molnar, Kees Cook, Linux Kernel Mailing List,
	Mathias Krause, Michael Davidson, Thomas Gleixner, Wei Yongjun


* H. Peter Anvin <hpa@zytor.com> wrote:

> On 01/26/2014 10:49 PM, Richard Weinberger wrote:
> >>
> >> No, because that information is available to user space unless we panic.
> > 
> > Didn't you mean non-root?
> > I thought one has to set dmesg_restrict anyways if kASLR is used.
> > 
> > And isn't the offset available to perf too?
> > Of course only for root, but still user space.
> > 
> 
> For certain system security levels one want to protect even from a 
> rogue root.  In those cases, leaking that information via dmesg and 
> perf isn't going to work, either.
> 
> With lower security settings, by all means...

The 'no' was categorical and unconditional though, so is the right 
answer perhaps something more along the lines of:

  'Yes, the random offset can be reported in an oops, as long as
   high security setups can turn off the reporting of the offset,
   in their idealistic attempt to protect the system against root.'

?

I also still think that in addition to reporting the offset, 
automatically 'un-randomizing' the oopses and warnings would be useful 
as well: with a clear to recognize indicator used for every value 
unrandomized, such as capitalizing their first hexa digit.

Let me show a mockup of how I think it could work:

raw 64-bit original:

    [  246.085174]  <IRQ>  [<ffffffff8264fbf6>] dump_stack+0x46/0x58
    [  246.098352]  [<ffffffff82054fb6>] warn_slowpath_fmt+0x46/0x50
    [  246.104786]  [<ffffffff825710d6>] dev_watchdog+0x246/0x250
    [  246.110923]  [<ffffffff82570e90>] ? dev_deactivate_queue.constprop.31+0x80/0x80
    [  246.119097]  [<ffffffff8206092a>] call_timer_fn+0x3a/0x110
    [  246.125224]  [<ffffffff8206280f>] ? update_process_times+0x6f/0x80

64-bit un-randomized:

    [  246.085174]  <IRQ>  [<FFFFFFFF8164fbf6>] dump_stack+0x46/0x58
    [  246.091633]  [<FFFFFFFF81054ecc>] warn_slowpath_common+0x8c/0xc0
    [  246.098352]  [<FFFFFFFF81054fb6>] warn_slowpath_fmt+0x46/0x50
    [  246.104786]  [<FFFFFFFF815710d6>] dev_watchdog+0x246/0x250
    [  246.110923]  [<FFFFFFFF81570e90>] ? dev_deactivate_queue.constprop.31+0x80/0x80
    [  246.119097]  [<FFFFFFFF8106092a>] call_timer_fn+0x3a/0x110
    [  246.125224]  [<FFFFFFFF8106280f>] ? update_process_times+0x6f/0x80

Note how the hex values of unrandomized kernel text start with capital 
letters, and how their values match up System.map and vmlinux symbol 
values.

raw 32-bit randomized:

[   39.054098]  [<c20ded55>] ? __jump_label_update+0x45/0x60
[   39.064852]  [<c2057aa2>] ? queue_work_on+0x32/0x70
[   39.074570]  [<c20085b1>] ? mark_tsc_unstable+0x21/0x60
[   39.084980]  [<c2f03af6>] ? tsc_init+0x326/0x344
[   39.094175]  [<c2eff9c5>] ? start_kernel+0x2c7/0x356

32-bit un-randomized:

[   39.054098]  [<C10ded55>] ? __jump_label_update+0x45/0x60
[   39.064852]  [<C1057aa2>] ? queue_work_on+0x32/0x70
[   39.074570]  [<C10085b1>] ? mark_tsc_unstable+0x21/0x60
[   39.084980]  [<C1f03af6>] ? tsc_init+0x326/0x344
[   39.094175]  [<C1eff9c5>] ? start_kernel+0x2c7/0x356

This looks eminently useful to me, I could plug those hexa values into 
gdb straight away to look up a symbol instead of having to subtract 
the random offset first.

This would do 99% of the unrandomizing job for the user/developer (and 
not the least, for tooling), without obfuscating oopses as it would be 
clear on which values the unrandomizing was performed, without losing 
information.

But only reporting the random offset with every raw oops and stack 
dump would lead to equivalent information as well. We can tweak this 
as we go, it's not like these details are ABIs - I just still think 
that the KASLR feature isn't nearly as utility oriented as it should 
be.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [GIT PULL] x86/kaslr for v3.14
  2014-01-27  7:38         ` Ingo Molnar
@ 2014-01-27  7:43           ` Ingo Molnar
  2014-01-27  7:59           ` Richard Weinberger
  1 sibling, 0 replies; 70+ messages in thread
From: Ingo Molnar @ 2014-01-27  7:43 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Richard Weinberger, H. Peter Anvin, Linus Torvalds, Cong Ding,
	Ingo Molnar, Kees Cook, Linux Kernel Mailing List,
	Mathias Krause, Michael Davidson, Thomas Gleixner, Wei Yongjun


* Ingo Molnar <mingo@kernel.org> wrote:

> 
> * H. Peter Anvin <hpa@zytor.com> wrote:
> 
> > On 01/26/2014 10:49 PM, Richard Weinberger wrote:
> > >>
> > >> No, because that information is available to user space unless we panic.
> > > 
> > > Didn't you mean non-root?
> > > I thought one has to set dmesg_restrict anyways if kASLR is used.
> > > 
> > > And isn't the offset available to perf too?
> > > Of course only for root, but still user space.
> > > 
> > 
> > For certain system security levels one want to protect even from a 
> > rogue root.  In those cases, leaking that information via dmesg and 
> > perf isn't going to work, either.
> > 
> > With lower security settings, by all means...
> 
> The 'no' was categorical and unconditional though, so is the right 
> answer perhaps something more along the lines of:
> 
>   'Yes, the random offset can be reported in an oops, as long as
>    high security setups can turn off the reporting of the offset,
>    in their idealistic attempt to protect the system against root.'
> 
> ?

'reporting of the offset' should probably be 'reporting kernel data' - 
there's many possible ways an oops (and its associated raw stack dump) 
can leak the offset, I'm not sure this can ever be made 'safe' against 
a rougue root.

Not giving kernel originated debug information at all would. (At the 
cost of reducing the utility of having that root user around, of 
course.)

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [GIT PULL] x86/kaslr for v3.14
  2014-01-27  7:38         ` Ingo Molnar
  2014-01-27  7:43           ` Ingo Molnar
@ 2014-01-27  7:59           ` Richard Weinberger
  1 sibling, 0 replies; 70+ messages in thread
From: Richard Weinberger @ 2014-01-27  7:59 UTC (permalink / raw)
  To: Ingo Molnar, H. Peter Anvin
  Cc: H. Peter Anvin, Linus Torvalds, Cong Ding, Ingo Molnar,
	Kees Cook, Linux Kernel Mailing List, Mathias Krause,
	Michael Davidson, Thomas Gleixner, Wei Yongjun

Am 27.01.2014 08:38, schrieb Ingo Molnar:
> 
> * H. Peter Anvin <hpa@zytor.com> wrote:
> 
>> On 01/26/2014 10:49 PM, Richard Weinberger wrote:
>>>>
>>>> No, because that information is available to user space unless we panic.
>>>
>>> Didn't you mean non-root?
>>> I thought one has to set dmesg_restrict anyways if kASLR is used.
>>>
>>> And isn't the offset available to perf too?
>>> Of course only for root, but still user space.
>>>
>>
>> For certain system security levels one want to protect even from a 
>> rogue root.  In those cases, leaking that information via dmesg and 
>> perf isn't going to work, either.
>>
>> With lower security settings, by all means...
> 
> The 'no' was categorical and unconditional though, so is the right 
> answer perhaps something more along the lines of:
> 
>   'Yes, the random offset can be reported in an oops, as long as
>    high security setups can turn off the reporting of the offset,
>    in their idealistic attempt to protect the system against root.'
> 
> ?
> 
> I also still think that in addition to reporting the offset, 
> automatically 'un-randomizing' the oopses and warnings would be useful 
> as well: with a clear to recognize indicator used for every value 
> unrandomized, such as capitalizing their first hexa digit.
> 
> Let me show a mockup of how I think it could work:
> 
> raw 64-bit original:
> 
>     [  246.085174]  <IRQ>  [<ffffffff8264fbf6>] dump_stack+0x46/0x58
>     [  246.098352]  [<ffffffff82054fb6>] warn_slowpath_fmt+0x46/0x50
>     [  246.104786]  [<ffffffff825710d6>] dev_watchdog+0x246/0x250
>     [  246.110923]  [<ffffffff82570e90>] ? dev_deactivate_queue.constprop.31+0x80/0x80
>     [  246.119097]  [<ffffffff8206092a>] call_timer_fn+0x3a/0x110
>     [  246.125224]  [<ffffffff8206280f>] ? update_process_times+0x6f/0x80
> 
> 64-bit un-randomized:
> 
>     [  246.085174]  <IRQ>  [<FFFFFFFF8164fbf6>] dump_stack+0x46/0x58
>     [  246.091633]  [<FFFFFFFF81054ecc>] warn_slowpath_common+0x8c/0xc0
>     [  246.098352]  [<FFFFFFFF81054fb6>] warn_slowpath_fmt+0x46/0x50
>     [  246.104786]  [<FFFFFFFF815710d6>] dev_watchdog+0x246/0x250
>     [  246.110923]  [<FFFFFFFF81570e90>] ? dev_deactivate_queue.constprop.31+0x80/0x80
>     [  246.119097]  [<FFFFFFFF8106092a>] call_timer_fn+0x3a/0x110
>     [  246.125224]  [<FFFFFFFF8106280f>] ? update_process_times+0x6f/0x80
> 
> Note how the hex values of unrandomized kernel text start with capital 
> letters, and how their values match up System.map and vmlinux symbol 
> values.
> 
> raw 32-bit randomized:
> 
> [   39.054098]  [<c20ded55>] ? __jump_label_update+0x45/0x60
> [   39.064852]  [<c2057aa2>] ? queue_work_on+0x32/0x70
> [   39.074570]  [<c20085b1>] ? mark_tsc_unstable+0x21/0x60
> [   39.084980]  [<c2f03af6>] ? tsc_init+0x326/0x344
> [   39.094175]  [<c2eff9c5>] ? start_kernel+0x2c7/0x356
> 
> 32-bit un-randomized:
> 
> [   39.054098]  [<C10ded55>] ? __jump_label_update+0x45/0x60
> [   39.064852]  [<C1057aa2>] ? queue_work_on+0x32/0x70
> [   39.074570]  [<C10085b1>] ? mark_tsc_unstable+0x21/0x60
> [   39.084980]  [<C1f03af6>] ? tsc_init+0x326/0x344
> [   39.094175]  [<C1eff9c5>] ? start_kernel+0x2c7/0x356
> 
> This looks eminently useful to me, I could plug those hexa values into 
> gdb straight away to look up a symbol instead of having to subtract 
> the random offset first.
> 
> This would do 99% of the unrandomizing job for the user/developer (and 
> not the least, for tooling), without obfuscating oopses as it would be 
> clear on which values the unrandomizing was performed, without losing 
> information.

I like this idea.

Hopefully nothing breaks if the mix lower and upper case hex numbers. =)
If so we could still inject a line like
"[<fffffffffffffffe>] __unrandomize_addr+0x0/0x0" into the trace
to mark a an un-randomized one.
Or a <UN-RANDOM> like <IRQ> on x86_64...

Thanks,
//richard

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [GIT PULL] x86/kaslr for v3.14
  2014-01-27  6:49     ` Richard Weinberger
  2014-01-27  6:51       ` H. Peter Anvin
  2014-01-27  6:52       ` H. Peter Anvin
@ 2014-01-27 17:05       ` Kees Cook
  2014-01-27 17:20         ` Richard Weinberger
  2 siblings, 1 reply; 70+ messages in thread
From: Kees Cook @ 2014-01-27 17:05 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: H. Peter Anvin, H. Peter Anvin, Linus Torvalds, Cong Ding,
	Ingo Molnar, Ingo Molnar, Linux Kernel Mailing List,
	Mathias Krause, Michael Davidson, Thomas Gleixner, Wei Yongjun

On Sun, Jan 26, 2014 at 10:49 PM, Richard Weinberger
<richard.weinberger@gmail.com> wrote:
> On Mon, Jan 27, 2014 at 6:33 AM, H. Peter Anvin <hpa@linux.intel.com> wrote:
>> On 01/26/2014 02:16 AM, Richard Weinberger wrote:
>>>
>>> Currently we print the kernel offset only upon a panic() using the
>>> panic notifier list.
>>> This way it does not show up if the kernel hits a BUG() in process
>>> context or something less critical.
>>> Wouldn't make more sense to report the offset in every dump_stack() or
>>> show_regs() call?
>>
>> No, because that information is available to user space unless we panic.
>
> Didn't you mean non-root?
> I thought one has to set dmesg_restrict anyways if kASLR is used.
>
> And isn't the offset available to perf too?
> Of course only for root, but still user space.

Setting dmesg_restrict is done mostly in an effort to try to lock down
access to dmesg since it'll likely contain enough clues to help an
attacker. System owners need to avoid dmesg getting sprayed into
/var/log world-readable, or available via privileged debugging
daemons, etc. Since keeping dmesg secret from non-root users is going
to be error-prone, I had a goal of keeping the offset out of dmesg
while the system is still running -- hence doing it only at panic
time.

Finding the offset as the (unconfined) root user is extremely easy, so
I personally see no reason to hide it from root (and it would be very
irritating for things like perf, too). I view kASLR as a tool for
statistical defense against confined processes or remote attacks.

I would argue that decoding a non-panic oops on a running system is
entirely possible as-is, since the offset can be found from
/proc/kallsyms as root. It was the dead system that needed the offset
exported: via text in the panic, or via an ELF note in a core.

-Kees

-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [GIT PULL] x86/kaslr for v3.14
  2014-01-27 17:05       ` Kees Cook
@ 2014-01-27 17:20         ` Richard Weinberger
  2014-01-27 17:24           ` Kees Cook
  0 siblings, 1 reply; 70+ messages in thread
From: Richard Weinberger @ 2014-01-27 17:20 UTC (permalink / raw)
  To: Kees Cook
  Cc: H. Peter Anvin, H. Peter Anvin, Linus Torvalds, Cong Ding,
	Ingo Molnar, Ingo Molnar, Linux Kernel Mailing List,
	Mathias Krause, Michael Davidson, Thomas Gleixner, Wei Yongjun

Am 27.01.2014 18:05, schrieb Kees Cook:
> I would argue that decoding a non-panic oops on a running system is
> entirely possible as-is, since the offset can be found from
> /proc/kallsyms as root. It was the dead system that needed the offset
> exported: via text in the panic, or via an ELF note in a core.

The problem is that you have to pickup information from two sources.
As a kernel developer users/customers often show you a backtrace (oops or panic)
and want you do find the problem.
They barley manage it copy&paste the topmost full trace from dmesg or /var/log/messages.
If I have to ask them a bit later to tell me the offset from /proc/kallsyms or something else
I'm lost. Mostly because they have already rebooted the box...

Thanks,
//richard

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [GIT PULL] x86/kaslr for v3.14
  2014-01-27 17:20         ` Richard Weinberger
@ 2014-01-27 17:24           ` Kees Cook
  2014-01-28  6:28             ` Ingo Molnar
  0 siblings, 1 reply; 70+ messages in thread
From: Kees Cook @ 2014-01-27 17:24 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: H. Peter Anvin, H. Peter Anvin, Linus Torvalds, Cong Ding,
	Ingo Molnar, Ingo Molnar, Linux Kernel Mailing List,
	Mathias Krause, Michael Davidson, Thomas Gleixner, Wei Yongjun

On Mon, Jan 27, 2014 at 9:20 AM, Richard Weinberger <richard@nod.at> wrote:
> Am 27.01.2014 18:05, schrieb Kees Cook:
>> I would argue that decoding a non-panic oops on a running system is
>> entirely possible as-is, since the offset can be found from
>> /proc/kallsyms as root. It was the dead system that needed the offset
>> exported: via text in the panic, or via an ELF note in a core.
>
> The problem is that you have to pickup information from two sources.
> As a kernel developer users/customers often show you a backtrace (oops or panic)
> and want you do find the problem.
> They barley manage it copy&paste the topmost full trace from dmesg or /var/log/messages.
> If I have to ask them a bit later to tell me the offset from /proc/kallsyms or something else
> I'm lost. Mostly because they have already rebooted the box...

As long as I can turn it off, I'd be happy. :)
/proc/sys/kernel/kaslr_in_oops or something?

-Kees

-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [GIT PULL] x86/kaslr for v3.14
  2014-01-27 17:24           ` Kees Cook
@ 2014-01-28  6:28             ` Ingo Molnar
  2014-01-28  8:25               ` Richard Weinberger
  0 siblings, 1 reply; 70+ messages in thread
From: Ingo Molnar @ 2014-01-28  6:28 UTC (permalink / raw)
  To: Kees Cook
  Cc: Richard Weinberger, H. Peter Anvin, H. Peter Anvin,
	Linus Torvalds, Cong Ding, Ingo Molnar,
	Linux Kernel Mailing List, Mathias Krause, Michael Davidson,
	Thomas Gleixner, Wei Yongjun


* Kees Cook <keescook@chromium.org> wrote:

> On Mon, Jan 27, 2014 at 9:20 AM, Richard Weinberger <richard@nod.at> wrote:
> > Am 27.01.2014 18:05, schrieb Kees Cook:
> >> I would argue that decoding a non-panic oops on a running system is
> >> entirely possible as-is, since the offset can be found from
> >> /proc/kallsyms as root. It was the dead system that needed the offset
> >> exported: via text in the panic, or via an ELF note in a core.
> >
> > The problem is that you have to pickup information from two sources.
> > As a kernel developer users/customers often show you a backtrace (oops or panic)
> > and want you do find the problem.
> > They barley manage it copy&paste the topmost full trace from dmesg or /var/log/messages.
> > If I have to ask them a bit later to tell me the offset from /proc/kallsyms or something else
> > I'm lost. Mostly because they have already rebooted the box...
> 
> As long as I can turn it off, I'd be happy. :)
> /proc/sys/kernel/kaslr_in_oops or something?

Yeah, as long as it decodes by default.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [GIT PULL] x86/kaslr for v3.14
  2014-01-28  6:28             ` Ingo Molnar
@ 2014-01-28  8:25               ` Richard Weinberger
  2014-01-28 15:55                 ` H. Peter Anvin
  0 siblings, 1 reply; 70+ messages in thread
From: Richard Weinberger @ 2014-01-28  8:25 UTC (permalink / raw)
  To: Ingo Molnar, Kees Cook
  Cc: H. Peter Anvin, H. Peter Anvin, Linus Torvalds, Cong Ding,
	Ingo Molnar, Linux Kernel Mailing List, Mathias Krause,
	Michael Davidson, Thomas Gleixner, Wei Yongjun

Am 28.01.2014 07:28, schrieb Ingo Molnar:
> 
> * Kees Cook <keescook@chromium.org> wrote:
> 
>> On Mon, Jan 27, 2014 at 9:20 AM, Richard Weinberger <richard@nod.at> wrote:
>>> Am 27.01.2014 18:05, schrieb Kees Cook:
>>>> I would argue that decoding a non-panic oops on a running system is
>>>> entirely possible as-is, since the offset can be found from
>>>> /proc/kallsyms as root. It was the dead system that needed the offset
>>>> exported: via text in the panic, or via an ELF note in a core.
>>>
>>> The problem is that you have to pickup information from two sources.
>>> As a kernel developer users/customers often show you a backtrace (oops or panic)
>>> and want you do find the problem.
>>> They barley manage it copy&paste the topmost full trace from dmesg or /var/log/messages.
>>> If I have to ask them a bit later to tell me the offset from /proc/kallsyms or something else
>>> I'm lost. Mostly because they have already rebooted the box...
>>
>> As long as I can turn it off, I'd be happy. :)
>> /proc/sys/kernel/kaslr_in_oops or something?

Would be nice to have! :)

> Yeah, as long as it decodes by default.

Yep.
I like Ingo's idea (capital letters as indicators).
Are we all fine with that?

Thanks,
//richard

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [GIT PULL] x86/kaslr for v3.14
  2014-01-28  8:25               ` Richard Weinberger
@ 2014-01-28 15:55                 ` H. Peter Anvin
  2014-01-28 16:25                   ` Richard Weinberger
  0 siblings, 1 reply; 70+ messages in thread
From: H. Peter Anvin @ 2014-01-28 15:55 UTC (permalink / raw)
  To: Richard Weinberger, Ingo Molnar, Kees Cook
  Cc: H. Peter Anvin, Linus Torvalds, Cong Ding, Ingo Molnar,
	Linux Kernel Mailing List, Mathias Krause, Michael Davidson,
	Thomas Gleixner, Wei Yongjun

On 01/28/2014 12:25 AM, Richard Weinberger wrote:
> 
> Yep.
> I like Ingo's idea (capital letters as indicators).
> Are we all fine with that?
> 

I guess it is extremely unlikely that we'd have a kernel address without
any letters... the "general solutions only please" part of my brain
wants to scream to handle that corner case, though.  The most logical
way would be to change the brackets.

It doesn't matter much, though.

	-hpa



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [GIT PULL] x86/kaslr for v3.14
  2014-01-28 15:55                 ` H. Peter Anvin
@ 2014-01-28 16:25                   ` Richard Weinberger
  2014-01-28 16:30                     ` H. Peter Anvin
  0 siblings, 1 reply; 70+ messages in thread
From: Richard Weinberger @ 2014-01-28 16:25 UTC (permalink / raw)
  To: H. Peter Anvin, Ingo Molnar, Kees Cook
  Cc: H. Peter Anvin, Linus Torvalds, Cong Ding, Ingo Molnar,
	Linux Kernel Mailing List, Mathias Krause, Michael Davidson,
	Thomas Gleixner, Wei Yongjun

Am 28.01.2014 16:55, schrieb H. Peter Anvin:
> On 01/28/2014 12:25 AM, Richard Weinberger wrote:
>>
>> Yep.
>> I like Ingo's idea (capital letters as indicators).
>> Are we all fine with that?
>>
> 
> I guess it is extremely unlikely that we'd have a kernel address without
> any letters... the "general solutions only please" part of my brain
> wants to scream to handle that corner case, though.  The most logical
> way would be to change the brackets.
> 
> It doesn't matter much, though.

I fear both solutions will break various scripts.
For example scripts/markup_oops.pl looks for \[\<([a-z0-9]+)\>\].

Thanks,
//richard

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [GIT PULL] x86/kaslr for v3.14
  2014-01-28 16:25                   ` Richard Weinberger
@ 2014-01-28 16:30                     ` H. Peter Anvin
  2014-01-28 16:51                       ` Linus Torvalds
  0 siblings, 1 reply; 70+ messages in thread
From: H. Peter Anvin @ 2014-01-28 16:30 UTC (permalink / raw)
  To: Richard Weinberger, H. Peter Anvin, Ingo Molnar, Kees Cook
  Cc: Linus Torvalds, Cong Ding, Ingo Molnar,
	Linux Kernel Mailing List, Mathias Krause, Michael Davidson,
	Thomas Gleixner, Wei Yongjun

On 01/28/2014 08:25 AM, Richard Weinberger wrote:
> 
> I fear both solutions will break various scripts.
> For example scripts/markup_oops.pl looks for \[\<([a-z0-9]+)\>\].
> 

I don't think there is any way to not break anything... we're
introducing a new kind of object ("normalized kernel pointer") here.

	-hpa




^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [GIT PULL] x86/kaslr for v3.14
  2014-01-28 16:30                     ` H. Peter Anvin
@ 2014-01-28 16:51                       ` Linus Torvalds
  2014-01-28 17:05                         ` Ingo Molnar
  0 siblings, 1 reply; 70+ messages in thread
From: Linus Torvalds @ 2014-01-28 16:51 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Richard Weinberger, H. Peter Anvin, Ingo Molnar, Kees Cook,
	Cong Ding, Ingo Molnar, Linux Kernel Mailing List,
	Mathias Krause, Michael Davidson, Thomas Gleixner, Wei Yongjun

On Tue, Jan 28, 2014 at 8:30 AM, H. Peter Anvin <hpa@zytor.com> wrote:
>
> I don't think there is any way to not break anything... we're
> introducing a new kind of object ("normalized kernel pointer") here.

I suspect we could just drop the addresses entirely if we have a
symbolic version. It's not like the hex addresses are all that
interesting, and in commit messages I actually end up editing them out
just because they are worthless noise.

So we could just do "schedule+0x45/0x368" without the actual hex
address in brackets..

               Linus

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [GIT PULL] x86/kaslr for v3.14
  2014-01-28 16:51                       ` Linus Torvalds
@ 2014-01-28 17:05                         ` Ingo Molnar
  2014-01-28 17:12                           ` Linus Torvalds
  0 siblings, 1 reply; 70+ messages in thread
From: Ingo Molnar @ 2014-01-28 17:05 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: H. Peter Anvin, Richard Weinberger, H. Peter Anvin, Kees Cook,
	Cong Ding, Ingo Molnar, Linux Kernel Mailing List,
	Mathias Krause, Michael Davidson, Thomas Gleixner, Wei Yongjun


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Tue, Jan 28, 2014 at 8:30 AM, H. Peter Anvin <hpa@zytor.com> wrote:
> >
> > I don't think there is any way to not break anything... we're
> > introducing a new kind of object ("normalized kernel pointer") here.
> 
> I suspect we could just drop the addresses entirely if we have a 
> symbolic version. It's not like the hex addresses are all that 
> interesting, and in commit messages I actually end up editing them 
> out just because they are worthless noise.

Well, I often use the hex numbers to look them up and disassemble them 
in a vmlinux via gdb and 'list *0x1234123412341234' - where the 
vmlinux has no debuginfo. (Debuginfo takes longer to build so I 
generally build without it.)

I'm quite sure other kernel developers do that as well.

> So we could just do "schedule+0x45/0x368" without the actual hex 
> address in brackets..

AFAICS this won't work in a symbol-less vmlinux. Is there some trick 
to do it with gdb?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [GIT PULL] x86/kaslr for v3.14
  2014-01-28 17:05                         ` Ingo Molnar
@ 2014-01-28 17:12                           ` Linus Torvalds
  2014-01-28 17:24                             ` Richard Weinberger
  2014-01-28 19:48                             ` Ingo Molnar
  0 siblings, 2 replies; 70+ messages in thread
From: Linus Torvalds @ 2014-01-28 17:12 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: H. Peter Anvin, Richard Weinberger, H. Peter Anvin, Kees Cook,
	Cong Ding, Ingo Molnar, Linux Kernel Mailing List,
	Mathias Krause, Michael Davidson, Thomas Gleixner, Wei Yongjun

On Tue, Jan 28, 2014 at 9:05 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> Well, I often use the hex numbers to look them up and disassemble them
> in a vmlinux via gdb and 'list *0x1234123412341234' - where the
> vmlinux has no debuginfo. (Debuginfo takes longer to build so I
> generally build without it.)

Why the heck wouldn't you do that? Just do

   list schedule+0x45

instead.

> AFAICS this won't work in a symbol-less vmlinux. Is there some trick
> to do it with gdb?

Why would you have a symbol-less vmlinux? The only reason to strip
vmlinux is because you were crazy enough to build with
CONFIG_DEBUG_INFO and the damn debug info is so large that it won't
fit on your root partition. But dammit, if you build with debug_info
and then strip the end result, you're just insane. You made your build
take ten times longer, use ten times more diskspace, and then you
throw it all away. Crazy.

So I don't think the symbol-less version is worth even worrying about.
You do want to build with KALLSYMS (or whatever the config option is
called), so that the symbolic name is worth something, but once you
have the symbolc name, you're good unless you did something terminally
stupid.

Btw, we should make it harder to enable CONFIG_DEBUG_INFO. It's a
f*cking pain. It's particularly nasty when you do "make allmodconfig"
and it enables debug-info and makes the build take forever and waste
diskspace - but nobody sane actually *boots* the end result, so that
debug info is all pointless.

                  Linus

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [GIT PULL] x86/kaslr for v3.14
  2014-01-28 17:12                           ` Linus Torvalds
@ 2014-01-28 17:24                             ` Richard Weinberger
  2014-01-28 17:35                               ` Linus Torvalds
  2014-01-28 19:48                             ` Ingo Molnar
  1 sibling, 1 reply; 70+ messages in thread
From: Richard Weinberger @ 2014-01-28 17:24 UTC (permalink / raw)
  To: Linus Torvalds, Ingo Molnar
  Cc: H. Peter Anvin, H. Peter Anvin, Kees Cook, Cong Ding,
	Ingo Molnar, Linux Kernel Mailing List, Mathias Krause,
	Michael Davidson, Thomas Gleixner, Wei Yongjun

Am 28.01.2014 18:12, schrieb Linus Torvalds:
> On Tue, Jan 28, 2014 at 9:05 AM, Ingo Molnar <mingo@kernel.org> wrote:
>>
>> Well, I often use the hex numbers to look them up and disassemble them
>> in a vmlinux via gdb and 'list *0x1234123412341234' - where the
>> vmlinux has no debuginfo. (Debuginfo takes longer to build so I
>> generally build without it.)
> 
> Why the heck wouldn't you do that? Just do
> 
>    list schedule+0x45
> 
> instead.

Hmm, my gdb does not like that notion.

(gdb) list schedule+0x45
Function "schedule+0x45" not defined.

It thinks +0x45 is part of the function name, adding spaces also does not help.
Or am I doing something very stupid?

Thanks,
//richard

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [GIT PULL] x86/kaslr for v3.14
  2014-01-28 17:24                             ` Richard Weinberger
@ 2014-01-28 17:35                               ` Linus Torvalds
  2014-01-28 17:52                                 ` Richard Weinberger
  0 siblings, 1 reply; 70+ messages in thread
From: Linus Torvalds @ 2014-01-28 17:35 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Ingo Molnar, H. Peter Anvin, H. Peter Anvin, Kees Cook,
	Cong Ding, Ingo Molnar, Linux Kernel Mailing List,
	Mathias Krause, Michael Davidson, Thomas Gleixner, Wei Yongjun

On Tue, Jan 28, 2014 at 9:24 AM, Richard Weinberger <richard@nod.at> wrote:
>
> Hmm, my gdb does not like that notion.
>
> (gdb) list schedule+0x45
> Function "schedule+0x45" not defined.

I don't have a debug build, so maybe it's something specific to gdb
actually seeing type information and then being confused..

Can you do "x/10i schedule+0x45" because that definitely works for me.
But I literally detest DBUG_INFO builds, because it's useless crap.
99% of everything I debug is from people reporting problems on their
kernels, so it's not like my local debug info would match that anyway.

Anyway, if it's a type information thing due to DEBUG_INFO that
confuses gdb and makes it think that the "+0x45" part doesn't make
sense, you may need to add a cast to get gdb to ignore the type
information. IOW, does

  list (void *)schedule + 0x45

work for you?

I happen to have gdb-7.6.50 here that I tested with, but I've used
"symbol+offset" forever afaik, so it's definitely not something new.

              Linus

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [GIT PULL] x86/kaslr for v3.14
  2014-01-28 17:35                               ` Linus Torvalds
@ 2014-01-28 17:52                                 ` Richard Weinberger
  2014-01-28 17:56                                   ` Linus Torvalds
  0 siblings, 1 reply; 70+ messages in thread
From: Richard Weinberger @ 2014-01-28 17:52 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, H. Peter Anvin, H. Peter Anvin, Kees Cook,
	Cong Ding, Ingo Molnar, Linux Kernel Mailing List,
	Mathias Krause, Michael Davidson, Thomas Gleixner, Wei Yongjun

Am 28.01.2014 18:35, schrieb Linus Torvalds:
> On Tue, Jan 28, 2014 at 9:24 AM, Richard Weinberger <richard@nod.at> wrote:
>>
>> Hmm, my gdb does not like that notion.
>>
>> (gdb) list schedule+0x45
>> Function "schedule+0x45" not defined.
> 
> I don't have a debug build, so maybe it's something specific to gdb
> actually seeing type information and then being confused..
> 
> Can you do "x/10i schedule+0x45" because that definitely works for me.
> But I literally detest DBUG_INFO builds, because it's useless crap.
> 99% of everything I debug is from people reporting problems on their
> kernels, so it's not like my local debug info would match that anyway.

x/10i schedule+0x45 works.

> Anyway, if it's a type information thing due to DEBUG_INFO that
> confuses gdb and makes it think that the "+0x45" part doesn't make
> sense, you may need to add a cast to get gdb to ignore the type
> information. IOW, does
> 
>   list (void *)schedule + 0x45
> 
> work for you?

Nope.

(gdb) list (void *)schedule + 0x45
Function "(void *)schedule + 0x45" not defined.

> I happen to have gdb-7.6.50 here that I tested with, but I've used
> "symbol+offset" forever afaik, so it's definitely not something new.

I'm on 7.5.1.

Thanks,
//richard

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [GIT PULL] x86/kaslr for v3.14
  2014-01-28 17:52                                 ` Richard Weinberger
@ 2014-01-28 17:56                                   ` Linus Torvalds
  2014-01-28 18:54                                     ` Richard Weinberger
  0 siblings, 1 reply; 70+ messages in thread
From: Linus Torvalds @ 2014-01-28 17:56 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Ingo Molnar, H. Peter Anvin, H. Peter Anvin, Kees Cook,
	Cong Ding, Ingo Molnar, Linux Kernel Mailing List,
	Mathias Krause, Michael Davidson, Thomas Gleixner, Wei Yongjun

On Tue, Jan 28, 2014 at 9:52 AM, Richard Weinberger <richard@nod.at> wrote:
>
> x/10i schedule+0x45 works.

Ok, so it's just list that is braindamaged. Maybe it wants a
*(schedule+0x45) or something. I dunno. You can obviously get the hex
number from just doing "p schedule+0x45" and then do that.

Whatever. I still claim that the hex numbers in the oopses are
worthless noise. The fact that gdb has "UI issues" is an unrelated
issue and perhaps not surprising.

                Linus

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [GIT PULL] x86/kaslr for v3.14
  2014-01-28 17:56                                   ` Linus Torvalds
@ 2014-01-28 18:54                                     ` Richard Weinberger
  0 siblings, 0 replies; 70+ messages in thread
From: Richard Weinberger @ 2014-01-28 18:54 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, H. Peter Anvin, H. Peter Anvin, Kees Cook,
	Cong Ding, Ingo Molnar, Linux Kernel Mailing List,
	Mathias Krause, Michael Davidson, Thomas Gleixner, Wei Yongjun



Am 28.01.2014 18:56, schrieb Linus Torvalds:
> On Tue, Jan 28, 2014 at 9:52 AM, Richard Weinberger <richard@nod.at> wrote:
>>
>> x/10i schedule+0x45 works.
> 
> Ok, so it's just list that is braindamaged. Maybe it wants a
> *(schedule+0x45) or something. I dunno. You can obviously get the hex
> number from just doing "p schedule+0x45" and then do that.

Ah, "list *(schedule+0x45)" works.
This reminds my why I never use gdb and like addr2line so much. ;D

Thanks,
//richard

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [GIT PULL] x86/kaslr for v3.14
  2014-01-28 17:12                           ` Linus Torvalds
  2014-01-28 17:24                             ` Richard Weinberger
@ 2014-01-28 19:48                             ` Ingo Molnar
  2014-01-28 20:07                               ` Linus Torvalds
  1 sibling, 1 reply; 70+ messages in thread
From: Ingo Molnar @ 2014-01-28 19:48 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: H. Peter Anvin, Richard Weinberger, H. Peter Anvin, Kees Cook,
	Cong Ding, Ingo Molnar, Linux Kernel Mailing List,
	Mathias Krause, Michael Davidson, Thomas Gleixner, Wei Yongjun


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Tue, Jan 28, 2014 at 9:05 AM, Ingo Molnar <mingo@kernel.org> wrote:
> >
> > Well, I often use the hex numbers to look them up and disassemble them
> > in a vmlinux via gdb and 'list *0x1234123412341234' - where the
> > vmlinux has no debuginfo. (Debuginfo takes longer to build so I
> > generally build without it.)
> 
> Why the heck wouldn't you do that? Just do
> 
>    list schedule+0x45
> 
> instead.
> 
> > AFAICS this won't work in a symbol-less vmlinux. Is there some trick
> > to do it with gdb?
> 
> Why would you have a symbol-less vmlinux? The only reason to strip 
> vmlinux is because you were crazy enough to build with 

I don't think I ever stripped a vmlinux in my life, and I definitely 
didn't strip this one:

  phoenix:~/linux/linux> file vmlinux
  vmlinux: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, BuildID[sha1]=0x239d56b40bb654ddd2dd704e6b585d5c90de4e12, not stripped

> CONFIG_DEBUG_INFO and the damn debug info is so large that it won't 
> fit on your root partition. But dammit, if you build with debug_info 
> and then strip the end result, you're just insane. [...]

So, to quote myself from 2 sentences ago:

> > vmlinux has no debuginfo. (Debuginfo takes longer to build so I 
> > generally build without it.)

I really meant it when I said I build without debuginfo! :)

So, when I build a kernel, such as with a regular 'make defconfig', 
the following happens in gdb:

  Reading symbols from /home/mingo/tip/vmlinux...(no debugging symbols found)...done.
  (gdb) list schedule+0x45
  No symbol table is loaded.  Use the "file" command.

Is there a way to resolve schedule+0x45 in a regular vmlinux? It was 
an honest question.

> [...] You made your build take ten times longer, use ten times more 
> diskspace, and then you throw it all away. Crazy.

It's so crazy that I in fact try to force off debuginfo for all my 
builds, even randconfig ones:

  config DEBUG_INFO
        bool "Compile the kernel with debug info"
        depends on DEBUG_KERNEL
        # too slow build in QA
        depends on 0

> So I don't think the symbol-less version is worth even worrying 
> about. You do want to build with KALLSYMS (or whatever the config 
> option is called), so that the symbolic name is worth something, but 
> once you have the symbolc name, you're good unless you did something 
> terminally stupid.

Hm, I have kallsyms on - it's a regular 'make defconfig':

  phoenix:~/linux/linux> grep KALL .config
  CONFIG_KALLSYMS=y

I might be doing something terminally stupid.

> Btw, we should make it harder to enable CONFIG_DEBUG_INFO. [...]

It's really not set:

  phoenix:~/linux/linux> grep DEBUG_INFO .config
  # CONFIG_DEBUG_INFO is not set

> [...] It's a f*cking pain. It's particularly nasty when you do "make 
> allmodconfig" and it enables debug-info and makes the build take 
> forever and waste diskspace - but nobody sane actually *boots* the 
> end result, so that debug info is all pointless.

I actually boot (almost-)allmod and allyesconfigs - still I disable 
DEBUG_INFO because it's such a strain on our planet's climate.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [GIT PULL] x86/kaslr for v3.14
  2014-01-28 19:48                             ` Ingo Molnar
@ 2014-01-28 20:07                               ` Linus Torvalds
  2014-01-28 20:15                                 ` Borislav Petkov
  2014-01-29  8:11                                 ` Ingo Molnar
  0 siblings, 2 replies; 70+ messages in thread
From: Linus Torvalds @ 2014-01-28 20:07 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: H. Peter Anvin, Richard Weinberger, H. Peter Anvin, Kees Cook,
	Cong Ding, Ingo Molnar, Linux Kernel Mailing List,
	Mathias Krause, Michael Davidson, Thomas Gleixner, Wei Yongjun

On Tue, Jan 28, 2014 at 11:48 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> I really meant it when I said I build without debuginfo! :)

Ok, but so what?

As mentioned, nobody sane should build with DEBUG_INFO. But a normal
vmlinux file has the symbol information even without it.

> So, when I build a kernel, such as with a regular 'make defconfig',
> the following happens in gdb:
>
>   Reading symbols from /home/mingo/tip/vmlinux...(no debugging symbols found)...done.
>   (gdb) list schedule+0x45
>   No symbol table is loaded.  Use the "file" command.
>
> Is there a way to resolve schedule+0x45 in a regular vmlinux? It was
> an honest question.

That seems to be just a gdb bug (or "UI feature"), in that gdb likes
to give misleading error messages and requires odd syntax for some
things.

The symbols are there (see the first line):

  Reading symbols from /home/mingo/tip/vmlinux...(no debugging symbols
found)...done.

and that "no debugging symbols found" is just because you don't have
the extra *debug* info.

The "list" command requires debug info to work (since that's where the
line number information is), and will not work with hex symbols either
if you don't have that. So when it says "No symbol table is loaded" it
really means "no debug information is loaded".

But you can see that the symbol is perfectly fine:

  (gdb) list *(schedule+0x45)
  No symbol table is loaded.  Use the "file" command.
  (gdb) x/3i schedule+0x45
     0xffffffff81616d05 <schedule+69>: adc    0x1(%rsi),%bh
     0xffffffff81616d0b <schedule+75>: callq  0xffffffff812c0b30
<blk_flush_plug_list>
     0xffffffff81616d10 <schedule+80>: jmp    0xffffffff81616cdf <schedule+31>

(ok, so 0x45 wasn't the start of a real instruction, but you get the idea).

So my point is that the hex address doesn't give you *anything* that
the symbolic address doesn't give you. Unless you do truly crazy
things like actively strip the kernel.

              Linus

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [GIT PULL] x86/kaslr for v3.14
  2014-01-28 20:07                               ` Linus Torvalds
@ 2014-01-28 20:15                                 ` Borislav Petkov
  2014-01-28 20:25                                   ` Linus Torvalds
  2014-01-29  8:11                                 ` Ingo Molnar
  1 sibling, 1 reply; 70+ messages in thread
From: Borislav Petkov @ 2014-01-28 20:15 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, H. Peter Anvin, Richard Weinberger, H. Peter Anvin,
	Kees Cook, Cong Ding, Ingo Molnar, Linux Kernel Mailing List,
	Mathias Krause, Michael Davidson, Thomas Gleixner, Wei Yongjun

On Tue, Jan 28, 2014 at 12:07:26PM -0800, Linus Torvalds wrote:
> On Tue, Jan 28, 2014 at 11:48 AM, Ingo Molnar <mingo@kernel.org> wrote:
> >
> > I really meant it when I said I build without debuginfo! :)
> 
> Ok, but so what?
> 
> As mentioned, nobody sane should build with DEBUG_INFO.

Shouldn't we hold that down in the Kconfig help text of DEBUG_INFO?
Something like:

"You don't need to enable this if you want symbolic names for kernel
objects. Enable CONFIG_KALLSYMS instead."

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [GIT PULL] x86/kaslr for v3.14
  2014-01-28 20:15                                 ` Borislav Petkov
@ 2014-01-28 20:25                                   ` Linus Torvalds
  2014-01-28 20:28                                     ` Richard Weinberger
                                                       ` (2 more replies)
  0 siblings, 3 replies; 70+ messages in thread
From: Linus Torvalds @ 2014-01-28 20:25 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, H. Peter Anvin, Richard Weinberger, H. Peter Anvin,
	Kees Cook, Cong Ding, Ingo Molnar, Linux Kernel Mailing List,
	Mathias Krause, Michael Davidson, Thomas Gleixner, Wei Yongjun

On Tue, Jan 28, 2014 at 12:15 PM, Borislav Petkov <bp@alien8.de> wrote:
>
> Shouldn't we hold that down in the Kconfig help text of DEBUG_INFO?
> Something like:
>
> "You don't need to enable this if you want symbolic names for kernel
> objects. Enable CONFIG_KALLSYMS instead."

Probably. And then we should make sure that allyesconfig/allmodconfig
don't pick it.

There *are* reasonable uses for DEBUG_INFO:

 - if you very actively and extensively use kgdb, DEBUG_INFO is very useful.

 - distros might want to build distro kernels with DEBUG_INFO for the
kernel debug package

but for most kernel developers, DEBUG_INFO really just bloats things
and makes the build much *much* slower, for very little gain. Sure,
you get line numbers, but let's face it, it's not that hard to just do
"x/20i <address>" and trying to match it up with the generated code
instead. And since "just match it up" model works with user-reported
oopses of random kernels, you had better be able and willing to do
that *anyway*.

If most of the oopses you decode are on your own machine with your own
kernel, you might want to try to learn to be more careful when writing
code. And I'm not even kidding.

                       Linus

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [GIT PULL] x86/kaslr for v3.14
  2014-01-28 20:25                                   ` Linus Torvalds
@ 2014-01-28 20:28                                     ` Richard Weinberger
  2014-01-28 20:38                                       ` H. Peter Anvin
  2014-01-28 20:49                                     ` Borislav Petkov
  2014-01-28 21:08                                     ` Dave Jones
  2 siblings, 1 reply; 70+ messages in thread
From: Richard Weinberger @ 2014-01-28 20:28 UTC (permalink / raw)
  To: Linus Torvalds, Borislav Petkov
  Cc: Ingo Molnar, H. Peter Anvin, H. Peter Anvin, Kees Cook,
	Cong Ding, Ingo Molnar, Linux Kernel Mailing List,
	Mathias Krause, Michael Davidson, Thomas Gleixner, Wei Yongjun

Am 28.01.2014 21:25, schrieb Linus Torvalds:
> On Tue, Jan 28, 2014 at 12:15 PM, Borislav Petkov <bp@alien8.de> wrote:
>>
>> Shouldn't we hold that down in the Kconfig help text of DEBUG_INFO?
>> Something like:
>>
>> "You don't need to enable this if you want symbolic names for kernel
>> objects. Enable CONFIG_KALLSYMS instead."
> 
> Probably. And then we should make sure that allyesconfig/allmodconfig
> don't pick it.

Let's make it depend on CONFIG_EXPERT.

Thanks,
//richard

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [GIT PULL] x86/kaslr for v3.14
  2014-01-28 20:28                                     ` Richard Weinberger
@ 2014-01-28 20:38                                       ` H. Peter Anvin
  2014-01-29  8:25                                         ` Ingo Molnar
  0 siblings, 1 reply; 70+ messages in thread
From: H. Peter Anvin @ 2014-01-28 20:38 UTC (permalink / raw)
  To: Richard Weinberger, Linus Torvalds, Borislav Petkov
  Cc: Ingo Molnar, H. Peter Anvin, Kees Cook, Cong Ding, Ingo Molnar,
	Linux Kernel Mailing List, Mathias Krause, Michael Davidson,
	Thomas Gleixner, Wei Yongjun

On 01/28/2014 12:28 PM, Richard Weinberger wrote:
> Am 28.01.2014 21:25, schrieb Linus Torvalds:
>> On Tue, Jan 28, 2014 at 12:15 PM, Borislav Petkov <bp@alien8.de> wrote:
>>>
>>> Shouldn't we hold that down in the Kconfig help text of DEBUG_INFO?
>>> Something like:
>>>
>>> "You don't need to enable this if you want symbolic names for kernel
>>> objects. Enable CONFIG_KALLSYMS instead."
>>
>> Probably. And then we should make sure that allyesconfig/allmodconfig
>> don't pick it.
> 
> Let's make it depend on CONFIG_EXPERT.
> 

That doesn't solve all*config.  My scripts for all*config just overrides
it, maybe we need to do something like that by default?

	-hpa



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [GIT PULL] x86/kaslr for v3.14
  2014-01-28 20:25                                   ` Linus Torvalds
  2014-01-28 20:28                                     ` Richard Weinberger
@ 2014-01-28 20:49                                     ` Borislav Petkov
  2014-01-28 23:37                                       ` Borislav Petkov
  2014-01-28 21:08                                     ` Dave Jones
  2 siblings, 1 reply; 70+ messages in thread
From: Borislav Petkov @ 2014-01-28 20:49 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, H. Peter Anvin, Richard Weinberger, H. Peter Anvin,
	Kees Cook, Cong Ding, Ingo Molnar, Linux Kernel Mailing List,
	Mathias Krause, Michael Davidson, Thomas Gleixner, Wei Yongjun

On Tue, Jan 28, 2014 at 12:25:15PM -0800, Linus Torvalds wrote:
> Probably. And then we should make sure that allyesconfig/allmodconfig
> don't pick it.

I'd need to think about that a bit longer as scripts/kconfig/conf.c goes
and sets those. Unless someone has a better idea...

> There *are* reasonable uses for DEBUG_INFO:
> 
>  - if you very actively and extensively use kgdb, DEBUG_INFO is very useful.

Right, and at least KGDB depends on it. I was going to suggest to hide
it and make only tools select it but...

>  - distros might want to build distro kernels with DEBUG_INFO for the
> kernel debug package

Yes. SUSE uses crash to analyze dumps and for that it needs -g symbols
and other distros I'm sure too in some fashion.

> but for most kernel developers, DEBUG_INFO really just bloats things
> and makes the build much *much* slower, for very little gain.

Right, your discussion kinda made me aware of this - I hadn't realized
that KALLSYMS is enough.

> Sure, you get line numbers, but let's face it, it's not that hard
> to just do "x/20i <address>" and trying to match it up with the
> generated code instead. And since "just match it up" model works with
> user-reported oopses of random kernels, you had better be able and
> willing to do that *anyway*.

Hohumm.

> If most of the oopses you decode are on your own machine with your own
> kernel, you might want to try to learn to be more careful when writing
> code. And I'm not even kidding.

Haha.

Ok, here's the help text hunk:

---
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index dbf94a7d25a8..73b36a1624ec 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -126,7 +126,11 @@ config DEBUG_INFO
 	  This adds debug symbols to the kernel and modules (gcc -g), and
 	  is needed if you intend to use kernel crashdump or binary object
 	  tools like crash, kgdb, LKCD, gdb, etc on the kernel.
-	  Say Y here only if you plan to debug the kernel.
+
+	  If you only want to have symbols in kernel traces and are not
+	  going to need support for those tools above, you don't need
+	  to enable this as it is a huge bloat and build slowdown.
+	  Enable CONFIG_KALLSYMS instead.
 
 	  If unsure, say N.
 

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply related	[flat|nested] 70+ messages in thread

* Re: [GIT PULL] x86/kaslr for v3.14
  2014-01-28 20:25                                   ` Linus Torvalds
  2014-01-28 20:28                                     ` Richard Weinberger
  2014-01-28 20:49                                     ` Borislav Petkov
@ 2014-01-28 21:08                                     ` Dave Jones
  2014-01-29  6:36                                       ` Mike Galbraith
  2 siblings, 1 reply; 70+ messages in thread
From: Dave Jones @ 2014-01-28 21:08 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Borislav Petkov, Ingo Molnar, H. Peter Anvin, Richard Weinberger,
	H. Peter Anvin, Kees Cook, Cong Ding, Ingo Molnar,
	Linux Kernel Mailing List, Mathias Krause, Michael Davidson,
	Thomas Gleixner, Wei Yongjun

On Tue, Jan 28, 2014 at 12:25:15PM -0800, Linus Torvalds wrote:
 > On Tue, Jan 28, 2014 at 12:15 PM, Borislav Petkov <bp@alien8.de> wrote:
 > >
 > > Shouldn't we hold that down in the Kconfig help text of DEBUG_INFO?
 > > Something like:
 > >
 > > "You don't need to enable this if you want symbolic names for kernel
 > > objects. Enable CONFIG_KALLSYMS instead."
 > 
 > Probably. And then we should make sure that allyesconfig/allmodconfig
 > don't pick it.
 > 
 > There *are* reasonable uses for DEBUG_INFO:
 > 
 >  - if you very actively and extensively use kgdb, DEBUG_INFO is very useful.
 > 
 >  - distros might want to build distro kernels with DEBUG_INFO for the
 > kernel debug package

- objdump -S is kind of useful. I find myself using that quite often.
  (like at least a dozen times each merge window)

pretty sure it works with DEBUG_INFO_REDUCED though, which is somewhat
faster than full DEBUG_INFO

	Dave


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [GIT PULL] x86/kaslr for v3.14
  2014-01-28 20:49                                     ` Borislav Petkov
@ 2014-01-28 23:37                                       ` Borislav Petkov
  0 siblings, 0 replies; 70+ messages in thread
From: Borislav Petkov @ 2014-01-28 23:37 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, H. Peter Anvin, Richard Weinberger, H. Peter Anvin,
	Kees Cook, Cong Ding, Ingo Molnar, Linux Kernel Mailing List,
	Mathias Krause, Michael Davidson, Thomas Gleixner, Wei Yongjun

On Tue, Jan 28, 2014 at 09:49:06PM +0100, Borislav Petkov wrote:
> On Tue, Jan 28, 2014 at 12:25:15PM -0800, Linus Torvalds wrote:
> > Probably. And then we should make sure that allyesconfig/allmodconfig
> > don't pick it.
> 
> I'd need to think about that a bit longer as scripts/kconfig/conf.c goes
> and sets those. Unless someone has a better idea...

Maybe we can do something simple like this, it works here. We could
even generate a random name for the all.config file so that there are
no conflicts with existing stuff and even test for its presence. Or we
could hide all that in a script and so on and so on...

---
diff --git a/scripts/kconfig/Makefile b/scripts/kconfig/Makefile
index 844bc9da08da..76e889fc6384 100644
--- a/scripts/kconfig/Makefile
+++ b/scripts/kconfig/Makefile
@@ -77,7 +77,9 @@ update-po-config: $(obj)/kxgettext $(obj)/gconf.glade.h
 PHONY += allnoconfig allyesconfig allmodconfig alldefconfig randconfig
 
 allnoconfig allyesconfig allmodconfig alldefconfig randconfig: $(obj)/conf
-	$< --$@ $(Kconfig)
+	$(shell echo "# CONFIG_DEBUG_INFO is not set" > /tmp/all.config)
+	$(Q)KCONFIG_ALLCONFIG=/tmp/all.config $< --$@ $(Kconfig)
+	$(Q)rm -rf /tmp/all.config
 
 PHONY += listnewconfig olddefconfig oldnoconfig savedefconfig defconfig
 

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply related	[flat|nested] 70+ messages in thread

* Re: [GIT PULL] x86/kaslr for v3.14
  2014-01-28 21:08                                     ` Dave Jones
@ 2014-01-29  6:36                                       ` Mike Galbraith
  0 siblings, 0 replies; 70+ messages in thread
From: Mike Galbraith @ 2014-01-29  6:36 UTC (permalink / raw)
  To: Dave Jones
  Cc: Linus Torvalds, Borislav Petkov, Ingo Molnar, H. Peter Anvin,
	Richard Weinberger, H. Peter Anvin, Kees Cook, Cong Ding,
	Ingo Molnar, Linux Kernel Mailing List, Mathias Krause,
	Michael Davidson, Thomas Gleixner, Wei Yongjun

On Tue, 2014-01-28 at 16:08 -0500, Dave Jones wrote: 
> On Tue, Jan 28, 2014 at 12:25:15PM -0800, Linus Torvalds wrote:
>  > On Tue, Jan 28, 2014 at 12:15 PM, Borislav Petkov <bp@alien8.de> wrote:
>  > >
>  > > Shouldn't we hold that down in the Kconfig help text of DEBUG_INFO?
>  > > Something like:
>  > >
>  > > "You don't need to enable this if you want symbolic names for kernel
>  > > objects. Enable CONFIG_KALLSYMS instead."
>  > 
>  > Probably. And then we should make sure that allyesconfig/allmodconfig
>  > don't pick it.
>  > 
>  > There *are* reasonable uses for DEBUG_INFO:
>  > 
>  >  - if you very actively and extensively use kgdb, DEBUG_INFO is very useful.
>  > 
>  >  - distros might want to build distro kernels with DEBUG_INFO for the
>  > kernel debug package
> 
> - objdump -S is kind of useful. I find myself using that quite often.
>   (like at least a dozen times each merge window)

For those with wimpy asm muscles (/me), it's very useful indeed, as are
gdb list *foo()+0x10 or *0xfeedbabedeadbeef.  I always build with full
DEBUG_INFO, just keep configs lean enough that lots of kernels fit in a
2G /boot, and _never ever_ set a box up with a microscopic root.

> pretty sure it works with DEBUG_INFO_REDUCED though, which is somewhat
> faster than full DEBUG_INFO

Yeah, it does.

-Mike


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [GIT PULL] x86/kaslr for v3.14
  2014-01-28 20:07                               ` Linus Torvalds
  2014-01-28 20:15                                 ` Borislav Petkov
@ 2014-01-29  8:11                                 ` Ingo Molnar
  2014-01-29  8:27                                   ` Mathias Krause
  1 sibling, 1 reply; 70+ messages in thread
From: Ingo Molnar @ 2014-01-29  8:11 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: H. Peter Anvin, Richard Weinberger, H. Peter Anvin, Kees Cook,
	Cong Ding, Ingo Molnar, Linux Kernel Mailing List,
	Mathias Krause, Michael Davidson, Thomas Gleixner, Wei Yongjun


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Tue, Jan 28, 2014 at 11:48 AM, Ingo Molnar <mingo@kernel.org> wrote:
> >
> > I really meant it when I said I build without debuginfo! :)
> 
> Ok, but so what?
> 
> As mentioned, nobody sane should build with DEBUG_INFO. But a normal 
> vmlinux file has the symbol information even without it.

So, your mail sure read to me as a rant directed at me, so I thought 
I'd defend myself or something :)

I now realize that the whole episode was caused by me calling the 
vmlinux 'symbol-less':

> > > > AFAICS this won't work in a symbol-less vmlinux. Is there some 
> > > > trick to do it with gdb?

while I should have said 'debuginfo-less'. Mea culpa.

> > So, when I build a kernel, such as with a regular 'make defconfig',
> > the following happens in gdb:
> >
> >   Reading symbols from /home/mingo/tip/vmlinux...(no debugging symbols found)...done.
> >   (gdb) list schedule+0x45
> >   No symbol table is loaded.  Use the "file" command.
> >
> > Is there a way to resolve schedule+0x45 in a regular vmlinux? It 
> > was an honest question.
> 
> That seems to be just a gdb bug (or "UI feature"), in that gdb likes 
> to give misleading error messages and requires odd syntax for some 
> things.

Yeah. Almost as if they worked hard to make annoying users go away or 
something. (LLVM is IMO a blessing because, despite its somewhat 
broken licensing, it cured a similar attitude of the GCC folks. In a 
way competition is more important than licensing details!)

> But you can see that the symbol is perfectly fine:
> 
>   (gdb) list *(schedule+0x45)

Oh, cool. Thanks for that trick - this will save me quite some time in 
the future.

So we can strip absolute addresses just fine from oopses - cool.

I'd even argue to strip the hex on non-randomized kernels as long as 
there's kallsyms around, and only print hex if we don't have any 
symbols.

> So my point is that the hex address doesn't give you *anything* that 
> the symbolic address doesn't give you. [...]

Yeah, and with your trick that's now the case for my debugging as 
well, which is a nice touch.

> [...] Unless you do truly crazy things like actively strip the 
> kernel.

Being crazy is something I try to avoid. (Beyond being a maintainer
of a software project as busy and stressful as the Linux kernel that is.)

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [GIT PULL] x86/kaslr for v3.14
  2014-01-28 20:38                                       ` H. Peter Anvin
@ 2014-01-29  8:25                                         ` Ingo Molnar
  2014-01-29 10:40                                           ` Borislav Petkov
  0 siblings, 1 reply; 70+ messages in thread
From: Ingo Molnar @ 2014-01-29  8:25 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Richard Weinberger, Linus Torvalds, Borislav Petkov,
	H. Peter Anvin, Kees Cook, Cong Ding, Ingo Molnar,
	Linux Kernel Mailing List, Mathias Krause, Michael Davidson,
	Thomas Gleixner, Wei Yongjun


* H. Peter Anvin <hpa@zytor.com> wrote:

> On 01/28/2014 12:28 PM, Richard Weinberger wrote:
> > Am 28.01.2014 21:25, schrieb Linus Torvalds:
> >> On Tue, Jan 28, 2014 at 12:15 PM, Borislav Petkov <bp@alien8.de> wrote:
> >>>
> >>> Shouldn't we hold that down in the Kconfig help text of DEBUG_INFO?
> >>> Something like:
> >>>
> >>> "You don't need to enable this if you want symbolic names for kernel
> >>> objects. Enable CONFIG_KALLSYMS instead."
> >>
> >> Probably. And then we should make sure that allyesconfig/allmodconfig
> >> don't pick it.
> > 
> > Let's make it depend on CONFIG_EXPERT.
> > 
> 
> That doesn't solve all*config.  My scripts for all*config just 
> overrides it, maybe we need to do something like that by default?

For features we want to exclude from allyesconfig we can define them 
as logical negatives. (I've used this technique in the past to hide 
silly options from allyesconfig and it works well.)

So to reign in debuginfo in allyesconfig/allmodconfig builds we could 
do something like:

 config SAVE_TIME_AND_DISK_SPACE
	bool "Faster and leaner kernel build: compile without debug info"
	default y

 choice
        prompt "Choose DEBUGINFO bloat level"
	depends on !SAVE_TIME_AND_DISK_SPACE
        default DEBUG_INFO_REDUCED
        help
           This option allows to select the debuginfo model.

   config DEBUG_INFO_REDUCED
        bool "Reduced amount of debugging information"

   config DEBUG_INFO
        bool "Full DEBUG info, 'planet killer' version"

 endchoice

(or so, perhaps with slightly more PC text.)

That would default to Y and would disable debuginfo by default.

( And yeah, it's a bit ugly, but it beats having to hack the kconfig 
  language! )

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [GIT PULL] x86/kaslr for v3.14
  2014-01-29  8:11                                 ` Ingo Molnar
@ 2014-01-29  8:27                                   ` Mathias Krause
  2014-01-30  9:23                                     ` Ingo Molnar
  0 siblings, 1 reply; 70+ messages in thread
From: Mathias Krause @ 2014-01-29  8:27 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, H. Peter Anvin, Richard Weinberger,
	H. Peter Anvin, Kees Cook, Cong Ding, Ingo Molnar,
	Linux Kernel Mailing List, Michael Davidson, Thomas Gleixner,
	Wei Yongjun

On 29 January 2014 09:11, Ingo Molnar <mingo@kernel.org> wrote:
>> But you can see that the symbol is perfectly fine:
>>
>>   (gdb) list *(schedule+0x45)
>
> Oh, cool. Thanks for that trick - this will save me quite some time in
> the future.
>
> So we can strip absolute addresses just fine from oopses - cool.
>
> I'd even argue to strip the hex on non-randomized kernels as long as
> there's kallsyms around, and only print hex if we don't have any
> symbols.

Please, don't do so! I do find the hex values in the backtrace *very*
useful as I'm using 'objdump -wdr vmlinux | less' quite often to
"browse around" in the kernel binary. Grepping for addresses from a
backtrace works quite nicely this way. Having to lookup symbols and do
base-16 arithmetics in the head (or a shell, for that matter) would
only slow down this process. So, please leave the hex values in place.
They do help a lot -- at least in the non-kASLR case.

Regards,
Mathias

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [GIT PULL] x86/kaslr for v3.14
  2014-01-29  8:25                                         ` Ingo Molnar
@ 2014-01-29 10:40                                           ` Borislav Petkov
  0 siblings, 0 replies; 70+ messages in thread
From: Borislav Petkov @ 2014-01-29 10:40 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: H. Peter Anvin, Richard Weinberger, Linus Torvalds,
	H. Peter Anvin, Kees Cook, Cong Ding, Ingo Molnar,
	Linux Kernel Mailing List, Mathias Krause, Michael Davidson,
	Thomas Gleixner, Wei Yongjun

On Wed, Jan 29, 2014 at 09:25:05AM +0100, Ingo Molnar wrote:
> That would default to Y and would disable debuginfo by default.
>
> ( And yeah, it's a bit ugly, but it beats having to hack the kconfig
> language! )

Yeah, that could be another solution - I came up with another hack last
night: http://marc.info/?l=linux-kernel&m=139095226124549

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [GIT PULL] x86/kaslr for v3.14
  2014-01-29  8:27                                   ` Mathias Krause
@ 2014-01-30  9:23                                     ` Ingo Molnar
  2014-01-30 18:15                                       ` Linus Torvalds
  0 siblings, 1 reply; 70+ messages in thread
From: Ingo Molnar @ 2014-01-30  9:23 UTC (permalink / raw)
  To: Mathias Krause
  Cc: Linus Torvalds, H. Peter Anvin, Richard Weinberger,
	H. Peter Anvin, Kees Cook, Cong Ding, Ingo Molnar,
	Linux Kernel Mailing List, Michael Davidson, Thomas Gleixner,
	Wei Yongjun


* Mathias Krause <minipli@googlemail.com> wrote:

> On 29 January 2014 09:11, Ingo Molnar <mingo@kernel.org> wrote:
> >> But you can see that the symbol is perfectly fine:
> >>
> >>   (gdb) list *(schedule+0x45)
> >
> > Oh, cool. Thanks for that trick - this will save me quite some time in
> > the future.
> >
> > So we can strip absolute addresses just fine from oopses - cool.
> >
> > I'd even argue to strip the hex on non-randomized kernels as long as
> > there's kallsyms around, and only print hex if we don't have any
> > symbols.
> 
> Please, don't do so! I do find the hex values in the backtrace 
> *very* useful as I'm using 'objdump -wdr vmlinux | less' quite often 
> to "browse around" in the kernel binary. Grepping for addresses from 
> a backtrace works quite nicely this way. Having to lookup symbols 
> and do base-16 arithmetics in the head (or a shell, for that matter) 
> would only slow down this process. So, please leave the hex values 
> in place. They do help a lot -- at least in the non-kASLR case.

Well, if the consensus is that they help then we better make them 
correct in the KASLR case as well ...

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [GIT PULL] x86/kaslr for v3.14
  2014-01-30  9:23                                     ` Ingo Molnar
@ 2014-01-30 18:15                                       ` Linus Torvalds
  0 siblings, 0 replies; 70+ messages in thread
From: Linus Torvalds @ 2014-01-30 18:15 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Mathias Krause, H. Peter Anvin, Richard Weinberger,
	H. Peter Anvin, Kees Cook, Cong Ding, Ingo Molnar,
	Linux Kernel Mailing List, Michael Davidson, Thomas Gleixner,
	Wei Yongjun

On Thu, Jan 30, 2014 at 1:23 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> Well, if the consensus is that they help then we better make them
> correct in the KASLR case as well ...

In the kaslr case, the hex values cannot possibly help, since they are
meaningless due to the random offset (the external tools will *not* be
able to use them without relocation information anyway, and if the
tools can take relocation into account, they might as well just take
the symbol name into account anyway).

I'd really suggest dropping the damn thing entirely. Even if you do
use "objdump" (and I admit to doing that quite often myself), the
symbol version isn't really all that hard to use. And the thing is,
once again, the symbol version works even for kaslr, while the raw hex
does not.

I do agree that having to do the arithmetic can be annoying, but it
wouldn't actually be all that hard to write a script that turns the
"objdump" format from hex address to "symbol+offset" format. It would
probably be three lines of perl:

 - parse the "hex: rest-of-line" thing
 - if "rest-of-line" is of the format "<symbol>:", then remember the
hex and the symbol name.
 - print out "symbol+offset: rest-of-line" where "offset" is "hex -
remembered_hex"

Even I could probably do it, and my perl knowledge is really limited
to "I can edit other peoples perl code if I'm lucky". Somebody who
actually does perl probably goes "Three lines? That's a oneliner".

                        Linus

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [GIT PULL] x86/kaslr for v3.14
  2014-01-27  6:51       ` H. Peter Anvin
  2014-01-27  7:38         ` Ingo Molnar
@ 2014-01-30 22:07         ` Vivek Goyal
  2014-01-31 16:57           ` Kees Cook
  1 sibling, 1 reply; 70+ messages in thread
From: Vivek Goyal @ 2014-01-30 22:07 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Richard Weinberger, H. Peter Anvin, Linus Torvalds, Cong Ding,
	Ingo Molnar, Ingo Molnar, Kees Cook, Linux Kernel Mailing List,
	Mathias Krause, Michael Davidson, Thomas Gleixner, Wei Yongjun

On Sun, Jan 26, 2014 at 10:51:06PM -0800, H. Peter Anvin wrote:
> On 01/26/2014 10:49 PM, Richard Weinberger wrote:
> >>
> >> No, because that information is available to user space unless we panic.
> > 
> > Didn't you mean non-root?
> > I thought one has to set dmesg_restrict anyways if kASLR is used.
> > 
> > And isn't the offset available to perf too?
> > Of course only for root, but still user space.
> > 
> 
> For certain system security levels one want to protect even from a rogue
> root.  In those cases, leaking that information via dmesg and perf isn't
> going to work, either.
> 
> With lower security settings, by all means...

I am wondering if kdump functionality is impacted with this change.

Kexec tools prepares ELF headers for kernel memory ranges and for the
range where kernel text is mapped. So it needs to know virtual address
of the region where kernel is mapped (obtained by /proc/kcore) and
it gets the physical address where kernel is loaded from /proc/iomem.

So with this change are we planning to hide kernel text virtual address and
physical address information information from root in /proc/kcore and
/proc/iomem in anyway?

Thanks
Vivek

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [GIT PULL] x86/kaslr for v3.14
  2014-01-30 22:07         ` Vivek Goyal
@ 2014-01-31 16:57           ` Kees Cook
  2014-02-07 14:49             ` Vivek Goyal
  0 siblings, 1 reply; 70+ messages in thread
From: Kees Cook @ 2014-01-31 16:57 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: H. Peter Anvin, Richard Weinberger, H. Peter Anvin,
	Linus Torvalds, Cong Ding, Ingo Molnar, Ingo Molnar,
	Linux Kernel Mailing List, Mathias Krause, Michael Davidson,
	Thomas Gleixner, Wei Yongjun

On Thu, Jan 30, 2014 at 2:07 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Sun, Jan 26, 2014 at 10:51:06PM -0800, H. Peter Anvin wrote:
>> On 01/26/2014 10:49 PM, Richard Weinberger wrote:
>> >>
>> >> No, because that information is available to user space unless we panic.
>> >
>> > Didn't you mean non-root?
>> > I thought one has to set dmesg_restrict anyways if kASLR is used.
>> >
>> > And isn't the offset available to perf too?
>> > Of course only for root, but still user space.
>> >
>>
>> For certain system security levels one want to protect even from a rogue
>> root.  In those cases, leaking that information via dmesg and perf isn't
>> going to work, either.
>>
>> With lower security settings, by all means...
>
> I am wondering if kdump functionality is impacted with this change.
>
> Kexec tools prepares ELF headers for kernel memory ranges and for the
> range where kernel text is mapped. So it needs to know virtual address
> of the region where kernel is mapped (obtained by /proc/kcore) and
> it gets the physical address where kernel is loaded from /proc/iomem.
>
> So with this change are we planning to hide kernel text virtual address and
> physical address information information from root in /proc/kcore and
> /proc/iomem in anyway?

I have no intention of that. Mentioned earlier in the thread, hiding
it from root will be pretty ugly/hard/pointless:
https://lkml.org/lkml/2014/1/27/287
I would like to just keep the offset out of dmesg.

-Kees

-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [GIT PULL] x86/kaslr for v3.14
  2014-01-31 16:57           ` Kees Cook
@ 2014-02-07 14:49             ` Vivek Goyal
  2014-02-07 16:04               ` H. Peter Anvin
  2014-02-07 19:07               ` H. Peter Anvin
  0 siblings, 2 replies; 70+ messages in thread
From: Vivek Goyal @ 2014-02-07 14:49 UTC (permalink / raw)
  To: Kees Cook
  Cc: H. Peter Anvin, Richard Weinberger, H. Peter Anvin,
	Linus Torvalds, Cong Ding, Ingo Molnar, Ingo Molnar,
	Linux Kernel Mailing List, Mathias Krause, Michael Davidson,
	Thomas Gleixner, Wei Yongjun, Dave Young

On Fri, Jan 31, 2014 at 08:57:03AM -0800, Kees Cook wrote:

[..]
> I have no intention of that. Mentioned earlier in the thread, hiding
> it from root will be pretty ugly/hard/pointless:
> https://lkml.org/lkml/2014/1/27/287
> I would like to just keep the offset out of dmesg.

[ CC Dave Young ]

Hi Kees,

Dave Young is testing kdump with kaslr enabled. He is facing some issues.

One issue he mentioned is that when second kernel boots, it might be
placed in an area which is outside the reserved area for second kernel.

We reserve a certain memory for second kernel. And modify memory map of
second kernel using memmap=exactmap parameter. Looks like kernel placement
is happening before memmap=exactmap takes effect. And that seems to be
the reason that second kernel can be placed outside the reserved memory.

IOW, memmap=exactmap and kaslr don't work together. Is it possible to
first let memmap=exactmap take affect and then kaslr does its job. Or it
is too late by the time memmap=exactmap is parsed.

As a workaround, Dave is currently using "nokaslr" command line parameter
for second kernel. He is still facing issues where makedumpfile segment
faults. He is looking into it further.

I thought I will atleast bring up with issue of memmap=exactmap and kaslr
being incompatible.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [GIT PULL] x86/kaslr for v3.14
  2014-02-07 14:49             ` Vivek Goyal
@ 2014-02-07 16:04               ` H. Peter Anvin
  2014-02-07 16:24                 ` Vivek Goyal
  2014-02-07 19:07               ` H. Peter Anvin
  1 sibling, 1 reply; 70+ messages in thread
From: H. Peter Anvin @ 2014-02-07 16:04 UTC (permalink / raw)
  To: Vivek Goyal, Kees Cook
  Cc: H. Peter Anvin, Richard Weinberger, Linus Torvalds, Cong Ding,
	Ingo Molnar, Ingo Molnar, Linux Kernel Mailing List,
	Mathias Krause, Michael Davidson, Thomas Gleixner, Wei Yongjun,
	Dave Young

On 02/07/2014 06:49 AM, Vivek Goyal wrote:
> 
> Hi Kees,
> 
> Dave Young is testing kdump with kaslr enabled. He is facing some issues.
> 
> One issue he mentioned is that when second kernel boots, it might be
> placed in an area which is outside the reserved area for second kernel.
> 
> We reserve a certain memory for second kernel. And modify memory map of
> second kernel using memmap=exactmap parameter. Looks like kernel placement
> is happening before memmap=exactmap takes effect. And that seems to be
> the reason that second kernel can be placed outside the reserved memory.
> 
> IOW, memmap=exactmap and kaslr don't work together. Is it possible to
> first let memmap=exactmap take affect and then kaslr does its job. Or it
> is too late by the time memmap=exactmap is parsed.
> 
> As a workaround, Dave is currently using "nokaslr" command line parameter
> for second kernel. He is still facing issues where makedumpfile segment
> faults. He is looking into it further.
> 
> I thought I will atleast bring up with issue of memmap=exactmap and kaslr
> being incompatible.
> 

Yes, because memmap=exactmap gets parsed too late; kaslr assumes that
the e820 information passed to it is actually correct.

Yet another cause of breakage caused by the decision on the part of
kdump to rely on command-line options.

	-hpa



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [GIT PULL] x86/kaslr for v3.14
  2014-02-07 16:04               ` H. Peter Anvin
@ 2014-02-07 16:24                 ` Vivek Goyal
  2014-02-07 23:16                   ` Dave Young
  0 siblings, 1 reply; 70+ messages in thread
From: Vivek Goyal @ 2014-02-07 16:24 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Kees Cook, H. Peter Anvin, Richard Weinberger, Linus Torvalds,
	Cong Ding, Ingo Molnar, Ingo Molnar, Linux Kernel Mailing List,
	Mathias Krause, Michael Davidson, Thomas Gleixner, Wei Yongjun,
	Dave Young, Kexec Mailing List, Yinghai Lu

On Fri, Feb 07, 2014 at 08:04:13AM -0800, H. Peter Anvin wrote:
> On 02/07/2014 06:49 AM, Vivek Goyal wrote:
> > 
> > Hi Kees,
> > 
> > Dave Young is testing kdump with kaslr enabled. He is facing some issues.
> > 
> > One issue he mentioned is that when second kernel boots, it might be
> > placed in an area which is outside the reserved area for second kernel.
> > 
> > We reserve a certain memory for second kernel. And modify memory map of
> > second kernel using memmap=exactmap parameter. Looks like kernel placement
> > is happening before memmap=exactmap takes effect. And that seems to be
> > the reason that second kernel can be placed outside the reserved memory.
> > 
> > IOW, memmap=exactmap and kaslr don't work together. Is it possible to
> > first let memmap=exactmap take affect and then kaslr does its job. Or it
> > is too late by the time memmap=exactmap is parsed.
> > 
> > As a workaround, Dave is currently using "nokaslr" command line parameter
> > for second kernel. He is still facing issues where makedumpfile segment
> > faults. He is looking into it further.
> > 
> > I thought I will atleast bring up with issue of memmap=exactmap and kaslr
> > being incompatible.
> > 
> 
> Yes, because memmap=exactmap gets parsed too late; kaslr assumes that
> the e820 information passed to it is actually correct.
> 
> Yet another cause of breakage caused by the decision on the part of
> kdump to rely on command-line options.

[CC kexec mailing list]

Ok, I think this is high time we change kexec-tools to not use
memmap=exactmap and start passing modified memory map in bootparams. I
think only concern with that change was backward compatibility of
kexec-tools with older kernels.

IIUC, only thing which will be impacted by this change is users of
saved_max_pfn which determine the highest accessible pfn in first
kernel. Some calgary IOMMU code seems to be the only user of it now.

So may be we can create a new command line option say --pass-memmap-cmdline
to kexec-tools which forces old behavior and by default we pass memmap
in bootparams.

Or create --do-not-pass-memmap-cmdline and new users will use it. Default
will be old behavior.

Thansk
Vivek

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [GIT PULL] x86/kaslr for v3.14
  2014-02-07 14:49             ` Vivek Goyal
  2014-02-07 16:04               ` H. Peter Anvin
@ 2014-02-07 19:07               ` H. Peter Anvin
  2014-02-07 19:44                 ` Kees Cook
  1 sibling, 1 reply; 70+ messages in thread
From: H. Peter Anvin @ 2014-02-07 19:07 UTC (permalink / raw)
  To: Vivek Goyal, Kees Cook
  Cc: H. Peter Anvin, Richard Weinberger, Linus Torvalds, Cong Ding,
	Ingo Molnar, Ingo Molnar, Linux Kernel Mailing List,
	Mathias Krause, Michael Davidson, Thomas Gleixner, Wei Yongjun,
	Dave Young

On 02/07/2014 06:49 AM, Vivek Goyal wrote:
> 
> As a workaround, Dave is currently using "nokaslr" command line parameter
> for second kernel. He is still facing issues where makedumpfile segment
> faults. He is looking into it further.
> 

Now, let's state this: kaslr for kdump is almost certainly useless (the
amount of reserved memory is not enough to provide any meaningful
randomization, so any randomization needs to happen during the memory
reservation phase.)  So disabling kaslr in the kdump kernel is entirely
appropriate.

	-hpa



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [GIT PULL] x86/kaslr for v3.14
  2014-02-07 19:07               ` H. Peter Anvin
@ 2014-02-07 19:44                 ` Kees Cook
  0 siblings, 0 replies; 70+ messages in thread
From: Kees Cook @ 2014-02-07 19:44 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Vivek Goyal, H. Peter Anvin, Richard Weinberger, Linus Torvalds,
	Cong Ding, Ingo Molnar, Ingo Molnar, Linux Kernel Mailing List,
	Mathias Krause, Michael Davidson, Thomas Gleixner, Wei Yongjun,
	Dave Young, Kexec Mailing List

On Fri, Feb 7, 2014 at 11:07 AM, H. Peter Anvin <hpa@linux.intel.com> wrote:
> On 02/07/2014 06:49 AM, Vivek Goyal wrote:
>>
>> As a workaround, Dave is currently using "nokaslr" command line parameter
>> for second kernel. He is still facing issues where makedumpfile segment
>> faults. He is looking into it further.
>>
>
> Now, let's state this: kaslr for kdump is almost certainly useless (the
> amount of reserved memory is not enough to provide any meaningful
> randomization, so any randomization needs to happen during the memory
> reservation phase.)  So disabling kaslr in the kdump kernel is entirely
> appropriate.

Peter covered everything already, but yeah, kaslr and kdump may not
make a lot of sense together, but regardless, yes, it only examines
e820 for memory space. It has to do all this work before the kernel
decompresses, so it's very early.

-Kees

-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [GIT PULL] x86/kaslr for v3.14
  2014-02-07 16:24                 ` Vivek Goyal
@ 2014-02-07 23:16                   ` Dave Young
  2014-02-07 23:20                     ` H. Peter Anvin
  0 siblings, 1 reply; 70+ messages in thread
From: Dave Young @ 2014-02-07 23:16 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: H. Peter Anvin, Kees Cook, Cong Ding, Richard Weinberger,
	Kexec Mailing List, Linux Kernel Mailing List, Mathias Krause,
	Michael Davidson, Wei Yongjun, Thomas Gleixner, H. Peter Anvin,
	Yinghai Lu, Ingo Molnar, Linus Torvalds, Ingo Molnar

On 02/07/14 at 11:24am, Vivek Goyal wrote:
> On Fri, Feb 07, 2014 at 08:04:13AM -0800, H. Peter Anvin wrote:
> > On 02/07/2014 06:49 AM, Vivek Goyal wrote:
> > > 
> > > Hi Kees,
> > > 
> > > Dave Young is testing kdump with kaslr enabled. He is facing some issues.
> > > 
> > > One issue he mentioned is that when second kernel boots, it might be
> > > placed in an area which is outside the reserved area for second kernel.
> > > 
> > > We reserve a certain memory for second kernel. And modify memory map of
> > > second kernel using memmap=exactmap parameter. Looks like kernel placement
> > > is happening before memmap=exactmap takes effect. And that seems to be
> > > the reason that second kernel can be placed outside the reserved memory.
> > > 
> > > IOW, memmap=exactmap and kaslr don't work together. Is it possible to
> > > first let memmap=exactmap take affect and then kaslr does its job. Or it
> > > is too late by the time memmap=exactmap is parsed.
> > > 
> > > As a workaround, Dave is currently using "nokaslr" command line parameter
> > > for second kernel. He is still facing issues where makedumpfile segment
> > > faults. He is looking into it further.
> > > 
> > > I thought I will atleast bring up with issue of memmap=exactmap and kaslr
> > > being incompatible.
> > > 
> > 
> > Yes, because memmap=exactmap gets parsed too late; kaslr assumes that
> > the e820 information passed to it is actually correct.
> > 
> > Yet another cause of breakage caused by the decision on the part of
> > kdump to rely on command-line options.
> 
> [CC kexec mailing list]
> 
> Ok, I think this is high time we change kexec-tools to not use
> memmap=exactmap and start passing modified memory map in bootparams. I
> think only concern with that change was backward compatibility of
> kexec-tools with older kernels.
> 
> IIUC, only thing which will be impacted by this change is users of
> saved_max_pfn which determine the highest accessible pfn in first
> kernel. Some calgary IOMMU code seems to be the only user of it now.
> 
> So may be we can create a new command line option say --pass-memmap-cmdline
> to kexec-tools which forces old behavior and by default we pass memmap
> in bootparams.
> 
> Or create --do-not-pass-memmap-cmdline and new users will use it. Default
> will be old behavior.

Hi Vivek,

Chaowang <chaowang@redhat.com> is looking into passing setup_data SETUP_E820_EXT
instead of using exactmap. Previously Thomas Renninger tried passing them in e820.
I did not find the old thread, but I remember it's not enough because of the
128 entries limitation. 

In case for kaslr, I agree that it do not make sense for kdump kernel. IMHO it's
better to disable it automaticlly for kdump kernel but we can use the nokaslr
as workaround for now. 

Thanks
Dave

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [GIT PULL] x86/kaslr for v3.14
  2014-02-07 23:16                   ` Dave Young
@ 2014-02-07 23:20                     ` H. Peter Anvin
  2014-02-07 23:28                       ` Dave Young
  0 siblings, 1 reply; 70+ messages in thread
From: H. Peter Anvin @ 2014-02-07 23:20 UTC (permalink / raw)
  To: Dave Young, Vivek Goyal
  Cc: H. Peter Anvin, Kees Cook, Cong Ding, Richard Weinberger,
	Kexec Mailing List, Linux Kernel Mailing List, Mathias Krause,
	Michael Davidson, Wei Yongjun, Thomas Gleixner, Yinghai Lu,
	Ingo Molnar, Linus Torvalds, Ingo Molnar

On 02/07/2014 03:16 PM, Dave Young wrote:
> 
> Hi Vivek,
> 
> Chaowang <chaowang@redhat.com> is looking into passing setup_data SETUP_E820_EXT
> instead of using exactmap. Previously Thomas Renninger tried passing them in e820.
> I did not find the old thread, but I remember it's not enough because of the
> 128 entries limitation. 
> 

THERE IS NO SUCH LIMITATION.

Only 128 slots fit in struct boot_params, but additional entries can be
added to the setup_data linked list.

	-hpa


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [GIT PULL] x86/kaslr for v3.14
  2014-02-07 23:20                     ` H. Peter Anvin
@ 2014-02-07 23:28                       ` Dave Young
  0 siblings, 0 replies; 70+ messages in thread
From: Dave Young @ 2014-02-07 23:28 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Vivek Goyal, H. Peter Anvin, Kees Cook, Cong Ding,
	Richard Weinberger, Kexec Mailing List,
	Linux Kernel Mailing List, Mathias Krause, Michael Davidson,
	Wei Yongjun, Thomas Gleixner, Yinghai Lu, Ingo Molnar,
	Linus Torvalds, Ingo Molnar

On 02/07/14 at 03:20pm, H. Peter Anvin wrote:
> On 02/07/2014 03:16 PM, Dave Young wrote:
> > 
> > Hi Vivek,
> > 
> > Chaowang <chaowang@redhat.com> is looking into passing setup_data SETUP_E820_EXT
> > instead of using exactmap. Previously Thomas Renninger tried passing them in e820.
> > I did not find the old thread, but I remember it's not enough because of the
> > 128 entries limitation. 
> > 
> 
> THERE IS NO SUCH LIMITATION.
> 
> Only 128 slots fit in struct boot_params, but additional entries can be
> added to the setup_data linked list.

Ok, I did not describe it right, I actually means the space limit. Thanks for
clarifying. 

^ permalink raw reply	[flat|nested] 70+ messages in thread

end of thread, other threads:[~2014-02-07 23:28 UTC | newest]

Thread overview: 70+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-01-20 16:47 [GIT PULL] x86/kaslr for v3.14 H. Peter Anvin
2014-01-20 22:54 ` Linus Torvalds
2014-01-20 23:00   ` H. Peter Anvin
2014-01-20 23:12   ` Linus Torvalds
2014-01-20 23:13     ` H. Peter Anvin
2014-01-21  9:00       ` Peter Zijlstra
2014-01-21 14:20         ` H. Peter Anvin
2014-01-21 14:39           ` Ingo Molnar
2014-01-21 14:51             ` H. Peter Anvin
2014-01-21 14:56               ` Ingo Molnar
2014-01-21 18:37           ` Kees Cook
2014-01-21 10:27     ` Ingo Molnar
2014-01-21 13:55       ` H. Peter Anvin
2014-01-21 14:03         ` Ingo Molnar
2014-01-21 14:05           ` H. Peter Anvin
2014-01-21 14:14             ` Ingo Molnar
2014-01-21 14:17               ` H. Peter Anvin
2014-01-21  5:18   ` Kees Cook
2014-01-23  9:39   ` Pavel Machek
2014-01-26 10:16 ` Richard Weinberger
2014-01-27  5:33   ` H. Peter Anvin
2014-01-27  6:49     ` Richard Weinberger
2014-01-27  6:51       ` H. Peter Anvin
2014-01-27  7:38         ` Ingo Molnar
2014-01-27  7:43           ` Ingo Molnar
2014-01-27  7:59           ` Richard Weinberger
2014-01-30 22:07         ` Vivek Goyal
2014-01-31 16:57           ` Kees Cook
2014-02-07 14:49             ` Vivek Goyal
2014-02-07 16:04               ` H. Peter Anvin
2014-02-07 16:24                 ` Vivek Goyal
2014-02-07 23:16                   ` Dave Young
2014-02-07 23:20                     ` H. Peter Anvin
2014-02-07 23:28                       ` Dave Young
2014-02-07 19:07               ` H. Peter Anvin
2014-02-07 19:44                 ` Kees Cook
2014-01-27  6:52       ` H. Peter Anvin
2014-01-27  7:34         ` Richard Weinberger
2014-01-27 17:05       ` Kees Cook
2014-01-27 17:20         ` Richard Weinberger
2014-01-27 17:24           ` Kees Cook
2014-01-28  6:28             ` Ingo Molnar
2014-01-28  8:25               ` Richard Weinberger
2014-01-28 15:55                 ` H. Peter Anvin
2014-01-28 16:25                   ` Richard Weinberger
2014-01-28 16:30                     ` H. Peter Anvin
2014-01-28 16:51                       ` Linus Torvalds
2014-01-28 17:05                         ` Ingo Molnar
2014-01-28 17:12                           ` Linus Torvalds
2014-01-28 17:24                             ` Richard Weinberger
2014-01-28 17:35                               ` Linus Torvalds
2014-01-28 17:52                                 ` Richard Weinberger
2014-01-28 17:56                                   ` Linus Torvalds
2014-01-28 18:54                                     ` Richard Weinberger
2014-01-28 19:48                             ` Ingo Molnar
2014-01-28 20:07                               ` Linus Torvalds
2014-01-28 20:15                                 ` Borislav Petkov
2014-01-28 20:25                                   ` Linus Torvalds
2014-01-28 20:28                                     ` Richard Weinberger
2014-01-28 20:38                                       ` H. Peter Anvin
2014-01-29  8:25                                         ` Ingo Molnar
2014-01-29 10:40                                           ` Borislav Petkov
2014-01-28 20:49                                     ` Borislav Petkov
2014-01-28 23:37                                       ` Borislav Petkov
2014-01-28 21:08                                     ` Dave Jones
2014-01-29  6:36                                       ` Mike Galbraith
2014-01-29  8:11                                 ` Ingo Molnar
2014-01-29  8:27                                   ` Mathias Krause
2014-01-30  9:23                                     ` Ingo Molnar
2014-01-30 18:15                                       ` Linus Torvalds

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).