LKML Archive on lore.kernel.org
 help / Atom feed
* [PATCH v3 0/7] x86, boot: clean up kasl
@ 2015-03-07 22:07 Yinghai Lu
  2015-03-07 22:07 ` [PATCH v3 1/7] x86, kaslr: Use init_size instead of run_size Yinghai Lu
                   ` (7 more replies)
  0 siblings, 8 replies; 54+ messages in thread
From: Yinghai Lu @ 2015-03-07 22:07 UTC (permalink / raw)
  To: Matt Fleming, H. Peter Anvin, Ingo Molnar, Kees Cook,
	Borislav Petkov, Baoquan He
  Cc: Thomas Gleixner, Jiri Kosina, linux-kernel, linux-efi, Yinghai Lu

First 3 patches make ZO (arch/x86/boot/compressed/vmlinux) data region is not
overwritten by VO (vmlinux) after decompress.  So could pass data from ZO to VO.

The 4th one is fixing kaslr_enabled accessing. Old code is using address
as value wrongly.

Last 3 patches are the base for kaslr supporting kernel above 4G.
create new ident mapping for kasl 64bit, so we can cover
   above 4G random kernel base, also don't need to track pagetable
   for 64bit bootloader (patched grub2 or kexec).
   that will make mem_avoid handling simple.

Please put first 4 patches into tip/x86/urgent to v4.0

Last 3 patches should go to tip/x86/kasl and to v4.1, but you may need to
pull x86/urgent to x86/kasl, as them depends on first 4 patches.

He could rebase his patches about kasl on top those patches.

git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git for-x86-4.0-rc2-aslr

Thanks

Yinghai Lu

Yinghai Lu (7):
  x86, kaslr: Use init_size instead of run_size
  x86, boot: Move ZO to end of buffer
  x86, boot: Don't overlap VO with ZO data
  x86, kaslr: Access the correct kaslr_enabled variable
  x86, kaslr: Consolidate mem_avoid array filling
  x86, boot: Split kernel_ident_mapping_init to another file
  x86, kaslr, 64bit: Set new or extra ident_mapping

 arch/x86/boot/Makefile                 |  2 +-
 arch/x86/boot/compressed/Makefile      |  4 +-
 arch/x86/boot/compressed/aslr.c        | 48 ++++++++++++-----
 arch/x86/boot/compressed/head_32.S     | 16 ++++--
 arch/x86/boot/compressed/head_64.S     | 17 +++---
 arch/x86/boot/compressed/misc.c        | 15 +++---
 arch/x86/boot/compressed/misc.h        |  4 +-
 arch/x86/boot/compressed/misc_pgt.c    | 98 ++++++++++++++++++++++++++++++++++
 arch/x86/boot/compressed/mkpiggy.c     | 16 ++----
 arch/x86/boot/compressed/vmlinux.lds.S |  2 +
 arch/x86/boot/header.S                 |  9 ++--
 arch/x86/include/asm/boot.h            | 19 +++++++
 arch/x86/include/asm/page.h            |  5 ++
 arch/x86/kernel/asm-offsets.c          |  1 +
 arch/x86/kernel/setup.c                | 13 ++++-
 arch/x86/kernel/vmlinux.lds.S          |  1 +
 arch/x86/mm/ident_map.c                | 74 +++++++++++++++++++++++++
 arch/x86/mm/init_64.c                  | 74 +------------------------
 arch/x86/tools/calc_run_size.sh        | 42 ---------------
 19 files changed, 288 insertions(+), 172 deletions(-)
 create mode 100644 arch/x86/boot/compressed/misc_pgt.c
 create mode 100644 arch/x86/mm/ident_map.c
 delete mode 100644 arch/x86/tools/calc_run_size.sh

-- 
1.8.4.5


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v3 1/7] x86, kaslr: Use init_size instead of run_size
  2015-03-07 22:07 [PATCH v3 0/7] x86, boot: clean up kasl Yinghai Lu
@ 2015-03-07 22:07 ` Yinghai Lu
  2015-03-09 12:49   ` Borislav Petkov
                     ` (2 more replies)
  2015-03-07 22:07 ` [PATCH v3 2/7] x86, boot: Move ZO to end of buffer Yinghai Lu
                   ` (6 subsequent siblings)
  7 siblings, 3 replies; 54+ messages in thread
From: Yinghai Lu @ 2015-03-07 22:07 UTC (permalink / raw)
  To: Matt Fleming, H. Peter Anvin, Ingo Molnar, Kees Cook,
	Borislav Petkov, Baoquan He
  Cc: Thomas Gleixner, Jiri Kosina, linux-kernel, linux-efi,
	Yinghai Lu, Josh Triplett, Andrew Morton, Ard Biesheuvel,
	Junjie Mao

commit e6023367d779 ("x86, kaslr: Prevent .bss from overlaping initrd")
introduced one run_size for kaslr.
We should use real runtime size (include copy/decompress) aka init_size.

run_size is VO (vmlinux) init size include bss and brk.
init_size is the size needed for decompress and it is bigger than run_size
when decompress need more buff.

According to arch/x86/boot/header.S:
| #define ZO_INIT_SIZE    (ZO__end - ZO_startup_32 + ZO_z_extract_offset)
| #define VO_INIT_SIZE    (VO__end - VO__text)
| #if ZO_INIT_SIZE > VO_INIT_SIZE
| #define INIT_SIZE ZO_INIT_SIZE
| #else
| #define INIT_SIZE VO_INIT_SIZE
| #endif
| init_size:              .long INIT_SIZE         # kernel initialization size

Bootloader allocate buffer according to init_size in hdr, and load the
ZO (arch/x86/boot/compressed/vmlinux) from start of that buffer.
init_size first come from VO (vmlinux) init size. That VO init size
is from VO _end to VO _end and include VO bss and brk area.

During running of ZO, ZO move itself to the middle of buffer at
z_extract_offset to make sure that decompressor would not have output
overwrite input data before input data get consumed.
But z_extract_offset calculating is based on size of VO (vmlinux) and size
of compressed VO only at first.
So need to make sure [z_extra_offset, init_size) will fit ZO, that means
init_size need to be adjusted according to ZO size.
That make init_size is always >= run_size.

During aslr buffer searching, we need to make sure the new buffer is big
enough for decompress at first. So use init_size instead, and kill not
needed run_size related code.

Fixes: e6023367d779 ("x86, kaslr: Prevent .bss from overlaping initrd")
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Matt Fleming <matt.fleming@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Junjie Mao <eternal.n08@gmail.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/boot/compressed/Makefile  |  4 +---
 arch/x86/boot/compressed/head_32.S |  5 ++---
 arch/x86/boot/compressed/head_64.S |  5 +----
 arch/x86/boot/compressed/misc.c    | 15 +++++++-------
 arch/x86/boot/compressed/mkpiggy.c |  9 ++------
 arch/x86/tools/calc_run_size.sh    | 42 --------------------------------------
 6 files changed, 13 insertions(+), 67 deletions(-)
 delete mode 100644 arch/x86/tools/calc_run_size.sh

diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index 0a291cd..70cc92c 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -92,10 +92,8 @@ suffix-$(CONFIG_KERNEL_XZ)	:= xz
 suffix-$(CONFIG_KERNEL_LZO) 	:= lzo
 suffix-$(CONFIG_KERNEL_LZ4) 	:= lz4
 
-RUN_SIZE = $(shell $(OBJDUMP) -h vmlinux | \
-	     $(CONFIG_SHELL) $(srctree)/arch/x86/tools/calc_run_size.sh)
 quiet_cmd_mkpiggy = MKPIGGY $@
-      cmd_mkpiggy = $(obj)/mkpiggy $< $(RUN_SIZE) > $@ || ( rm -f $@ ; false )
+      cmd_mkpiggy = $(obj)/mkpiggy $< > $@ || ( rm -f $@ ; false )
 
 targets += piggy.S
 $(obj)/piggy.S: $(obj)/vmlinux.bin.$(suffix-y) $(obj)/mkpiggy FORCE
diff --git a/arch/x86/boot/compressed/head_32.S b/arch/x86/boot/compressed/head_32.S
index 1d7fbbc..cbed140 100644
--- a/arch/x86/boot/compressed/head_32.S
+++ b/arch/x86/boot/compressed/head_32.S
@@ -207,8 +207,7 @@ relocated:
  * Do the decompression, and jump to the new kernel..
  */
 				/* push arguments for decompress_kernel: */
-	pushl	$z_run_size	/* size of kernel with .bss and .brk */
-	pushl	$z_output_len	/* decompressed length, end of relocs */
+	pushl	$z_output_len	/* decompressed length */
 	leal	z_extract_offset_negative(%ebx), %ebp
 	pushl	%ebp		/* output address */
 	pushl	$z_input_len	/* input_len */
@@ -218,7 +217,7 @@ relocated:
 	pushl	%eax		/* heap area */
 	pushl	%esi		/* real mode pointer */
 	call	decompress_kernel /* returns kernel location in %eax */
-	addl	$28, %esp
+	addl	$24, %esp
 
 /*
  * Jump to the decompressed kernel.
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 6b1766c..2884e0c 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -402,16 +402,13 @@ relocated:
  * Do the decompression, and jump to the new kernel..
  */
 	pushq	%rsi			/* Save the real mode argument */
-	movq	$z_run_size, %r9	/* size of kernel with .bss and .brk */
-	pushq	%r9
 	movq	%rsi, %rdi		/* real mode address */
 	leaq	boot_heap(%rip), %rsi	/* malloc area for uncompression */
 	leaq	input_data(%rip), %rdx  /* input_data */
 	movl	$z_input_len, %ecx	/* input_len */
 	movq	%rbp, %r8		/* output target address */
-	movq	$z_output_len, %r9	/* decompressed length, end of relocs */
+	movq	$z_output_len, %r9	/* decompressed length */
 	call	decompress_kernel	/* returns kernel location in %rax */
-	popq	%r9
 	popq	%rsi
 
 /*
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 5903089..51e9e54 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -370,10 +370,10 @@ asmlinkage __visible void *decompress_kernel(void *rmode, memptr heap,
 				  unsigned char *input_data,
 				  unsigned long input_len,
 				  unsigned char *output,
-				  unsigned long output_len,
-				  unsigned long run_size)
+				  unsigned long output_len)
 {
 	unsigned char *output_orig = output;
+	unsigned long init_size;
 
 	real_mode = rmode;
 
@@ -396,15 +396,14 @@ asmlinkage __visible void *decompress_kernel(void *rmode, memptr heap,
 	free_mem_ptr     = heap;	/* Heap */
 	free_mem_end_ptr = heap + BOOT_HEAP_SIZE;
 
+	init_size = real_mode->hdr.init_size;
+
 	/*
-	 * The memory hole needed for the kernel is the larger of either
-	 * the entire decompressed kernel plus relocation table, or the
-	 * entire decompressed kernel plus .bss and .brk sections.
+	 * The memory hole needed for the kernel is init_size for running
+	 * and init_size is bigger than output_len always.
 	 */
 	output = choose_kernel_location(real_mode, input_data, input_len,
-					output,
-					output_len > run_size ? output_len
-							      : run_size);
+					output, init_size);
 
 	/* Validate memory location choices. */
 	if ((unsigned long)output & (MIN_KERNEL_ALIGN - 1))
diff --git a/arch/x86/boot/compressed/mkpiggy.c b/arch/x86/boot/compressed/mkpiggy.c
index d8222f2..b669ab6 100644
--- a/arch/x86/boot/compressed/mkpiggy.c
+++ b/arch/x86/boot/compressed/mkpiggy.c
@@ -36,13 +36,11 @@ int main(int argc, char *argv[])
 	uint32_t olen;
 	long ilen;
 	unsigned long offs;
-	unsigned long run_size;
 	FILE *f = NULL;
 	int retval = 1;
 
-	if (argc < 3) {
-		fprintf(stderr, "Usage: %s compressed_file run_size\n",
-				argv[0]);
+	if (argc < 2) {
+		fprintf(stderr, "Usage: %s compressed_file\n", argv[0]);
 		goto bail;
 	}
 
@@ -76,7 +74,6 @@ int main(int argc, char *argv[])
 	offs += olen >> 12;	/* Add 8 bytes for each 32K block */
 	offs += 64*1024 + 128;	/* Add 64K + 128 bytes slack */
 	offs = (offs+4095) & ~4095; /* Round to a 4K boundary */
-	run_size = atoi(argv[2]);
 
 	printf(".section \".rodata..compressed\",\"a\",@progbits\n");
 	printf(".globl z_input_len\n");
@@ -88,8 +85,6 @@ int main(int argc, char *argv[])
 	/* z_extract_offset_negative allows simplification of head_32.S */
 	printf(".globl z_extract_offset_negative\n");
 	printf("z_extract_offset_negative = -0x%lx\n", offs);
-	printf(".globl z_run_size\n");
-	printf("z_run_size = %lu\n", run_size);
 
 	printf(".globl input_data, input_data_end\n");
 	printf("input_data:\n");
diff --git a/arch/x86/tools/calc_run_size.sh b/arch/x86/tools/calc_run_size.sh
deleted file mode 100644
index 1a4c17b..0000000
--- a/arch/x86/tools/calc_run_size.sh
+++ /dev/null
@@ -1,42 +0,0 @@
-#!/bin/sh
-#
-# Calculate the amount of space needed to run the kernel, including room for
-# the .bss and .brk sections.
-#
-# Usage:
-# objdump -h a.out | sh calc_run_size.sh
-
-NUM='\([0-9a-fA-F]*[ \t]*\)'
-OUT=$(sed -n 's/^[ \t0-9]*.b[sr][sk][ \t]*'"$NUM$NUM$NUM$NUM"'.*/\1\4/p')
-if [ -z "$OUT" ] ; then
-	echo "Never found .bss or .brk file offset" >&2
-	exit 1
-fi
-
-OUT=$(echo ${OUT# })
-sizeA=$(printf "%d" 0x${OUT%% *})
-OUT=${OUT#* }
-offsetA=$(printf "%d" 0x${OUT%% *})
-OUT=${OUT#* }
-sizeB=$(printf "%d" 0x${OUT%% *})
-OUT=${OUT#* }
-offsetB=$(printf "%d" 0x${OUT%% *})
-
-run_size=$(( $offsetA + $sizeA + $sizeB ))
-
-# BFD linker shows the same file offset in ELF.
-if [ "$offsetA" -ne "$offsetB" ] ; then
-	# Gold linker shows them as consecutive.
-	endB=$(( $offsetB + $sizeB ))
-	if [ "$endB" != "$run_size" ] ; then
-		printf "sizeA: 0x%x\n" $sizeA >&2
-		printf "offsetA: 0x%x\n" $offsetA >&2
-		printf "sizeB: 0x%x\n" $sizeB >&2
-		printf "offsetB: 0x%x\n" $offsetB >&2
-		echo ".bss and .brk are non-contiguous" >&2
-		exit 1
-	fi
-fi
-
-printf "%d\n" $run_size
-exit 0
-- 
1.8.4.5


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v3 2/7] x86, boot: Move ZO to end of buffer
  2015-03-07 22:07 [PATCH v3 0/7] x86, boot: clean up kasl Yinghai Lu
  2015-03-07 22:07 ` [PATCH v3 1/7] x86, kaslr: Use init_size instead of run_size Yinghai Lu
@ 2015-03-07 22:07 ` Yinghai Lu
  2015-03-10  0:54   ` Kees Cook
  2015-03-10  8:00   ` Borislav Petkov
  2015-03-07 22:07 ` [PATCH v3 3/7] x86, boot: Don't overlap VO with ZO data Yinghai Lu
                   ` (5 subsequent siblings)
  7 siblings, 2 replies; 54+ messages in thread
From: Yinghai Lu @ 2015-03-07 22:07 UTC (permalink / raw)
  To: Matt Fleming, H. Peter Anvin, Ingo Molnar, Kees Cook,
	Borislav Petkov, Baoquan He
  Cc: Thomas Gleixner, Jiri Kosina, linux-kernel, linux-efi, Yinghai Lu

Boris found data from boot stage can not be used kernel stage.

Bootloader allocate buffer according to init_size in hdr, and load the
ZO (arch/x86/boot/compressed/vmlinux) from start of that buffer.
During running of ZO, ZO move itself to the middle of buffer at
z_extract_offset to make sure that decompressor would not have output
overwrite input data before input data get consumed.
After decompressor is called, VO (vmlinux) use whole buffer from start,
and ZO code and data section is overlapped with VO bss section.
And later VO/clear_bss() clear them before code in arch/x86/kernel/setup.c
access them.

To make the data survive that later, we should avoid the overlapping.
At first move ZO close the end of buffer instead of middle of the buffer,
that will move out ZO data out of VO bss area.

Also after that we can find out where is data section of copied ZO
instead of guessing. That will aslr mem_avoid array filling for
new buffer seaching much simple.

And rename z_extract_offset to z_min_extract_offset, as it is
actually the minimum offset for extracting now.

To keep the final real extract_offset to be page aligned like
min_extract_offset, we need make VO _end and ZO _end both
page aligned to make sure init_size always page aligned.

Next patch will add ZO data size to init_size, so it will make sure
ZO data is even out of VO brk area.

Fixes: f47233c2d34f ("x86/mm/ASLR: Propagate base load address calculation")
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Matt Fleming <matt.fleming@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/boot/compressed/head_32.S     | 11 +++++++++--
 arch/x86/boot/compressed/head_64.S     |  8 ++++++--
 arch/x86/boot/compressed/mkpiggy.c     |  7 ++-----
 arch/x86/boot/compressed/vmlinux.lds.S |  1 +
 arch/x86/boot/header.S                 |  2 +-
 arch/x86/kernel/asm-offsets.c          |  1 +
 arch/x86/kernel/vmlinux.lds.S          |  1 +
 7 files changed, 21 insertions(+), 10 deletions(-)

diff --git a/arch/x86/boot/compressed/head_32.S b/arch/x86/boot/compressed/head_32.S
index cbed140..a9b56f1 100644
--- a/arch/x86/boot/compressed/head_32.S
+++ b/arch/x86/boot/compressed/head_32.S
@@ -147,7 +147,9 @@ preferred_addr:
 1:
 
 	/* Target address to relocate to for decompression */
-	addl	$z_extract_offset, %ebx
+	movl    BP_init_size(%esi), %eax
+	subl    $_end, %eax
+	addl    %eax, %ebx
 
 	/* Set up the stack */
 	leal	boot_stack_end(%ebx), %esp
@@ -208,8 +210,13 @@ relocated:
  */
 				/* push arguments for decompress_kernel: */
 	pushl	$z_output_len	/* decompressed length */
-	leal	z_extract_offset_negative(%ebx), %ebp
+
+	movl    BP_init_size(%esi), %eax
+	subl    $_end, %eax
+	movl    %ebx, %ebp
+	subl    %eax, %ebp
 	pushl	%ebp		/* output address */
+
 	pushl	$z_input_len	/* input_len */
 	leal	input_data(%ebx), %eax
 	pushl	%eax		/* input_data */
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 2884e0c..69015b5 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -101,7 +101,9 @@ ENTRY(startup_32)
 1:
 
 	/* Target address to relocate to for decompression */
-	addl	$z_extract_offset, %ebx
+	movl	BP_init_size(%esi), %eax
+	subl	$_end, %eax
+	addl	%eax, %ebx
 
 /*
  * Prepare for entering 64 bit mode
@@ -329,7 +331,9 @@ preferred_addr:
 1:
 
 	/* Target address to relocate to for decompression */
-	leaq	z_extract_offset(%rbp), %rbx
+	movl	BP_init_size(%rsi), %ebx
+	subl	$_end, %ebx
+	addq	%rbp, %rbx
 
 	/* Set up the stack */
 	leaq	boot_stack_end(%rbx), %rsp
diff --git a/arch/x86/boot/compressed/mkpiggy.c b/arch/x86/boot/compressed/mkpiggy.c
index b669ab6..c03b009 100644
--- a/arch/x86/boot/compressed/mkpiggy.c
+++ b/arch/x86/boot/compressed/mkpiggy.c
@@ -80,11 +80,8 @@ int main(int argc, char *argv[])
 	printf("z_input_len = %lu\n", ilen);
 	printf(".globl z_output_len\n");
 	printf("z_output_len = %lu\n", (unsigned long)olen);
-	printf(".globl z_extract_offset\n");
-	printf("z_extract_offset = 0x%lx\n", offs);
-	/* z_extract_offset_negative allows simplification of head_32.S */
-	printf(".globl z_extract_offset_negative\n");
-	printf("z_extract_offset_negative = -0x%lx\n", offs);
+	printf(".globl z_min_extract_offset\n");
+	printf("z_min_extract_offset = 0x%lx\n", offs);
 
 	printf(".globl input_data, input_data_end\n");
 	printf("input_data:\n");
diff --git a/arch/x86/boot/compressed/vmlinux.lds.S b/arch/x86/boot/compressed/vmlinux.lds.S
index 34d047c..e24e0a0 100644
--- a/arch/x86/boot/compressed/vmlinux.lds.S
+++ b/arch/x86/boot/compressed/vmlinux.lds.S
@@ -70,5 +70,6 @@ SECTIONS
 		_epgtable = . ;
 	}
 #endif
+	. = ALIGN(PAGE_SIZE);	/* keep ZO size page aligned */
 	_end = .;
 }
diff --git a/arch/x86/boot/header.S b/arch/x86/boot/header.S
index 16ef025..9bfab22 100644
--- a/arch/x86/boot/header.S
+++ b/arch/x86/boot/header.S
@@ -440,7 +440,7 @@ setup_data:		.quad 0			# 64-bit physical pointer to
 
 pref_address:		.quad LOAD_PHYSICAL_ADDR	# preferred load addr
 
-#define ZO_INIT_SIZE	(ZO__end - ZO_startup_32 + ZO_z_extract_offset)
+#define ZO_INIT_SIZE	(ZO__end - ZO_startup_32 + ZO_z_min_extract_offset)
 #define VO_INIT_SIZE	(VO__end - VO__text)
 #if ZO_INIT_SIZE > VO_INIT_SIZE
 #define INIT_SIZE ZO_INIT_SIZE
diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index 9f6b934..0e8e4f7 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -66,6 +66,7 @@ void common(void) {
 	OFFSET(BP_hardware_subarch, boot_params, hdr.hardware_subarch);
 	OFFSET(BP_version, boot_params, hdr.version);
 	OFFSET(BP_kernel_alignment, boot_params, hdr.kernel_alignment);
+	OFFSET(BP_init_size, boot_params, hdr.init_size);
 	OFFSET(BP_pref_address, boot_params, hdr.pref_address);
 	OFFSET(BP_code32_start, boot_params, hdr.code32_start);
 
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 00bf300..ac25c7f 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -325,6 +325,7 @@ SECTIONS
 		__brk_limit = .;
 	}
 
+	. = ALIGN(PAGE_SIZE);		/* keep VO init size page aligned */
 	_end = .;
 
         STABS_DEBUG
-- 
1.8.4.5


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v3 3/7] x86, boot: Don't overlap VO with ZO data
  2015-03-07 22:07 [PATCH v3 0/7] x86, boot: clean up kasl Yinghai Lu
  2015-03-07 22:07 ` [PATCH v3 1/7] x86, kaslr: Use init_size instead of run_size Yinghai Lu
  2015-03-07 22:07 ` [PATCH v3 2/7] x86, boot: Move ZO to end of buffer Yinghai Lu
@ 2015-03-07 22:07 ` Yinghai Lu
  2015-03-10  9:34   ` Borislav Petkov
  2015-03-07 22:07 ` [PATCH v3 4/7] x86, kaslr: Access the correct kaslr_enabled variable Yinghai Lu
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 54+ messages in thread
From: Yinghai Lu @ 2015-03-07 22:07 UTC (permalink / raw)
  To: Matt Fleming, H. Peter Anvin, Ingo Molnar, Kees Cook,
	Borislav Petkov, Baoquan He
  Cc: Thomas Gleixner, Jiri Kosina, linux-kernel, linux-efi, Yinghai Lu

Boris found data from boot stage can not be used kernel stage.

Bootloader allocate buffer according to init_size in hdr, and load the
ZO (arch/x86/boot/compressed/vmlinux) from start of that buffer.
During running of ZO, ZO move itself to the middle of buffer at
z_extract_offset to make sure that decompressor would not have output
overwrite input data before input data get consumed.
After decompressor is called, VO (vmlinux) use whole buffer from start,
and ZO code and data section is overlapped with VO bss section.
And later VO/clear_bss() clear them before code in arch/x86/kernel/setup.c
access them.

We need to avoid overlapping to keep ZO data.

In previous patch, We already move ZO close the end of buffer instead of
middle of buffer. But there is overlapping beween VO brk with ZO data area.

Extend init_size so VO bss and brk will not overlap with data ZO data area
(most from boot/compressed/misc.c).

The increase is from _rodata to _end in ZO (arch/x86/boot/compressed/vmlinux).

-v2: add init_size in arch/x86/boot/header.S instead of BRK.
-v3: split code that move ZO to end of buffer to another patch.

Fixes: f47233c2d34f ("x86/mm/ASLR: Propagate base load address calculation")
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Matt Fleming <matt.fleming@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/boot/Makefile                 | 2 +-
 arch/x86/boot/compressed/vmlinux.lds.S | 1 +
 arch/x86/boot/header.S                 | 7 +++++--
 3 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
index 57bbf2f..863ef25 100644
--- a/arch/x86/boot/Makefile
+++ b/arch/x86/boot/Makefile
@@ -86,7 +86,7 @@ targets += voffset.h
 $(obj)/voffset.h: vmlinux FORCE
 	$(call if_changed,voffset)
 
-sed-zoffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(startup_32\|startup_64\|efi32_stub_entry\|efi64_stub_entry\|efi_pe_entry\|input_data\|_end\|z_.*\)$$/\#define ZO_\2 0x\1/p'
+sed-zoffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(startup_32\|startup_64\|efi32_stub_entry\|efi64_stub_entry\|efi_pe_entry\|input_data\|_end\|_rodata\|z_.*\)$$/\#define ZO_\2 0x\1/p'
 
 quiet_cmd_zoffset = ZOFFSET $@
       cmd_zoffset = $(NM) $< | sed -n $(sed-zoffset) > $@
diff --git a/arch/x86/boot/compressed/vmlinux.lds.S b/arch/x86/boot/compressed/vmlinux.lds.S
index e24e0a0..6d6158e 100644
--- a/arch/x86/boot/compressed/vmlinux.lds.S
+++ b/arch/x86/boot/compressed/vmlinux.lds.S
@@ -35,6 +35,7 @@ SECTIONS
 		*(.text.*)
 		_etext = . ;
 	}
+        . = ALIGN(PAGE_SIZE); /* keep ADDON_ZO_SIZE page aligned */
 	.rodata : {
 		_rodata = . ;
 		*(.rodata)	 /* read-only data */
diff --git a/arch/x86/boot/header.S b/arch/x86/boot/header.S
index 9bfab22..bc5d78d 100644
--- a/arch/x86/boot/header.S
+++ b/arch/x86/boot/header.S
@@ -440,12 +440,15 @@ setup_data:		.quad 0			# 64-bit physical pointer to
 
 pref_address:		.quad LOAD_PHYSICAL_ADDR	# preferred load addr
 
+# don't overlap data area of ZO with VO
+#define ADDON_ZO_SIZE (ZO__end - ZO__rodata)
+
 #define ZO_INIT_SIZE	(ZO__end - ZO_startup_32 + ZO_z_min_extract_offset)
 #define VO_INIT_SIZE	(VO__end - VO__text)
 #if ZO_INIT_SIZE > VO_INIT_SIZE
-#define INIT_SIZE ZO_INIT_SIZE
+#define INIT_SIZE (ZO_INIT_SIZE + ADDON_ZO_SIZE)
 #else
-#define INIT_SIZE VO_INIT_SIZE
+#define INIT_SIZE (VO_INIT_SIZE + ADDON_ZO_SIZE)
 #endif
 init_size:		.long INIT_SIZE		# kernel initialization size
 handover_offset:	.long 0			# Filled in by build.c
-- 
1.8.4.5


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v3 4/7] x86, kaslr: Access the correct kaslr_enabled variable
  2015-03-07 22:07 [PATCH v3 0/7] x86, boot: clean up kasl Yinghai Lu
                   ` (2 preceding siblings ...)
  2015-03-07 22:07 ` [PATCH v3 3/7] x86, boot: Don't overlap VO with ZO data Yinghai Lu
@ 2015-03-07 22:07 ` Yinghai Lu
  2015-03-10  0:55   ` Kees Cook
  2015-03-07 22:07 ` [PATCH v3 5/7] x86, kaslr: Consolidate mem_avoid array filling Yinghai Lu
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 54+ messages in thread
From: Yinghai Lu @ 2015-03-07 22:07 UTC (permalink / raw)
  To: Matt Fleming, H. Peter Anvin, Ingo Molnar, Kees Cook,
	Borislav Petkov, Baoquan He
  Cc: Thomas Gleixner, Jiri Kosina, linux-kernel, linux-efi, Yinghai Lu

Commit

  f47233c2d34f ("x86/mm/ASLR: Propagate base load address calculation")

started passing KASLR status to kernel proper, but it uses a physical
address as the vaule, leading to parsing bogus KASLR status in kernel
proper.

The setup_data linked list and thus the element which contains
kaslr_enabled is chained together using physical addresses. At the time
when we access it in the kernel proper, we're already running with
paging enabled and therefore must access it through its virtual address.

This patch changes the code to use early_memmap() and access the value.

-v3: add checking return from early_memmap according to Boris.
-v4: add description about setup_data accessing from Boris to change log.

Fixes: f47233c2d34f ("x86/mm/ASLR: Propagate base load address calculation")
Cc: Matt Fleming <matt.fleming@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/kernel/setup.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 98dc931..912f124 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -429,7 +429,18 @@ static void __init reserve_initrd(void)
 
 static void __init parse_kaslr_setup(u64 pa_data, u32 data_len)
 {
-	kaslr_enabled = (bool)(pa_data + sizeof(struct setup_data));
+	/* kaslr_setup_data is defined in aslr.c */
+	unsigned char *data;
+	unsigned long offset = sizeof(struct setup_data);
+
+	data = early_memremap(pa_data, offset + 1);
+	if (!data) {
+		kaslr_enabled = true;
+		return;
+	}
+
+	kaslr_enabled = *(data + offset);
+	early_memunmap(data, offset + 1);
 }
 
 static void __init parse_setup_data(void)
-- 
1.8.4.5


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v3 5/7] x86, kaslr: Consolidate mem_avoid array filling
  2015-03-07 22:07 [PATCH v3 0/7] x86, boot: clean up kasl Yinghai Lu
                   ` (3 preceding siblings ...)
  2015-03-07 22:07 ` [PATCH v3 4/7] x86, kaslr: Access the correct kaslr_enabled variable Yinghai Lu
@ 2015-03-07 22:07 ` Yinghai Lu
  2015-03-10  1:00   ` Kees Cook
  2015-03-07 22:07 ` [PATCH v3 6/7] x86, boot: Split kernel_ident_mapping_init to another file Yinghai Lu
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 54+ messages in thread
From: Yinghai Lu @ 2015-03-07 22:07 UTC (permalink / raw)
  To: Matt Fleming, H. Peter Anvin, Ingo Molnar, Kees Cook,
	Borislav Petkov, Baoquan He
  Cc: Thomas Gleixner, Jiri Kosina, linux-kernel, linux-efi, Yinghai Lu

Now ZO sit end of the buffer, we can find out where is ZO text
and data/bss etc.

[input, input+input_size) is copied compressed kernel, not the whole ZO.
[output, output+init_size) is the buffer for VO.

[input+input_size, output+init_size) is [_text, _end) for ZO.
that could be first range in mem_avoid. We don't need to guess that anymore.

That area already include heap and stack for ZO running. So we don't need
to put them into mem_avoid array.

We need to put boot_params into the mem_avoid too. As with 64bit bootloader
could put it anywhere.

Also change output_size referring to init_size, as we pass init_size instead.

Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/boot/compressed/aslr.c | 29 ++++++++++++++---------------
 arch/x86/boot/compressed/misc.h |  4 ++--
 2 files changed, 16 insertions(+), 17 deletions(-)

diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index 7083c16..a279514 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -116,7 +116,7 @@ struct mem_vector {
 	unsigned long size;
 };
 
-#define MEM_AVOID_MAX 5
+#define MEM_AVOID_MAX 4
 static struct mem_vector mem_avoid[MEM_AVOID_MAX];
 
 static bool mem_contains(struct mem_vector *region, struct mem_vector *item)
@@ -142,7 +142,7 @@ static bool mem_overlaps(struct mem_vector *one, struct mem_vector *two)
 }
 
 static void mem_avoid_init(unsigned long input, unsigned long input_size,
-			   unsigned long output, unsigned long output_size)
+			   unsigned long output, unsigned long init_size)
 {
 	u64 initrd_start, initrd_size;
 	u64 cmd_line, cmd_line_size;
@@ -151,10 +151,13 @@ static void mem_avoid_init(unsigned long input, unsigned long input_size,
 
 	/*
 	 * Avoid the region that is unsafe to overlap during
-	 * decompression (see calculations at top of misc.c).
+	 * decompression.
+	 * As we already move ZO (arch/x86/boot/compressed/vmlinux)
+	 * to the end of buffer, [input+input_size, output+init_size)
+	 * has [_text, _end) for ZO.
 	 */
-	unsafe_len = (output_size >> 12) + 32768 + 18;
-	unsafe = (unsigned long)input + input_size - unsafe_len;
+	unsafe_len = output + init_size - (input + input_size);
+	unsafe = (unsigned long)input + input_size;
 	mem_avoid[0].start = unsafe;
 	mem_avoid[0].size = unsafe_len;
 
@@ -176,13 +179,9 @@ static void mem_avoid_init(unsigned long input, unsigned long input_size,
 	mem_avoid[2].start = cmd_line;
 	mem_avoid[2].size = cmd_line_size;
 
-	/* Avoid heap memory. */
-	mem_avoid[3].start = (unsigned long)free_mem_ptr;
-	mem_avoid[3].size = BOOT_HEAP_SIZE;
-
-	/* Avoid stack memory. */
-	mem_avoid[4].start = (unsigned long)free_mem_end_ptr;
-	mem_avoid[4].size = BOOT_STACK_SIZE;
+	/* Avoid params */
+	mem_avoid[3].start = (unsigned long)real_mode;
+	mem_avoid[3].size = sizeof(*real_mode);
 }
 
 /* Does this memory vector overlap a known avoided area? */
@@ -327,7 +326,7 @@ unsigned char *choose_kernel_location(struct boot_params *params,
 				      unsigned char *input,
 				      unsigned long input_size,
 				      unsigned char *output,
-				      unsigned long output_size)
+				      unsigned long init_size)
 {
 	unsigned long choice = (unsigned long)output;
 	unsigned long random;
@@ -349,10 +348,10 @@ unsigned char *choose_kernel_location(struct boot_params *params,
 
 	/* Record the various known unsafe memory ranges. */
 	mem_avoid_init((unsigned long)input, input_size,
-		       (unsigned long)output, output_size);
+		       (unsigned long)output, init_size);
 
 	/* Walk e820 and find a random address. */
-	random = find_random_addr(choice, output_size);
+	random = find_random_addr(choice, init_size);
 	if (!random) {
 		debug_putstr("KASLR could not find suitable E820 region...\n");
 		goto out;
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index ee3576b..23156e7 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -61,7 +61,7 @@ unsigned char *choose_kernel_location(struct boot_params *params,
 				      unsigned char *input,
 				      unsigned long input_size,
 				      unsigned char *output,
-				      unsigned long output_size);
+				      unsigned long init_size);
 /* cpuflags.c */
 bool has_cpuflag(int flag);
 #else
@@ -70,7 +70,7 @@ unsigned char *choose_kernel_location(struct boot_params *params,
 				      unsigned char *input,
 				      unsigned long input_size,
 				      unsigned char *output,
-				      unsigned long output_size)
+				      unsigned long init_size)
 {
 	return output;
 }
-- 
1.8.4.5


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v3 6/7] x86, boot: Split kernel_ident_mapping_init to another file
  2015-03-07 22:07 [PATCH v3 0/7] x86, boot: clean up kasl Yinghai Lu
                   ` (4 preceding siblings ...)
  2015-03-07 22:07 ` [PATCH v3 5/7] x86, kaslr: Consolidate mem_avoid array filling Yinghai Lu
@ 2015-03-07 22:07 ` Yinghai Lu
  2015-03-10  1:03   ` Kees Cook
  2015-03-07 22:07 ` [PATCH v3 7/7] x86, kaslr, 64bit: Set new or extra ident_mapping Yinghai Lu
  2015-03-10  0:39 ` [PATCH v3 0/7] x86, boot: clean up kasl Kees Cook
  7 siblings, 1 reply; 54+ messages in thread
From: Yinghai Lu @ 2015-03-07 22:07 UTC (permalink / raw)
  To: Matt Fleming, H. Peter Anvin, Ingo Molnar, Kees Cook,
	Borislav Petkov, Baoquan He
  Cc: Thomas Gleixner, Jiri Kosina, linux-kernel, linux-efi, Yinghai Lu

We need to include that in boot::decompress_kernel stage to set new
ident mapping.

Also add checking for __pa/__va macro definition, as we need to override them
in boot::decompress_kernel stage.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/include/asm/page.h |  5 +++
 arch/x86/mm/ident_map.c     | 74 +++++++++++++++++++++++++++++++++++++++++++++
 arch/x86/mm/init_64.c       | 74 +--------------------------------------------
 3 files changed, 80 insertions(+), 73 deletions(-)
 create mode 100644 arch/x86/mm/ident_map.c

diff --git a/arch/x86/include/asm/page.h b/arch/x86/include/asm/page.h
index 802dde3..cf8f619 100644
--- a/arch/x86/include/asm/page.h
+++ b/arch/x86/include/asm/page.h
@@ -37,7 +37,10 @@ static inline void copy_user_page(void *to, void *from, unsigned long vaddr,
 	alloc_page_vma(GFP_HIGHUSER | __GFP_ZERO | movableflags, vma, vaddr)
 #define __HAVE_ARCH_ALLOC_ZEROED_USER_HIGHPAGE
 
+#ifndef __pa
 #define __pa(x)		__phys_addr((unsigned long)(x))
+#endif
+
 #define __pa_nodebug(x)	__phys_addr_nodebug((unsigned long)(x))
 /* __pa_symbol should be used for C visible symbols.
    This seems to be the official gcc blessed way to do such arithmetic. */
@@ -51,7 +54,9 @@ static inline void copy_user_page(void *to, void *from, unsigned long vaddr,
 #define __pa_symbol(x) \
 	__phys_addr_symbol(__phys_reloc_hide((unsigned long)(x)))
 
+#ifndef __va
 #define __va(x)			((void *)((unsigned long)(x)+PAGE_OFFSET))
+#endif
 
 #define __boot_va(x)		__va(x)
 #define __boot_pa(x)		__pa(x)
diff --git a/arch/x86/mm/ident_map.c b/arch/x86/mm/ident_map.c
new file mode 100644
index 0000000..751ca92
--- /dev/null
+++ b/arch/x86/mm/ident_map.c
@@ -0,0 +1,74 @@
+
+static void ident_pmd_init(unsigned long pmd_flag, pmd_t *pmd_page,
+			   unsigned long addr, unsigned long end)
+{
+	addr &= PMD_MASK;
+	for (; addr < end; addr += PMD_SIZE) {
+		pmd_t *pmd = pmd_page + pmd_index(addr);
+
+		if (!pmd_present(*pmd))
+			set_pmd(pmd, __pmd(addr | pmd_flag));
+	}
+}
+static int ident_pud_init(struct x86_mapping_info *info, pud_t *pud_page,
+			  unsigned long addr, unsigned long end)
+{
+	unsigned long next;
+
+	for (; addr < end; addr = next) {
+		pud_t *pud = pud_page + pud_index(addr);
+		pmd_t *pmd;
+
+		next = (addr & PUD_MASK) + PUD_SIZE;
+		if (next > end)
+			next = end;
+
+		if (pud_present(*pud)) {
+			pmd = pmd_offset(pud, 0);
+			ident_pmd_init(info->pmd_flag, pmd, addr, next);
+			continue;
+		}
+		pmd = (pmd_t *)info->alloc_pgt_page(info->context);
+		if (!pmd)
+			return -ENOMEM;
+		ident_pmd_init(info->pmd_flag, pmd, addr, next);
+		set_pud(pud, __pud(__pa(pmd) | _KERNPG_TABLE));
+	}
+
+	return 0;
+}
+
+int kernel_ident_mapping_init(struct x86_mapping_info *info, pgd_t *pgd_page,
+			      unsigned long addr, unsigned long end)
+{
+	unsigned long next;
+	int result;
+	int off = info->kernel_mapping ? pgd_index(__PAGE_OFFSET) : 0;
+
+	for (; addr < end; addr = next) {
+		pgd_t *pgd = pgd_page + pgd_index(addr) + off;
+		pud_t *pud;
+
+		next = (addr & PGDIR_MASK) + PGDIR_SIZE;
+		if (next > end)
+			next = end;
+
+		if (pgd_present(*pgd)) {
+			pud = pud_offset(pgd, 0);
+			result = ident_pud_init(info, pud, addr, next);
+			if (result)
+				return result;
+			continue;
+		}
+
+		pud = (pud_t *)info->alloc_pgt_page(info->context);
+		if (!pud)
+			return -ENOMEM;
+		result = ident_pud_init(info, pud, addr, next);
+		if (result)
+			return result;
+		set_pgd(pgd, __pgd(__pa(pud) | _KERNPG_TABLE));
+	}
+
+	return 0;
+}
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 30eb05a..c30efb6 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -56,79 +56,7 @@
 
 #include "mm_internal.h"
 
-static void ident_pmd_init(unsigned long pmd_flag, pmd_t *pmd_page,
-			   unsigned long addr, unsigned long end)
-{
-	addr &= PMD_MASK;
-	for (; addr < end; addr += PMD_SIZE) {
-		pmd_t *pmd = pmd_page + pmd_index(addr);
-
-		if (!pmd_present(*pmd))
-			set_pmd(pmd, __pmd(addr | pmd_flag));
-	}
-}
-static int ident_pud_init(struct x86_mapping_info *info, pud_t *pud_page,
-			  unsigned long addr, unsigned long end)
-{
-	unsigned long next;
-
-	for (; addr < end; addr = next) {
-		pud_t *pud = pud_page + pud_index(addr);
-		pmd_t *pmd;
-
-		next = (addr & PUD_MASK) + PUD_SIZE;
-		if (next > end)
-			next = end;
-
-		if (pud_present(*pud)) {
-			pmd = pmd_offset(pud, 0);
-			ident_pmd_init(info->pmd_flag, pmd, addr, next);
-			continue;
-		}
-		pmd = (pmd_t *)info->alloc_pgt_page(info->context);
-		if (!pmd)
-			return -ENOMEM;
-		ident_pmd_init(info->pmd_flag, pmd, addr, next);
-		set_pud(pud, __pud(__pa(pmd) | _KERNPG_TABLE));
-	}
-
-	return 0;
-}
-
-int kernel_ident_mapping_init(struct x86_mapping_info *info, pgd_t *pgd_page,
-			      unsigned long addr, unsigned long end)
-{
-	unsigned long next;
-	int result;
-	int off = info->kernel_mapping ? pgd_index(__PAGE_OFFSET) : 0;
-
-	for (; addr < end; addr = next) {
-		pgd_t *pgd = pgd_page + pgd_index(addr) + off;
-		pud_t *pud;
-
-		next = (addr & PGDIR_MASK) + PGDIR_SIZE;
-		if (next > end)
-			next = end;
-
-		if (pgd_present(*pgd)) {
-			pud = pud_offset(pgd, 0);
-			result = ident_pud_init(info, pud, addr, next);
-			if (result)
-				return result;
-			continue;
-		}
-
-		pud = (pud_t *)info->alloc_pgt_page(info->context);
-		if (!pud)
-			return -ENOMEM;
-		result = ident_pud_init(info, pud, addr, next);
-		if (result)
-			return result;
-		set_pgd(pgd, __pgd(__pa(pud) | _KERNPG_TABLE));
-	}
-
-	return 0;
-}
+#include "ident_map.c"
 
 static int __init parse_direct_gbpages_off(char *arg)
 {
-- 
1.8.4.5


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v3 7/7] x86, kaslr, 64bit: Set new or extra ident_mapping
  2015-03-07 22:07 [PATCH v3 0/7] x86, boot: clean up kasl Yinghai Lu
                   ` (5 preceding siblings ...)
  2015-03-07 22:07 ` [PATCH v3 6/7] x86, boot: Split kernel_ident_mapping_init to another file Yinghai Lu
@ 2015-03-07 22:07 ` Yinghai Lu
  2015-03-10  1:09   ` Kees Cook
  2015-03-10  0:39 ` [PATCH v3 0/7] x86, boot: clean up kasl Kees Cook
  7 siblings, 1 reply; 54+ messages in thread
From: Yinghai Lu @ 2015-03-07 22:07 UTC (permalink / raw)
  To: Matt Fleming, H. Peter Anvin, Ingo Molnar, Kees Cook,
	Borislav Petkov, Baoquan He
  Cc: Thomas Gleixner, Jiri Kosina, linux-kernel, linux-efi, Yinghai Lu

First, aslr will support to put random VO above 4G, so we must set ident
mapping for the range even we come from startup_32 path.

Second, when boot from 64bit bootloader, bootloader set ident mapping,
and boot via ZO (arch/x86/boot/compressed/vmlinux) startup_64.
Those pages for pagetable need to be avoided when we select new random
VO (vmlinux) base. Otherwise decompressor would overwrite them during
decompressing.

One solution: go through pagetable and find out every page is used by
pagetable for every mem_aovid checking but we will need extra code.

Other solution: create new ident mapping instead, and pages for pagetable
will sit in _pagetable section of ZO, and they are in mem_avoid array already.
In this way, we can reuse the code for setting ident mapping.

The _pgtable will be shared 32bit and 64bit path to reduce init_size,
as now ZO _rodata to _end will contribute init_size.

Need to increase pgt buffer size.
When boot via startup_64, as we need to cover old VO, params, cmdline
and new VO, in extreme case we could have them all cross 512G boundary,
will need (2+2)*4 pages with 2M mapping. And need 2 for first 2M for vga ram.
Plus one for level4. Total will be 19 pages.
When boot via startup_32, aslr would move new VO above 4G, we need set extra
ident mapping for new VO, pgt buffer come from _pgtable offset 6 pages.
should only need (2+2) pages at most when it cross 512G boundary.
So 19 pages could make both pathes happy.


-v3: add mapping for first 2M with video ram when X86_VERBOSE_BOOTUP is set.
     Don't need to set mapping for setup_data, as it is already late
     in boot/ZO stage, will not access it until VO stage, and VO stage
     will use early_memmap or kernel address to access them.

Cc: Kees Cook <keescook@chromium.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Borislav Petkov <bp@suse.de>
Cc: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/boot/compressed/aslr.c     | 21 ++++++++
 arch/x86/boot/compressed/head_64.S  |  4 +-
 arch/x86/boot/compressed/misc_pgt.c | 98 +++++++++++++++++++++++++++++++++++++
 arch/x86/include/asm/boot.h         | 19 +++++++
 4 files changed, 140 insertions(+), 2 deletions(-)
 create mode 100644 arch/x86/boot/compressed/misc_pgt.c

diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index a279514..34eb652 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -1,3 +1,8 @@
+#ifdef CONFIG_X86_64
+#define __pa(x)  ((unsigned long)(x))
+#define __va(x)  ((void *)((unsigned long)(x)))
+#endif
+
 #include "misc.h"
 
 #include <asm/msr.h>
@@ -21,6 +26,8 @@ struct kaslr_setup_data {
 	__u8 data[1];
 } kaslr_setup_data;
 
+#include "misc_pgt.c"
+
 #define I8254_PORT_CONTROL	0x43
 #define I8254_PORT_COUNTER0	0x40
 #define I8254_CMD_READBACK	0xC0
@@ -160,6 +167,7 @@ static void mem_avoid_init(unsigned long input, unsigned long input_size,
 	unsafe = (unsigned long)input + input_size;
 	mem_avoid[0].start = unsafe;
 	mem_avoid[0].size = unsafe_len;
+	fill_pagetable(output, init_size);
 
 	/* Avoid initrd. */
 	initrd_start  = (u64)real_mode->ext_ramdisk_image << 32;
@@ -168,6 +176,7 @@ static void mem_avoid_init(unsigned long input, unsigned long input_size,
 	initrd_size |= real_mode->hdr.ramdisk_size;
 	mem_avoid[1].start = initrd_start;
 	mem_avoid[1].size = initrd_size;
+	/* don't need to set mapping for initrd */
 
 	/* Avoid kernel command line. */
 	cmd_line  = (u64)real_mode->ext_cmd_line_ptr << 32;
@@ -178,10 +187,19 @@ static void mem_avoid_init(unsigned long input, unsigned long input_size,
 		;
 	mem_avoid[2].start = cmd_line;
 	mem_avoid[2].size = cmd_line_size;
+	fill_pagetable(cmd_line, cmd_line_size);
 
 	/* Avoid params */
 	mem_avoid[3].start = (unsigned long)real_mode;
 	mem_avoid[3].size = sizeof(*real_mode);
+	fill_pagetable((unsigned long)real_mode, sizeof(*real_mode));
+
+	/* don't need to set mapping for setup_data */
+
+#ifdef CONFIG_X86_VERBOSE_BOOTUP
+	/* for video ram */
+	fill_pagetable(0, PMD_SIZE);
+#endif
 }
 
 /* Does this memory vector overlap a known avoided area? */
@@ -362,6 +380,9 @@ unsigned char *choose_kernel_location(struct boot_params *params,
 		goto out;
 
 	choice = random;
+
+	fill_pagetable(choice, init_size);
+	switch_pagetable();
 out:
 	return (unsigned char *)choice;
 }
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 69015b5..1b6e34a 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -125,7 +125,7 @@ ENTRY(startup_32)
 	/* Initialize Page tables to 0 */
 	leal	pgtable(%ebx), %edi
 	xorl	%eax, %eax
-	movl	$((4096*6)/4), %ecx
+	movl	$(BOOT_INIT_PGT_SIZE/4), %ecx
 	rep	stosl
 
 	/* Build Level 4 */
@@ -477,4 +477,4 @@ boot_stack_end:
 	.section ".pgtable","a",@nobits
 	.balign 4096
 pgtable:
-	.fill 6*4096, 1, 0
+	.fill BOOT_PGT_SIZE, 1, 0
diff --git a/arch/x86/boot/compressed/misc_pgt.c b/arch/x86/boot/compressed/misc_pgt.c
new file mode 100644
index 0000000..b55982c
--- /dev/null
+++ b/arch/x86/boot/compressed/misc_pgt.c
@@ -0,0 +1,98 @@
+
+#ifdef CONFIG_X86_64
+#include <asm/init.h>
+#include <asm/pgtable.h>
+
+#include "../../mm/ident_map.c"
+
+struct alloc_pgt_data {
+	unsigned char *pgt_buf;
+	unsigned long pgt_buf_size;
+	unsigned long pgt_buf_offset;
+};
+
+static void *alloc_pgt_page(void *context)
+{
+	struct alloc_pgt_data *d = (struct alloc_pgt_data *)context;
+	unsigned char *p = (unsigned char *)d->pgt_buf;
+
+	if (d->pgt_buf_offset >= d->pgt_buf_size) {
+		debug_putstr("out of pgt_buf in misc.c\n");
+		return NULL;
+	}
+
+	p += d->pgt_buf_offset;
+	d->pgt_buf_offset += PAGE_SIZE;
+
+	return p;
+}
+
+/*
+ * Use a normal definition of memset() from string.c. There are already
+ * included header files which expect a definition of memset() and by
+ * the time we define memset macro, it is too late.
+ */
+#undef memset
+#define memzero(s, n)   memset((s), 0, (n))
+
+unsigned long __force_order;
+static struct alloc_pgt_data pgt_data;
+static struct x86_mapping_info mapping_info;
+static pgd_t *level4p;
+
+extern unsigned char _pgtable[];
+static void fill_pagetable(unsigned long start, unsigned long size)
+{
+	unsigned long end = start + size;
+
+	if (!level4p) {
+		pgt_data.pgt_buf_offset = 0;
+		mapping_info.alloc_pgt_page = alloc_pgt_page;
+		mapping_info.context = &pgt_data;
+		mapping_info.pmd_flag = __PAGE_KERNEL_LARGE_EXEC;
+
+		/*
+		 * come from startup_32 ?
+		 * then cr3 is _pgtable, we can reuse it.
+		 */
+		level4p = (pgd_t *)read_cr3();
+		if ((unsigned long)level4p == (unsigned long)_pgtable) {
+			pgt_data.pgt_buf = (unsigned char *)_pgtable +
+						 BOOT_INIT_PGT_SIZE;
+			pgt_data.pgt_buf_size = BOOT_PGT_SIZE -
+						 BOOT_INIT_PGT_SIZE;
+
+			debug_putstr("boot via startup_32\n");
+		} else {
+			pgt_data.pgt_buf = (unsigned char *)_pgtable;
+			pgt_data.pgt_buf_size = BOOT_PGT_SIZE;
+
+			debug_putstr("boot via startup_64\n");
+			level4p = (pgd_t *)alloc_pgt_page(&pgt_data);
+		}
+		memset((unsigned char *)pgt_data.pgt_buf, 0,
+			 pgt_data.pgt_buf_size);
+	}
+
+	/* align boundary to 2M */
+	start = round_down(start, PMD_SIZE);
+	end = round_up(end, PMD_SIZE);
+	if (start >= end)
+		return;
+
+	kernel_ident_mapping_init(&mapping_info, level4p, start, end);
+}
+
+static void switch_pagetable(void)
+{
+	write_cr3((unsigned long)level4p);
+}
+
+#else
+static void fill_pagetable(unsigned long start, unsigned long size)
+{
+}
+static void switch_pagetable(void)
+{
+}
+#endif
diff --git a/arch/x86/include/asm/boot.h b/arch/x86/include/asm/boot.h
index 4fa687a..7b23908 100644
--- a/arch/x86/include/asm/boot.h
+++ b/arch/x86/include/asm/boot.h
@@ -32,7 +32,26 @@
 #endif /* !CONFIG_KERNEL_BZIP2 */
 
 #ifdef CONFIG_X86_64
+
 #define BOOT_STACK_SIZE	0x4000
+
+#define BOOT_INIT_PGT_SIZE (6*4096)
+#ifdef CONFIG_RANDOMIZE_BASE
+/*
+ * 1 page for level4, 2 pages for first 2M.
+ * (2+2)*4 pages for kernel, param, cmd_line, random kernel
+ * if all cross 512G boundary.
+ * So total will be 19 pages.
+ */
+#ifdef CONFIG_X86_VERBOSE_BOOTUP
+#define BOOT_PGT_SIZE (19*4096)
+#else
+#define BOOT_PGT_SIZE (17*4096)
+#endif
+#else
+#define BOOT_PGT_SIZE BOOT_INIT_PGT_SIZE
+#endif
+
 #else
 #define BOOT_STACK_SIZE	0x1000
 #endif
-- 
1.8.4.5


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 1/7] x86, kaslr: Use init_size instead of run_size
  2015-03-07 22:07 ` [PATCH v3 1/7] x86, kaslr: Use init_size instead of run_size Yinghai Lu
@ 2015-03-09 12:49   ` Borislav Petkov
  2015-03-09 15:58     ` Ingo Molnar
  2015-03-09 19:35     ` Yinghai Lu
  2015-03-10  0:42   ` Kees Cook
  2015-03-13 12:27   ` Ingo Molnar
  2 siblings, 2 replies; 54+ messages in thread
From: Borislav Petkov @ 2015-03-09 12:49 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Matt Fleming, H. Peter Anvin, Ingo Molnar, Kees Cook, Baoquan He,
	Thomas Gleixner, Jiri Kosina, linux-kernel, linux-efi,
	Josh Triplett, Andrew Morton, Ard Biesheuvel, Junjie Mao

On Sat, Mar 07, 2015 at 02:07:15PM -0800, Yinghai Lu wrote:

...

I ended up committing this. Anything I've missed?

---
From: Yinghai Lu <yinghai@kernel.org>
Date: Sat, 7 Mar 2015 14:07:15 -0800
Subject: [PATCH] x86/setup: Use init_size instead of run_size

Commit

  e6023367d779 ("x86, kaslr: Prevent .bss from overlaping initrd")

introduced run_size for KASLR to represent the size of kernel proper
(vmlinux).

However, we should use the actual runtime size (which provides for
copy/decompress), i.e. init_size, as it includes .bss and .brk.

Why, you ask?

Because init_size is the size needed for safe kernel decompression and
thus can be higher than run_size in case the decompressor needs a larger
buffer.

>From arch/x86/boot/header.S:
  #define ZO_INIT_SIZE    (ZO__end - ZO_startup_32 + ZO_z_extract_offset)
  #define VO_INIT_SIZE    (VO__end - VO__text)
  #if ZO_INIT_SIZE > VO_INIT_SIZE
  #define INIT_SIZE ZO_INIT_SIZE
  #else
  #define INIT_SIZE VO_INIT_SIZE
  #endif
  init_size:              .long INIT_SIZE         # kernel initialization size

The boot loader allocates a buffer of size init_size which it
reads from the setup header and loads the compressed kernel
(arch/x86/boot/compressed/vmlinux) in it.

init_size initially comes from the kernel proper's (vmlinux) init size.
It includes the .bss and .brk area.

When the boot loader hands off to the compressed kernel, the last
moves itself to z_extract_offset within the buffer to make sure that
the decompressor output does not overwrite input data before it gets
consumed.

However, z_extract_offset is the size difference
between the uncompressed and compressed kernel (see
arch/x86/boot/compressed/mkpiggy.c) and thus represents the additional
space needed for decompression but it doesn't factor in a bigger
ZO_INIT_SIZE.

During ASLR buffer searching, we need to make sure the new buffer is big
enough for decompression. So use init_size instead, and kill run_size
related code.

Fixes: e6023367d779 ("x86, kaslr: Prevent .bss from overlaping initrd")
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Matt Fleming <matt.fleming@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Junjie Mao <eternal.n08@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1425766041-6551-2-git-send-email-yinghai@kernel.org
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
[ Seriously massage commit message. ]
Signed-off-by: 
---
 arch/x86/boot/compressed/Makefile  |  4 +---
 arch/x86/boot/compressed/head_32.S |  5 ++---
 arch/x86/boot/compressed/head_64.S |  5 +----
 arch/x86/boot/compressed/misc.c    | 15 +++++++-------
 arch/x86/boot/compressed/mkpiggy.c |  9 ++------
 arch/x86/tools/calc_run_size.sh    | 42 --------------------------------------
 6 files changed, 13 insertions(+), 67 deletions(-)
 delete mode 100644 arch/x86/tools/calc_run_size.sh

diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index 0a291cdfaf77..70cc92c8bcfb 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -92,10 +92,8 @@ suffix-$(CONFIG_KERNEL_XZ)	:= xz
 suffix-$(CONFIG_KERNEL_LZO) 	:= lzo
 suffix-$(CONFIG_KERNEL_LZ4) 	:= lz4
 
-RUN_SIZE = $(shell $(OBJDUMP) -h vmlinux | \
-	     $(CONFIG_SHELL) $(srctree)/arch/x86/tools/calc_run_size.sh)
 quiet_cmd_mkpiggy = MKPIGGY $@
-      cmd_mkpiggy = $(obj)/mkpiggy $< $(RUN_SIZE) > $@ || ( rm -f $@ ; false )
+      cmd_mkpiggy = $(obj)/mkpiggy $< > $@ || ( rm -f $@ ; false )
 
 targets += piggy.S
 $(obj)/piggy.S: $(obj)/vmlinux.bin.$(suffix-y) $(obj)/mkpiggy FORCE
diff --git a/arch/x86/boot/compressed/head_32.S b/arch/x86/boot/compressed/head_32.S
index 1d7fbbcc196d..cbed1407a5cd 100644
--- a/arch/x86/boot/compressed/head_32.S
+++ b/arch/x86/boot/compressed/head_32.S
@@ -207,8 +207,7 @@ relocated:
  * Do the decompression, and jump to the new kernel..
  */
 				/* push arguments for decompress_kernel: */
-	pushl	$z_run_size	/* size of kernel with .bss and .brk */
-	pushl	$z_output_len	/* decompressed length, end of relocs */
+	pushl	$z_output_len	/* decompressed length */
 	leal	z_extract_offset_negative(%ebx), %ebp
 	pushl	%ebp		/* output address */
 	pushl	$z_input_len	/* input_len */
@@ -218,7 +217,7 @@ relocated:
 	pushl	%eax		/* heap area */
 	pushl	%esi		/* real mode pointer */
 	call	decompress_kernel /* returns kernel location in %eax */
-	addl	$28, %esp
+	addl	$24, %esp
 
 /*
  * Jump to the decompressed kernel.
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 6b1766c6c082..2884e0c3e8a5 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -402,16 +402,13 @@ relocated:
  * Do the decompression, and jump to the new kernel..
  */
 	pushq	%rsi			/* Save the real mode argument */
-	movq	$z_run_size, %r9	/* size of kernel with .bss and .brk */
-	pushq	%r9
 	movq	%rsi, %rdi		/* real mode address */
 	leaq	boot_heap(%rip), %rsi	/* malloc area for uncompression */
 	leaq	input_data(%rip), %rdx  /* input_data */
 	movl	$z_input_len, %ecx	/* input_len */
 	movq	%rbp, %r8		/* output target address */
-	movq	$z_output_len, %r9	/* decompressed length, end of relocs */
+	movq	$z_output_len, %r9	/* decompressed length */
 	call	decompress_kernel	/* returns kernel location in %rax */
-	popq	%r9
 	popq	%rsi
 
 /*
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 5903089c818f..f12a97506951 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -370,10 +370,10 @@ asmlinkage __visible void *decompress_kernel(void *rmode, memptr heap,
 				  unsigned char *input_data,
 				  unsigned long input_len,
 				  unsigned char *output,
-				  unsigned long output_len,
-				  unsigned long run_size)
+				  unsigned long output_len)
 {
 	unsigned char *output_orig = output;
+	unsigned long init_size;
 
 	real_mode = rmode;
 
@@ -396,15 +396,14 @@ asmlinkage __visible void *decompress_kernel(void *rmode, memptr heap,
 	free_mem_ptr     = heap;	/* Heap */
 	free_mem_end_ptr = heap + BOOT_HEAP_SIZE;
 
+	init_size = real_mode->hdr.init_size;
+
 	/*
-	 * The memory hole needed for the kernel is the larger of either
-	 * the entire decompressed kernel plus relocation table, or the
-	 * entire decompressed kernel plus .bss and .brk sections.
+	 * The memory hole needed for the kernel is init_size for running
+	 * and init_size is always bigger than output_len.
 	 */
 	output = choose_kernel_location(real_mode, input_data, input_len,
-					output,
-					output_len > run_size ? output_len
-							      : run_size);
+					output, init_size);
 
 	/* Validate memory location choices. */
 	if ((unsigned long)output & (MIN_KERNEL_ALIGN - 1))
diff --git a/arch/x86/boot/compressed/mkpiggy.c b/arch/x86/boot/compressed/mkpiggy.c
index d8222f213182..b669ab65bf6c 100644
--- a/arch/x86/boot/compressed/mkpiggy.c
+++ b/arch/x86/boot/compressed/mkpiggy.c
@@ -36,13 +36,11 @@ int main(int argc, char *argv[])
 	uint32_t olen;
 	long ilen;
 	unsigned long offs;
-	unsigned long run_size;
 	FILE *f = NULL;
 	int retval = 1;
 
-	if (argc < 3) {
-		fprintf(stderr, "Usage: %s compressed_file run_size\n",
-				argv[0]);
+	if (argc < 2) {
+		fprintf(stderr, "Usage: %s compressed_file\n", argv[0]);
 		goto bail;
 	}
 
@@ -76,7 +74,6 @@ int main(int argc, char *argv[])
 	offs += olen >> 12;	/* Add 8 bytes for each 32K block */
 	offs += 64*1024 + 128;	/* Add 64K + 128 bytes slack */
 	offs = (offs+4095) & ~4095; /* Round to a 4K boundary */
-	run_size = atoi(argv[2]);
 
 	printf(".section \".rodata..compressed\",\"a\",@progbits\n");
 	printf(".globl z_input_len\n");
@@ -88,8 +85,6 @@ int main(int argc, char *argv[])
 	/* z_extract_offset_negative allows simplification of head_32.S */
 	printf(".globl z_extract_offset_negative\n");
 	printf("z_extract_offset_negative = -0x%lx\n", offs);
-	printf(".globl z_run_size\n");
-	printf("z_run_size = %lu\n", run_size);
 
 	printf(".globl input_data, input_data_end\n");
 	printf("input_data:\n");
diff --git a/arch/x86/tools/calc_run_size.sh b/arch/x86/tools/calc_run_size.sh
deleted file mode 100644
index 1a4c17bb3910..000000000000
--- a/arch/x86/tools/calc_run_size.sh
+++ /dev/null
@@ -1,42 +0,0 @@
-#!/bin/sh
-#
-# Calculate the amount of space needed to run the kernel, including room for
-# the .bss and .brk sections.
-#
-# Usage:
-# objdump -h a.out | sh calc_run_size.sh
-
-NUM='\([0-9a-fA-F]*[ \t]*\)'
-OUT=$(sed -n 's/^[ \t0-9]*.b[sr][sk][ \t]*'"$NUM$NUM$NUM$NUM"'.*/\1\4/p')
-if [ -z "$OUT" ] ; then
-	echo "Never found .bss or .brk file offset" >&2
-	exit 1
-fi
-
-OUT=$(echo ${OUT# })
-sizeA=$(printf "%d" 0x${OUT%% *})
-OUT=${OUT#* }
-offsetA=$(printf "%d" 0x${OUT%% *})
-OUT=${OUT#* }
-sizeB=$(printf "%d" 0x${OUT%% *})
-OUT=${OUT#* }
-offsetB=$(printf "%d" 0x${OUT%% *})
-
-run_size=$(( $offsetA + $sizeA + $sizeB ))
-
-# BFD linker shows the same file offset in ELF.
-if [ "$offsetA" -ne "$offsetB" ] ; then
-	# Gold linker shows them as consecutive.
-	endB=$(( $offsetB + $sizeB ))
-	if [ "$endB" != "$run_size" ] ; then
-		printf "sizeA: 0x%x\n" $sizeA >&2
-		printf "offsetA: 0x%x\n" $offsetA >&2
-		printf "sizeB: 0x%x\n" $sizeB >&2
-		printf "offsetB: 0x%x\n" $offsetB >&2
-		echo ".bss and .brk are non-contiguous" >&2
-		exit 1
-	fi
-fi
-
-printf "%d\n" $run_size
-exit 0
-- 
2.2.0.33.gc18b867

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 1/7] x86, kaslr: Use init_size instead of run_size
  2015-03-09 12:49   ` Borislav Petkov
@ 2015-03-09 15:58     ` Ingo Molnar
  2015-03-09 15:58       ` Borislav Petkov
  2015-03-09 19:35     ` Yinghai Lu
  1 sibling, 1 reply; 54+ messages in thread
From: Ingo Molnar @ 2015-03-09 15:58 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Yinghai Lu, Matt Fleming, H. Peter Anvin, Ingo Molnar, Kees Cook,
	Baoquan He, Thomas Gleixner, Jiri Kosina, linux-kernel,
	linux-efi, Josh Triplett, Andrew Morton, Ard Biesheuvel,
	Junjie Mao


* Borislav Petkov <bp@suse.de> wrote:

> On Sat, Mar 07, 2015 at 02:07:15PM -0800, Yinghai Lu wrote:
> 
> ...
> 
> I ended up committing this. Anything I've missed?

So this would be fine as-is for v4.1, but I think we want to attempt 
to fix this in x86/urgent so that we don't have to revert the kaslr 
changes in v4.0, so it would be awesome if you could split it into two 
parts: the fix, plus the orphaned run_size removal?

That would make it easier to bisect to, should anything break ...

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 1/7] x86, kaslr: Use init_size instead of run_size
  2015-03-09 15:58     ` Ingo Molnar
@ 2015-03-09 15:58       ` Borislav Petkov
  0 siblings, 0 replies; 54+ messages in thread
From: Borislav Petkov @ 2015-03-09 15:58 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Yinghai Lu, Matt Fleming, H. Peter Anvin, Ingo Molnar, Kees Cook,
	Baoquan He, Thomas Gleixner, Jiri Kosina, linux-kernel,
	linux-efi, Josh Triplett, Andrew Morton, Ard Biesheuvel,
	Junjie Mao

On Mon, Mar 09, 2015 at 04:58:13PM +0100, Ingo Molnar wrote:
> So this would be fine as-is for v4.1, but I think we want to attempt 
> to fix this in x86/urgent so that we don't have to revert the kaslr 
> changes in v4.0, so it would be awesome if you could split it into two 
> parts: the fix, plus the orphaned run_size removal?

Ok, lemme do that.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 1/7] x86, kaslr: Use init_size instead of run_size
  2015-03-09 12:49   ` Borislav Petkov
  2015-03-09 15:58     ` Ingo Molnar
@ 2015-03-09 19:35     ` Yinghai Lu
  2015-03-09 20:00       ` Borislav Petkov
  1 sibling, 1 reply; 54+ messages in thread
From: Yinghai Lu @ 2015-03-09 19:35 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Matt Fleming, H. Peter Anvin, Ingo Molnar, Kees Cook, Baoquan He,
	Thomas Gleixner, Jiri Kosina, Linux Kernel Mailing List,
	linux-efi, Josh Triplett, Andrew Morton, Ard Biesheuvel,
	Junjie Mao

On Mon, Mar 9, 2015 at 5:49 AM, Borislav Petkov <bp@suse.de> wrote:
> I ended up committing this. Anything I've missed?
>
> ---
> From: Yinghai Lu <yinghai@kernel.org>
> Date: Sat, 7 Mar 2015 14:07:15 -0800
> Subject: [PATCH] x86/setup: Use init_size instead of run_size
>
> Commit
>
>   e6023367d779 ("x86, kaslr: Prevent .bss from overlaping initrd")
>
> introduced run_size for KASLR to represent the size of kernel proper
> (vmlinux).
>
> However, we should use the actual runtime size (which provides for
> copy/decompress), i.e. init_size, as it includes .bss and .brk.
>
> Why, you ask?
>
> Because init_size is the size needed for safe kernel decompression and
> thus can be higher than run_size in case the decompressor needs a larger
> buffer.
>
> From arch/x86/boot/header.S:
>   #define ZO_INIT_SIZE    (ZO__end - ZO_startup_32 + ZO_z_extract_offset)
>   #define VO_INIT_SIZE    (VO__end - VO__text)
>   #if ZO_INIT_SIZE > VO_INIT_SIZE
>   #define INIT_SIZE ZO_INIT_SIZE
>   #else
>   #define INIT_SIZE VO_INIT_SIZE
>   #endif
>   init_size:              .long INIT_SIZE         # kernel initialization size
>
> The boot loader allocates a buffer of size init_size which it
> reads from the setup header and loads the compressed kernel
> (arch/x86/boot/compressed/vmlinux) in it.
>
> init_size initially comes from the kernel proper's (vmlinux) init size.
> It includes the .bss and .brk area.
>
> When the boot loader hands off to the compressed kernel, the last
> moves itself to z_extract_offset within the buffer to make sure that
> the decompressor output does not overwrite input data before it gets
> consumed.
>
> However, z_extract_offset is the size difference
> between the uncompressed and compressed kernel (see
> arch/x86/boot/compressed/mkpiggy.c) and thus represents the additional
> space needed for decompression but it doesn't factor in a bigger
> ZO_INIT_SIZE.

Can you put back:
"
So need to make sure [z_extra_offset, init_size) will fit ZO, that means
init_size need to be adjusted according to ZO size.
That make init_size is always >= run_size.
"

>
> During ASLR buffer searching, we need to make sure the new buffer is big
> enough for decompression. So use init_size instead, and kill run_size
> related code.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 1/7] x86, kaslr: Use init_size instead of run_size
  2015-03-09 19:35     ` Yinghai Lu
@ 2015-03-09 20:00       ` Borislav Petkov
  2015-03-09 20:06         ` Yinghai Lu
  0 siblings, 1 reply; 54+ messages in thread
From: Borislav Petkov @ 2015-03-09 20:00 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Matt Fleming, H. Peter Anvin, Ingo Molnar, Kees Cook, Baoquan He,
	Thomas Gleixner, Jiri Kosina, Linux Kernel Mailing List,
	linux-efi, Josh Triplett, Andrew Morton, Ard Biesheuvel,
	Junjie Mao

On Mon, Mar 09, 2015 at 12:35:25PM -0700, Yinghai Lu wrote:
> Can you put back:
> "
> So need to make sure [z_extra_offset, init_size) will fit ZO, that means
> init_size need to be adjusted according to ZO size.
> That make init_size is always >= run_size.

Why?

We don't adjust init_size. We assign INIT_SIZE to it. Or do you want to
say that we indirectly adjust init_size through INIT_SIZE?

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 1/7] x86, kaslr: Use init_size instead of run_size
  2015-03-09 20:00       ` Borislav Petkov
@ 2015-03-09 20:06         ` Yinghai Lu
  2015-03-09 20:18           ` Borislav Petkov
  0 siblings, 1 reply; 54+ messages in thread
From: Yinghai Lu @ 2015-03-09 20:06 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Matt Fleming, H. Peter Anvin, Ingo Molnar, Kees Cook, Baoquan He,
	Thomas Gleixner, Jiri Kosina, Linux Kernel Mailing List,
	linux-efi, Josh Triplett, Andrew Morton, Ard Biesheuvel,
	Junjie Mao

On Mon, Mar 9, 2015 at 1:00 PM, Borislav Petkov <bp@suse.de> wrote:
> On Mon, Mar 09, 2015 at 12:35:25PM -0700, Yinghai Lu wrote:
>> Can you put back:
>> "
>> So need to make sure [z_extra_offset, init_size) will fit ZO, that means
>> init_size need to be adjusted according to ZO size.
>> That make init_size is always >= run_size.
>
> Why?
>
> We don't adjust init_size. We assign INIT_SIZE to it. Or do you want to
> say that we indirectly adjust init_size through INIT_SIZE?

Yes. Just to emphasize that  " We need to make sure [z_extra_offset,
init_size) will fit ZO"

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 1/7] x86, kaslr: Use init_size instead of run_size
  2015-03-09 20:06         ` Yinghai Lu
@ 2015-03-09 20:18           ` Borislav Petkov
  2015-03-09 21:28             ` Yinghai Lu
  0 siblings, 1 reply; 54+ messages in thread
From: Borislav Petkov @ 2015-03-09 20:18 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Matt Fleming, H. Peter Anvin, Ingo Molnar, Kees Cook, Baoquan He,
	Thomas Gleixner, Jiri Kosina, Linux Kernel Mailing List,
	linux-efi, Josh Triplett, Andrew Morton, Ard Biesheuvel,
	Junjie Mao

On Mon, Mar 09, 2015 at 01:06:00PM -0700, Yinghai Lu wrote:
> Yes. Just to emphasize that  " We need to make sure [z_extra_offset,
> init_size) will fit ZO"

So you want to say:

"We need to make sure the compressed kernel fits in the interval
[z_extra_offset, z_extra_offset + init_size)"

?

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 1/7] x86, kaslr: Use init_size instead of run_size
  2015-03-09 20:18           ` Borislav Petkov
@ 2015-03-09 21:28             ` Yinghai Lu
  0 siblings, 0 replies; 54+ messages in thread
From: Yinghai Lu @ 2015-03-09 21:28 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Matt Fleming, H. Peter Anvin, Ingo Molnar, Kees Cook, Baoquan He,
	Thomas Gleixner, Jiri Kosina, Linux Kernel Mailing List,
	linux-efi, Josh Triplett, Andrew Morton, Ard Biesheuvel,
	Junjie Mao

On Mon, Mar 9, 2015 at 1:18 PM, Borislav Petkov <bp@suse.de> wrote:
> On Mon, Mar 09, 2015 at 01:06:00PM -0700, Yinghai Lu wrote:
>> Yes. Just to emphasize that  " We need to make sure [z_extra_offset,
>> init_size) will fit ZO"
>
> So you want to say:
>
> "We need to make sure the compressed kernel fits in the interval
> [z_extra_offset, z_extra_offset + init_size)"
>

"We need to make sure the compressed kernel fits in the interval
[z_extra_offset, init_size)"

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 0/7] x86, boot: clean up kasl
  2015-03-07 22:07 [PATCH v3 0/7] x86, boot: clean up kasl Yinghai Lu
                   ` (6 preceding siblings ...)
  2015-03-07 22:07 ` [PATCH v3 7/7] x86, kaslr, 64bit: Set new or extra ident_mapping Yinghai Lu
@ 2015-03-10  0:39 ` Kees Cook
  2015-03-10  0:54   ` Yinghai Lu
  7 siblings, 1 reply; 54+ messages in thread
From: Kees Cook @ 2015-03-10  0:39 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Matt Fleming, H. Peter Anvin, Ingo Molnar, Borislav Petkov,
	Baoquan He, Thomas Gleixner, Jiri Kosina, LKML, linux-efi

On Sat, Mar 7, 2015 at 2:07 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> First 3 patches make ZO (arch/x86/boot/compressed/vmlinux) data region is not
> overwritten by VO (vmlinux) after decompress.  So could pass data from ZO to VO.

Tiny nit-pick, in case there is a v5: the Subject on this (0/7) says
"kasl" instead of "kaslr".

-Kees

-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 1/7] x86, kaslr: Use init_size instead of run_size
  2015-03-07 22:07 ` [PATCH v3 1/7] x86, kaslr: Use init_size instead of run_size Yinghai Lu
  2015-03-09 12:49   ` Borislav Petkov
@ 2015-03-10  0:42   ` Kees Cook
  2015-03-13 12:27   ` Ingo Molnar
  2 siblings, 0 replies; 54+ messages in thread
From: Kees Cook @ 2015-03-10  0:42 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Matt Fleming, H. Peter Anvin, Ingo Molnar, Borislav Petkov,
	Baoquan He, Thomas Gleixner, Jiri Kosina, LKML, linux-efi,
	Josh Triplett, Andrew Morton, Ard Biesheuvel, Junjie Mao

On Sat, Mar 7, 2015 at 2:07 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> commit e6023367d779 ("x86, kaslr: Prevent .bss from overlaping initrd")
> introduced one run_size for kaslr.
> We should use real runtime size (include copy/decompress) aka init_size.
>
> run_size is VO (vmlinux) init size include bss and brk.
> init_size is the size needed for decompress and it is bigger than run_size
> when decompress need more buff.
>
> According to arch/x86/boot/header.S:
> | #define ZO_INIT_SIZE    (ZO__end - ZO_startup_32 + ZO_z_extract_offset)
> | #define VO_INIT_SIZE    (VO__end - VO__text)
> | #if ZO_INIT_SIZE > VO_INIT_SIZE
> | #define INIT_SIZE ZO_INIT_SIZE
> | #else
> | #define INIT_SIZE VO_INIT_SIZE
> | #endif
> | init_size:              .long INIT_SIZE         # kernel initialization size
>
> Bootloader allocate buffer according to init_size in hdr, and load the
> ZO (arch/x86/boot/compressed/vmlinux) from start of that buffer.
> init_size first come from VO (vmlinux) init size. That VO init size
> is from VO _end to VO _end and include VO bss and brk area.
>
> During running of ZO, ZO move itself to the middle of buffer at
> z_extract_offset to make sure that decompressor would not have output
> overwrite input data before input data get consumed.
> But z_extract_offset calculating is based on size of VO (vmlinux) and size
> of compressed VO only at first.
> So need to make sure [z_extra_offset, init_size) will fit ZO, that means
> init_size need to be adjusted according to ZO size.
> That make init_size is always >= run_size.
>
> During aslr buffer searching, we need to make sure the new buffer is big
> enough for decompress at first. So use init_size instead, and kill not
> needed run_size related code.
>
> Fixes: e6023367d779 ("x86, kaslr: Prevent .bss from overlaping initrd")
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Josh Triplett <josh@joshtriplett.org>
> Cc: Matt Fleming <matt.fleming@intel.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> Cc: Junjie Mao <eternal.n08@gmail.com>
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>

Acked-by: Kees Cook <keescook@chromium.org>

> ---
>  arch/x86/boot/compressed/Makefile  |  4 +---
>  arch/x86/boot/compressed/head_32.S |  5 ++---
>  arch/x86/boot/compressed/head_64.S |  5 +----
>  arch/x86/boot/compressed/misc.c    | 15 +++++++-------
>  arch/x86/boot/compressed/mkpiggy.c |  9 ++------
>  arch/x86/tools/calc_run_size.sh    | 42 --------------------------------------
>  6 files changed, 13 insertions(+), 67 deletions(-)
>  delete mode 100644 arch/x86/tools/calc_run_size.sh
>
> diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
> index 0a291cd..70cc92c 100644
> --- a/arch/x86/boot/compressed/Makefile
> +++ b/arch/x86/boot/compressed/Makefile
> @@ -92,10 +92,8 @@ suffix-$(CONFIG_KERNEL_XZ)   := xz
>  suffix-$(CONFIG_KERNEL_LZO)    := lzo
>  suffix-$(CONFIG_KERNEL_LZ4)    := lz4
>
> -RUN_SIZE = $(shell $(OBJDUMP) -h vmlinux | \
> -            $(CONFIG_SHELL) $(srctree)/arch/x86/tools/calc_run_size.sh)
>  quiet_cmd_mkpiggy = MKPIGGY $@
> -      cmd_mkpiggy = $(obj)/mkpiggy $< $(RUN_SIZE) > $@ || ( rm -f $@ ; false )
> +      cmd_mkpiggy = $(obj)/mkpiggy $< > $@ || ( rm -f $@ ; false )
>
>  targets += piggy.S
>  $(obj)/piggy.S: $(obj)/vmlinux.bin.$(suffix-y) $(obj)/mkpiggy FORCE
> diff --git a/arch/x86/boot/compressed/head_32.S b/arch/x86/boot/compressed/head_32.S
> index 1d7fbbc..cbed140 100644
> --- a/arch/x86/boot/compressed/head_32.S
> +++ b/arch/x86/boot/compressed/head_32.S
> @@ -207,8 +207,7 @@ relocated:
>   * Do the decompression, and jump to the new kernel..
>   */
>                                 /* push arguments for decompress_kernel: */
> -       pushl   $z_run_size     /* size of kernel with .bss and .brk */
> -       pushl   $z_output_len   /* decompressed length, end of relocs */
> +       pushl   $z_output_len   /* decompressed length */
>         leal    z_extract_offset_negative(%ebx), %ebp
>         pushl   %ebp            /* output address */
>         pushl   $z_input_len    /* input_len */
> @@ -218,7 +217,7 @@ relocated:
>         pushl   %eax            /* heap area */
>         pushl   %esi            /* real mode pointer */
>         call    decompress_kernel /* returns kernel location in %eax */
> -       addl    $28, %esp
> +       addl    $24, %esp
>
>  /*
>   * Jump to the decompressed kernel.
> diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
> index 6b1766c..2884e0c 100644
> --- a/arch/x86/boot/compressed/head_64.S
> +++ b/arch/x86/boot/compressed/head_64.S
> @@ -402,16 +402,13 @@ relocated:
>   * Do the decompression, and jump to the new kernel..
>   */
>         pushq   %rsi                    /* Save the real mode argument */
> -       movq    $z_run_size, %r9        /* size of kernel with .bss and .brk */
> -       pushq   %r9
>         movq    %rsi, %rdi              /* real mode address */
>         leaq    boot_heap(%rip), %rsi   /* malloc area for uncompression */
>         leaq    input_data(%rip), %rdx  /* input_data */
>         movl    $z_input_len, %ecx      /* input_len */
>         movq    %rbp, %r8               /* output target address */
> -       movq    $z_output_len, %r9      /* decompressed length, end of relocs */
> +       movq    $z_output_len, %r9      /* decompressed length */
>         call    decompress_kernel       /* returns kernel location in %rax */
> -       popq    %r9
>         popq    %rsi
>
>  /*
> diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
> index 5903089..51e9e54 100644
> --- a/arch/x86/boot/compressed/misc.c
> +++ b/arch/x86/boot/compressed/misc.c
> @@ -370,10 +370,10 @@ asmlinkage __visible void *decompress_kernel(void *rmode, memptr heap,
>                                   unsigned char *input_data,
>                                   unsigned long input_len,
>                                   unsigned char *output,
> -                                 unsigned long output_len,
> -                                 unsigned long run_size)
> +                                 unsigned long output_len)
>  {
>         unsigned char *output_orig = output;
> +       unsigned long init_size;
>
>         real_mode = rmode;
>
> @@ -396,15 +396,14 @@ asmlinkage __visible void *decompress_kernel(void *rmode, memptr heap,
>         free_mem_ptr     = heap;        /* Heap */
>         free_mem_end_ptr = heap + BOOT_HEAP_SIZE;
>
> +       init_size = real_mode->hdr.init_size;
> +
>         /*
> -        * The memory hole needed for the kernel is the larger of either
> -        * the entire decompressed kernel plus relocation table, or the
> -        * entire decompressed kernel plus .bss and .brk sections.
> +        * The memory hole needed for the kernel is init_size for running
> +        * and init_size is bigger than output_len always.
>          */
>         output = choose_kernel_location(real_mode, input_data, input_len,
> -                                       output,
> -                                       output_len > run_size ? output_len
> -                                                             : run_size);
> +                                       output, init_size);
>
>         /* Validate memory location choices. */
>         if ((unsigned long)output & (MIN_KERNEL_ALIGN - 1))
> diff --git a/arch/x86/boot/compressed/mkpiggy.c b/arch/x86/boot/compressed/mkpiggy.c
> index d8222f2..b669ab6 100644
> --- a/arch/x86/boot/compressed/mkpiggy.c
> +++ b/arch/x86/boot/compressed/mkpiggy.c
> @@ -36,13 +36,11 @@ int main(int argc, char *argv[])
>         uint32_t olen;
>         long ilen;
>         unsigned long offs;
> -       unsigned long run_size;
>         FILE *f = NULL;
>         int retval = 1;
>
> -       if (argc < 3) {
> -               fprintf(stderr, "Usage: %s compressed_file run_size\n",
> -                               argv[0]);
> +       if (argc < 2) {
> +               fprintf(stderr, "Usage: %s compressed_file\n", argv[0]);
>                 goto bail;
>         }
>
> @@ -76,7 +74,6 @@ int main(int argc, char *argv[])
>         offs += olen >> 12;     /* Add 8 bytes for each 32K block */
>         offs += 64*1024 + 128;  /* Add 64K + 128 bytes slack */
>         offs = (offs+4095) & ~4095; /* Round to a 4K boundary */
> -       run_size = atoi(argv[2]);
>
>         printf(".section \".rodata..compressed\",\"a\",@progbits\n");
>         printf(".globl z_input_len\n");
> @@ -88,8 +85,6 @@ int main(int argc, char *argv[])
>         /* z_extract_offset_negative allows simplification of head_32.S */
>         printf(".globl z_extract_offset_negative\n");
>         printf("z_extract_offset_negative = -0x%lx\n", offs);
> -       printf(".globl z_run_size\n");
> -       printf("z_run_size = %lu\n", run_size);
>
>         printf(".globl input_data, input_data_end\n");
>         printf("input_data:\n");
> diff --git a/arch/x86/tools/calc_run_size.sh b/arch/x86/tools/calc_run_size.sh
> deleted file mode 100644
> index 1a4c17b..0000000
> --- a/arch/x86/tools/calc_run_size.sh
> +++ /dev/null
> @@ -1,42 +0,0 @@
> -#!/bin/sh
> -#
> -# Calculate the amount of space needed to run the kernel, including room for
> -# the .bss and .brk sections.
> -#
> -# Usage:
> -# objdump -h a.out | sh calc_run_size.sh
> -
> -NUM='\([0-9a-fA-F]*[ \t]*\)'
> -OUT=$(sed -n 's/^[ \t0-9]*.b[sr][sk][ \t]*'"$NUM$NUM$NUM$NUM"'.*/\1\4/p')
> -if [ -z "$OUT" ] ; then
> -       echo "Never found .bss or .brk file offset" >&2
> -       exit 1
> -fi
> -
> -OUT=$(echo ${OUT# })
> -sizeA=$(printf "%d" 0x${OUT%% *})
> -OUT=${OUT#* }
> -offsetA=$(printf "%d" 0x${OUT%% *})
> -OUT=${OUT#* }
> -sizeB=$(printf "%d" 0x${OUT%% *})
> -OUT=${OUT#* }
> -offsetB=$(printf "%d" 0x${OUT%% *})
> -
> -run_size=$(( $offsetA + $sizeA + $sizeB ))
> -
> -# BFD linker shows the same file offset in ELF.
> -if [ "$offsetA" -ne "$offsetB" ] ; then
> -       # Gold linker shows them as consecutive.
> -       endB=$(( $offsetB + $sizeB ))
> -       if [ "$endB" != "$run_size" ] ; then
> -               printf "sizeA: 0x%x\n" $sizeA >&2
> -               printf "offsetA: 0x%x\n" $offsetA >&2
> -               printf "sizeB: 0x%x\n" $sizeB >&2
> -               printf "offsetB: 0x%x\n" $offsetB >&2
> -               echo ".bss and .brk are non-contiguous" >&2
> -               exit 1
> -       fi
> -fi
> -
> -printf "%d\n" $run_size
> -exit 0
> --
> 1.8.4.5
>



-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 2/7] x86, boot: Move ZO to end of buffer
  2015-03-07 22:07 ` [PATCH v3 2/7] x86, boot: Move ZO to end of buffer Yinghai Lu
@ 2015-03-10  0:54   ` Kees Cook
  2015-03-10  1:04     ` Yinghai Lu
  2015-03-10  5:59     ` Borislav Petkov
  2015-03-10  8:00   ` Borislav Petkov
  1 sibling, 2 replies; 54+ messages in thread
From: Kees Cook @ 2015-03-10  0:54 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Matt Fleming, H. Peter Anvin, Ingo Molnar, Borislav Petkov,
	Baoquan He, Thomas Gleixner, Jiri Kosina, LKML, linux-efi

On Sat, Mar 7, 2015 at 2:07 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> Boris found data from boot stage can not be used kernel stage.

"... be used during kernel stage."

Also, can you give a specific example of this problem? (Which data, used how?)

> Bootloader allocate buffer according to init_size in hdr, and load the
> ZO (arch/x86/boot/compressed/vmlinux) from start of that buffer.
> During running of ZO, ZO move itself to the middle of buffer at
> z_extract_offset to make sure that decompressor would not have output
> overwrite input data before input data get consumed.
> After decompressor is called, VO (vmlinux) use whole buffer from start,
> and ZO code and data section is overlapped with VO bss section.
> And later VO/clear_bss() clear them before code in arch/x86/kernel/setup.c
> access them.
>
> To make the data survive that later, we should avoid the overlapping.
> At first move ZO close the end of buffer instead of middle of the buffer,
> that will move out ZO data out of VO bss area.
>
> Also after that we can find out where is data section of copied ZO
> instead of guessing. That will aslr mem_avoid array filling for

"That will make aslr mem_avoid array ..."

> new buffer seaching much simple.
>
> And rename z_extract_offset to z_min_extract_offset, as it is
> actually the minimum offset for extracting now.
>
> To keep the final real extract_offset to be page aligned like
> min_extract_offset, we need make VO _end and ZO _end both
> page aligned to make sure init_size always page aligned.
>
> Next patch will add ZO data size to init_size, so it will make sure
> ZO data is even out of VO brk area.

This seems like a reasonable idea, but I think the changes should be
noted/updated in misc.c since a lot of effort was made to make the
in-memory foot print as small as possible. These changes do expand the
size of the loaded kernel, IIUC. If not in this patch, maybe in 5/7?

-Kees

>
> Fixes: f47233c2d34f ("x86/mm/ASLR: Propagate base load address calculation")
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Matt Fleming <matt.fleming@intel.com>
> Cc: Kees Cook <keescook@chromium.org>
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> ---
>  arch/x86/boot/compressed/head_32.S     | 11 +++++++++--
>  arch/x86/boot/compressed/head_64.S     |  8 ++++++--
>  arch/x86/boot/compressed/mkpiggy.c     |  7 ++-----
>  arch/x86/boot/compressed/vmlinux.lds.S |  1 +
>  arch/x86/boot/header.S                 |  2 +-
>  arch/x86/kernel/asm-offsets.c          |  1 +
>  arch/x86/kernel/vmlinux.lds.S          |  1 +
>  7 files changed, 21 insertions(+), 10 deletions(-)
>
> diff --git a/arch/x86/boot/compressed/head_32.S b/arch/x86/boot/compressed/head_32.S
> index cbed140..a9b56f1 100644
> --- a/arch/x86/boot/compressed/head_32.S
> +++ b/arch/x86/boot/compressed/head_32.S
> @@ -147,7 +147,9 @@ preferred_addr:
>  1:
>
>         /* Target address to relocate to for decompression */
> -       addl    $z_extract_offset, %ebx
> +       movl    BP_init_size(%esi), %eax
> +       subl    $_end, %eax
> +       addl    %eax, %ebx
>
>         /* Set up the stack */
>         leal    boot_stack_end(%ebx), %esp
> @@ -208,8 +210,13 @@ relocated:
>   */
>                                 /* push arguments for decompress_kernel: */
>         pushl   $z_output_len   /* decompressed length */
> -       leal    z_extract_offset_negative(%ebx), %ebp
> +
> +       movl    BP_init_size(%esi), %eax
> +       subl    $_end, %eax
> +       movl    %ebx, %ebp
> +       subl    %eax, %ebp
>         pushl   %ebp            /* output address */
> +
>         pushl   $z_input_len    /* input_len */
>         leal    input_data(%ebx), %eax
>         pushl   %eax            /* input_data */
> diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
> index 2884e0c..69015b5 100644
> --- a/arch/x86/boot/compressed/head_64.S
> +++ b/arch/x86/boot/compressed/head_64.S
> @@ -101,7 +101,9 @@ ENTRY(startup_32)
>  1:
>
>         /* Target address to relocate to for decompression */
> -       addl    $z_extract_offset, %ebx
> +       movl    BP_init_size(%esi), %eax
> +       subl    $_end, %eax
> +       addl    %eax, %ebx
>
>  /*
>   * Prepare for entering 64 bit mode
> @@ -329,7 +331,9 @@ preferred_addr:
>  1:
>
>         /* Target address to relocate to for decompression */
> -       leaq    z_extract_offset(%rbp), %rbx
> +       movl    BP_init_size(%rsi), %ebx
> +       subl    $_end, %ebx
> +       addq    %rbp, %rbx
>
>         /* Set up the stack */
>         leaq    boot_stack_end(%rbx), %rsp
> diff --git a/arch/x86/boot/compressed/mkpiggy.c b/arch/x86/boot/compressed/mkpiggy.c
> index b669ab6..c03b009 100644
> --- a/arch/x86/boot/compressed/mkpiggy.c
> +++ b/arch/x86/boot/compressed/mkpiggy.c
> @@ -80,11 +80,8 @@ int main(int argc, char *argv[])
>         printf("z_input_len = %lu\n", ilen);
>         printf(".globl z_output_len\n");
>         printf("z_output_len = %lu\n", (unsigned long)olen);
> -       printf(".globl z_extract_offset\n");
> -       printf("z_extract_offset = 0x%lx\n", offs);
> -       /* z_extract_offset_negative allows simplification of head_32.S */
> -       printf(".globl z_extract_offset_negative\n");
> -       printf("z_extract_offset_negative = -0x%lx\n", offs);
> +       printf(".globl z_min_extract_offset\n");
> +       printf("z_min_extract_offset = 0x%lx\n", offs);
>
>         printf(".globl input_data, input_data_end\n");
>         printf("input_data:\n");
> diff --git a/arch/x86/boot/compressed/vmlinux.lds.S b/arch/x86/boot/compressed/vmlinux.lds.S
> index 34d047c..e24e0a0 100644
> --- a/arch/x86/boot/compressed/vmlinux.lds.S
> +++ b/arch/x86/boot/compressed/vmlinux.lds.S
> @@ -70,5 +70,6 @@ SECTIONS
>                 _epgtable = . ;
>         }
>  #endif
> +       . = ALIGN(PAGE_SIZE);   /* keep ZO size page aligned */
>         _end = .;
>  }
> diff --git a/arch/x86/boot/header.S b/arch/x86/boot/header.S
> index 16ef025..9bfab22 100644
> --- a/arch/x86/boot/header.S
> +++ b/arch/x86/boot/header.S
> @@ -440,7 +440,7 @@ setup_data:         .quad 0                 # 64-bit physical pointer to
>
>  pref_address:          .quad LOAD_PHYSICAL_ADDR        # preferred load addr
>
> -#define ZO_INIT_SIZE   (ZO__end - ZO_startup_32 + ZO_z_extract_offset)
> +#define ZO_INIT_SIZE   (ZO__end - ZO_startup_32 + ZO_z_min_extract_offset)
>  #define VO_INIT_SIZE   (VO__end - VO__text)
>  #if ZO_INIT_SIZE > VO_INIT_SIZE
>  #define INIT_SIZE ZO_INIT_SIZE
> diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
> index 9f6b934..0e8e4f7 100644
> --- a/arch/x86/kernel/asm-offsets.c
> +++ b/arch/x86/kernel/asm-offsets.c
> @@ -66,6 +66,7 @@ void common(void) {
>         OFFSET(BP_hardware_subarch, boot_params, hdr.hardware_subarch);
>         OFFSET(BP_version, boot_params, hdr.version);
>         OFFSET(BP_kernel_alignment, boot_params, hdr.kernel_alignment);
> +       OFFSET(BP_init_size, boot_params, hdr.init_size);
>         OFFSET(BP_pref_address, boot_params, hdr.pref_address);
>         OFFSET(BP_code32_start, boot_params, hdr.code32_start);
>
> diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
> index 00bf300..ac25c7f 100644
> --- a/arch/x86/kernel/vmlinux.lds.S
> +++ b/arch/x86/kernel/vmlinux.lds.S
> @@ -325,6 +325,7 @@ SECTIONS
>                 __brk_limit = .;
>         }
>
> +       . = ALIGN(PAGE_SIZE);           /* keep VO init size page aligned */
>         _end = .;
>
>          STABS_DEBUG
> --
> 1.8.4.5
>



-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 0/7] x86, boot: clean up kasl
  2015-03-10  0:39 ` [PATCH v3 0/7] x86, boot: clean up kasl Kees Cook
@ 2015-03-10  0:54   ` Yinghai Lu
  0 siblings, 0 replies; 54+ messages in thread
From: Yinghai Lu @ 2015-03-10  0:54 UTC (permalink / raw)
  To: Kees Cook
  Cc: Matt Fleming, H. Peter Anvin, Ingo Molnar, Borislav Petkov,
	Baoquan He, Thomas Gleixner, Jiri Kosina, LKML, linux-efi

On Mon, Mar 9, 2015 at 5:39 PM, Kees Cook <keescook@chromium.org> wrote:
> On Sat, Mar 7, 2015 at 2:07 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>> First 3 patches make ZO (arch/x86/boot/compressed/vmlinux) data region is not
>> overwritten by VO (vmlinux) after decompress.  So could pass data from ZO to VO.
>
> Tiny nit-pick, in case there is a v5: the Subject on this (0/7) says
> "kasl" instead of "kaslr".

Sure. Sorry about that.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 4/7] x86, kaslr: Access the correct kaslr_enabled variable
  2015-03-07 22:07 ` [PATCH v3 4/7] x86, kaslr: Access the correct kaslr_enabled variable Yinghai Lu
@ 2015-03-10  0:55   ` Kees Cook
  0 siblings, 0 replies; 54+ messages in thread
From: Kees Cook @ 2015-03-10  0:55 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Matt Fleming, H. Peter Anvin, Ingo Molnar, Borislav Petkov,
	Baoquan He, Thomas Gleixner, Jiri Kosina, LKML, linux-efi

On Sat, Mar 7, 2015 at 2:07 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> Commit
>
>   f47233c2d34f ("x86/mm/ASLR: Propagate base load address calculation")
>
> started passing KASLR status to kernel proper, but it uses a physical
> address as the vaule, leading to parsing bogus KASLR status in kernel
> proper.
>
> The setup_data linked list and thus the element which contains
> kaslr_enabled is chained together using physical addresses. At the time
> when we access it in the kernel proper, we're already running with
> paging enabled and therefore must access it through its virtual address.
>
> This patch changes the code to use early_memmap() and access the value.
>
> -v3: add checking return from early_memmap according to Boris.
> -v4: add description about setup_data accessing from Boris to change log.

Nit for next version: revision change notes usually go below the "---"
in a separate section, and end with "---", above the "files changed"
report, and below the main changelog.

-Kees

>
> Fixes: f47233c2d34f ("x86/mm/ASLR: Propagate base load address calculation")
> Cc: Matt Fleming <matt.fleming@intel.com>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Jiri Kosina <jkosina@suse.cz>
> Acked-by: Jiri Kosina <jkosina@suse.cz>
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> ---
>  arch/x86/kernel/setup.c | 13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 98dc931..912f124 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -429,7 +429,18 @@ static void __init reserve_initrd(void)
>
>  static void __init parse_kaslr_setup(u64 pa_data, u32 data_len)
>  {
> -       kaslr_enabled = (bool)(pa_data + sizeof(struct setup_data));
> +       /* kaslr_setup_data is defined in aslr.c */
> +       unsigned char *data;
> +       unsigned long offset = sizeof(struct setup_data);
> +
> +       data = early_memremap(pa_data, offset + 1);
> +       if (!data) {
> +               kaslr_enabled = true;
> +               return;
> +       }
> +
> +       kaslr_enabled = *(data + offset);
> +       early_memunmap(data, offset + 1);
>  }
>
>  static void __init parse_setup_data(void)
> --
> 1.8.4.5
>



-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 5/7] x86, kaslr: Consolidate mem_avoid array filling
  2015-03-07 22:07 ` [PATCH v3 5/7] x86, kaslr: Consolidate mem_avoid array filling Yinghai Lu
@ 2015-03-10  1:00   ` Kees Cook
  2015-03-10  1:10     ` Yinghai Lu
  0 siblings, 1 reply; 54+ messages in thread
From: Kees Cook @ 2015-03-10  1:00 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Matt Fleming, H. Peter Anvin, Ingo Molnar, Borislav Petkov,
	Baoquan He, Thomas Gleixner, Jiri Kosina, LKML, linux-efi

On Sat, Mar 7, 2015 at 2:07 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> Now ZO sit end of the buffer, we can find out where is ZO text
> and data/bss etc.
>
> [input, input+input_size) is copied compressed kernel, not the whole ZO.
> [output, output+init_size) is the buffer for VO.
>
> [input+input_size, output+init_size) is [_text, _end) for ZO.
> that could be first range in mem_avoid. We don't need to guess that anymore.
>
> That area already include heap and stack for ZO running. So we don't need
> to put them into mem_avoid array.
>
> We need to put boot_params into the mem_avoid too. As with 64bit bootloader
> could put it anywhere.

This may be a stupid question, but are boot_params being used outside
of the compressed loader? If so, it might make sense to split that
change into a separate patch to go to stable, if it's causing
problems. (And document what problem is getting solved.)

-Kees

>
> Also change output_size referring to init_size, as we pass init_size instead.
>
> Cc: Kees Cook <keescook@chromium.org>
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> ---
>  arch/x86/boot/compressed/aslr.c | 29 ++++++++++++++---------------
>  arch/x86/boot/compressed/misc.h |  4 ++--
>  2 files changed, 16 insertions(+), 17 deletions(-)
>
> diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
> index 7083c16..a279514 100644
> --- a/arch/x86/boot/compressed/aslr.c
> +++ b/arch/x86/boot/compressed/aslr.c
> @@ -116,7 +116,7 @@ struct mem_vector {
>         unsigned long size;
>  };
>
> -#define MEM_AVOID_MAX 5
> +#define MEM_AVOID_MAX 4
>  static struct mem_vector mem_avoid[MEM_AVOID_MAX];
>
>  static bool mem_contains(struct mem_vector *region, struct mem_vector *item)
> @@ -142,7 +142,7 @@ static bool mem_overlaps(struct mem_vector *one, struct mem_vector *two)
>  }
>
>  static void mem_avoid_init(unsigned long input, unsigned long input_size,
> -                          unsigned long output, unsigned long output_size)
> +                          unsigned long output, unsigned long init_size)
>  {
>         u64 initrd_start, initrd_size;
>         u64 cmd_line, cmd_line_size;
> @@ -151,10 +151,13 @@ static void mem_avoid_init(unsigned long input, unsigned long input_size,
>
>         /*
>          * Avoid the region that is unsafe to overlap during
> -        * decompression (see calculations at top of misc.c).
> +        * decompression.
> +        * As we already move ZO (arch/x86/boot/compressed/vmlinux)
> +        * to the end of buffer, [input+input_size, output+init_size)
> +        * has [_text, _end) for ZO.
>          */
> -       unsafe_len = (output_size >> 12) + 32768 + 18;
> -       unsafe = (unsigned long)input + input_size - unsafe_len;
> +       unsafe_len = output + init_size - (input + input_size);
> +       unsafe = (unsigned long)input + input_size;
>         mem_avoid[0].start = unsafe;
>         mem_avoid[0].size = unsafe_len;
>
> @@ -176,13 +179,9 @@ static void mem_avoid_init(unsigned long input, unsigned long input_size,
>         mem_avoid[2].start = cmd_line;
>         mem_avoid[2].size = cmd_line_size;
>
> -       /* Avoid heap memory. */
> -       mem_avoid[3].start = (unsigned long)free_mem_ptr;
> -       mem_avoid[3].size = BOOT_HEAP_SIZE;
> -
> -       /* Avoid stack memory. */
> -       mem_avoid[4].start = (unsigned long)free_mem_end_ptr;
> -       mem_avoid[4].size = BOOT_STACK_SIZE;
> +       /* Avoid params */
> +       mem_avoid[3].start = (unsigned long)real_mode;
> +       mem_avoid[3].size = sizeof(*real_mode);
>  }
>
>  /* Does this memory vector overlap a known avoided area? */
> @@ -327,7 +326,7 @@ unsigned char *choose_kernel_location(struct boot_params *params,
>                                       unsigned char *input,
>                                       unsigned long input_size,
>                                       unsigned char *output,
> -                                     unsigned long output_size)
> +                                     unsigned long init_size)
>  {
>         unsigned long choice = (unsigned long)output;
>         unsigned long random;
> @@ -349,10 +348,10 @@ unsigned char *choose_kernel_location(struct boot_params *params,
>
>         /* Record the various known unsafe memory ranges. */
>         mem_avoid_init((unsigned long)input, input_size,
> -                      (unsigned long)output, output_size);
> +                      (unsigned long)output, init_size);
>
>         /* Walk e820 and find a random address. */
> -       random = find_random_addr(choice, output_size);
> +       random = find_random_addr(choice, init_size);
>         if (!random) {
>                 debug_putstr("KASLR could not find suitable E820 region...\n");
>                 goto out;
> diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
> index ee3576b..23156e7 100644
> --- a/arch/x86/boot/compressed/misc.h
> +++ b/arch/x86/boot/compressed/misc.h
> @@ -61,7 +61,7 @@ unsigned char *choose_kernel_location(struct boot_params *params,
>                                       unsigned char *input,
>                                       unsigned long input_size,
>                                       unsigned char *output,
> -                                     unsigned long output_size);
> +                                     unsigned long init_size);
>  /* cpuflags.c */
>  bool has_cpuflag(int flag);
>  #else
> @@ -70,7 +70,7 @@ unsigned char *choose_kernel_location(struct boot_params *params,
>                                       unsigned char *input,
>                                       unsigned long input_size,
>                                       unsigned char *output,
> -                                     unsigned long output_size)
> +                                     unsigned long init_size)
>  {
>         return output;
>  }
> --
> 1.8.4.5
>



-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 6/7] x86, boot: Split kernel_ident_mapping_init to another file
  2015-03-07 22:07 ` [PATCH v3 6/7] x86, boot: Split kernel_ident_mapping_init to another file Yinghai Lu
@ 2015-03-10  1:03   ` Kees Cook
  0 siblings, 0 replies; 54+ messages in thread
From: Kees Cook @ 2015-03-10  1:03 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Matt Fleming, H. Peter Anvin, Ingo Molnar, Borislav Petkov,
	Baoquan He, Thomas Gleixner, Jiri Kosina, LKML, linux-efi

On Sat, Mar 7, 2015 at 2:07 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> We need to include that in boot::decompress_kernel stage to set new
> ident mapping.
>
> Also add checking for __pa/__va macro definition, as we need to override them
> in boot::decompress_kernel stage.
>
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>

This seems good. :)

Reviewed-by: Kees Cook <keescook@chromium.org>

-Kees

> ---
>  arch/x86/include/asm/page.h |  5 +++
>  arch/x86/mm/ident_map.c     | 74 +++++++++++++++++++++++++++++++++++++++++++++
>  arch/x86/mm/init_64.c       | 74 +--------------------------------------------
>  3 files changed, 80 insertions(+), 73 deletions(-)
>  create mode 100644 arch/x86/mm/ident_map.c
>
> diff --git a/arch/x86/include/asm/page.h b/arch/x86/include/asm/page.h
> index 802dde3..cf8f619 100644
> --- a/arch/x86/include/asm/page.h
> +++ b/arch/x86/include/asm/page.h
> @@ -37,7 +37,10 @@ static inline void copy_user_page(void *to, void *from, unsigned long vaddr,
>         alloc_page_vma(GFP_HIGHUSER | __GFP_ZERO | movableflags, vma, vaddr)
>  #define __HAVE_ARCH_ALLOC_ZEROED_USER_HIGHPAGE
>
> +#ifndef __pa
>  #define __pa(x)                __phys_addr((unsigned long)(x))
> +#endif
> +
>  #define __pa_nodebug(x)        __phys_addr_nodebug((unsigned long)(x))
>  /* __pa_symbol should be used for C visible symbols.
>     This seems to be the official gcc blessed way to do such arithmetic. */
> @@ -51,7 +54,9 @@ static inline void copy_user_page(void *to, void *from, unsigned long vaddr,
>  #define __pa_symbol(x) \
>         __phys_addr_symbol(__phys_reloc_hide((unsigned long)(x)))
>
> +#ifndef __va
>  #define __va(x)                        ((void *)((unsigned long)(x)+PAGE_OFFSET))
> +#endif
>
>  #define __boot_va(x)           __va(x)
>  #define __boot_pa(x)           __pa(x)
> diff --git a/arch/x86/mm/ident_map.c b/arch/x86/mm/ident_map.c
> new file mode 100644
> index 0000000..751ca92
> --- /dev/null
> +++ b/arch/x86/mm/ident_map.c
> @@ -0,0 +1,74 @@
> +
> +static void ident_pmd_init(unsigned long pmd_flag, pmd_t *pmd_page,
> +                          unsigned long addr, unsigned long end)
> +{
> +       addr &= PMD_MASK;
> +       for (; addr < end; addr += PMD_SIZE) {
> +               pmd_t *pmd = pmd_page + pmd_index(addr);
> +
> +               if (!pmd_present(*pmd))
> +                       set_pmd(pmd, __pmd(addr | pmd_flag));
> +       }
> +}
> +static int ident_pud_init(struct x86_mapping_info *info, pud_t *pud_page,
> +                         unsigned long addr, unsigned long end)
> +{
> +       unsigned long next;
> +
> +       for (; addr < end; addr = next) {
> +               pud_t *pud = pud_page + pud_index(addr);
> +               pmd_t *pmd;
> +
> +               next = (addr & PUD_MASK) + PUD_SIZE;
> +               if (next > end)
> +                       next = end;
> +
> +               if (pud_present(*pud)) {
> +                       pmd = pmd_offset(pud, 0);
> +                       ident_pmd_init(info->pmd_flag, pmd, addr, next);
> +                       continue;
> +               }
> +               pmd = (pmd_t *)info->alloc_pgt_page(info->context);
> +               if (!pmd)
> +                       return -ENOMEM;
> +               ident_pmd_init(info->pmd_flag, pmd, addr, next);
> +               set_pud(pud, __pud(__pa(pmd) | _KERNPG_TABLE));
> +       }
> +
> +       return 0;
> +}
> +
> +int kernel_ident_mapping_init(struct x86_mapping_info *info, pgd_t *pgd_page,
> +                             unsigned long addr, unsigned long end)
> +{
> +       unsigned long next;
> +       int result;
> +       int off = info->kernel_mapping ? pgd_index(__PAGE_OFFSET) : 0;
> +
> +       for (; addr < end; addr = next) {
> +               pgd_t *pgd = pgd_page + pgd_index(addr) + off;
> +               pud_t *pud;
> +
> +               next = (addr & PGDIR_MASK) + PGDIR_SIZE;
> +               if (next > end)
> +                       next = end;
> +
> +               if (pgd_present(*pgd)) {
> +                       pud = pud_offset(pgd, 0);
> +                       result = ident_pud_init(info, pud, addr, next);
> +                       if (result)
> +                               return result;
> +                       continue;
> +               }
> +
> +               pud = (pud_t *)info->alloc_pgt_page(info->context);
> +               if (!pud)
> +                       return -ENOMEM;
> +               result = ident_pud_init(info, pud, addr, next);
> +               if (result)
> +                       return result;
> +               set_pgd(pgd, __pgd(__pa(pud) | _KERNPG_TABLE));
> +       }
> +
> +       return 0;
> +}
> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> index 30eb05a..c30efb6 100644
> --- a/arch/x86/mm/init_64.c
> +++ b/arch/x86/mm/init_64.c
> @@ -56,79 +56,7 @@
>
>  #include "mm_internal.h"
>
> -static void ident_pmd_init(unsigned long pmd_flag, pmd_t *pmd_page,
> -                          unsigned long addr, unsigned long end)
> -{
> -       addr &= PMD_MASK;
> -       for (; addr < end; addr += PMD_SIZE) {
> -               pmd_t *pmd = pmd_page + pmd_index(addr);
> -
> -               if (!pmd_present(*pmd))
> -                       set_pmd(pmd, __pmd(addr | pmd_flag));
> -       }
> -}
> -static int ident_pud_init(struct x86_mapping_info *info, pud_t *pud_page,
> -                         unsigned long addr, unsigned long end)
> -{
> -       unsigned long next;
> -
> -       for (; addr < end; addr = next) {
> -               pud_t *pud = pud_page + pud_index(addr);
> -               pmd_t *pmd;
> -
> -               next = (addr & PUD_MASK) + PUD_SIZE;
> -               if (next > end)
> -                       next = end;
> -
> -               if (pud_present(*pud)) {
> -                       pmd = pmd_offset(pud, 0);
> -                       ident_pmd_init(info->pmd_flag, pmd, addr, next);
> -                       continue;
> -               }
> -               pmd = (pmd_t *)info->alloc_pgt_page(info->context);
> -               if (!pmd)
> -                       return -ENOMEM;
> -               ident_pmd_init(info->pmd_flag, pmd, addr, next);
> -               set_pud(pud, __pud(__pa(pmd) | _KERNPG_TABLE));
> -       }
> -
> -       return 0;
> -}
> -
> -int kernel_ident_mapping_init(struct x86_mapping_info *info, pgd_t *pgd_page,
> -                             unsigned long addr, unsigned long end)
> -{
> -       unsigned long next;
> -       int result;
> -       int off = info->kernel_mapping ? pgd_index(__PAGE_OFFSET) : 0;
> -
> -       for (; addr < end; addr = next) {
> -               pgd_t *pgd = pgd_page + pgd_index(addr) + off;
> -               pud_t *pud;
> -
> -               next = (addr & PGDIR_MASK) + PGDIR_SIZE;
> -               if (next > end)
> -                       next = end;
> -
> -               if (pgd_present(*pgd)) {
> -                       pud = pud_offset(pgd, 0);
> -                       result = ident_pud_init(info, pud, addr, next);
> -                       if (result)
> -                               return result;
> -                       continue;
> -               }
> -
> -               pud = (pud_t *)info->alloc_pgt_page(info->context);
> -               if (!pud)
> -                       return -ENOMEM;
> -               result = ident_pud_init(info, pud, addr, next);
> -               if (result)
> -                       return result;
> -               set_pgd(pgd, __pgd(__pa(pud) | _KERNPG_TABLE));
> -       }
> -
> -       return 0;
> -}
> +#include "ident_map.c"
>
>  static int __init parse_direct_gbpages_off(char *arg)
>  {
> --
> 1.8.4.5
>



-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 2/7] x86, boot: Move ZO to end of buffer
  2015-03-10  0:54   ` Kees Cook
@ 2015-03-10  1:04     ` Yinghai Lu
  2015-03-10  5:59     ` Borislav Petkov
  1 sibling, 0 replies; 54+ messages in thread
From: Yinghai Lu @ 2015-03-10  1:04 UTC (permalink / raw)
  To: Kees Cook
  Cc: Matt Fleming, H. Peter Anvin, Ingo Molnar, Borislav Petkov,
	Baoquan He, Thomas Gleixner, Jiri Kosina, LKML, linux-efi

On Mon, Mar 9, 2015 at 5:54 PM, Kees Cook <keescook@chromium.org> wrote:
> On Sat, Mar 7, 2015 at 2:07 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>> Boris found data from boot stage can not be used kernel stage.
>
> "... be used during kernel stage."
>
> Also, can you give a specific example of this problem? (Which data, used how?)
>
>> Bootloader allocate buffer according to init_size in hdr, and load the
>> ZO (arch/x86/boot/compressed/vmlinux) from start of that buffer.
>> During running of ZO, ZO move itself to the middle of buffer at
>> z_extract_offset to make sure that decompressor would not have output
>> overwrite input data before input data get consumed.
>> After decompressor is called, VO (vmlinux) use whole buffer from start,
>> and ZO code and data section is overlapped with VO bss section.
>> And later VO/clear_bss() clear them before code in arch/x86/kernel/setup.c
>> access them.
>>
>> To make the data survive that later, we should avoid the overlapping.
>> At first move ZO close the end of buffer instead of middle of the buffer,
>> that will move out ZO data out of VO bss area.
>>
>> Also after that we can find out where is data section of copied ZO
>> instead of guessing. That will aslr mem_avoid array filling for
>
> "That will make aslr mem_avoid array ..."
>
>> new buffer seaching much simple.
>>
>> And rename z_extract_offset to z_min_extract_offset, as it is
>> actually the minimum offset for extracting now.
>>
>> To keep the final real extract_offset to be page aligned like
>> min_extract_offset, we need make VO _end and ZO _end both
>> page aligned to make sure init_size always page aligned.
>>
>> Next patch will add ZO data size to init_size, so it will make sure
>> ZO data is even out of VO brk area.
>
> This seems like a reasonable idea, but I think the changes should be
> noted/updated in misc.c since a lot of effort was made to make the
> in-memory foot print as small as possible.

Yes, that explanation in misc.c is very good about extra bytes, and it should
still stand. Also need to make mkpiggy.c to point to it.
We need to add more info about in head_32/64.c actually we need move
to end of the buffer.

> These changes do expand the
> size of the loaded kernel, IIUC. If not in this patch, maybe in 5/7?

It extend about 256k for init_size.  should be 3/7.
but that is only for decompress time. Final VO (vmlinux) run size
is not changed.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 7/7] x86, kaslr, 64bit: Set new or extra ident_mapping
  2015-03-07 22:07 ` [PATCH v3 7/7] x86, kaslr, 64bit: Set new or extra ident_mapping Yinghai Lu
@ 2015-03-10  1:09   ` Kees Cook
  2015-03-10  1:14     ` Yinghai Lu
  0 siblings, 1 reply; 54+ messages in thread
From: Kees Cook @ 2015-03-10  1:09 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Matt Fleming, H. Peter Anvin, Ingo Molnar, Borislav Petkov,
	Baoquan He, Thomas Gleixner, Jiri Kosina, LKML, linux-efi

On Sat, Mar 7, 2015 at 2:07 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> First, aslr will support to put random VO above 4G, so we must set ident
> mapping for the range even we come from startup_32 path.
>
> Second, when boot from 64bit bootloader, bootloader set ident mapping,
> and boot via ZO (arch/x86/boot/compressed/vmlinux) startup_64.
> Those pages for pagetable need to be avoided when we select new random
> VO (vmlinux) base. Otherwise decompressor would overwrite them during
> decompressing.
>
> One solution: go through pagetable and find out every page is used by
> pagetable for every mem_aovid checking but we will need extra code.
>
> Other solution: create new ident mapping instead, and pages for pagetable
> will sit in _pagetable section of ZO, and they are in mem_avoid array already.
> In this way, we can reuse the code for setting ident mapping.
>
> The _pgtable will be shared 32bit and 64bit path to reduce init_size,
> as now ZO _rodata to _end will contribute init_size.
>
> Need to increase pgt buffer size.
> When boot via startup_64, as we need to cover old VO, params, cmdline
> and new VO, in extreme case we could have them all cross 512G boundary,
> will need (2+2)*4 pages with 2M mapping. And need 2 for first 2M for vga ram.
> Plus one for level4. Total will be 19 pages.
> When boot via startup_32, aslr would move new VO above 4G, we need set extra
> ident mapping for new VO, pgt buffer come from _pgtable offset 6 pages.
> should only need (2+2) pages at most when it cross 512G boundary.
> So 19 pages could make both pathes happy.
>
>
> -v3: add mapping for first 2M with video ram when X86_VERBOSE_BOOTUP is set.
>      Don't need to set mapping for setup_data, as it is already late
>      in boot/ZO stage, will not access it until VO stage, and VO stage
>      will use early_memmap or kernel address to access them.
>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Jiri Kosina <jkosina@suse.cz>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: Matt Fleming <matt.fleming@intel.com>
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> ---
>  arch/x86/boot/compressed/aslr.c     | 21 ++++++++
>  arch/x86/boot/compressed/head_64.S  |  4 +-
>  arch/x86/boot/compressed/misc_pgt.c | 98 +++++++++++++++++++++++++++++++++++++
>  arch/x86/include/asm/boot.h         | 19 +++++++
>  4 files changed, 140 insertions(+), 2 deletions(-)
>  create mode 100644 arch/x86/boot/compressed/misc_pgt.c
>
> diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
> index a279514..34eb652 100644
> --- a/arch/x86/boot/compressed/aslr.c
> +++ b/arch/x86/boot/compressed/aslr.c
> @@ -1,3 +1,8 @@
> +#ifdef CONFIG_X86_64
> +#define __pa(x)  ((unsigned long)(x))
> +#define __va(x)  ((void *)((unsigned long)(x)))
> +#endif
> +
>  #include "misc.h"
>
>  #include <asm/msr.h>
> @@ -21,6 +26,8 @@ struct kaslr_setup_data {
>         __u8 data[1];
>  } kaslr_setup_data;
>
> +#include "misc_pgt.c"

Shouldn't this just be a normal built .o file that is linked together
in the Makefile, specifically tracking CONFIG_RANDOMIZE_BASE as aslr.o
already is?

-Kees

> +
>  #define I8254_PORT_CONTROL     0x43
>  #define I8254_PORT_COUNTER0    0x40
>  #define I8254_CMD_READBACK     0xC0
> @@ -160,6 +167,7 @@ static void mem_avoid_init(unsigned long input, unsigned long input_size,
>         unsafe = (unsigned long)input + input_size;
>         mem_avoid[0].start = unsafe;
>         mem_avoid[0].size = unsafe_len;
> +       fill_pagetable(output, init_size);
>
>         /* Avoid initrd. */
>         initrd_start  = (u64)real_mode->ext_ramdisk_image << 32;
> @@ -168,6 +176,7 @@ static void mem_avoid_init(unsigned long input, unsigned long input_size,
>         initrd_size |= real_mode->hdr.ramdisk_size;
>         mem_avoid[1].start = initrd_start;
>         mem_avoid[1].size = initrd_size;
> +       /* don't need to set mapping for initrd */
>
>         /* Avoid kernel command line. */
>         cmd_line  = (u64)real_mode->ext_cmd_line_ptr << 32;
> @@ -178,10 +187,19 @@ static void mem_avoid_init(unsigned long input, unsigned long input_size,
>                 ;
>         mem_avoid[2].start = cmd_line;
>         mem_avoid[2].size = cmd_line_size;
> +       fill_pagetable(cmd_line, cmd_line_size);
>
>         /* Avoid params */
>         mem_avoid[3].start = (unsigned long)real_mode;
>         mem_avoid[3].size = sizeof(*real_mode);
> +       fill_pagetable((unsigned long)real_mode, sizeof(*real_mode));
> +
> +       /* don't need to set mapping for setup_data */
> +
> +#ifdef CONFIG_X86_VERBOSE_BOOTUP
> +       /* for video ram */
> +       fill_pagetable(0, PMD_SIZE);
> +#endif
>  }
>
>  /* Does this memory vector overlap a known avoided area? */
> @@ -362,6 +380,9 @@ unsigned char *choose_kernel_location(struct boot_params *params,
>                 goto out;
>
>         choice = random;
> +
> +       fill_pagetable(choice, init_size);
> +       switch_pagetable();
>  out:
>         return (unsigned char *)choice;
>  }
> diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
> index 69015b5..1b6e34a 100644
> --- a/arch/x86/boot/compressed/head_64.S
> +++ b/arch/x86/boot/compressed/head_64.S
> @@ -125,7 +125,7 @@ ENTRY(startup_32)
>         /* Initialize Page tables to 0 */
>         leal    pgtable(%ebx), %edi
>         xorl    %eax, %eax
> -       movl    $((4096*6)/4), %ecx
> +       movl    $(BOOT_INIT_PGT_SIZE/4), %ecx
>         rep     stosl
>
>         /* Build Level 4 */
> @@ -477,4 +477,4 @@ boot_stack_end:
>         .section ".pgtable","a",@nobits
>         .balign 4096
>  pgtable:
> -       .fill 6*4096, 1, 0
> +       .fill BOOT_PGT_SIZE, 1, 0
> diff --git a/arch/x86/boot/compressed/misc_pgt.c b/arch/x86/boot/compressed/misc_pgt.c
> new file mode 100644
> index 0000000..b55982c
> --- /dev/null
> +++ b/arch/x86/boot/compressed/misc_pgt.c
> @@ -0,0 +1,98 @@
> +
> +#ifdef CONFIG_X86_64
> +#include <asm/init.h>
> +#include <asm/pgtable.h>
> +
> +#include "../../mm/ident_map.c"
> +
> +struct alloc_pgt_data {
> +       unsigned char *pgt_buf;
> +       unsigned long pgt_buf_size;
> +       unsigned long pgt_buf_offset;
> +};
> +
> +static void *alloc_pgt_page(void *context)
> +{
> +       struct alloc_pgt_data *d = (struct alloc_pgt_data *)context;
> +       unsigned char *p = (unsigned char *)d->pgt_buf;
> +
> +       if (d->pgt_buf_offset >= d->pgt_buf_size) {
> +               debug_putstr("out of pgt_buf in misc.c\n");
> +               return NULL;
> +       }
> +
> +       p += d->pgt_buf_offset;
> +       d->pgt_buf_offset += PAGE_SIZE;
> +
> +       return p;
> +}
> +
> +/*
> + * Use a normal definition of memset() from string.c. There are already
> + * included header files which expect a definition of memset() and by
> + * the time we define memset macro, it is too late.
> + */
> +#undef memset
> +#define memzero(s, n)   memset((s), 0, (n))
> +
> +unsigned long __force_order;
> +static struct alloc_pgt_data pgt_data;
> +static struct x86_mapping_info mapping_info;
> +static pgd_t *level4p;
> +
> +extern unsigned char _pgtable[];
> +static void fill_pagetable(unsigned long start, unsigned long size)
> +{
> +       unsigned long end = start + size;
> +
> +       if (!level4p) {
> +               pgt_data.pgt_buf_offset = 0;
> +               mapping_info.alloc_pgt_page = alloc_pgt_page;
> +               mapping_info.context = &pgt_data;
> +               mapping_info.pmd_flag = __PAGE_KERNEL_LARGE_EXEC;
> +
> +               /*
> +                * come from startup_32 ?
> +                * then cr3 is _pgtable, we can reuse it.
> +                */
> +               level4p = (pgd_t *)read_cr3();
> +               if ((unsigned long)level4p == (unsigned long)_pgtable) {
> +                       pgt_data.pgt_buf = (unsigned char *)_pgtable +
> +                                                BOOT_INIT_PGT_SIZE;
> +                       pgt_data.pgt_buf_size = BOOT_PGT_SIZE -
> +                                                BOOT_INIT_PGT_SIZE;
> +
> +                       debug_putstr("boot via startup_32\n");
> +               } else {
> +                       pgt_data.pgt_buf = (unsigned char *)_pgtable;
> +                       pgt_data.pgt_buf_size = BOOT_PGT_SIZE;
> +
> +                       debug_putstr("boot via startup_64\n");
> +                       level4p = (pgd_t *)alloc_pgt_page(&pgt_data);
> +               }
> +               memset((unsigned char *)pgt_data.pgt_buf, 0,
> +                        pgt_data.pgt_buf_size);
> +       }
> +
> +       /* align boundary to 2M */
> +       start = round_down(start, PMD_SIZE);
> +       end = round_up(end, PMD_SIZE);
> +       if (start >= end)
> +               return;
> +
> +       kernel_ident_mapping_init(&mapping_info, level4p, start, end);
> +}
> +
> +static void switch_pagetable(void)
> +{
> +       write_cr3((unsigned long)level4p);
> +}
> +
> +#else
> +static void fill_pagetable(unsigned long start, unsigned long size)
> +{
> +}
> +static void switch_pagetable(void)
> +{
> +}
> +#endif
> diff --git a/arch/x86/include/asm/boot.h b/arch/x86/include/asm/boot.h
> index 4fa687a..7b23908 100644
> --- a/arch/x86/include/asm/boot.h
> +++ b/arch/x86/include/asm/boot.h
> @@ -32,7 +32,26 @@
>  #endif /* !CONFIG_KERNEL_BZIP2 */
>
>  #ifdef CONFIG_X86_64
> +
>  #define BOOT_STACK_SIZE        0x4000
> +
> +#define BOOT_INIT_PGT_SIZE (6*4096)
> +#ifdef CONFIG_RANDOMIZE_BASE
> +/*
> + * 1 page for level4, 2 pages for first 2M.
> + * (2+2)*4 pages for kernel, param, cmd_line, random kernel
> + * if all cross 512G boundary.
> + * So total will be 19 pages.
> + */
> +#ifdef CONFIG_X86_VERBOSE_BOOTUP
> +#define BOOT_PGT_SIZE (19*4096)
> +#else
> +#define BOOT_PGT_SIZE (17*4096)
> +#endif
> +#else
> +#define BOOT_PGT_SIZE BOOT_INIT_PGT_SIZE
> +#endif
> +
>  #else
>  #define BOOT_STACK_SIZE        0x1000
>  #endif
> --
> 1.8.4.5
>



-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 5/7] x86, kaslr: Consolidate mem_avoid array filling
  2015-03-10  1:00   ` Kees Cook
@ 2015-03-10  1:10     ` Yinghai Lu
  2015-03-10  1:26       ` Kees Cook
  0 siblings, 1 reply; 54+ messages in thread
From: Yinghai Lu @ 2015-03-10  1:10 UTC (permalink / raw)
  To: Kees Cook
  Cc: Matt Fleming, H. Peter Anvin, Ingo Molnar, Borislav Petkov,
	Baoquan He, Thomas Gleixner, Jiri Kosina, LKML, linux-efi

On Mon, Mar 9, 2015 at 6:00 PM, Kees Cook <keescook@chromium.org> wrote:
> On Sat, Mar 7, 2015 at 2:07 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>
> This may be a stupid question, but are boot_params being used outside
> of the compressed loader? If so, it might make sense to split that
> change into a separate patch to go to stable, if it's causing
> problems. (And document what problem is getting solved.)

boot_params will keep the same and until it is passed
x86_64_start_kernel in vmlinux.
and there it will be copied, same as cmdline.

but current kaslr only support random the base high, and it does not
support kexec
and current boot loader (grub2) put it really low. (under 1M). the
real real_mode. :).

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 7/7] x86, kaslr, 64bit: Set new or extra ident_mapping
  2015-03-10  1:09   ` Kees Cook
@ 2015-03-10  1:14     ` Yinghai Lu
  2015-03-10  6:54       ` Yinghai Lu
  0 siblings, 1 reply; 54+ messages in thread
From: Yinghai Lu @ 2015-03-10  1:14 UTC (permalink / raw)
  To: Kees Cook
  Cc: Matt Fleming, H. Peter Anvin, Ingo Molnar, Borislav Petkov,
	Baoquan He, Thomas Gleixner, Jiri Kosina, LKML, linux-efi

On Mon, Mar 9, 2015 at 6:09 PM, Kees Cook <keescook@chromium.org> wrote:
> On Sat, Mar 7, 2015 at 2:07 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>> First, aslr will support to put random VO above 4G, so we must set ident
>> mapping for the range even we come from startup_32 path.
>>
>> Second, when boot from 64bit bootloader, bootloader set ident mapping,
>> and boot via ZO (arch/x86/boot/compressed/vmlinux) startup_64.
>> Those pages for pagetable need to be avoided when we select new random
>> VO (vmlinux) base. Otherwise decompressor would overwrite them during
>> decompressing.
>>
>> One solution: go through pagetable and find out every page is used by
>> pagetable for every mem_aovid checking but we will need extra code.
>>
>> Other solution: create new ident mapping instead, and pages for pagetable
>> will sit in _pagetable section of ZO, and they are in mem_avoid array already.
>> In this way, we can reuse the code for setting ident mapping.
>>
>> The _pgtable will be shared 32bit and 64bit path to reduce init_size,
>> as now ZO _rodata to _end will contribute init_size.
>>
>> Need to increase pgt buffer size.
>> When boot via startup_64, as we need to cover old VO, params, cmdline
>> and new VO, in extreme case we could have them all cross 512G boundary,
>> will need (2+2)*4 pages with 2M mapping. And need 2 for first 2M for vga ram.
>> Plus one for level4. Total will be 19 pages.
>> When boot via startup_32, aslr would move new VO above 4G, we need set extra
>> ident mapping for new VO, pgt buffer come from _pgtable offset 6 pages.
>> should only need (2+2) pages at most when it cross 512G boundary.
>> So 19 pages could make both pathes happy.
>>
>>
>> -v3: add mapping for first 2M with video ram when X86_VERBOSE_BOOTUP is set.
>>      Don't need to set mapping for setup_data, as it is already late
>>      in boot/ZO stage, will not access it until VO stage, and VO stage
>>      will use early_memmap or kernel address to access them.
>>
>> Cc: Kees Cook <keescook@chromium.org>
>> Cc: Jiri Kosina <jkosina@suse.cz>
>> Cc: Borislav Petkov <bp@suse.de>
>> Cc: Matt Fleming <matt.fleming@intel.com>
>> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
>> ---
>>  arch/x86/boot/compressed/aslr.c     | 21 ++++++++
>>  arch/x86/boot/compressed/head_64.S  |  4 +-
>>  arch/x86/boot/compressed/misc_pgt.c | 98 +++++++++++++++++++++++++++++++++++++
>>  arch/x86/include/asm/boot.h         | 19 +++++++
>>  4 files changed, 140 insertions(+), 2 deletions(-)
>>  create mode 100644 arch/x86/boot/compressed/misc_pgt.c
>>
>> diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
>> index a279514..34eb652 100644
>> --- a/arch/x86/boot/compressed/aslr.c
>> +++ b/arch/x86/boot/compressed/aslr.c
>> @@ -1,3 +1,8 @@
>> +#ifdef CONFIG_X86_64
>> +#define __pa(x)  ((unsigned long)(x))
>> +#define __va(x)  ((void *)((unsigned long)(x)))
>> +#endif
>> +
>>  #include "misc.h"
>>
>>  #include <asm/msr.h>
>> @@ -21,6 +26,8 @@ struct kaslr_setup_data {
>>         __u8 data[1];
>>  } kaslr_setup_data;
>>
>> +#include "misc_pgt.c"
>
> Shouldn't this just be a normal built .o file that is linked together
> in the Makefile, specifically tracking CONFIG_RANDOMIZE_BASE as aslr.o
> already is?

Yes, we could go that way.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 5/7] x86, kaslr: Consolidate mem_avoid array filling
  2015-03-10  1:10     ` Yinghai Lu
@ 2015-03-10  1:26       ` Kees Cook
  0 siblings, 0 replies; 54+ messages in thread
From: Kees Cook @ 2015-03-10  1:26 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Matt Fleming, H. Peter Anvin, Ingo Molnar, Borislav Petkov,
	Baoquan He, Thomas Gleixner, Jiri Kosina, LKML, linux-efi

On Mon, Mar 9, 2015 at 6:10 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Mon, Mar 9, 2015 at 6:00 PM, Kees Cook <keescook@chromium.org> wrote:
>> On Sat, Mar 7, 2015 at 2:07 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>>
>> This may be a stupid question, but are boot_params being used outside
>> of the compressed loader? If so, it might make sense to split that
>> change into a separate patch to go to stable, if it's causing
>> problems. (And document what problem is getting solved.)
>
> boot_params will keep the same and until it is passed
> x86_64_start_kernel in vmlinux.
> and there it will be copied, same as cmdline.
>
> but current kaslr only support random the base high, and it does not
> support kexec
> and current boot loader (grub2) put it really low. (under 1M). the
> real real_mode. :).

Oh right! Of course, thanks for the details. Yes, it was implicitly
ignored before, but now it needs to be explicitly ignored. Great,
thanks!

-Kees

>
> Thanks
>
> Yinghai



-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 2/7] x86, boot: Move ZO to end of buffer
  2015-03-10  0:54   ` Kees Cook
  2015-03-10  1:04     ` Yinghai Lu
@ 2015-03-10  5:59     ` Borislav Petkov
  1 sibling, 0 replies; 54+ messages in thread
From: Borislav Petkov @ 2015-03-10  5:59 UTC (permalink / raw)
  To: Kees Cook
  Cc: Yinghai Lu, Matt Fleming, H. Peter Anvin, Ingo Molnar,
	Baoquan He, Thomas Gleixner, Jiri Kosina, LKML, linux-efi

On Mon, Mar 09, 2015 at 05:54:01PM -0700, Kees Cook wrote:
> On Sat, Mar 7, 2015 at 2:07 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> > Boris found data from boot stage can not be used kernel stage.
> 
> "... be used during kernel stage."
> 
> Also, can you give a specific example of this problem? (Which data, used how?)

Yeah, I'm filling stuff in, no worries. It just takes twice as long as
I'm decyphering early boot code *and* Yinghai. :-)

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 7/7] x86, kaslr, 64bit: Set new or extra ident_mapping
  2015-03-10  1:14     ` Yinghai Lu
@ 2015-03-10  6:54       ` Yinghai Lu
  0 siblings, 0 replies; 54+ messages in thread
From: Yinghai Lu @ 2015-03-10  6:54 UTC (permalink / raw)
  To: Kees Cook
  Cc: Matt Fleming, H. Peter Anvin, Ingo Molnar, Borislav Petkov,
	Baoquan He, Thomas Gleixner, Jiri Kosina, LKML, linux-efi

[-- Attachment #1: Type: text/plain, Size: 781 bytes --]

On Mon, Mar 9, 2015 at 6:14 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>>>
>>> +#include "misc_pgt.c"
>>
>> Shouldn't this just be a normal built .o file that is linked together
>> in the Makefile, specifically tracking CONFIG_RANDOMIZE_BASE as aslr.o
>> already is?
>
> Yes, we could go that way.

Please check attached version with link misc_pgt.o.

BTW, here are difference between include misc_pgt.c and link misc_pgt.o:

include misc_pgt.c
   text       data        bss        dec        hex    filename
13934290        304      50064    13984658     d56392
arch/x86/boot/compressed/vmlinux

link misc_pgt.o
   text       data        bss        dec        hex    filename
13934336        304      50064    13984704     d563c0
arch/x86/boot/compressed/vmlinux

Thanks

Yinghai

[-- Attachment #2: misc_fill_pgt_v4_2.patch --]
[-- Type: text/x-patch, Size: 9231 bytes --]

From: Yinghai Lu <yinghai@kernel.org>
Subject: [PATCH v4] x86, kaslr, 64bit: Set new or extra ident_mapping

First, aslr will support to put random VO above 4G, so we must set ident
mapping for the range even we come from startup_32 path.

Second, when boot from 64bit bootloader, bootloader set ident mapping,
and boot via ZO (arch/x86/boot/compressed/vmlinux) startup_64.
Those pages for pagetable need to be avoided when we select new random
VO (vmlinux) base. Otherwise decompressor would overwrite them during
decompressing.

One solution: go through pagetable and find out every page is used by
pagetable for every mem_aovid checking but we will need extra code.

Other solution: create new ident mapping instead, and pages for pagetable
will sit in _pagetable section of ZO, and they are in mem_avoid array already.
In this way, we can reuse the code for setting ident mapping.

The _pgtable will be shared 32bit and 64bit path to reduce init_size,
as now ZO _rodata to _end will contribute init_size.

Need to increase pgt buffer size.
When boot via startup_64, as we need to cover old VO, params, cmdline
and new VO, in extreme case we could have them all cross 512G boundary,
will need (2+2)*4 pages with 2M mapping. And need 2 for first 2M for vga ram.
Plus one for level4. Total will be 19 pages.
When boot via startup_32, aslr would move new VO above 4G, we need set extra
ident mapping for new VO, pgt buffer come from _pgtable offset 6 pages.
should only need (2+2) pages at most when it cross 512G boundary.
So 19 pages could make both paths happy.

Cc: Kees Cook <keescook@chromium.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Borislav Petkov <bp@suse.de>
Cc: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
-v3: add mapping for first 2M with video ram when X86_VERBOSE_BOOTUP is set.
     Don't need to set mapping for setup_data, as it is already late
     in boot/ZO stage, will not access it until VO stage, and VO stage
     will use early_memmap or kernel address to access them.
-v4: link misc_pgt.o instead of including misc_pgt.c in aslr.c up to request
     from Kees.

---
 arch/x86/boot/compressed/Makefile   |    3 +
 arch/x86/boot/compressed/aslr.c     |   14 +++++
 arch/x86/boot/compressed/head_64.S  |    4 -
 arch/x86/boot/compressed/misc.h     |   11 ++++
 arch/x86/boot/compressed/misc_pgt.c |   91 ++++++++++++++++++++++++++++++++++++
 arch/x86/include/asm/boot.h         |   19 +++++++
 6 files changed, 140 insertions(+), 2 deletions(-)

Index: linux-2.6/arch/x86/boot/compressed/misc_pgt.c
===================================================================
--- /dev/null
+++ linux-2.6/arch/x86/boot/compressed/misc_pgt.c
@@ -0,0 +1,91 @@
+#define __pa(x)  ((unsigned long)(x))
+#define __va(x)  ((void *)((unsigned long)(x)))
+
+#include "misc.h"
+
+#include <asm/init.h>
+#include <asm/pgtable.h>
+
+#include "../../mm/ident_map.c"
+
+struct alloc_pgt_data {
+	unsigned char *pgt_buf;
+	unsigned long pgt_buf_size;
+	unsigned long pgt_buf_offset;
+};
+
+static void *alloc_pgt_page(void *context)
+{
+	struct alloc_pgt_data *d = (struct alloc_pgt_data *)context;
+	unsigned char *p = (unsigned char *)d->pgt_buf;
+
+	if (d->pgt_buf_offset >= d->pgt_buf_size) {
+		debug_putstr("out of pgt_buf in misc.c\n");
+		return NULL;
+	}
+
+	p += d->pgt_buf_offset;
+	d->pgt_buf_offset += PAGE_SIZE;
+
+	return p;
+}
+
+/*
+ * Use a normal definition of memset() from string.c. There are already
+ * included header files which expect a definition of memset() and by
+ * the time we define memset macro, it is too late.
+ */
+#undef memset
+#define memzero(s, n)   memset((s), 0, (n))
+
+unsigned long __force_order;
+static struct alloc_pgt_data pgt_data;
+static struct x86_mapping_info mapping_info;
+static pgd_t *level4p;
+
+void fill_pagetable(unsigned long start, unsigned long size)
+{
+	unsigned long end = start + size;
+
+	if (!level4p) {
+		pgt_data.pgt_buf_offset = 0;
+		mapping_info.alloc_pgt_page = alloc_pgt_page;
+		mapping_info.context = &pgt_data;
+		mapping_info.pmd_flag = __PAGE_KERNEL_LARGE_EXEC;
+
+		/*
+		 * come from startup_32 ?
+		 * then cr3 is _pgtable, we can reuse it.
+		 */
+		level4p = (pgd_t *)read_cr3();
+		if ((unsigned long)level4p == (unsigned long)_pgtable) {
+			pgt_data.pgt_buf = (unsigned char *)_pgtable +
+						 BOOT_INIT_PGT_SIZE;
+			pgt_data.pgt_buf_size = BOOT_PGT_SIZE -
+						 BOOT_INIT_PGT_SIZE;
+
+			debug_putstr("boot via startup_32\n");
+		} else {
+			pgt_data.pgt_buf = (unsigned char *)_pgtable;
+			pgt_data.pgt_buf_size = BOOT_PGT_SIZE;
+
+			debug_putstr("boot via startup_64\n");
+			level4p = (pgd_t *)alloc_pgt_page(&pgt_data);
+		}
+		memset((unsigned char *)pgt_data.pgt_buf, 0,
+			 pgt_data.pgt_buf_size);
+	}
+
+	/* align boundary to 2M */
+	start = round_down(start, PMD_SIZE);
+	end = round_up(end, PMD_SIZE);
+	if (start >= end)
+		return;
+
+	kernel_ident_mapping_init(&mapping_info, level4p, start, end);
+}
+
+void switch_pagetable(void)
+{
+	write_cr3((unsigned long)level4p);
+}
Index: linux-2.6/arch/x86/boot/compressed/aslr.c
===================================================================
--- linux-2.6.orig/arch/x86/boot/compressed/aslr.c
+++ linux-2.6/arch/x86/boot/compressed/aslr.c
@@ -160,6 +160,7 @@ static void mem_avoid_init(unsigned long
 	unsafe = (unsigned long)input + input_size;
 	mem_avoid[0].start = unsafe;
 	mem_avoid[0].size = unsafe_len;
+	fill_pagetable(output, init_size);
 
 	/* Avoid initrd. */
 	initrd_start  = (u64)real_mode->ext_ramdisk_image << 32;
@@ -168,6 +169,7 @@ static void mem_avoid_init(unsigned long
 	initrd_size |= real_mode->hdr.ramdisk_size;
 	mem_avoid[1].start = initrd_start;
 	mem_avoid[1].size = initrd_size;
+	/* don't need to set mapping for initrd */
 
 	/* Avoid kernel command line. */
 	cmd_line  = (u64)real_mode->ext_cmd_line_ptr << 32;
@@ -178,10 +180,19 @@ static void mem_avoid_init(unsigned long
 		;
 	mem_avoid[2].start = cmd_line;
 	mem_avoid[2].size = cmd_line_size;
+	fill_pagetable(cmd_line, cmd_line_size);
 
 	/* Avoid params */
 	mem_avoid[3].start = (unsigned long)real_mode;
 	mem_avoid[3].size = sizeof(*real_mode);
+	fill_pagetable((unsigned long)real_mode, sizeof(*real_mode));
+
+	/* don't need to set mapping for setup_data */
+
+#ifdef CONFIG_X86_VERBOSE_BOOTUP
+	/* for video ram */
+	fill_pagetable(0, PMD_SIZE);
+#endif
 }
 
 /* Does this memory vector overlap a known avoided area? */
@@ -362,6 +373,9 @@ unsigned char *choose_kernel_location(st
 		goto out;
 
 	choice = random;
+
+	fill_pagetable(choice, init_size);
+	switch_pagetable();
 out:
 	return (unsigned char *)choice;
 }
Index: linux-2.6/arch/x86/boot/compressed/head_64.S
===================================================================
--- linux-2.6.orig/arch/x86/boot/compressed/head_64.S
+++ linux-2.6/arch/x86/boot/compressed/head_64.S
@@ -125,7 +125,7 @@ ENTRY(startup_32)
 	/* Initialize Page tables to 0 */
 	leal	pgtable(%ebx), %edi
 	xorl	%eax, %eax
-	movl	$((4096*6)/4), %ecx
+	movl	$(BOOT_INIT_PGT_SIZE/4), %ecx
 	rep	stosl
 
 	/* Build Level 4 */
@@ -477,4 +477,4 @@ boot_stack_end:
 	.section ".pgtable","a",@nobits
 	.balign 4096
 pgtable:
-	.fill 6*4096, 1, 0
+	.fill BOOT_PGT_SIZE, 1, 0
Index: linux-2.6/arch/x86/include/asm/boot.h
===================================================================
--- linux-2.6.orig/arch/x86/include/asm/boot.h
+++ linux-2.6/arch/x86/include/asm/boot.h
@@ -32,7 +32,26 @@
 #endif /* !CONFIG_KERNEL_BZIP2 */
 
 #ifdef CONFIG_X86_64
+
 #define BOOT_STACK_SIZE	0x4000
+
+#define BOOT_INIT_PGT_SIZE (6*4096)
+#ifdef CONFIG_RANDOMIZE_BASE
+/*
+ * 1 page for level4, 2 pages for first 2M.
+ * (2+2)*4 pages for kernel, param, cmd_line, random kernel
+ * if all cross 512G boundary.
+ * So total will be 19 pages.
+ */
+#ifdef CONFIG_X86_VERBOSE_BOOTUP
+#define BOOT_PGT_SIZE (19*4096)
+#else
+#define BOOT_PGT_SIZE (17*4096)
+#endif
+#else
+#define BOOT_PGT_SIZE BOOT_INIT_PGT_SIZE
+#endif
+
 #else
 #define BOOT_STACK_SIZE	0x1000
 #endif
Index: linux-2.6/arch/x86/boot/compressed/Makefile
===================================================================
--- linux-2.6.orig/arch/x86/boot/compressed/Makefile
+++ linux-2.6/arch/x86/boot/compressed/Makefile
@@ -46,6 +46,9 @@ vmlinux-objs-y := $(obj)/vmlinux.lds $(o
 
 vmlinux-objs-$(CONFIG_EARLY_PRINTK) += $(obj)/early_serial_console.o
 vmlinux-objs-$(CONFIG_RANDOMIZE_BASE) += $(obj)/aslr.o
+ifdef CONFIG_X86_64
+	vmlinux-objs-$(CONFIG_RANDOMIZE_BASE) += $(obj)/misc_pgt.o
+endif
 
 $(obj)/eboot.o: KBUILD_CFLAGS += -fshort-wchar -mno-red-zone
 
Index: linux-2.6/arch/x86/boot/compressed/misc.h
===================================================================
--- linux-2.6.orig/arch/x86/boot/compressed/misc.h
+++ linux-2.6/arch/x86/boot/compressed/misc.h
@@ -76,6 +76,17 @@ unsigned char *choose_kernel_location(st
 }
 #endif
 
+#ifdef CONFIG_X86_64
+void fill_pagetable(unsigned long start, unsigned long size);
+void switch_pagetable(void);
+extern unsigned char _pgtable[];
+#else
+static inline void fill_pagetable(unsigned long start, unsigned long size)
+{ }
+static inline void switch_pagetable(void)
+{ }
+#endif
+
 #ifdef CONFIG_EARLY_PRINTK
 /* early_serial_console.c */
 extern int early_serial_base;

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 2/7] x86, boot: Move ZO to end of buffer
  2015-03-07 22:07 ` [PATCH v3 2/7] x86, boot: Move ZO to end of buffer Yinghai Lu
  2015-03-10  0:54   ` Kees Cook
@ 2015-03-10  8:00   ` Borislav Petkov
  2015-03-10  9:34     ` Jiri Kosina
                       ` (2 more replies)
  1 sibling, 3 replies; 54+ messages in thread
From: Borislav Petkov @ 2015-03-10  8:00 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Matt Fleming, H. Peter Anvin, Ingo Molnar, Kees Cook, Baoquan He,
	Thomas Gleixner, Jiri Kosina, linux-kernel, linux-efi

Final patch:

---
From: Yinghai Lu <yinghai@kernel.org>
Date: Sat, 7 Mar 2015 14:07:16 -0800
Subject: [PATCH] x86/setup: Move compressed kernel to the end of the buffer

Boris found that passing KASLR status through setup_data from the boot
stage cannot be used later in the kernel stage, see commit

  f47233c2d34f ("x86/mm/ASLR: Propagate base load address calculation")

Here's some background:

The boot loader allocates a buffer of size init_size in concordance with
the value passed in the setup header and it loads the compressed, i.e.
first kernel (arch/x86/boot/compressed/vmlinux) in it.

First kernel then moves itself somewhere around the middle of the
buffer at z_extract_offset to make sure that the decompressor does not
overwrite input data.

After the decompressor is finished, kernel proper (vmlinux) uses the
whole buffer from the beginning and the compressed kernel's code and
data section is overlapped with the kernel proper's bss section.

Later on, clear_bss() in kernel proper clears .bss before code in
arch/x86/kernel/setup.c can access setup_data passed in the first,
compressed kernel.

To make sure that data survives, we should avoid the overlapping.

As a first step, move the first kernel closer to the end of the buffer
instead of the middle. As a result, this will place first kernel's data
area out of kernel proper's .bss area.

This way we can find out where the data section of the copied first
kernel is instead of guessing. In addition, it will make the KASLR
mem_avoid array preparation for the search of a fitting buffer much
simpler.

While at it, rename z_extract_offset to z_min_extract_offset as it is
actually the minimum extract offset now.

In order to keep the final extract offset page-aligned we need to
make both kernels' _end markers page-aligned too so that init_size is
page-aligned as a result.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Matt Fleming <matt.fleming@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: linux-efi@vger.kernel.org
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Fixes: f47233c2d34f ("x86/mm/ASLR: Propagate base load address calculation")
Link: http://lkml.kernel.org/r/1425766041-6551-3-git-send-email-yinghai@kernel.org
[ Commit message massively rewritten ]
Signed-off-by: 
---
 arch/x86/boot/compressed/head_32.S     | 11 +++++++++--
 arch/x86/boot/compressed/head_64.S     |  8 ++++++--
 arch/x86/boot/compressed/mkpiggy.c     |  7 ++-----
 arch/x86/boot/compressed/vmlinux.lds.S |  1 +
 arch/x86/boot/header.S                 |  2 +-
 arch/x86/kernel/asm-offsets.c          |  1 +
 arch/x86/kernel/vmlinux.lds.S          |  1 +
 7 files changed, 21 insertions(+), 10 deletions(-)

diff --git a/arch/x86/boot/compressed/head_32.S b/arch/x86/boot/compressed/head_32.S
index cbed1407a5cd..a9b56f1d8e75 100644
--- a/arch/x86/boot/compressed/head_32.S
+++ b/arch/x86/boot/compressed/head_32.S
@@ -147,7 +147,9 @@ preferred_addr:
 1:
 
 	/* Target address to relocate to for decompression */
-	addl	$z_extract_offset, %ebx
+	movl    BP_init_size(%esi), %eax
+	subl    $_end, %eax
+	addl    %eax, %ebx
 
 	/* Set up the stack */
 	leal	boot_stack_end(%ebx), %esp
@@ -208,8 +210,13 @@ relocated:
  */
 				/* push arguments for decompress_kernel: */
 	pushl	$z_output_len	/* decompressed length */
-	leal	z_extract_offset_negative(%ebx), %ebp
+
+	movl    BP_init_size(%esi), %eax
+	subl    $_end, %eax
+	movl    %ebx, %ebp
+	subl    %eax, %ebp
 	pushl	%ebp		/* output address */
+
 	pushl	$z_input_len	/* input_len */
 	leal	input_data(%ebx), %eax
 	pushl	%eax		/* input_data */
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 2884e0c3e8a5..69015b576cf6 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -101,7 +101,9 @@ ENTRY(startup_32)
 1:
 
 	/* Target address to relocate to for decompression */
-	addl	$z_extract_offset, %ebx
+	movl	BP_init_size(%esi), %eax
+	subl	$_end, %eax
+	addl	%eax, %ebx
 
 /*
  * Prepare for entering 64 bit mode
@@ -329,7 +331,9 @@ preferred_addr:
 1:
 
 	/* Target address to relocate to for decompression */
-	leaq	z_extract_offset(%rbp), %rbx
+	movl	BP_init_size(%rsi), %ebx
+	subl	$_end, %ebx
+	addq	%rbp, %rbx
 
 	/* Set up the stack */
 	leaq	boot_stack_end(%rbx), %rsp
diff --git a/arch/x86/boot/compressed/mkpiggy.c b/arch/x86/boot/compressed/mkpiggy.c
index b669ab65bf6c..c03b0097ce58 100644
--- a/arch/x86/boot/compressed/mkpiggy.c
+++ b/arch/x86/boot/compressed/mkpiggy.c
@@ -80,11 +80,8 @@ int main(int argc, char *argv[])
 	printf("z_input_len = %lu\n", ilen);
 	printf(".globl z_output_len\n");
 	printf("z_output_len = %lu\n", (unsigned long)olen);
-	printf(".globl z_extract_offset\n");
-	printf("z_extract_offset = 0x%lx\n", offs);
-	/* z_extract_offset_negative allows simplification of head_32.S */
-	printf(".globl z_extract_offset_negative\n");
-	printf("z_extract_offset_negative = -0x%lx\n", offs);
+	printf(".globl z_min_extract_offset\n");
+	printf("z_min_extract_offset = 0x%lx\n", offs);
 
 	printf(".globl input_data, input_data_end\n");
 	printf("input_data:\n");
diff --git a/arch/x86/boot/compressed/vmlinux.lds.S b/arch/x86/boot/compressed/vmlinux.lds.S
index 34d047c98284..a80acabb80ec 100644
--- a/arch/x86/boot/compressed/vmlinux.lds.S
+++ b/arch/x86/boot/compressed/vmlinux.lds.S
@@ -70,5 +70,6 @@ SECTIONS
 		_epgtable = . ;
 	}
 #endif
+	. = ALIGN(PAGE_SIZE);	/* keep size page-aligned */
 	_end = .;
 }
diff --git a/arch/x86/boot/header.S b/arch/x86/boot/header.S
index 16ef02596db2..9bfab22efdf7 100644
--- a/arch/x86/boot/header.S
+++ b/arch/x86/boot/header.S
@@ -440,7 +440,7 @@ setup_data:		.quad 0			# 64-bit physical pointer to
 
 pref_address:		.quad LOAD_PHYSICAL_ADDR	# preferred load addr
 
-#define ZO_INIT_SIZE	(ZO__end - ZO_startup_32 + ZO_z_extract_offset)
+#define ZO_INIT_SIZE	(ZO__end - ZO_startup_32 + ZO_z_min_extract_offset)
 #define VO_INIT_SIZE	(VO__end - VO__text)
 #if ZO_INIT_SIZE > VO_INIT_SIZE
 #define INIT_SIZE ZO_INIT_SIZE
diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index 9f6b9341950f..0e8e4f7a31ce 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -66,6 +66,7 @@ void common(void) {
 	OFFSET(BP_hardware_subarch, boot_params, hdr.hardware_subarch);
 	OFFSET(BP_version, boot_params, hdr.version);
 	OFFSET(BP_kernel_alignment, boot_params, hdr.kernel_alignment);
+	OFFSET(BP_init_size, boot_params, hdr.init_size);
 	OFFSET(BP_pref_address, boot_params, hdr.pref_address);
 	OFFSET(BP_code32_start, boot_params, hdr.code32_start);
 
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 00bf300fd846..a92d3dc2812a 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -325,6 +325,7 @@ SECTIONS
 		__brk_limit = .;
 	}
 
+	. = ALIGN(PAGE_SIZE);		/* keep init size page-aligned */
 	_end = .;
 
         STABS_DEBUG
-- 
2.2.0.33.gc18b867


-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 3/7] x86, boot: Don't overlap VO with ZO data
  2015-03-07 22:07 ` [PATCH v3 3/7] x86, boot: Don't overlap VO with ZO data Yinghai Lu
@ 2015-03-10  9:34   ` Borislav Petkov
  2015-03-10 15:05     ` Yinghai Lu
  0 siblings, 1 reply; 54+ messages in thread
From: Borislav Petkov @ 2015-03-10  9:34 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Matt Fleming, H. Peter Anvin, Ingo Molnar, Kees Cook, Baoquan He,
	Thomas Gleixner, Jiri Kosina, linux-kernel, linux-efi

Final patch:

---
From: Yinghai Lu <yinghai@kernel.org>
Date: Sat, 7 Mar 2015 14:07:17 -0800
Subject: [PATCH] x86/setup: Don't overlap vmlinux's brk with compressed kernel's data

We already do move the compressed kernel close to the end of the buffer.
However, there's still overlapping beween kernel proper's .brk/.bss and
compressed kernel's data section.

Extend init_size so that kernel proper's .bss and .brk sections
do not overlap with compressed kernel's data section (see
arch/x86/boot/compressed/misc.c).

The increase is from _rodata to _end in the compressed kernel
(arch/x86/boot/compressed/vmlinux) which is something around ~90K on a
kernel with everything enabled in arch/x86/boot/ which gets linked into
vmlinux.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Matt Fleming <matt.fleming@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: linux-efi@vger.kernel.org
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Link: http://lkml.kernel.org/r/1425766041-6551-4-git-send-email-yinghai@kernel.org
Fixes: f47233c2d34f ("x86/mm/ASLR: Propagate base load address calculation")
[ Rewrite commit message;
  rename ADDON_ZO_SIZE to INIT_SIZE_PAD;
  improve comments; measure size increase  ]
Signed-off-by: 
---
 arch/x86/boot/Makefile                 |  2 +-
 arch/x86/boot/compressed/vmlinux.lds.S |  1 +
 arch/x86/boot/header.S                 | 13 +++++++++++--
 3 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
index 57bbf2fb21f6..863ef25dcf60 100644
--- a/arch/x86/boot/Makefile
+++ b/arch/x86/boot/Makefile
@@ -86,7 +86,7 @@ targets += voffset.h
 $(obj)/voffset.h: vmlinux FORCE
 	$(call if_changed,voffset)
 
-sed-zoffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(startup_32\|startup_64\|efi32_stub_entry\|efi64_stub_entry\|efi_pe_entry\|input_data\|_end\|z_.*\)$$/\#define ZO_\2 0x\1/p'
+sed-zoffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(startup_32\|startup_64\|efi32_stub_entry\|efi64_stub_entry\|efi_pe_entry\|input_data\|_end\|_rodata\|z_.*\)$$/\#define ZO_\2 0x\1/p'
 
 quiet_cmd_zoffset = ZOFFSET $@
       cmd_zoffset = $(NM) $< | sed -n $(sed-zoffset) > $@
diff --git a/arch/x86/boot/compressed/vmlinux.lds.S b/arch/x86/boot/compressed/vmlinux.lds.S
index a80acabb80ec..45081235ce64 100644
--- a/arch/x86/boot/compressed/vmlinux.lds.S
+++ b/arch/x86/boot/compressed/vmlinux.lds.S
@@ -35,6 +35,7 @@ SECTIONS
 		*(.text.*)
 		_etext = . ;
 	}
+        . = ALIGN(PAGE_SIZE); /* keep INIT_SIZE_PAD in header.S page-aligned */
 	.rodata : {
 		_rodata = . ;
 		*(.rodata)	 /* read-only data */
diff --git a/arch/x86/boot/header.S b/arch/x86/boot/header.S
index 9bfab22efdf7..db46aa45906d 100644
--- a/arch/x86/boot/header.S
+++ b/arch/x86/boot/header.S
@@ -440,13 +440,22 @@ setup_data:		.quad 0			# 64-bit physical pointer to
 
 pref_address:		.quad LOAD_PHYSICAL_ADDR	# preferred load addr
 
+
 #define ZO_INIT_SIZE	(ZO__end - ZO_startup_32 + ZO_z_min_extract_offset)
+
+/*
+ * Additional padding so that compressed kernel's .data section doesn't overlap
+ * with kernel proper's .bss/.brk sections.
+ */
+#define INIT_SIZE_PAD	(ZO__end - ZO__rodata)
 #define VO_INIT_SIZE	(VO__end - VO__text)
+
 #if ZO_INIT_SIZE > VO_INIT_SIZE
-#define INIT_SIZE ZO_INIT_SIZE
+#define INIT_SIZE (ZO_INIT_SIZE + INIT_SIZE_PAD)
 #else
-#define INIT_SIZE VO_INIT_SIZE
+#define INIT_SIZE (VO_INIT_SIZE + INIT_SIZE_PAD)
 #endif
+
 init_size:		.long INIT_SIZE		# kernel initialization size
 handover_offset:	.long 0			# Filled in by build.c
 
-- 
2.2.0.33.gc18b867

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 2/7] x86, boot: Move ZO to end of buffer
  2015-03-10  8:00   ` Borislav Petkov
@ 2015-03-10  9:34     ` Jiri Kosina
  2015-03-10  9:35       ` Borislav Petkov
  2015-03-10 15:11     ` Yinghai Lu
  2015-03-10 16:59     ` Kees Cook
  2 siblings, 1 reply; 54+ messages in thread
From: Jiri Kosina @ 2015-03-10  9:34 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Yinghai Lu, Matt Fleming, H. Peter Anvin, Ingo Molnar, Kees Cook,
	Baoquan He, Thomas Gleixner, linux-kernel, linux-efi

On Tue, 10 Mar 2015, Borislav Petkov wrote:

> Final patch:
> 
> ---
> From: Yinghai Lu <yinghai@kernel.org>
> Date: Sat, 7 Mar 2015 14:07:16 -0800
> Subject: [PATCH] x86/setup: Move compressed kernel to the end of the buffer
[ ... ]
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Matt Fleming <matt.fleming@intel.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Jiri Kosina <jkosina@suse.cz>
> Cc: linux-efi@vger.kernel.org
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Baoquan He <bhe@redhat.com>
> Fixes: f47233c2d34f ("x86/mm/ASLR: Propagate base load address calculation")

Thanks a lot for fixing my oversight.

Acked-by: Jiri Kosina <jkosina@suse.cz>

> Link: http://lkml.kernel.org/r/1425766041-6551-3-git-send-email-yinghai@kernel.org
> [ Commit message massively rewritten ]
> Signed-off-by: 
> ---
>  arch/x86/boot/compressed/head_32.S     | 11 +++++++++--
>  arch/x86/boot/compressed/head_64.S     |  8 ++++++--
>  arch/x86/boot/compressed/mkpiggy.c     |  7 ++-----
>  arch/x86/boot/compressed/vmlinux.lds.S |  1 +
>  arch/x86/boot/header.S                 |  2 +-
>  arch/x86/kernel/asm-offsets.c          |  1 +
>  arch/x86/kernel/vmlinux.lds.S          |  1 +
>  7 files changed, 21 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/x86/boot/compressed/head_32.S b/arch/x86/boot/compressed/head_32.S
> index cbed1407a5cd..a9b56f1d8e75 100644
> --- a/arch/x86/boot/compressed/head_32.S
> +++ b/arch/x86/boot/compressed/head_32.S
> @@ -147,7 +147,9 @@ preferred_addr:
>  1:
>  
>  	/* Target address to relocate to for decompression */
> -	addl	$z_extract_offset, %ebx
> +	movl    BP_init_size(%esi), %eax
> +	subl    $_end, %eax
> +	addl    %eax, %ebx
>  
>  	/* Set up the stack */
>  	leal	boot_stack_end(%ebx), %esp
> @@ -208,8 +210,13 @@ relocated:
>   */
>  				/* push arguments for decompress_kernel: */
>  	pushl	$z_output_len	/* decompressed length */
> -	leal	z_extract_offset_negative(%ebx), %ebp
> +
> +	movl    BP_init_size(%esi), %eax
> +	subl    $_end, %eax
> +	movl    %ebx, %ebp
> +	subl    %eax, %ebp
>  	pushl	%ebp		/* output address */
> +
>  	pushl	$z_input_len	/* input_len */
>  	leal	input_data(%ebx), %eax
>  	pushl	%eax		/* input_data */
> diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
> index 2884e0c3e8a5..69015b576cf6 100644
> --- a/arch/x86/boot/compressed/head_64.S
> +++ b/arch/x86/boot/compressed/head_64.S
> @@ -101,7 +101,9 @@ ENTRY(startup_32)
>  1:
>  
>  	/* Target address to relocate to for decompression */
> -	addl	$z_extract_offset, %ebx
> +	movl	BP_init_size(%esi), %eax
> +	subl	$_end, %eax
> +	addl	%eax, %ebx
>  
>  /*
>   * Prepare for entering 64 bit mode
> @@ -329,7 +331,9 @@ preferred_addr:
>  1:
>  
>  	/* Target address to relocate to for decompression */
> -	leaq	z_extract_offset(%rbp), %rbx
> +	movl	BP_init_size(%rsi), %ebx
> +	subl	$_end, %ebx
> +	addq	%rbp, %rbx
>  
>  	/* Set up the stack */
>  	leaq	boot_stack_end(%rbx), %rsp
> diff --git a/arch/x86/boot/compressed/mkpiggy.c b/arch/x86/boot/compressed/mkpiggy.c
> index b669ab65bf6c..c03b0097ce58 100644
> --- a/arch/x86/boot/compressed/mkpiggy.c
> +++ b/arch/x86/boot/compressed/mkpiggy.c
> @@ -80,11 +80,8 @@ int main(int argc, char *argv[])
>  	printf("z_input_len = %lu\n", ilen);
>  	printf(".globl z_output_len\n");
>  	printf("z_output_len = %lu\n", (unsigned long)olen);
> -	printf(".globl z_extract_offset\n");
> -	printf("z_extract_offset = 0x%lx\n", offs);
> -	/* z_extract_offset_negative allows simplification of head_32.S */
> -	printf(".globl z_extract_offset_negative\n");
> -	printf("z_extract_offset_negative = -0x%lx\n", offs);
> +	printf(".globl z_min_extract_offset\n");
> +	printf("z_min_extract_offset = 0x%lx\n", offs);
>  
>  	printf(".globl input_data, input_data_end\n");
>  	printf("input_data:\n");
> diff --git a/arch/x86/boot/compressed/vmlinux.lds.S b/arch/x86/boot/compressed/vmlinux.lds.S
> index 34d047c98284..a80acabb80ec 100644
> --- a/arch/x86/boot/compressed/vmlinux.lds.S
> +++ b/arch/x86/boot/compressed/vmlinux.lds.S
> @@ -70,5 +70,6 @@ SECTIONS
>  		_epgtable = . ;
>  	}
>  #endif
> +	. = ALIGN(PAGE_SIZE);	/* keep size page-aligned */
>  	_end = .;
>  }
> diff --git a/arch/x86/boot/header.S b/arch/x86/boot/header.S
> index 16ef02596db2..9bfab22efdf7 100644
> --- a/arch/x86/boot/header.S
> +++ b/arch/x86/boot/header.S
> @@ -440,7 +440,7 @@ setup_data:		.quad 0			# 64-bit physical pointer to
>  
>  pref_address:		.quad LOAD_PHYSICAL_ADDR	# preferred load addr
>  
> -#define ZO_INIT_SIZE	(ZO__end - ZO_startup_32 + ZO_z_extract_offset)
> +#define ZO_INIT_SIZE	(ZO__end - ZO_startup_32 + ZO_z_min_extract_offset)
>  #define VO_INIT_SIZE	(VO__end - VO__text)
>  #if ZO_INIT_SIZE > VO_INIT_SIZE
>  #define INIT_SIZE ZO_INIT_SIZE
> diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
> index 9f6b9341950f..0e8e4f7a31ce 100644
> --- a/arch/x86/kernel/asm-offsets.c
> +++ b/arch/x86/kernel/asm-offsets.c
> @@ -66,6 +66,7 @@ void common(void) {
>  	OFFSET(BP_hardware_subarch, boot_params, hdr.hardware_subarch);
>  	OFFSET(BP_version, boot_params, hdr.version);
>  	OFFSET(BP_kernel_alignment, boot_params, hdr.kernel_alignment);
> +	OFFSET(BP_init_size, boot_params, hdr.init_size);
>  	OFFSET(BP_pref_address, boot_params, hdr.pref_address);
>  	OFFSET(BP_code32_start, boot_params, hdr.code32_start);
>  
> diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
> index 00bf300fd846..a92d3dc2812a 100644
> --- a/arch/x86/kernel/vmlinux.lds.S
> +++ b/arch/x86/kernel/vmlinux.lds.S
> @@ -325,6 +325,7 @@ SECTIONS
>  		__brk_limit = .;
>  	}
>  
> +	. = ALIGN(PAGE_SIZE);		/* keep init size page-aligned */
>  	_end = .;
>  
>          STABS_DEBUG
> -- 
> 2.2.0.33.gc18b867
> 
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> ECO tip #101: Trim your mails when you reply.
> --
> 

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 2/7] x86, boot: Move ZO to end of buffer
  2015-03-10  9:34     ` Jiri Kosina
@ 2015-03-10  9:35       ` Borislav Petkov
  0 siblings, 0 replies; 54+ messages in thread
From: Borislav Petkov @ 2015-03-10  9:35 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Yinghai Lu, Matt Fleming, H. Peter Anvin, Ingo Molnar, Kees Cook,
	Baoquan He, Thomas Gleixner, linux-kernel, linux-efi

On Tue, Mar 10, 2015 at 10:34:31AM +0100, Jiri Kosina wrote:
> Thanks a lot for fixing my oversight.

Bah, it was my suggestion to use setup_data in the first place, sorry
about that.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 3/7] x86, boot: Don't overlap VO with ZO data
  2015-03-10  9:34   ` Borislav Petkov
@ 2015-03-10 15:05     ` Yinghai Lu
  2015-03-10 15:10       ` Borislav Petkov
  0 siblings, 1 reply; 54+ messages in thread
From: Yinghai Lu @ 2015-03-10 15:05 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Matt Fleming, H. Peter Anvin, Ingo Molnar, Kees Cook, Baoquan He,
	Thomas Gleixner, Jiri Kosina, Linux Kernel Mailing List,
	linux-efi

On Tue, Mar 10, 2015 at 2:34 AM, Borislav Petkov <bp@suse.de> wrote:
> Final patch:
>
> ---
> From: Yinghai Lu <yinghai@kernel.org>
> Date: Sat, 7 Mar 2015 14:07:17 -0800
> Subject: [PATCH] x86/setup: Don't overlap vmlinux's brk with compressed kernel's data

We need to keep VO and ZO here...

Should use ZO to replace "compressed kernel"

compressed kernel is compressed VO (vmlinux) actually.

ZO is arch/x86/boot/compressed/vmlinux

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 3/7] x86, boot: Don't overlap VO with ZO data
  2015-03-10 15:05     ` Yinghai Lu
@ 2015-03-10 15:10       ` Borislav Petkov
  2015-03-10 15:17         ` Yinghai Lu
  0 siblings, 1 reply; 54+ messages in thread
From: Borislav Petkov @ 2015-03-10 15:10 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Matt Fleming, H. Peter Anvin, Ingo Molnar, Kees Cook, Baoquan He,
	Thomas Gleixner, Jiri Kosina, Linux Kernel Mailing List,
	linux-efi

On Tue, Mar 10, 2015 at 08:05:52AM -0700, Yinghai Lu wrote:
> We need to keep VO and ZO here...

Why?

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 2/7] x86, boot: Move ZO to end of buffer
  2015-03-10  8:00   ` Borislav Petkov
  2015-03-10  9:34     ` Jiri Kosina
@ 2015-03-10 15:11     ` Yinghai Lu
  2015-03-10 15:13       ` Borislav Petkov
  2015-03-10 16:59     ` Kees Cook
  2 siblings, 1 reply; 54+ messages in thread
From: Yinghai Lu @ 2015-03-10 15:11 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Matt Fleming, H. Peter Anvin, Ingo Molnar, Kees Cook, Baoquan He,
	Thomas Gleixner, Jiri Kosina, Linux Kernel Mailing List,
	linux-efi

On Tue, Mar 10, 2015 at 1:00 AM, Borislav Petkov <bp@suse.de> wrote:
> Final patch:
>
> ---
> From: Yinghai Lu <yinghai@kernel.org>
> Date: Sat, 7 Mar 2015 14:07:16 -0800
> Subject: [PATCH] x86/setup: Move compressed kernel to the end of the buffer

setup should only about arch/x86/boot/setup.ld related.

So please keep x86, boot.

Also stop using "compressed kernel" please, that is confusing.

Just use

ZO: arch/x86/boot/compressed/vmlinux
VO: vmlinux

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 2/7] x86, boot: Move ZO to end of buffer
  2015-03-10 15:11     ` Yinghai Lu
@ 2015-03-10 15:13       ` Borislav Petkov
  0 siblings, 0 replies; 54+ messages in thread
From: Borislav Petkov @ 2015-03-10 15:13 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Matt Fleming, H. Peter Anvin, Ingo Molnar, Kees Cook, Baoquan He,
	Thomas Gleixner, Jiri Kosina, Linux Kernel Mailing List,
	linux-efi

On Tue, Mar 10, 2015 at 08:11:03AM -0700, Yinghai Lu wrote:
> Also stop using "compressed kernel" please, that is confusing.

Why?

> Just use
> 
> ZO: arch/x86/boot/compressed/vmlinux
> VO: vmlinux

and this is not confusing?

Yeah, right.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 3/7] x86, boot: Don't overlap VO with ZO data
  2015-03-10 15:10       ` Borislav Petkov
@ 2015-03-10 15:17         ` Yinghai Lu
  2015-03-10 15:21           ` Borislav Petkov
  0 siblings, 1 reply; 54+ messages in thread
From: Yinghai Lu @ 2015-03-10 15:17 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Matt Fleming, H. Peter Anvin, Ingo Molnar, Kees Cook, Baoquan He,
	Thomas Gleixner, Jiri Kosina, Linux Kernel Mailing List,
	linux-efi

On Tue, Mar 10, 2015 at 8:10 AM, Borislav Petkov <bp@suse.de> wrote:
> On Tue, Mar 10, 2015 at 08:05:52AM -0700, Yinghai Lu wrote:
>> We need to keep VO and ZO here...
>
> Why?

Make it not confusing.

ZO: arch/x86/boot/compressed/vmlinux
VO: vmlinux

setup + ZO ==> bzImage.

compressed kernel is just compressed VO.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 3/7] x86, boot: Don't overlap VO with ZO data
  2015-03-10 15:17         ` Yinghai Lu
@ 2015-03-10 15:21           ` Borislav Petkov
  2015-03-10 15:42             ` Yinghai Lu
  0 siblings, 1 reply; 54+ messages in thread
From: Borislav Petkov @ 2015-03-10 15:21 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Matt Fleming, H. Peter Anvin, Ingo Molnar, Kees Cook, Baoquan He,
	Thomas Gleixner, Jiri Kosina, Linux Kernel Mailing List,
	linux-efi

On Tue, Mar 10, 2015 at 08:17:01AM -0700, Yinghai Lu wrote:
> Make it not confusing.
> 
> ZO: arch/x86/boot/compressed/vmlinux
> VO: vmlinux
> 
> setup + ZO ==> bzImage.
> 
> compressed kernel is just compressed VO.

So the two end up being the "compressed kernel" and "kernel proper".

And then the subject turns into

"Don't overlap kernel proper's brk with the compressed kernel's data section"

Does that still confuse you?

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 3/7] x86, boot: Don't overlap VO with ZO data
  2015-03-10 15:21           ` Borislav Petkov
@ 2015-03-10 15:42             ` Yinghai Lu
  2015-03-10 15:48               ` Borislav Petkov
  0 siblings, 1 reply; 54+ messages in thread
From: Yinghai Lu @ 2015-03-10 15:42 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Matt Fleming, H. Peter Anvin, Ingo Molnar, Kees Cook, Baoquan He,
	Thomas Gleixner, Jiri Kosina, Linux Kernel Mailing List,
	linux-efi

On Tue, Mar 10, 2015 at 8:21 AM, Borislav Petkov <bp@suse.de> wrote:
> On Tue, Mar 10, 2015 at 08:17:01AM -0700, Yinghai Lu wrote:
>> Make it not confusing.
>>
>> ZO: arch/x86/boot/compressed/vmlinux
>> VO: vmlinux
>>
>> setup + ZO ==> bzImage.
>>
>> compressed kernel is just compressed VO.
>
> So the two end up being the "compressed kernel" and "kernel proper".
>
> And then the subject turns into
>
> "Don't overlap kernel proper's brk with the compressed kernel's data section"
>
> Does that still confuse you?

In arch/x86/boot/header.S, we already use VO and ZO.
So please keep on using them, and don't introduce "kernel proper" etc.

Thanks

Yinghai Lu

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 3/7] x86, boot: Don't overlap VO with ZO data
  2015-03-10 15:42             ` Yinghai Lu
@ 2015-03-10 15:48               ` Borislav Petkov
  2015-03-10 19:29                 ` Yinghai Lu
  0 siblings, 1 reply; 54+ messages in thread
From: Borislav Petkov @ 2015-03-10 15:48 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Matt Fleming, H. Peter Anvin, Ingo Molnar, Kees Cook, Baoquan He,
	Thomas Gleixner, Jiri Kosina, Linux Kernel Mailing List,
	linux-efi

On Tue, Mar 10, 2015 at 08:42:40AM -0700, Yinghai Lu wrote:
> In arch/x86/boot/header.S, we already use VO and ZO.
> So please keep on using them, and don't introduce "kernel proper" etc.

So you're suggesting commit messages should use variable names and
prefixes from the code instead of being human-readable?

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 2/7] x86, boot: Move ZO to end of buffer
  2015-03-10  8:00   ` Borislav Petkov
  2015-03-10  9:34     ` Jiri Kosina
  2015-03-10 15:11     ` Yinghai Lu
@ 2015-03-10 16:59     ` Kees Cook
  2 siblings, 0 replies; 54+ messages in thread
From: Kees Cook @ 2015-03-10 16:59 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Yinghai Lu, Matt Fleming, H. Peter Anvin, Ingo Molnar,
	Baoquan He, Thomas Gleixner, Jiri Kosina, LKML, linux-efi

On Tue, Mar 10, 2015 at 1:00 AM, Borislav Petkov <bp@suse.de> wrote:
> Final patch:
>
> ---
> From: Yinghai Lu <yinghai@kernel.org>
> Date: Sat, 7 Mar 2015 14:07:16 -0800
> Subject: [PATCH] x86/setup: Move compressed kernel to the end of the buffer
>
> Boris found that passing KASLR status through setup_data from the boot
> stage cannot be used later in the kernel stage, see commit
>
>   f47233c2d34f ("x86/mm/ASLR: Propagate base load address calculation")
>
> Here's some background:
>
> The boot loader allocates a buffer of size init_size in concordance with
> the value passed in the setup header and it loads the compressed, i.e.
> first kernel (arch/x86/boot/compressed/vmlinux) in it.
>
> First kernel then moves itself somewhere around the middle of the
> buffer at z_extract_offset to make sure that the decompressor does not
> overwrite input data.
>
> After the decompressor is finished, kernel proper (vmlinux) uses the
> whole buffer from the beginning and the compressed kernel's code and
> data section is overlapped with the kernel proper's bss section.
>
> Later on, clear_bss() in kernel proper clears .bss before code in
> arch/x86/kernel/setup.c can access setup_data passed in the first,
> compressed kernel.
>
> To make sure that data survives, we should avoid the overlapping.
>
> As a first step, move the first kernel closer to the end of the buffer
> instead of the middle. As a result, this will place first kernel's data
> area out of kernel proper's .bss area.
>
> This way we can find out where the data section of the copied first
> kernel is instead of guessing. In addition, it will make the KASLR
> mem_avoid array preparation for the search of a fitting buffer much
> simpler.
>
> While at it, rename z_extract_offset to z_min_extract_offset as it is
> actually the minimum extract offset now.
>
> In order to keep the final extract offset page-aligned we need to
> make both kernels' _end markers page-aligned too so that init_size is
> page-aligned as a result.
>
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Matt Fleming <matt.fleming@intel.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Jiri Kosina <jkosina@suse.cz>
> Cc: linux-efi@vger.kernel.org
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Baoquan He <bhe@redhat.com>
> Fixes: f47233c2d34f ("x86/mm/ASLR: Propagate base load address calculation")
> Link: http://lkml.kernel.org/r/1425766041-6551-3-git-send-email-yinghai@kernel.org
> [ Commit message massively rewritten ]

Acked-by: Kees Cook <keescook@chromium.org>

Thanks!

-Kees

> Signed-off-by:
> ---
>  arch/x86/boot/compressed/head_32.S     | 11 +++++++++--
>  arch/x86/boot/compressed/head_64.S     |  8 ++++++--
>  arch/x86/boot/compressed/mkpiggy.c     |  7 ++-----
>  arch/x86/boot/compressed/vmlinux.lds.S |  1 +
>  arch/x86/boot/header.S                 |  2 +-
>  arch/x86/kernel/asm-offsets.c          |  1 +
>  arch/x86/kernel/vmlinux.lds.S          |  1 +
>  7 files changed, 21 insertions(+), 10 deletions(-)
>
> diff --git a/arch/x86/boot/compressed/head_32.S b/arch/x86/boot/compressed/head_32.S
> index cbed1407a5cd..a9b56f1d8e75 100644
> --- a/arch/x86/boot/compressed/head_32.S
> +++ b/arch/x86/boot/compressed/head_32.S
> @@ -147,7 +147,9 @@ preferred_addr:
>  1:
>
>         /* Target address to relocate to for decompression */
> -       addl    $z_extract_offset, %ebx
> +       movl    BP_init_size(%esi), %eax
> +       subl    $_end, %eax
> +       addl    %eax, %ebx
>
>         /* Set up the stack */
>         leal    boot_stack_end(%ebx), %esp
> @@ -208,8 +210,13 @@ relocated:
>   */
>                                 /* push arguments for decompress_kernel: */
>         pushl   $z_output_len   /* decompressed length */
> -       leal    z_extract_offset_negative(%ebx), %ebp
> +
> +       movl    BP_init_size(%esi), %eax
> +       subl    $_end, %eax
> +       movl    %ebx, %ebp
> +       subl    %eax, %ebp
>         pushl   %ebp            /* output address */
> +
>         pushl   $z_input_len    /* input_len */
>         leal    input_data(%ebx), %eax
>         pushl   %eax            /* input_data */
> diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
> index 2884e0c3e8a5..69015b576cf6 100644
> --- a/arch/x86/boot/compressed/head_64.S
> +++ b/arch/x86/boot/compressed/head_64.S
> @@ -101,7 +101,9 @@ ENTRY(startup_32)
>  1:
>
>         /* Target address to relocate to for decompression */
> -       addl    $z_extract_offset, %ebx
> +       movl    BP_init_size(%esi), %eax
> +       subl    $_end, %eax
> +       addl    %eax, %ebx
>
>  /*
>   * Prepare for entering 64 bit mode
> @@ -329,7 +331,9 @@ preferred_addr:
>  1:
>
>         /* Target address to relocate to for decompression */
> -       leaq    z_extract_offset(%rbp), %rbx
> +       movl    BP_init_size(%rsi), %ebx
> +       subl    $_end, %ebx
> +       addq    %rbp, %rbx
>
>         /* Set up the stack */
>         leaq    boot_stack_end(%rbx), %rsp
> diff --git a/arch/x86/boot/compressed/mkpiggy.c b/arch/x86/boot/compressed/mkpiggy.c
> index b669ab65bf6c..c03b0097ce58 100644
> --- a/arch/x86/boot/compressed/mkpiggy.c
> +++ b/arch/x86/boot/compressed/mkpiggy.c
> @@ -80,11 +80,8 @@ int main(int argc, char *argv[])
>         printf("z_input_len = %lu\n", ilen);
>         printf(".globl z_output_len\n");
>         printf("z_output_len = %lu\n", (unsigned long)olen);
> -       printf(".globl z_extract_offset\n");
> -       printf("z_extract_offset = 0x%lx\n", offs);
> -       /* z_extract_offset_negative allows simplification of head_32.S */
> -       printf(".globl z_extract_offset_negative\n");
> -       printf("z_extract_offset_negative = -0x%lx\n", offs);
> +       printf(".globl z_min_extract_offset\n");
> +       printf("z_min_extract_offset = 0x%lx\n", offs);
>
>         printf(".globl input_data, input_data_end\n");
>         printf("input_data:\n");
> diff --git a/arch/x86/boot/compressed/vmlinux.lds.S b/arch/x86/boot/compressed/vmlinux.lds.S
> index 34d047c98284..a80acabb80ec 100644
> --- a/arch/x86/boot/compressed/vmlinux.lds.S
> +++ b/arch/x86/boot/compressed/vmlinux.lds.S
> @@ -70,5 +70,6 @@ SECTIONS
>                 _epgtable = . ;
>         }
>  #endif
> +       . = ALIGN(PAGE_SIZE);   /* keep size page-aligned */
>         _end = .;
>  }
> diff --git a/arch/x86/boot/header.S b/arch/x86/boot/header.S
> index 16ef02596db2..9bfab22efdf7 100644
> --- a/arch/x86/boot/header.S
> +++ b/arch/x86/boot/header.S
> @@ -440,7 +440,7 @@ setup_data:         .quad 0                 # 64-bit physical pointer to
>
>  pref_address:          .quad LOAD_PHYSICAL_ADDR        # preferred load addr
>
> -#define ZO_INIT_SIZE   (ZO__end - ZO_startup_32 + ZO_z_extract_offset)
> +#define ZO_INIT_SIZE   (ZO__end - ZO_startup_32 + ZO_z_min_extract_offset)
>  #define VO_INIT_SIZE   (VO__end - VO__text)
>  #if ZO_INIT_SIZE > VO_INIT_SIZE
>  #define INIT_SIZE ZO_INIT_SIZE
> diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
> index 9f6b9341950f..0e8e4f7a31ce 100644
> --- a/arch/x86/kernel/asm-offsets.c
> +++ b/arch/x86/kernel/asm-offsets.c
> @@ -66,6 +66,7 @@ void common(void) {
>         OFFSET(BP_hardware_subarch, boot_params, hdr.hardware_subarch);
>         OFFSET(BP_version, boot_params, hdr.version);
>         OFFSET(BP_kernel_alignment, boot_params, hdr.kernel_alignment);
> +       OFFSET(BP_init_size, boot_params, hdr.init_size);
>         OFFSET(BP_pref_address, boot_params, hdr.pref_address);
>         OFFSET(BP_code32_start, boot_params, hdr.code32_start);
>
> diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
> index 00bf300fd846..a92d3dc2812a 100644
> --- a/arch/x86/kernel/vmlinux.lds.S
> +++ b/arch/x86/kernel/vmlinux.lds.S
> @@ -325,6 +325,7 @@ SECTIONS
>                 __brk_limit = .;
>         }
>
> +       . = ALIGN(PAGE_SIZE);           /* keep init size page-aligned */
>         _end = .;
>
>          STABS_DEBUG
> --
> 2.2.0.33.gc18b867
>
>
> --
> Regards/Gruss,
>     Boris.
>
> ECO tip #101: Trim your mails when you reply.
> --



-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 3/7] x86, boot: Don't overlap VO with ZO data
  2015-03-10 15:48               ` Borislav Petkov
@ 2015-03-10 19:29                 ` Yinghai Lu
  0 siblings, 0 replies; 54+ messages in thread
From: Yinghai Lu @ 2015-03-10 19:29 UTC (permalink / raw)
  To: Borislav Petkov, H. Peter Anvin, Ingo Molnar
  Cc: Matt Fleming, Kees Cook, Baoquan He, Thomas Gleixner,
	Jiri Kosina, Linux Kernel Mailing List, linux-efi

On Tue, Mar 10, 2015 at 8:48 AM, Borislav Petkov <bp@suse.de> wrote:
> On Tue, Mar 10, 2015 at 08:42:40AM -0700, Yinghai Lu wrote:
>> In arch/x86/boot/header.S, we already use VO and ZO.
>> So please keep on using them, and don't introduce "kernel proper" etc.
>
> So you're suggesting commit messages should use variable names and
> prefixes from the code instead of being human-readable?

Now we have
vmlinux   ===> VO or kernel,  that is output/init_size in misc.c
arch/x86/boot/compressed/vmlinux.bin.xz  ==> compressed kernel. that
is  input/input_len in misc.c
arch/x86/boot/compressed/vmllinux  ==> ZO,
arch/x86/boot/setup + ZO is bzImage.

I really think we should not use
"compressed kernel" for arch/x86/boot/compressed/vmlinux
that should stand for compressed vmlinux.

VO/ZO should be better thatn "kernel proper"/'compressed kernel"
for vmlinux/"arch/x86/boot/compressed/vmlinux" ?

Peter/Ingo, How to you think about that?

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 1/7] x86, kaslr: Use init_size instead of run_size
  2015-03-07 22:07 ` [PATCH v3 1/7] x86, kaslr: Use init_size instead of run_size Yinghai Lu
  2015-03-09 12:49   ` Borislav Petkov
  2015-03-10  0:42   ` Kees Cook
@ 2015-03-13 12:27   ` Ingo Molnar
  2015-03-14  2:47     ` Yinghai Lu
  2 siblings, 1 reply; 54+ messages in thread
From: Ingo Molnar @ 2015-03-13 12:27 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Matt Fleming, H. Peter Anvin, Ingo Molnar, Kees Cook,
	Borislav Petkov, Baoquan He, Thomas Gleixner, Jiri Kosina,
	linux-kernel, linux-efi, Josh Triplett, Andrew Morton,
	Ard Biesheuvel, Junjie Mao


* Yinghai Lu <yinghai@kernel.org> wrote:

> commit e6023367d779 ("x86, kaslr: Prevent .bss from overlaping initrd")
> introduced one run_size for kaslr.
> We should use real runtime size (include copy/decompress) aka init_size.

Why, what happens if we don't change this?

What's the purpose of your change?

> run_size is VO (vmlinux) init size include bss and brk.
> init_size is the size needed for decompress and it is bigger than run_size
> when decompress need more buff.

What happens if we don't have enough 'buff'?

What's the purpose of your change?

> According to arch/x86/boot/header.S:
> | #define ZO_INIT_SIZE    (ZO__end - ZO_startup_32 + ZO_z_extract_offset)
> | #define VO_INIT_SIZE    (VO__end - VO__text)
> | #if ZO_INIT_SIZE > VO_INIT_SIZE
> | #define INIT_SIZE ZO_INIT_SIZE
> | #else
> | #define INIT_SIZE VO_INIT_SIZE
> | #endif
> | init_size:              .long INIT_SIZE         # kernel initialization size
> 
> Bootloader allocate buffer according to init_size in hdr, and load the
> ZO (arch/x86/boot/compressed/vmlinux) from start of that buffer.
> init_size first come from VO (vmlinux) init size. That VO init size
> is from VO _end to VO _end and include VO bss and brk area.
> 
> During running of ZO, ZO move itself to the middle of buffer at
> z_extract_offset to make sure that decompressor would not have output
> overwrite input data before input data get consumed.
> But z_extract_offset calculating is based on size of VO (vmlinux) and size
> of compressed VO only at first.
> So need to make sure [z_extra_offset, init_size) will fit ZO, that means
> init_size need to be adjusted according to ZO size.
> That make init_size is always >= run_size.
> 
> During aslr buffer searching, we need to make sure the new buffer is big
> enough for decompress at first. So use init_size instead, and kill not
> needed run_size related code.

Why, what happens if it's not big enough?

What's the purpose of your change?

> Fixes: e6023367d779 ("x86, kaslr: Prevent .bss from overlaping initrd")

What does your patch fix in that commit?

What's the purpose of your change?

After so many words you still haven't explained _the_ most basic thing 
a Linux kernel changelog needs to include: why a change is necessary, 
i.e. what bad effects did the bug have...

Why are you ignoring the kernel technology of trying to explain things 
to others so that they can easily understand so badly?

'Code well explained to others' is just as important a technology to 
the Linux kernel as 'correct code', 'clean code' or 'fast code'.

Do you perhaps understand now why I have to ignore most of your 
patches after just a sad glance at the changelog's quality?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 1/7] x86, kaslr: Use init_size instead of run_size
  2015-03-13 12:27   ` Ingo Molnar
@ 2015-03-14  2:47     ` Yinghai Lu
  2015-03-14  7:53       ` Ingo Molnar
  0 siblings, 1 reply; 54+ messages in thread
From: Yinghai Lu @ 2015-03-14  2:47 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Matt Fleming, H. Peter Anvin, Ingo Molnar, Kees Cook,
	Borislav Petkov, Baoquan He, Thomas Gleixner, Jiri Kosina,
	Linux Kernel Mailing List, linux-efi, Josh Triplett,
	Andrew Morton, Ard Biesheuvel, Junjie Mao

On Fri, Mar 13, 2015 at 5:27 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Yinghai Lu <yinghai@kernel.org> wrote:
>
>> commit e6023367d779 ("x86, kaslr: Prevent .bss from overlaping initrd")
>> introduced one run_size for kaslr.
>> We should use real runtime size (include copy/decompress) aka init_size.
>
> Why, what happens if we don't change this?

While trying to update change log, found we still need to keep this run_size.
otherwise kaslr will find not needed big size of init_size instead of
max(output_len, run_size).

will refresh the patchset.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 1/7] x86, kaslr: Use init_size instead of run_size
  2015-03-14  2:47     ` Yinghai Lu
@ 2015-03-14  7:53       ` Ingo Molnar
  2015-03-14  9:59         ` Borislav Petkov
  0 siblings, 1 reply; 54+ messages in thread
From: Ingo Molnar @ 2015-03-14  7:53 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Matt Fleming, H. Peter Anvin, Ingo Molnar, Kees Cook,
	Borislav Petkov, Baoquan He, Thomas Gleixner, Jiri Kosina,
	Linux Kernel Mailing List, linux-efi, Josh Triplett,
	Andrew Morton, Ard Biesheuvel, Junjie Mao


* Yinghai Lu <yinghai@kernel.org> wrote:

> On Fri, Mar 13, 2015 at 5:27 AM, Ingo Molnar <mingo@kernel.org> wrote:
> >
> > * Yinghai Lu <yinghai@kernel.org> wrote:
> >
> >> commit e6023367d779 ("x86, kaslr: Prevent .bss from overlaping 
> >> initrd") introduced one run_size for kaslr. We should use real 
> >> runtime size (include copy/decompress) aka init_size.
> >
> > Why, what happens if we don't change this?
> 
> While trying to update change log, found we still need to keep this 
> run_size. otherwise kaslr will find not needed big size of init_size 
> instead of max(output_len, run_size).

You are still talking only about the low level code and about low 
level symptoms, while in contrast to that the primary question with 
_any_ change to the kernel is always:

   WHY ARE WE CHANGING THE KERNEL, WHAT BAD HIGH LEVEL BEHAVIOR CAN A 
   USER OBSERVE IF THE BUG IS NOT FIXED?

Your descriptions totally ignore the high level effects of the bug on 
the system and on users of the machine, and you fail to describe them 
properly. You totally concentrate on the low level: your descriptions 
are missing the forest from all the trees.

That makes 90% of your patch descriptions useless.

In fact, 90% of your patches had that deficiency and had it for the 
past 4 years, non-stop, and maintainers were complaining to you about 
that, non-stop as well. Do you think maintainers enjoy complaining 
about deficiencies? Do you wonder why maintainers were forced to 
simply stop looking at any complex series from yours after you refused 
to change?

> will refresh the patchset.

So let me try this again, one very last time.

That sentence demonstrates your problem: it's not a 'refresh' your 
patches need, but a 'hard reboot', a totally new viewpoint that 
concentrates on what matters: that zooms out of the small details of 
the patch!

For any serious change to the Linux kernel, analyzing small details is 
fine and required as well, AFTER the high level has been discussed 
properly:

  - What happened, what high level concern motivates you to change the 
    Linux kernel?

       And no, starting a changelog with:

          commit e6023367d779 ("x86, kaslr: Prevent .bss from 
          overlaping initrd") introduced one run_size for kaslr.

       is not 'high level' in any way, it talks about code in the 
       first sentence! Talking about code, not talking about high 
       level concerns is a BUG().

  - What was the previous (often bad) high level behavior of the
    kernel?

       And no, 'KASLR will not find a proper area' is NOT a high level
       description, it's a very low level description! Not discussing 
       high level behavior of the kernel in a changelog is a BUG().

  - What new high level behavior do we want to happen instead?

       And no, saying that 'KASLR should be passed init_size instead
       of run_size' is not a description of desired new high level
       behavior! Not discussing the desired high level behavior of the 
       kernel in a changelog is a BUG().

  - What was the high level design of the old code, was that design
    fine, should it be changed, and if yes, in what way?

       Note that we describe the high level design even if we don't
       change it, partly to give context to the reader, partly to 
       prove that you know what you are doing, to build trust in your 
       patch! Not discussing the old (and new) design of that area of 
       the kernel is a BUG().

and only then do we:

  - Describe the old implementation, and describe how the new
    implementation works in all that context.

       Here is where 99.9% of your existing changelogs fit in.
       Put differently: your changelogs are missing the first 3 
       essential components of a changelog.

       And note, mentioning 'run_size' in a low level description is 
       fine. Mentioning it in a high level description is a BUG(): 
       that is why Boris was trying to change your changelogs into 
       spoken English, to recover some of that missing high level 
       context. Maintainers like Boris should not be forced to do 
       that, you are supposed to offer this with any significant 
       patch, as a passport to prove that your changes to the code are 
       worth looking at.

And yes, we do it in that order. Take a look at a recent single line 
change Linus made in 53da3bc2ba9e48, attached to this mail.

Check how the changelog is structured: it discusses high level 
concepts first. It's a _ONE LINER FIX_ from Linus, yet it's spread 
over 8 paragraphs and 50 lines, because the change justified that kind 
of description.

And observe the moment when Linus, in his 8 paragraphs, 50 lines 
description starts talking about low level implementational details, 
when he mentions lines of code, function names, such as 
do_numa_page(), 'pte_write()' or 'pte_dirty()'.

He doesnt!

It's not needed for a one-liner most of the time: but the high level 
concepts around that code are very much necessary to convey.

Now contrast that with your changelogs: your changelogs are stock full 
of non-English phrases resembling code more than a fluid description, 
they are stock full of low level details, mentioning of function 
names, variables and fields with no high level context conveyed 
whatsoever.

Let me try to put it to you in a way that might come across: when I 
run maintainer code over your changelogs it runs into an instant BUG() 
most of the time, forcing me to drop your patches.

High level discussion of old behavior, new behavior, old design and 
new design isn't just some detail to be slapped on a change after the 
fact, this is a serious and required technological aspect of the Linux 
kernel, and 90% of your patches are buggy in that respect.

In fact, I noticed that both your descriptions and your followups to 
them are totally lacking the high level viewpoint!

Either you:

   - refuse to think in high level concepts when you are writing 
     patches, in which case we need to keep your patches away from
     the Linux kernel,

   - or you are unwilling to document such high level thinking 
     processes, in which case we need to keep your patches away from 
     the Linux kernel as well.

If your appoach to writing kernel patches does not change then I will 
be forced to take action, currently you are this -->.<-- close to 
being filtered out from my default inbox for your total refusal to 
change the technology of how you are writing patches.

Thanks,

	Ingo

[ Sample 'good' changelog from Linus: ]

======================>
>From 53da3bc2ba9e4899f32707b5cd7d18421b943687 Mon Sep 17 00:00:00 2001
From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Thu, 12 Mar 2015 08:45:46 -0700
Subject: [PATCH] mm: fix up numa read-only thread grouping logic

Dave Chinner reported that commit 4d9424669946 ("mm: convert
p[te|md]_mknonnuma and remaining page table manipulations") slowed down
his xfsrepair test enormously.  In particular, it was using more system
time due to extra TLB flushing.

The ultimate reason turns out to be how the change to use the regular
page table accessor functions broke the NUMA grouping logic.  The old
special mknuma/mknonnuma code accessed the page table present bit and
the magic NUMA bit directly, while the new code just changes the page
protections using PROT_NONE and the regular vma protections.

That sounds equivalent, and from a fault standpoint it really is, but a
subtle side effect is that the *other* protection bits of the page table
entries also change.  And the code to decide how to group the NUMA
entries together used the writable bit to decide whether a particular
page was likely to be shared read-only or not.

And with the change to make the NUMA handling use the regular permission
setting functions, that writable bit was basically always cleared for
private mappings due to COW.  So even if the page actually ends up being
written to in the end, the NUMA balancing would act as if it was always
shared RO.

This code is a heuristic anyway, so the fix - at least for now - is to
instead check whether the page is dirty rather than writable.  The bit
doesn't change with protection changes.

NOTE! This also adds a FIXME comment to revisit this issue,

Not only should we probably re-visit the whole "is this a shared
read-only page" heuristic (we might want to take the vma permissions
into account and base this more on those than the per-page ones, and
also look at whether the particular access that triggers it is a write
or not), but the whole COW issue shows that we should think about the
NUMA fault handling some more.

For example, maybe we should do the early-COW thing that a regular fault
does.  Or maybe we should accept that while using the same bits as
PROTNONE was a good thing (and got rid of the specual NUMA bit), we
might still want to just preseve the other protection bits across NUMA
faulting.

Those are bigger questions, left for later.  This just fixes up the
heuristic so that it at least approximates working again.  More analysis
and work needed.

Reported-by: Dave Chinner <david@fromorbit.com>
Tested-by: Mel Gorman <mgorman@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>,
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 mm/memory.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/mm/memory.c b/mm/memory.c
index 8068893697bb..411144f977b1 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3072,8 +3072,13 @@ static int do_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
 	 * Avoid grouping on DSO/COW pages in specific and RO pages
 	 * in general, RO pages shouldn't hurt as much anyway since
 	 * they can be in shared cache state.
+	 *
+	 * FIXME! This checks "pmd_dirty()" as an approximation of
+	 * "is this a read-only page", since checking "pmd_write()"
+	 * is even more broken. We haven't actually turned this into
+	 * a writable page, so pmd_write() will always be false.
 	 */
-	if (!pte_write(pte))
+	if (!pte_dirty(pte))
 		flags |= TNF_NO_GROUP;
 
 	/*

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v3 1/7] x86, kaslr: Use init_size instead of run_size
  2015-03-14  7:53       ` Ingo Molnar
@ 2015-03-14  9:59         ` Borislav Petkov
  2015-03-16 10:06           ` [PATCH] Revert "x86/mm/ASLR: Propagate base load address calculation" Borislav Petkov
  0 siblings, 1 reply; 54+ messages in thread
From: Borislav Petkov @ 2015-03-14  9:59 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Yinghai Lu, Matt Fleming, H. Peter Anvin, Ingo Molnar, Kees Cook,
	Baoquan He, Thomas Gleixner, Jiri Kosina,
	Linux Kernel Mailing List, linux-efi, Josh Triplett,
	Andrew Morton, Ard Biesheuvel, Junjie Mao

On Sat, Mar 14, 2015 at 08:53:57AM +0100, Ingo Molnar wrote:
> You are still talking only about the low level code and about low
> level symptoms, while in contrast to that the primary question with...

<snip text which will be ignored again, unfortunately>

Thanks for doing that!

I really doubt it'll bring anything this time either but I don't care -
certainly I've wasted enough time trying to improve things as far as I
can so hell yeah, this bullshit has to stop!

In addition to what you've said, I have only one proposal: we revert
the whole crap and do it again, this time nice and clean in tip. It'll
stay there as long as it has to. No half-arsed commit messages, no
misunderstood crap, no lala code f*ckery. Whatever it takes, as long as
it takes.

Thanks.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH] Revert "x86/mm/ASLR: Propagate base load address calculation"
  2015-03-14  9:59         ` Borislav Petkov
@ 2015-03-16 10:06           ` Borislav Petkov
  2015-03-16 12:11             ` [tip:x86/urgent] " tip-bot for Borislav Petkov
  2015-03-16 13:56             ` [PATCH] " Jiri Kosina
  0 siblings, 2 replies; 54+ messages in thread
From: Borislav Petkov @ 2015-03-16 10:06 UTC (permalink / raw)
  To: Ingo Molnar, Jiri Kosina
  Cc: Yinghai Lu, Matt Fleming, H. Peter Anvin, Ingo Molnar, Kees Cook,
	Baoquan He, Thomas Gleixner, Linux Kernel Mailing List,
	linux-efi, Josh Triplett, Andrew Morton, Ard Biesheuvel,
	Junjie Mao

On Sat, Mar 14, 2015 at 10:59:23AM +0100, Borislav Petkov wrote:
> In addition to what you've said, I have only one proposal: we revert
> the whole crap and do it again, this time nice and clean in tip. It'll
> stay there as long as it has to. No half-arsed commit messages, no
> misunderstood crap, no lala code f*ckery. Whatever it takes, as long as
> it takes.

And here it is:

---
From: Borislav Petkov <bp@suse.de>
Date: Mon, 16 Mar 2015 10:57:56 +0100
Subject: [PATCH] Revert "x86/mm/ASLR: Propagate base load address calculation"

This reverts commit f47233c2d34f243ecdaac179c3408a39ff9216a7.

The main reason for the revert is that in order to make this work, we
need non-trivial changes to the x86 boot code which we didn't manage to
get done in time for merging.

And even if we did, they would've been too risky so instead of rushing
things and break booting 4.1 on boxes left and right, we will be very
strict and conservative and will take our time with this to fix and test
it properly.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Kees Cook <keescook@chromium.org>
Cc: "H. Peter Anvin" <hpa@linux.intel.com>
---
 arch/x86/boot/compressed/aslr.c       | 34 +---------------------------------
 arch/x86/boot/compressed/misc.c       |  3 +--
 arch/x86/boot/compressed/misc.h       |  6 ++----
 arch/x86/include/asm/page_types.h     |  2 --
 arch/x86/include/uapi/asm/bootparam.h |  1 -
 arch/x86/kernel/module.c              | 10 +++++++++-
 arch/x86/kernel/setup.c               | 22 ++++------------------
 7 files changed, 17 insertions(+), 61 deletions(-)

diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index 7083c16cccba..bb1376381985 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -14,13 +14,6 @@
 static const char build_str[] = UTS_RELEASE " (" LINUX_COMPILE_BY "@"
 		LINUX_COMPILE_HOST ") (" LINUX_COMPILER ") " UTS_VERSION;
 
-struct kaslr_setup_data {
-	__u64 next;
-	__u32 type;
-	__u32 len;
-	__u8 data[1];
-} kaslr_setup_data;
-
 #define I8254_PORT_CONTROL	0x43
 #define I8254_PORT_COUNTER0	0x40
 #define I8254_CMD_READBACK	0xC0
@@ -302,29 +295,7 @@ static unsigned long find_random_addr(unsigned long minimum,
 	return slots_fetch_random();
 }
 
-static void add_kaslr_setup_data(struct boot_params *params, __u8 enabled)
-{
-	struct setup_data *data;
-
-	kaslr_setup_data.type = SETUP_KASLR;
-	kaslr_setup_data.len = 1;
-	kaslr_setup_data.next = 0;
-	kaslr_setup_data.data[0] = enabled;
-
-	data = (struct setup_data *)(unsigned long)params->hdr.setup_data;
-
-	while (data && data->next)
-		data = (struct setup_data *)(unsigned long)data->next;
-
-	if (data)
-		data->next = (unsigned long)&kaslr_setup_data;
-	else
-		params->hdr.setup_data = (unsigned long)&kaslr_setup_data;
-
-}
-
-unsigned char *choose_kernel_location(struct boot_params *params,
-				      unsigned char *input,
+unsigned char *choose_kernel_location(unsigned char *input,
 				      unsigned long input_size,
 				      unsigned char *output,
 				      unsigned long output_size)
@@ -335,17 +306,14 @@ unsigned char *choose_kernel_location(struct boot_params *params,
 #ifdef CONFIG_HIBERNATION
 	if (!cmdline_find_option_bool("kaslr")) {
 		debug_putstr("KASLR disabled by default...\n");
-		add_kaslr_setup_data(params, 0);
 		goto out;
 	}
 #else
 	if (cmdline_find_option_bool("nokaslr")) {
 		debug_putstr("KASLR disabled by cmdline...\n");
-		add_kaslr_setup_data(params, 0);
 		goto out;
 	}
 #endif
-	add_kaslr_setup_data(params, 1);
 
 	/* Record the various known unsafe memory ranges. */
 	mem_avoid_init((unsigned long)input, input_size,
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 5903089c818f..a950864a64da 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -401,8 +401,7 @@ asmlinkage __visible void *decompress_kernel(void *rmode, memptr heap,
 	 * the entire decompressed kernel plus relocation table, or the
 	 * entire decompressed kernel plus .bss and .brk sections.
 	 */
-	output = choose_kernel_location(real_mode, input_data, input_len,
-					output,
+	output = choose_kernel_location(input_data, input_len, output,
 					output_len > run_size ? output_len
 							      : run_size);
 
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index ee3576b2666b..04477d68403f 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -57,8 +57,7 @@ int cmdline_find_option_bool(const char *option);
 
 #if CONFIG_RANDOMIZE_BASE
 /* aslr.c */
-unsigned char *choose_kernel_location(struct boot_params *params,
-				      unsigned char *input,
+unsigned char *choose_kernel_location(unsigned char *input,
 				      unsigned long input_size,
 				      unsigned char *output,
 				      unsigned long output_size);
@@ -66,8 +65,7 @@ unsigned char *choose_kernel_location(struct boot_params *params,
 bool has_cpuflag(int flag);
 #else
 static inline
-unsigned char *choose_kernel_location(struct boot_params *params,
-				      unsigned char *input,
+unsigned char *choose_kernel_location(unsigned char *input,
 				      unsigned long input_size,
 				      unsigned char *output,
 				      unsigned long output_size)
diff --git a/arch/x86/include/asm/page_types.h b/arch/x86/include/asm/page_types.h
index 95e11f79f123..f97fbe3abb67 100644
--- a/arch/x86/include/asm/page_types.h
+++ b/arch/x86/include/asm/page_types.h
@@ -51,8 +51,6 @@ extern int devmem_is_allowed(unsigned long pagenr);
 extern unsigned long max_low_pfn_mapped;
 extern unsigned long max_pfn_mapped;
 
-extern bool kaslr_enabled;
-
 static inline phys_addr_t get_max_mapped(void)
 {
 	return (phys_addr_t)max_pfn_mapped << PAGE_SHIFT;
diff --git a/arch/x86/include/uapi/asm/bootparam.h b/arch/x86/include/uapi/asm/bootparam.h
index 44e6dd7e36a2..225b0988043a 100644
--- a/arch/x86/include/uapi/asm/bootparam.h
+++ b/arch/x86/include/uapi/asm/bootparam.h
@@ -7,7 +7,6 @@
 #define SETUP_DTB			2
 #define SETUP_PCI			3
 #define SETUP_EFI			4
-#define SETUP_KASLR			5
 
 /* ram_size flags */
 #define RAMDISK_IMAGE_START_MASK	0x07FF
diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c
index 9bbb9b35c144..d1ac80b72c72 100644
--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -47,13 +47,21 @@ do {							\
 
 #ifdef CONFIG_RANDOMIZE_BASE
 static unsigned long module_load_offset;
+static int randomize_modules = 1;
 
 /* Mutex protects the module_load_offset. */
 static DEFINE_MUTEX(module_kaslr_mutex);
 
+static int __init parse_nokaslr(char *p)
+{
+	randomize_modules = 0;
+	return 0;
+}
+early_param("nokaslr", parse_nokaslr);
+
 static unsigned long int get_module_load_offset(void)
 {
-	if (kaslr_enabled) {
+	if (randomize_modules) {
 		mutex_lock(&module_kaslr_mutex);
 		/*
 		 * Calculate the module_load_offset the first time this
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 98dc9317286e..0a2421cca01f 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -122,8 +122,6 @@
 unsigned long max_low_pfn_mapped;
 unsigned long max_pfn_mapped;
 
-bool __read_mostly kaslr_enabled = false;
-
 #ifdef CONFIG_DMI
 RESERVE_BRK(dmi_alloc, 65536);
 #endif
@@ -427,11 +425,6 @@ static void __init reserve_initrd(void)
 }
 #endif /* CONFIG_BLK_DEV_INITRD */
 
-static void __init parse_kaslr_setup(u64 pa_data, u32 data_len)
-{
-	kaslr_enabled = (bool)(pa_data + sizeof(struct setup_data));
-}
-
 static void __init parse_setup_data(void)
 {
 	struct setup_data *data;
@@ -457,9 +450,6 @@ static void __init parse_setup_data(void)
 		case SETUP_EFI:
 			parse_efi_setup(pa_data, data_len);
 			break;
-		case SETUP_KASLR:
-			parse_kaslr_setup(pa_data, data_len);
-			break;
 		default:
 			break;
 		}
@@ -842,14 +832,10 @@ static void __init trim_low_memory_range(void)
 static int
 dump_kernel_offset(struct notifier_block *self, unsigned long v, void *p)
 {
-	if (kaslr_enabled)
-		pr_emerg("Kernel Offset: 0x%lx from 0x%lx (relocation range: 0x%lx-0x%lx)\n",
-			 (unsigned long)&_text - __START_KERNEL,
-			 __START_KERNEL,
-			 __START_KERNEL_map,
-			 MODULES_VADDR-1);
-	else
-		pr_emerg("Kernel Offset: disabled\n");
+	pr_emerg("Kernel Offset: 0x%lx from 0x%lx "
+		 "(relocation range: 0x%lx-0x%lx)\n",
+		 (unsigned long)&_text - __START_KERNEL, __START_KERNEL,
+		 __START_KERNEL_map, MODULES_VADDR-1);
 
 	return 0;
 }
-- 
2.3.3


-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [tip:x86/urgent] Revert "x86/mm/ASLR: Propagate base load address calculation"
  2015-03-16 10:06           ` [PATCH] Revert "x86/mm/ASLR: Propagate base load address calculation" Borislav Petkov
@ 2015-03-16 12:11             ` " tip-bot for Borislav Petkov
  2015-03-16 19:32               ` Yinghai Lu
  2015-03-16 13:56             ` [PATCH] " Jiri Kosina
  1 sibling, 1 reply; 54+ messages in thread
From: tip-bot for Borislav Petkov @ 2015-03-16 12:11 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: tglx, mingo, josh, linux-kernel, jkosina, matt.fleming, bhe,
	torvalds, hpa, keescook, yinghai, eternal.n08, bp,
	ard.biesheuvel

Commit-ID:  69797dafe35541bfff1989c0b37c66ed785faf0e
Gitweb:     http://git.kernel.org/tip/69797dafe35541bfff1989c0b37c66ed785faf0e
Author:     Borislav Petkov <bp@suse.de>
AuthorDate: Mon, 16 Mar 2015 11:06:28 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Mon, 16 Mar 2015 11:18:21 +0100

Revert "x86/mm/ASLR: Propagate base load address calculation"

This reverts commit:

  f47233c2d34f ("x86/mm/ASLR: Propagate base load address calculation")

The main reason for the revert is that the new boot flag does not work
at all currently, and in order to make this work, we need non-trivial
changes to the x86 boot code which we didn't manage to get done in
time for merging.

And even if we did, they would've been too risky so instead of
rushing things and break booting 4.1 on boxes left and right, we
will be very strict and conservative and will take our time with
this to fix and test it properly.

Reported-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: H. Peter Anvin <hpa@linux.intel.com
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Junjie Mao <eternal.n08@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt.fleming@intel.com>
Link: http://lkml.kernel.org/r/20150316100628.GD22995@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/boot/compressed/aslr.c       | 34 +---------------------------------
 arch/x86/boot/compressed/misc.c       |  3 +--
 arch/x86/boot/compressed/misc.h       |  6 ++----
 arch/x86/include/asm/page_types.h     |  2 --
 arch/x86/include/uapi/asm/bootparam.h |  1 -
 arch/x86/kernel/module.c              | 10 +++++++++-
 arch/x86/kernel/setup.c               | 22 ++++------------------
 7 files changed, 17 insertions(+), 61 deletions(-)

diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index 7083c16..bb13763 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -14,13 +14,6 @@
 static const char build_str[] = UTS_RELEASE " (" LINUX_COMPILE_BY "@"
 		LINUX_COMPILE_HOST ") (" LINUX_COMPILER ") " UTS_VERSION;
 
-struct kaslr_setup_data {
-	__u64 next;
-	__u32 type;
-	__u32 len;
-	__u8 data[1];
-} kaslr_setup_data;
-
 #define I8254_PORT_CONTROL	0x43
 #define I8254_PORT_COUNTER0	0x40
 #define I8254_CMD_READBACK	0xC0
@@ -302,29 +295,7 @@ static unsigned long find_random_addr(unsigned long minimum,
 	return slots_fetch_random();
 }
 
-static void add_kaslr_setup_data(struct boot_params *params, __u8 enabled)
-{
-	struct setup_data *data;
-
-	kaslr_setup_data.type = SETUP_KASLR;
-	kaslr_setup_data.len = 1;
-	kaslr_setup_data.next = 0;
-	kaslr_setup_data.data[0] = enabled;
-
-	data = (struct setup_data *)(unsigned long)params->hdr.setup_data;
-
-	while (data && data->next)
-		data = (struct setup_data *)(unsigned long)data->next;
-
-	if (data)
-		data->next = (unsigned long)&kaslr_setup_data;
-	else
-		params->hdr.setup_data = (unsigned long)&kaslr_setup_data;
-
-}
-
-unsigned char *choose_kernel_location(struct boot_params *params,
-				      unsigned char *input,
+unsigned char *choose_kernel_location(unsigned char *input,
 				      unsigned long input_size,
 				      unsigned char *output,
 				      unsigned long output_size)
@@ -335,17 +306,14 @@ unsigned char *choose_kernel_location(struct boot_params *params,
 #ifdef CONFIG_HIBERNATION
 	if (!cmdline_find_option_bool("kaslr")) {
 		debug_putstr("KASLR disabled by default...\n");
-		add_kaslr_setup_data(params, 0);
 		goto out;
 	}
 #else
 	if (cmdline_find_option_bool("nokaslr")) {
 		debug_putstr("KASLR disabled by cmdline...\n");
-		add_kaslr_setup_data(params, 0);
 		goto out;
 	}
 #endif
-	add_kaslr_setup_data(params, 1);
 
 	/* Record the various known unsafe memory ranges. */
 	mem_avoid_init((unsigned long)input, input_size,
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 5903089..a950864 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -401,8 +401,7 @@ asmlinkage __visible void *decompress_kernel(void *rmode, memptr heap,
 	 * the entire decompressed kernel plus relocation table, or the
 	 * entire decompressed kernel plus .bss and .brk sections.
 	 */
-	output = choose_kernel_location(real_mode, input_data, input_len,
-					output,
+	output = choose_kernel_location(input_data, input_len, output,
 					output_len > run_size ? output_len
 							      : run_size);
 
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index ee3576b..04477d6 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -57,8 +57,7 @@ int cmdline_find_option_bool(const char *option);
 
 #if CONFIG_RANDOMIZE_BASE
 /* aslr.c */
-unsigned char *choose_kernel_location(struct boot_params *params,
-				      unsigned char *input,
+unsigned char *choose_kernel_location(unsigned char *input,
 				      unsigned long input_size,
 				      unsigned char *output,
 				      unsigned long output_size);
@@ -66,8 +65,7 @@ unsigned char *choose_kernel_location(struct boot_params *params,
 bool has_cpuflag(int flag);
 #else
 static inline
-unsigned char *choose_kernel_location(struct boot_params *params,
-				      unsigned char *input,
+unsigned char *choose_kernel_location(unsigned char *input,
 				      unsigned long input_size,
 				      unsigned char *output,
 				      unsigned long output_size)
diff --git a/arch/x86/include/asm/page_types.h b/arch/x86/include/asm/page_types.h
index 95e11f7..f97fbe3 100644
--- a/arch/x86/include/asm/page_types.h
+++ b/arch/x86/include/asm/page_types.h
@@ -51,8 +51,6 @@ extern int devmem_is_allowed(unsigned long pagenr);
 extern unsigned long max_low_pfn_mapped;
 extern unsigned long max_pfn_mapped;
 
-extern bool kaslr_enabled;
-
 static inline phys_addr_t get_max_mapped(void)
 {
 	return (phys_addr_t)max_pfn_mapped << PAGE_SHIFT;
diff --git a/arch/x86/include/uapi/asm/bootparam.h b/arch/x86/include/uapi/asm/bootparam.h
index 44e6dd7..225b098 100644
--- a/arch/x86/include/uapi/asm/bootparam.h
+++ b/arch/x86/include/uapi/asm/bootparam.h
@@ -7,7 +7,6 @@
 #define SETUP_DTB			2
 #define SETUP_PCI			3
 #define SETUP_EFI			4
-#define SETUP_KASLR			5
 
 /* ram_size flags */
 #define RAMDISK_IMAGE_START_MASK	0x07FF
diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c
index 9bbb9b3..d1ac80b 100644
--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -47,13 +47,21 @@ do {							\
 
 #ifdef CONFIG_RANDOMIZE_BASE
 static unsigned long module_load_offset;
+static int randomize_modules = 1;
 
 /* Mutex protects the module_load_offset. */
 static DEFINE_MUTEX(module_kaslr_mutex);
 
+static int __init parse_nokaslr(char *p)
+{
+	randomize_modules = 0;
+	return 0;
+}
+early_param("nokaslr", parse_nokaslr);
+
 static unsigned long int get_module_load_offset(void)
 {
-	if (kaslr_enabled) {
+	if (randomize_modules) {
 		mutex_lock(&module_kaslr_mutex);
 		/*
 		 * Calculate the module_load_offset the first time this
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 98dc931..0a2421c 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -122,8 +122,6 @@
 unsigned long max_low_pfn_mapped;
 unsigned long max_pfn_mapped;
 
-bool __read_mostly kaslr_enabled = false;
-
 #ifdef CONFIG_DMI
 RESERVE_BRK(dmi_alloc, 65536);
 #endif
@@ -427,11 +425,6 @@ static void __init reserve_initrd(void)
 }
 #endif /* CONFIG_BLK_DEV_INITRD */
 
-static void __init parse_kaslr_setup(u64 pa_data, u32 data_len)
-{
-	kaslr_enabled = (bool)(pa_data + sizeof(struct setup_data));
-}
-
 static void __init parse_setup_data(void)
 {
 	struct setup_data *data;
@@ -457,9 +450,6 @@ static void __init parse_setup_data(void)
 		case SETUP_EFI:
 			parse_efi_setup(pa_data, data_len);
 			break;
-		case SETUP_KASLR:
-			parse_kaslr_setup(pa_data, data_len);
-			break;
 		default:
 			break;
 		}
@@ -842,14 +832,10 @@ static void __init trim_low_memory_range(void)
 static int
 dump_kernel_offset(struct notifier_block *self, unsigned long v, void *p)
 {
-	if (kaslr_enabled)
-		pr_emerg("Kernel Offset: 0x%lx from 0x%lx (relocation range: 0x%lx-0x%lx)\n",
-			 (unsigned long)&_text - __START_KERNEL,
-			 __START_KERNEL,
-			 __START_KERNEL_map,
-			 MODULES_VADDR-1);
-	else
-		pr_emerg("Kernel Offset: disabled\n");
+	pr_emerg("Kernel Offset: 0x%lx from 0x%lx "
+		 "(relocation range: 0x%lx-0x%lx)\n",
+		 (unsigned long)&_text - __START_KERNEL, __START_KERNEL,
+		 __START_KERNEL_map, MODULES_VADDR-1);
 
 	return 0;
 }

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH] Revert "x86/mm/ASLR: Propagate base load address calculation"
  2015-03-16 10:06           ` [PATCH] Revert "x86/mm/ASLR: Propagate base load address calculation" Borislav Petkov
  2015-03-16 12:11             ` [tip:x86/urgent] " tip-bot for Borislav Petkov
@ 2015-03-16 13:56             ` " Jiri Kosina
  2015-03-16 19:15               ` Yinghai Lu
  1 sibling, 1 reply; 54+ messages in thread
From: Jiri Kosina @ 2015-03-16 13:56 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, Yinghai Lu, Matt Fleming, H. Peter Anvin,
	Ingo Molnar, Kees Cook, Baoquan He, Thomas Gleixner,
	Linux Kernel Mailing List, linux-efi, Josh Triplett,
	Andrew Morton, Ard Biesheuvel, Junjie Mao

On Mon, 16 Mar 2015, Borislav Petkov wrote:

> On Sat, Mar 14, 2015 at 10:59:23AM +0100, Borislav Petkov wrote:
> > In addition to what you've said, I have only one proposal: we revert
> > the whole crap and do it again, this time nice and clean in tip. It'll
> > stay there as long as it has to. No half-arsed commit messages, no
> > misunderstood crap, no lala code f*ckery. Whatever it takes, as long as
> > it takes.
> 
> And here it is:
> 
> ---
> From: Borislav Petkov <bp@suse.de>
> Date: Mon, 16 Mar 2015 10:57:56 +0100
> Subject: [PATCH] Revert "x86/mm/ASLR: Propagate base load address calculation"
> 
> This reverts commit f47233c2d34f243ecdaac179c3408a39ff9216a7.
> 
> The main reason for the revert is that in order to make this work, we
> need non-trivial changes to the x86 boot code which we didn't manage to
> get done in time for merging.
> 
> And even if we did, they would've been too risky so instead of rushing
> things and break booting 4.1 on boxes left and right, we will be very
> strict and conservative and will take our time with this to fix and test
> it properly.
> 
> Signed-off-by: Borislav Petkov <bp@suse.de>
> Cc: Jiri Kosina <jkosina@suse.cz>

Agreed. Let's work on a better refresh for 4.1+. The fix attempts were 
quite chaotic.

Acked-by: Jiri Kosina <jkosina@suse.cz>

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH] Revert "x86/mm/ASLR: Propagate base load address calculation"
  2015-03-16 13:56             ` [PATCH] " Jiri Kosina
@ 2015-03-16 19:15               ` Yinghai Lu
  2015-03-17  8:14                 ` Ingo Molnar
  0 siblings, 1 reply; 54+ messages in thread
From: Yinghai Lu @ 2015-03-16 19:15 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Borislav Petkov, Ingo Molnar, Matt Fleming, H. Peter Anvin,
	Ingo Molnar, Kees Cook, Baoquan He, Thomas Gleixner,
	Linux Kernel Mailing List, linux-efi, Josh Triplett,
	Andrew Morton, Ard Biesheuvel, Junjie Mao

On Mon, Mar 16, 2015 at 6:56 AM, Jiri Kosina <jkosina@suse.cz> wrote:
> On Mon, 16 Mar 2015, Borislav Petkov wrote:
>> From: Borislav Petkov <bp@suse.de>
>> Date: Mon, 16 Mar 2015 10:57:56 +0100
>> Subject: [PATCH] Revert "x86/mm/ASLR: Propagate base load address calculation"
>>
>> This reverts commit f47233c2d34f243ecdaac179c3408a39ff9216a7.
...
>
> Agreed. Let's work on a better refresh for 4.1+. The fix attempts were
> quite chaotic.

Then what is the plan for 4.1+ ?

Are you still going to use kaslr_setup_data?
If if is YES, then we should have
patch1: https://lkml.org/lkml/2015/3/15/7
patch2:
f47233c2d34f243ecdaac179c3408a39ff9216a7 + fix from
https://lkml.org/lkml/2015/3/15/8

If you are not going to have setup_data and
f47233c2d34f243ecdaac179c3408a39ff9216a7
back, please let me know.
I may need to change patch in  https://lkml.org/lkml/2015/3/15/7,
So it only have ADDON_ZO_SIZE with pgtable only and
we can go on with kaslr on 64bit above 4G support.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [tip:x86/urgent] Revert "x86/mm/ASLR: Propagate base load address calculation"
  2015-03-16 12:11             ` [tip:x86/urgent] " tip-bot for Borislav Petkov
@ 2015-03-16 19:32               ` Yinghai Lu
  0 siblings, 0 replies; 54+ messages in thread
From: Yinghai Lu @ 2015-03-16 19:32 UTC (permalink / raw)
  To: Ard Biesheuvel, Borislav Petkov, Junjie Mao, Yinghai Lu,
	Kees Cook, H. Peter Anvin, Linus Torvalds, Baoquan He,
	Matt Fleming, Jiri Kosina, Josh Triplett,
	Linux Kernel Mailing List, Ingo Molnar, Thomas Gleixner
  Cc: linux-tip-commits

On Mon, Mar 16, 2015 at 5:11 AM, tip-bot for Borislav Petkov
<tipbot@zytor.com> wrote:
> Commit-ID:  69797dafe35541bfff1989c0b37c66ed785faf0e
> Gitweb:     http://git.kernel.org/tip/69797dafe35541bfff1989c0b37c66ed785faf0e
> Author:     Borislav Petkov <bp@suse.de>
> AuthorDate: Mon, 16 Mar 2015 11:06:28 +0100
> Committer:  Ingo Molnar <mingo@kernel.org>
> CommitDate: Mon, 16 Mar 2015 11:18:21 +0100
>
> Revert "x86/mm/ASLR: Propagate base load address calculation"
>
> This reverts commit:
>
>   f47233c2d34f ("x86/mm/ASLR: Propagate base load address calculation")
>
> The main reason for the revert is that the new boot flag does not work
> at all currently, and in order to make this work, we need non-trivial
> changes to the x86 boot code which we didn't manage to get done in
> time for merging.
>
> And even if we did, they would've been too risky so instead of
> rushing things and break booting 4.1 on boxes left and right, we
> will be very strict and conservative and will take our time with
> this to fix and test it properly.
>
> Reported-by: Yinghai Lu <yinghai@kernel.org>

Please put
Reported-by: Ying Huang <ying.huang@intel.com>
too

accord to

http://marc.info/?l=linux-kernel&m=142492905425130&w=2
...
> Link: http://lkml.kernel.org/r/20150316100628.GD22995@pd.tnic

the URL does not work.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH] Revert "x86/mm/ASLR: Propagate base load address calculation"
  2015-03-16 19:15               ` Yinghai Lu
@ 2015-03-17  8:14                 ` Ingo Molnar
  0 siblings, 0 replies; 54+ messages in thread
From: Ingo Molnar @ 2015-03-17  8:14 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Jiri Kosina, Borislav Petkov, Matt Fleming, H. Peter Anvin,
	Ingo Molnar, Kees Cook, Baoquan He, Thomas Gleixner,
	Linux Kernel Mailing List, linux-efi, Josh Triplett,
	Andrew Morton, Ard Biesheuvel, Junjie Mao


* Yinghai Lu <yinghai@kernel.org> wrote:

> On Mon, Mar 16, 2015 at 6:56 AM, Jiri Kosina <jkosina@suse.cz> wrote:
> > On Mon, 16 Mar 2015, Borislav Petkov wrote:
> >> From: Borislav Petkov <bp@suse.de>
> >> Date: Mon, 16 Mar 2015 10:57:56 +0100
> >> Subject: [PATCH] Revert "x86/mm/ASLR: Propagate base load address calculation"
> >>
> >> This reverts commit f47233c2d34f243ecdaac179c3408a39ff9216a7.
> ...
> >
> > Agreed. Let's work on a better refresh for 4.1+. The fix attempts were
> > quite chaotic.
> 
> Then what is the plan for 4.1+ ?

My plan is to apply correct, clean, properly granular and properly 
changelogged patches.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 54+ messages in thread

end of thread, back to index

Thread overview: 54+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-07 22:07 [PATCH v3 0/7] x86, boot: clean up kasl Yinghai Lu
2015-03-07 22:07 ` [PATCH v3 1/7] x86, kaslr: Use init_size instead of run_size Yinghai Lu
2015-03-09 12:49   ` Borislav Petkov
2015-03-09 15:58     ` Ingo Molnar
2015-03-09 15:58       ` Borislav Petkov
2015-03-09 19:35     ` Yinghai Lu
2015-03-09 20:00       ` Borislav Petkov
2015-03-09 20:06         ` Yinghai Lu
2015-03-09 20:18           ` Borislav Petkov
2015-03-09 21:28             ` Yinghai Lu
2015-03-10  0:42   ` Kees Cook
2015-03-13 12:27   ` Ingo Molnar
2015-03-14  2:47     ` Yinghai Lu
2015-03-14  7:53       ` Ingo Molnar
2015-03-14  9:59         ` Borislav Petkov
2015-03-16 10:06           ` [PATCH] Revert "x86/mm/ASLR: Propagate base load address calculation" Borislav Petkov
2015-03-16 12:11             ` [tip:x86/urgent] " tip-bot for Borislav Petkov
2015-03-16 19:32               ` Yinghai Lu
2015-03-16 13:56             ` [PATCH] " Jiri Kosina
2015-03-16 19:15               ` Yinghai Lu
2015-03-17  8:14                 ` Ingo Molnar
2015-03-07 22:07 ` [PATCH v3 2/7] x86, boot: Move ZO to end of buffer Yinghai Lu
2015-03-10  0:54   ` Kees Cook
2015-03-10  1:04     ` Yinghai Lu
2015-03-10  5:59     ` Borislav Petkov
2015-03-10  8:00   ` Borislav Petkov
2015-03-10  9:34     ` Jiri Kosina
2015-03-10  9:35       ` Borislav Petkov
2015-03-10 15:11     ` Yinghai Lu
2015-03-10 15:13       ` Borislav Petkov
2015-03-10 16:59     ` Kees Cook
2015-03-07 22:07 ` [PATCH v3 3/7] x86, boot: Don't overlap VO with ZO data Yinghai Lu
2015-03-10  9:34   ` Borislav Petkov
2015-03-10 15:05     ` Yinghai Lu
2015-03-10 15:10       ` Borislav Petkov
2015-03-10 15:17         ` Yinghai Lu
2015-03-10 15:21           ` Borislav Petkov
2015-03-10 15:42             ` Yinghai Lu
2015-03-10 15:48               ` Borislav Petkov
2015-03-10 19:29                 ` Yinghai Lu
2015-03-07 22:07 ` [PATCH v3 4/7] x86, kaslr: Access the correct kaslr_enabled variable Yinghai Lu
2015-03-10  0:55   ` Kees Cook
2015-03-07 22:07 ` [PATCH v3 5/7] x86, kaslr: Consolidate mem_avoid array filling Yinghai Lu
2015-03-10  1:00   ` Kees Cook
2015-03-10  1:10     ` Yinghai Lu
2015-03-10  1:26       ` Kees Cook
2015-03-07 22:07 ` [PATCH v3 6/7] x86, boot: Split kernel_ident_mapping_init to another file Yinghai Lu
2015-03-10  1:03   ` Kees Cook
2015-03-07 22:07 ` [PATCH v3 7/7] x86, kaslr, 64bit: Set new or extra ident_mapping Yinghai Lu
2015-03-10  1:09   ` Kees Cook
2015-03-10  1:14     ` Yinghai Lu
2015-03-10  6:54       ` Yinghai Lu
2015-03-10  0:39 ` [PATCH v3 0/7] x86, boot: clean up kasl Kees Cook
2015-03-10  0:54   ` Yinghai Lu

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org linux-kernel@archiver.kernel.org
	public-inbox-index lkml


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/ public-inbox