All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 00/19] x86, boot: kaslr cleanup and 64bit kaslr support
@ 2015-03-18  7:28 ` Yinghai Lu
  0 siblings, 0 replies; 33+ messages in thread
From: Yinghai Lu @ 2015-03-18  7:28 UTC (permalink / raw)
  To: Matt Fleming, H. Peter Anvin, Ingo Molnar, Jiri Kosina,
	Kees Cook, Borislav Petkov, Baoquan He
  Cc: Thomas Gleixner, linux-kernel, linux-efi, Yinghai Lu

First make ZO (arch/x86/boot/compressed/vmlinux) data region is not
overwritten by VO (vmlinux) after decompress.  So could pass data from ZO to VO.

Second one is second try for kaslr_setup_data support.

Patch 3-11, are kaslr clean up and enable ident mapping for He's patches.
  kill run_size calculation shell scripts.
  create new ident mapping for kasl 64bit, so we can cover
   above 4G random kernel base, also don't need to track pagetable
   for 64bit bootloader (patched grub2 or kexec).
   that will make mem_avoid handling simple.

Also put 7 patches from He that support random random, as I already used
his patches to test the ident mapping code, and could save some rebase
work for him.

also at:
git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git for-x86-4.0-rc5-aslr

Thanks

Yinghai


Baoquan He (7):
  x86, kaslr: Fix a bug that relocation can not be handled when kernel is loaded above 2G
  x86, kaslr: Introduce struct slot_area to manage randomization slot info
  x86, kaslr: Add two functions which will be used later
  x86, kaslr: Introduce fetch_random_virt_offset to randomize the kernel text mapping address
  x86, kaslr: Randomize physical and virtual address of kernel separately
  x86, kaslr: Add support of kernel physical address randomization above 4G
  x86, kaslr: Remove useless codes

Jiri Kosina (1):
  x86, kaslr: Propagate base load address calculation v2

Yinghai Lu (11):
  x86, boot: Make data from decompress_kernel stage live longer
  x86, boot: Simplify run_size calculation
  x86, kaslr: Kill not used run_size related code.
  x86, kaslr: Use output_run_size
  x86, kaslr: Consolidate mem_avoid array filling
  x86, boot: Move z_extract_offset calculation to header.S
  x86, kaslr: Get correct max_addr for relocs pointer
  x86, boot: Split kernel_ident_mapping_init to another file
  x86, 64bit: Set ident_mapping for kaslr
  x86, boot: Add checking for memcpy
  x86, kaslr: Allow random address could be below loaded address

 arch/x86/boot/Makefile                 |  13 +-
 arch/x86/boot/compressed/Makefile      |  19 ++-
 arch/x86/boot/compressed/aslr.c        | 281 ++++++++++++++++++++++++---------
 arch/x86/boot/compressed/head_32.S     |  14 +-
 arch/x86/boot/compressed/head_64.S     |  15 +-
 arch/x86/boot/compressed/misc.c        |  71 +++++----
 arch/x86/boot/compressed/misc.h        |  32 ++--
 arch/x86/boot/compressed/misc_pgt.c    |  91 +++++++++++
 arch/x86/boot/compressed/mkpiggy.c     |  28 +---
 arch/x86/boot/compressed/string.c      |  28 +++-
 arch/x86/boot/compressed/vmlinux.lds.S |   2 +
 arch/x86/boot/header.S                 |  43 ++++-
 arch/x86/include/asm/aslr.h            |  10 ++
 arch/x86/include/asm/boot.h            |  19 +++
 arch/x86/include/asm/page.h            |   5 +
 arch/x86/include/asm/page_types.h      |   2 +
 arch/x86/include/uapi/asm/bootparam.h  |   1 +
 arch/x86/kernel/asm-offsets.c          |   1 +
 arch/x86/kernel/module.c               |  10 +-
 arch/x86/kernel/setup.c                |  27 +++-
 arch/x86/kernel/vmlinux.lds.S          |   1 +
 arch/x86/mm/ident_map.c                |  74 +++++++++
 arch/x86/mm/init_64.c                  |  74 +--------
 arch/x86/tools/calc_run_size.sh        |  42 -----
 24 files changed, 610 insertions(+), 293 deletions(-)
 create mode 100644 arch/x86/boot/compressed/misc_pgt.c
 create mode 100644 arch/x86/include/asm/aslr.h
 create mode 100644 arch/x86/mm/ident_map.c
 delete mode 100644 arch/x86/tools/calc_run_size.sh

-- 
1.8.4.5


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v5 00/19] x86, boot: kaslr cleanup and 64bit kaslr support
@ 2015-03-18  7:28 ` Yinghai Lu
  0 siblings, 0 replies; 33+ messages in thread
From: Yinghai Lu @ 2015-03-18  7:28 UTC (permalink / raw)
  To: Matt Fleming, H. Peter Anvin, Ingo Molnar, Jiri Kosina,
	Kees Cook, Borislav Petkov, Baoquan He
  Cc: Thomas Gleixner, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-efi-u79uwXL29TY76Z2rM5mHXA, Yinghai Lu

First make ZO (arch/x86/boot/compressed/vmlinux) data region is not
overwritten by VO (vmlinux) after decompress.  So could pass data from ZO to VO.

Second one is second try for kaslr_setup_data support.

Patch 3-11, are kaslr clean up and enable ident mapping for He's patches.
  kill run_size calculation shell scripts.
  create new ident mapping for kasl 64bit, so we can cover
   above 4G random kernel base, also don't need to track pagetable
   for 64bit bootloader (patched grub2 or kexec).
   that will make mem_avoid handling simple.

Also put 7 patches from He that support random random, as I already used
his patches to test the ident mapping code, and could save some rebase
work for him.

also at:
git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git for-x86-4.0-rc5-aslr

Thanks

Yinghai


Baoquan He (7):
  x86, kaslr: Fix a bug that relocation can not be handled when kernel is loaded above 2G
  x86, kaslr: Introduce struct slot_area to manage randomization slot info
  x86, kaslr: Add two functions which will be used later
  x86, kaslr: Introduce fetch_random_virt_offset to randomize the kernel text mapping address
  x86, kaslr: Randomize physical and virtual address of kernel separately
  x86, kaslr: Add support of kernel physical address randomization above 4G
  x86, kaslr: Remove useless codes

Jiri Kosina (1):
  x86, kaslr: Propagate base load address calculation v2

Yinghai Lu (11):
  x86, boot: Make data from decompress_kernel stage live longer
  x86, boot: Simplify run_size calculation
  x86, kaslr: Kill not used run_size related code.
  x86, kaslr: Use output_run_size
  x86, kaslr: Consolidate mem_avoid array filling
  x86, boot: Move z_extract_offset calculation to header.S
  x86, kaslr: Get correct max_addr for relocs pointer
  x86, boot: Split kernel_ident_mapping_init to another file
  x86, 64bit: Set ident_mapping for kaslr
  x86, boot: Add checking for memcpy
  x86, kaslr: Allow random address could be below loaded address

 arch/x86/boot/Makefile                 |  13 +-
 arch/x86/boot/compressed/Makefile      |  19 ++-
 arch/x86/boot/compressed/aslr.c        | 281 ++++++++++++++++++++++++---------
 arch/x86/boot/compressed/head_32.S     |  14 +-
 arch/x86/boot/compressed/head_64.S     |  15 +-
 arch/x86/boot/compressed/misc.c        |  71 +++++----
 arch/x86/boot/compressed/misc.h        |  32 ++--
 arch/x86/boot/compressed/misc_pgt.c    |  91 +++++++++++
 arch/x86/boot/compressed/mkpiggy.c     |  28 +---
 arch/x86/boot/compressed/string.c      |  28 +++-
 arch/x86/boot/compressed/vmlinux.lds.S |   2 +
 arch/x86/boot/header.S                 |  43 ++++-
 arch/x86/include/asm/aslr.h            |  10 ++
 arch/x86/include/asm/boot.h            |  19 +++
 arch/x86/include/asm/page.h            |   5 +
 arch/x86/include/asm/page_types.h      |   2 +
 arch/x86/include/uapi/asm/bootparam.h  |   1 +
 arch/x86/kernel/asm-offsets.c          |   1 +
 arch/x86/kernel/module.c               |  10 +-
 arch/x86/kernel/setup.c                |  27 +++-
 arch/x86/kernel/vmlinux.lds.S          |   1 +
 arch/x86/mm/ident_map.c                |  74 +++++++++
 arch/x86/mm/init_64.c                  |  74 +--------
 arch/x86/tools/calc_run_size.sh        |  42 -----
 24 files changed, 610 insertions(+), 293 deletions(-)
 create mode 100644 arch/x86/boot/compressed/misc_pgt.c
 create mode 100644 arch/x86/include/asm/aslr.h
 create mode 100644 arch/x86/mm/ident_map.c
 delete mode 100644 arch/x86/tools/calc_run_size.sh

-- 
1.8.4.5

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v5 01/19] x86, boot: Make data from decompress_kernel stage live longer
  2015-03-18  7:28 ` Yinghai Lu
  (?)
@ 2015-03-18  7:28 ` Yinghai Lu
  -1 siblings, 0 replies; 33+ messages in thread
From: Yinghai Lu @ 2015-03-18  7:28 UTC (permalink / raw)
  To: Matt Fleming, H. Peter Anvin, Ingo Molnar, Jiri Kosina,
	Kees Cook, Borislav Petkov, Baoquan He
  Cc: Thomas Gleixner, linux-kernel, linux-efi, Yinghai Lu, Ying Huang

Ying Huang found commit f47233c2d34f ("x86/mm/ASLR: Propagate base load address
calculation") causes warning from ioremap.

[    0.499891] ------------[ cut here ]------------
[    0.500021] WARNING: CPU: 0 PID: 1 at arch/x86/mm/ioremap.c:63 __ioremap_check_ram+0x445/0x4a0()
[    0.501015] ioremap on RAM pfn 0x3416
[    0.502013] Modules linked in:
[    0.503017] CPU: 0 PID: 1 Comm: swapper Not tainted 3.19.0-04793-g2c303f7 #3
[    0.504013] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[    0.505014]  0000000000000009 ffff880012d8bb88 ffffffff81cedc16 ffff880012d8bbc8
[    0.507424]  ffffffff810dd1a0 0000000000000000 0000000000000001 0000000000000001
[    0.509420]  0000000000003416 0000000000000001 0000000000000001 ffff880012d8bc28
[    0.511415] Call Trace:
[    0.512028]  [<ffffffff81cedc16>] dump_stack+0x2e/0x3e
[    0.513023]  [<ffffffff810dd1a0>] warn_slowpath_common+0xe0/0x160
[    0.514021]  [<ffffffff810dd316>] warn_slowpath_fmt+0x56/0x60
[    0.515022]  [<ffffffff8107f4a5>] __ioremap_check_ram+0x445/0x4a0
[    0.516022]  [<ffffffff8107f060>] ? trace_do_page_fault+0x9b0/0x9b0
[    0.517020]  [<ffffffff810ec948>] walk_system_ram_range+0x128/0x140
[    0.518022]  [<ffffffff82d9081f>] ? create_setup_data_nodes+0xd1/0x488
[    0.519019]  [<ffffffff82d9081f>] ? create_setup_data_nodes+0xd1/0x488
[    0.520021]  [<ffffffff8107f882>] __ioremap_caller+0x172/0x850
[    0.521021]  [<ffffffff81080064>] ioremap_cache+0x24/0x30
[    0.522019]  [<ffffffff82d9081f>] create_setup_data_nodes+0xd1/0x488
[    0.523023]  [<ffffffff81493c9c>] ? internal_create_group+0x4ac/0x830
[    0.524020]  [<ffffffff82d90c76>] boot_params_ksysfs_init+0xa0/0xf9
[    0.525020]  [<ffffffff810005f1>] do_one_initcall+0x371/0x4c0
[    0.526019]  [<ffffffff82d90bd6>] ? create_setup_data_nodes+0x488/0x488
[    0.527024]  [<ffffffff82d8afd5>] kernel_init_freeable+0x368/0x4ba
[    0.528022]  [<ffffffff81ce15d0>] ? rest_init+0x260/0x260
[    0.529020]  [<ffffffff81ce15e6>] kernel_init+0x16/0x240
[    0.530023]  [<ffffffff81d0253a>] ret_from_fork+0x7a/0xb0
[    0.531021]  [<ffffffff81ce15d0>] ? rest_init+0x260/0x260
[    0.532033] ---[ end trace b6a2b7ddc92922e5 ]---

Boris later found setup_data for kaslr from boot stage become all 0's in
kernel stage.

Current there is overlapping between data section for decompress code and final
kernel running code bss/brk area.  We need to avoid overlapping to make data
live longer till kernel access them.

Current code is using extract_offset to control copied kernel position, it
will put the copied kernel in the middle of buffer when kernel run size is
bigger than decompressed needed buffer size. That cause the overlapping.

Detail flow in current code:
Bootloader allocate buffer according to init_size in hdr, and load the
ZO (arch/x86/boot/compressed/vmlinux) from start of that buffer.
During running of ZO, ZO move itself to the middle of buffer at
z_extract_offset to make sure that decompressor would not have output
overwrite input data before input data get consumed.
After decompressor is done, VO use most part buffer from start.
and ZO code and data section will overlap with VO bss section.
And later VO/clear_bss() clear them before code in arch/x86/kernel/setup.c
try to access them.

Current layout:
when init_size is the same as kernel run_size:
                                        run_size
0              extract_offset          init_size
|------------------|------------------------|
   VO text/data                   VO bss/brk
                   input ZO text ZO data

This patch try to:
At first, move ZO to the end of buffer instead of middle of the buffer.
When init_size is bigger than kernel run size, will have

0                            run_size    init_size
|--------------------------------|----------|
   VO text/data        VO bss/brk
                       input ZO text ZO data

We already have init_size the buffer size, we can find the end easily
when copying ZO before decompressing.

Secondly, add extra size (ZO data size) to init_size. That is for
even old init_size is same as kernel run size, we will have

                                         run_size
0                                   old init_size init_size
|------------------------------------------|--------|
   VO text/data                  VO bss/brk
                               input ZO text ZO data

Here the size changes when old init_size is same as kernel run_size.
# size arch/x86/boot/compressed/vmlinux
   text	   data	    bss	    dec	    hex	filename
13247288    264	  49248	13296800 cae4a0	arch/x86/boot/compressed/vmlinux
# bootloader reported init_size
kernel: [13cc00000, 13ff8efff]

After patch:
#size arch/x86/boot/compressed/vmlinux
   text	   data	    bss	    dec	    hex	filename
13247289    264	  49248	13296801 cae4a1	arch/x86/boot/compressed/vmlinux
# bootloader reported init_size
kernel: [13cc00000, 13ffa2fff]

so init_size increase 20 pages 80k.

Fixes: f47233c2d34f ("x86/mm/ASLR: Propagate base load address calculation")
Link: http://marc.info/?l=linux-kernel&m=142492905425130&w=2
Reported-by: Ying Huang <ying.huang@intel.com>
Cc: Ying Huang <ying.huang@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Matt Fleming <matt.fleming@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/boot/Makefile                 |  2 +-
 arch/x86/boot/compressed/head_32.S     | 11 +++++++++--
 arch/x86/boot/compressed/head_64.S     |  8 ++++++--
 arch/x86/boot/compressed/mkpiggy.c     |  7 ++-----
 arch/x86/boot/compressed/vmlinux.lds.S |  2 ++
 arch/x86/boot/header.S                 | 14 ++++++++++++--
 arch/x86/kernel/asm-offsets.c          |  1 +
 arch/x86/kernel/vmlinux.lds.S          |  1 +
 8 files changed, 34 insertions(+), 12 deletions(-)

diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
index 57bbf2f..863ef25 100644
--- a/arch/x86/boot/Makefile
+++ b/arch/x86/boot/Makefile
@@ -86,7 +86,7 @@ targets += voffset.h
 $(obj)/voffset.h: vmlinux FORCE
 	$(call if_changed,voffset)
 
-sed-zoffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(startup_32\|startup_64\|efi32_stub_entry\|efi64_stub_entry\|efi_pe_entry\|input_data\|_end\|z_.*\)$$/\#define ZO_\2 0x\1/p'
+sed-zoffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(startup_32\|startup_64\|efi32_stub_entry\|efi64_stub_entry\|efi_pe_entry\|input_data\|_end\|_rodata\|z_.*\)$$/\#define ZO_\2 0x\1/p'
 
 quiet_cmd_zoffset = ZOFFSET $@
       cmd_zoffset = $(NM) $< | sed -n $(sed-zoffset) > $@
diff --git a/arch/x86/boot/compressed/head_32.S b/arch/x86/boot/compressed/head_32.S
index 1d7fbbc..1410c42 100644
--- a/arch/x86/boot/compressed/head_32.S
+++ b/arch/x86/boot/compressed/head_32.S
@@ -147,7 +147,9 @@ preferred_addr:
 1:
 
 	/* Target address to relocate to for decompression */
-	addl	$z_extract_offset, %ebx
+	movl    BP_init_size(%esi), %eax
+	subl    $_end, %eax
+	addl    %eax, %ebx
 
 	/* Set up the stack */
 	leal	boot_stack_end(%ebx), %esp
@@ -209,8 +211,13 @@ relocated:
 				/* push arguments for decompress_kernel: */
 	pushl	$z_run_size	/* size of kernel with .bss and .brk */
 	pushl	$z_output_len	/* decompressed length, end of relocs */
-	leal	z_extract_offset_negative(%ebx), %ebp
+
+	movl    BP_init_size(%esi), %eax
+	subl    $_end, %eax
+	movl    %ebx, %ebp
+	subl    %eax, %ebp
 	pushl	%ebp		/* output address */
+
 	pushl	$z_input_len	/* input_len */
 	leal	input_data(%ebx), %eax
 	pushl	%eax		/* input_data */
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 6b1766c..4e30ee3 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -101,7 +101,9 @@ ENTRY(startup_32)
 1:
 
 	/* Target address to relocate to for decompression */
-	addl	$z_extract_offset, %ebx
+	movl	BP_init_size(%esi), %eax
+	subl	$_end, %eax
+	addl	%eax, %ebx
 
 /*
  * Prepare for entering 64 bit mode
@@ -329,7 +331,9 @@ preferred_addr:
 1:
 
 	/* Target address to relocate to for decompression */
-	leaq	z_extract_offset(%rbp), %rbx
+	movl	BP_init_size(%rsi), %ebx
+	subl	$_end, %ebx
+	addq	%rbp, %rbx
 
 	/* Set up the stack */
 	leaq	boot_stack_end(%rbx), %rsp
diff --git a/arch/x86/boot/compressed/mkpiggy.c b/arch/x86/boot/compressed/mkpiggy.c
index d8222f2..5faad09 100644
--- a/arch/x86/boot/compressed/mkpiggy.c
+++ b/arch/x86/boot/compressed/mkpiggy.c
@@ -83,11 +83,8 @@ int main(int argc, char *argv[])
 	printf("z_input_len = %lu\n", ilen);
 	printf(".globl z_output_len\n");
 	printf("z_output_len = %lu\n", (unsigned long)olen);
-	printf(".globl z_extract_offset\n");
-	printf("z_extract_offset = 0x%lx\n", offs);
-	/* z_extract_offset_negative allows simplification of head_32.S */
-	printf(".globl z_extract_offset_negative\n");
-	printf("z_extract_offset_negative = -0x%lx\n", offs);
+	printf(".globl z_min_extract_offset\n");
+	printf("z_min_extract_offset = 0x%lx\n", offs);
 	printf(".globl z_run_size\n");
 	printf("z_run_size = %lu\n", run_size);
 
diff --git a/arch/x86/boot/compressed/vmlinux.lds.S b/arch/x86/boot/compressed/vmlinux.lds.S
index 34d047c..6d6158e 100644
--- a/arch/x86/boot/compressed/vmlinux.lds.S
+++ b/arch/x86/boot/compressed/vmlinux.lds.S
@@ -35,6 +35,7 @@ SECTIONS
 		*(.text.*)
 		_etext = . ;
 	}
+        . = ALIGN(PAGE_SIZE); /* keep ADDON_ZO_SIZE page aligned */
 	.rodata : {
 		_rodata = . ;
 		*(.rodata)	 /* read-only data */
@@ -70,5 +71,6 @@ SECTIONS
 		_epgtable = . ;
 	}
 #endif
+	. = ALIGN(PAGE_SIZE);	/* keep ZO size page aligned */
 	_end = .;
 }
diff --git a/arch/x86/boot/header.S b/arch/x86/boot/header.S
index 16ef025..226d166 100644
--- a/arch/x86/boot/header.S
+++ b/arch/x86/boot/header.S
@@ -440,12 +440,22 @@ setup_data:		.quad 0			# 64-bit physical pointer to
 
 pref_address:		.quad LOAD_PHYSICAL_ADDR	# preferred load addr
 
-#define ZO_INIT_SIZE	(ZO__end - ZO_startup_32 + ZO_z_extract_offset)
+# don't overlap data area of ZO with VO bss
+#define ADDON_ZO_SIZE (ZO__end - ZO__rodata)
+
+#define ZO_INIT_SIZE	(ZO__end - ZO_startup_32 + ZO_z_min_extract_offset)
 #define VO_INIT_SIZE	(VO__end - VO__text)
 #if ZO_INIT_SIZE > VO_INIT_SIZE
+
+/* only add the difference to cover ADDON_ZO */
+#if (ZO_INIT_SIZE - VO_INIT_SIZE) < ADDON_ZO_SIZE
+#define INIT_SIZE (ZO_INIT_SIZE + (ADDON_ZO_SIZE-(ZO_INIT_SIZE - VO_INIT_SIZE)))
+#else
 #define INIT_SIZE ZO_INIT_SIZE
+#endif
+
 #else
-#define INIT_SIZE VO_INIT_SIZE
+#define INIT_SIZE (VO_INIT_SIZE + ADDON_ZO_SIZE)
 #endif
 init_size:		.long INIT_SIZE		# kernel initialization size
 handover_offset:	.long 0			# Filled in by build.c
diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index 9f6b934..0e8e4f7 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -66,6 +66,7 @@ void common(void) {
 	OFFSET(BP_hardware_subarch, boot_params, hdr.hardware_subarch);
 	OFFSET(BP_version, boot_params, hdr.version);
 	OFFSET(BP_kernel_alignment, boot_params, hdr.kernel_alignment);
+	OFFSET(BP_init_size, boot_params, hdr.init_size);
 	OFFSET(BP_pref_address, boot_params, hdr.pref_address);
 	OFFSET(BP_code32_start, boot_params, hdr.code32_start);
 
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 00bf300..5816920 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -325,6 +325,7 @@ SECTIONS
 		__brk_limit = .;
 	}
 
+	. = ALIGN(PAGE_SIZE);		/* keep VO_INIT_SIZE page aligned */
 	_end = .;
 
         STABS_DEBUG
-- 
1.8.4.5


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v5 02/19] x86, kaslr: Propagate base load address calculation v2
  2015-03-18  7:28 ` Yinghai Lu
  (?)
  (?)
@ 2015-03-18  7:28 ` Yinghai Lu
  -1 siblings, 0 replies; 33+ messages in thread
From: Yinghai Lu @ 2015-03-18  7:28 UTC (permalink / raw)
  To: Matt Fleming, H. Peter Anvin, Ingo Molnar, Jiri Kosina,
	Kees Cook, Borislav Petkov, Baoquan He
  Cc: Thomas Gleixner, linux-kernel, linux-efi, Yinghai Lu

From: Jiri Kosina <jkosina@suse.cz>

commit f47233c2d34f ("x86/mm/ASLR: Propagate base load address calculation")
got reverted for v4.0, This is second try after patch (x86, boot: Make
data from decompress_kernel stage live longer) that save the data
for later accessing. Following changelog is from v1 commit.

Commit:

      e2b32e678513 ("x86, kaslr: randomize module base load address")

makes the base address for module to be unconditionally randomized in
case when CONFIG_RANDOMIZE_BASE is defined and "nokaslr" option isn't
present on the commandline.

This is not consistent with how choose_kernel_location() decides whether
it will randomize kernel load base.

Namely, CONFIG_HIBERNATION disables kASLR (unless "kaslr" option is
explicitly specified on kernel commandline), which makes the state space
larger than what module loader is looking at. IOW CONFIG_HIBERNATION &&
CONFIG_RANDOMIZE_BASE is a valid config option, kASLR wouldn't be applied
by default in that case, but module loader is not aware of that.

Instead of fixing the logic in module.c, this patch takes more generic
aproach. It introduces a new bootparam setup data_type SETUP_KASLR and
uses that to pass the information whether kaslr has been applied during
kernel decompression, and sets a global 'kaslr_enabled' variable
accordingly, so that any kernel code (module loading, livepatching, ...)
can make decisions based on its value.

x86 module loader is converted to make use of this flag.

[ Always dump correct kaslr status when panicking from Boris]

-v2: fold in fix for using physical address as value  -- Yinghai
     split struct kaslr_setup_data definition to another file -- Yinghai
     use real_mode directly instead of passing it around -- Yinghai

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/boot/compressed/aslr.c       | 26 ++++++++++++++++++++++++++
 arch/x86/include/asm/aslr.h           | 10 ++++++++++
 arch/x86/include/asm/page_types.h     |  2 ++
 arch/x86/include/uapi/asm/bootparam.h |  1 +
 arch/x86/kernel/module.c              | 10 +---------
 arch/x86/kernel/setup.c               | 27 +++++++++++++++++++++++----
 6 files changed, 63 insertions(+), 13 deletions(-)
 create mode 100644 arch/x86/include/asm/aslr.h

diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index bb13763..da01c78 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -3,6 +3,7 @@
 #include <asm/msr.h>
 #include <asm/archrandom.h>
 #include <asm/e820.h>
+#include <asm/aslr.h>
 
 #include <generated/compile.h>
 #include <linux/module.h>
@@ -14,6 +15,8 @@
 static const char build_str[] = UTS_RELEASE " (" LINUX_COMPILE_BY "@"
 		LINUX_COMPILE_HOST ") (" LINUX_COMPILER ") " UTS_VERSION;
 
+static struct kaslr_setup_data kaslr_setup_data;
+
 #define I8254_PORT_CONTROL	0x43
 #define I8254_PORT_COUNTER0	0x40
 #define I8254_CMD_READBACK	0xC0
@@ -295,6 +298,26 @@ static unsigned long find_random_addr(unsigned long minimum,
 	return slots_fetch_random();
 }
 
+static void add_kaslr_setup_data(__u8 enabled)
+{
+	struct setup_data *data;
+
+	kaslr_setup_data.type = SETUP_KASLR;
+	kaslr_setup_data.len = 1;
+	kaslr_setup_data.next = 0;
+	kaslr_setup_data.data[0] = enabled;
+
+	data = (struct setup_data *)(unsigned long)real_mode->hdr.setup_data;
+
+	while (data && data->next)
+		data = (struct setup_data *)(unsigned long)data->next;
+
+	if (data)
+		data->next = (unsigned long)&kaslr_setup_data;
+	else
+		real_mode->hdr.setup_data = (unsigned long)&kaslr_setup_data;
+}
+
 unsigned char *choose_kernel_location(unsigned char *input,
 				      unsigned long input_size,
 				      unsigned char *output,
@@ -306,14 +329,17 @@ unsigned char *choose_kernel_location(unsigned char *input,
 #ifdef CONFIG_HIBERNATION
 	if (!cmdline_find_option_bool("kaslr")) {
 		debug_putstr("KASLR disabled by default...\n");
+		add_kaslr_setup_data(0);
 		goto out;
 	}
 #else
 	if (cmdline_find_option_bool("nokaslr")) {
 		debug_putstr("KASLR disabled by cmdline...\n");
+		add_kaslr_setup_data(0);
 		goto out;
 	}
 #endif
+	add_kaslr_setup_data(1);
 
 	/* Record the various known unsafe memory ranges. */
 	mem_avoid_init((unsigned long)input, input_size,
diff --git a/arch/x86/include/asm/aslr.h b/arch/x86/include/asm/aslr.h
new file mode 100644
index 0000000..461a819
--- /dev/null
+++ b/arch/x86/include/asm/aslr.h
@@ -0,0 +1,10 @@
+#ifndef _ASM_X86_ASLR_H
+
+struct kaslr_setup_data {
+	__u64 next;
+	__u32 type;
+	__u32 len;
+	__u8 data[1];
+};
+
+#endif
diff --git a/arch/x86/include/asm/page_types.h b/arch/x86/include/asm/page_types.h
index f97fbe3..95e11f7 100644
--- a/arch/x86/include/asm/page_types.h
+++ b/arch/x86/include/asm/page_types.h
@@ -51,6 +51,8 @@ extern int devmem_is_allowed(unsigned long pagenr);
 extern unsigned long max_low_pfn_mapped;
 extern unsigned long max_pfn_mapped;
 
+extern bool kaslr_enabled;
+
 static inline phys_addr_t get_max_mapped(void)
 {
 	return (phys_addr_t)max_pfn_mapped << PAGE_SHIFT;
diff --git a/arch/x86/include/uapi/asm/bootparam.h b/arch/x86/include/uapi/asm/bootparam.h
index 225b098..44e6dd7 100644
--- a/arch/x86/include/uapi/asm/bootparam.h
+++ b/arch/x86/include/uapi/asm/bootparam.h
@@ -7,6 +7,7 @@
 #define SETUP_DTB			2
 #define SETUP_PCI			3
 #define SETUP_EFI			4
+#define SETUP_KASLR			5
 
 /* ram_size flags */
 #define RAMDISK_IMAGE_START_MASK	0x07FF
diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c
index d1ac80b..9bbb9b3 100644
--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -47,21 +47,13 @@ do {							\
 
 #ifdef CONFIG_RANDOMIZE_BASE
 static unsigned long module_load_offset;
-static int randomize_modules = 1;
 
 /* Mutex protects the module_load_offset. */
 static DEFINE_MUTEX(module_kaslr_mutex);
 
-static int __init parse_nokaslr(char *p)
-{
-	randomize_modules = 0;
-	return 0;
-}
-early_param("nokaslr", parse_nokaslr);
-
 static unsigned long int get_module_load_offset(void)
 {
-	if (randomize_modules) {
+	if (kaslr_enabled) {
 		mutex_lock(&module_kaslr_mutex);
 		/*
 		 * Calculate the module_load_offset the first time this
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 0a2421c..3b3f54c 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -78,6 +78,7 @@
 #include <asm/e820.h>
 #include <asm/mpspec.h>
 #include <asm/setup.h>
+#include <asm/aslr.h>
 #include <asm/efi.h>
 #include <asm/timer.h>
 #include <asm/i8259.h>
@@ -122,6 +123,8 @@
 unsigned long max_low_pfn_mapped;
 unsigned long max_pfn_mapped;
 
+bool __read_mostly kaslr_enabled = false;
+
 #ifdef CONFIG_DMI
 RESERVE_BRK(dmi_alloc, 65536);
 #endif
@@ -425,6 +428,15 @@ static void __init reserve_initrd(void)
 }
 #endif /* CONFIG_BLK_DEV_INITRD */
 
+static void __init parse_kaslr_setup(u64 pa_data, u32 data_len)
+{
+	struct kaslr_setup_data *data;
+
+	data = early_memremap(pa_data, sizeof(*data));
+	kaslr_enabled = data->data[0];
+	early_memunmap(data, sizeof(*data));
+}
+
 static void __init parse_setup_data(void)
 {
 	struct setup_data *data;
@@ -450,6 +462,9 @@ static void __init parse_setup_data(void)
 		case SETUP_EFI:
 			parse_efi_setup(pa_data, data_len);
 			break;
+		case SETUP_KASLR:
+			parse_kaslr_setup(pa_data, data_len);
+			break;
 		default:
 			break;
 		}
@@ -832,10 +847,14 @@ static void __init trim_low_memory_range(void)
 static int
 dump_kernel_offset(struct notifier_block *self, unsigned long v, void *p)
 {
-	pr_emerg("Kernel Offset: 0x%lx from 0x%lx "
-		 "(relocation range: 0x%lx-0x%lx)\n",
-		 (unsigned long)&_text - __START_KERNEL, __START_KERNEL,
-		 __START_KERNEL_map, MODULES_VADDR-1);
+	if (kaslr_enabled)
+		pr_emerg("Kernel Offset: 0x%lx from 0x%lx (relocation range: 0x%lx-0x%lx)\n",
+			 (unsigned long)&_text - __START_KERNEL,
+			 __START_KERNEL,
+			 __START_KERNEL_map,
+			 MODULES_VADDR-1);
+	else
+		pr_emerg("Kernel Offset: disabled\n");
 
 	return 0;
 }
-- 
1.8.4.5


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v5 03/19] x86, boot: Simplify run_size calculation
  2015-03-18  7:28 ` Yinghai Lu
                   ` (2 preceding siblings ...)
  (?)
@ 2015-03-18  7:28 ` Yinghai Lu
  2015-03-23  3:25     ` Baoquan He
  -1 siblings, 1 reply; 33+ messages in thread
From: Yinghai Lu @ 2015-03-18  7:28 UTC (permalink / raw)
  To: Matt Fleming, H. Peter Anvin, Ingo Molnar, Jiri Kosina,
	Kees Cook, Borislav Petkov, Baoquan He
  Cc: Thomas Gleixner, linux-kernel, linux-efi, Yinghai Lu, Junjie Mao,
	Josh Triplett, Andrew Morton

While looking at the boot code to add mem mapping for kasl
with 64bit above 4G support, I found that e6023367d779 ("x86, kaslr: Prevent
.bss from overlaping initrd") and later introduced way to get kernel run_size
and pass it around.  First via perl and then change to shell scripts.

That is not necessary. As that run_size is simple constant, we don't
need to pass it around and we already have voffset.h for that.

We can share voffset.h between misc.c and header.S instead
of adding other way to get run_size.

This patch:
Move voffset.h creation code to boot/compressed/Makefile.

Dependence was:
boot/header.S ==> boot/voffset.h ==> vmlinux
boot/header.S ==> compressed/vmlinux ==> compressed/misc.c
Now become:
boot/header.S ==> compressed/vmlinux ==> compressed/misc.c ==> boot/voffset.h ==> vmlinux

Use macro in misc.c to replace passed run_size.

Fixes: e6023367d779 ("x86, kaslr: Prevent .bss from overlaping initrd")
Cc: Junjie Mao <eternal.n08@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Matt Fleming <matt.fleming@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/boot/Makefile            | 11 +----------
 arch/x86/boot/compressed/Makefile | 12 ++++++++++++
 arch/x86/boot/compressed/misc.c   |  3 +++
 3 files changed, 16 insertions(+), 10 deletions(-)

diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
index 863ef25..e7ee9cd 100644
--- a/arch/x86/boot/Makefile
+++ b/arch/x86/boot/Makefile
@@ -77,15 +77,6 @@ $(obj)/vmlinux.bin: $(obj)/compressed/vmlinux FORCE
 
 SETUP_OBJS = $(addprefix $(obj)/,$(setup-y))
 
-sed-voffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(_text\|_end\)$$/\#define VO_\2 0x\1/p'
-
-quiet_cmd_voffset = VOFFSET $@
-      cmd_voffset = $(NM) $< | sed -n $(sed-voffset) > $@
-
-targets += voffset.h
-$(obj)/voffset.h: vmlinux FORCE
-	$(call if_changed,voffset)
-
 sed-zoffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(startup_32\|startup_64\|efi32_stub_entry\|efi64_stub_entry\|efi_pe_entry\|input_data\|_end\|_rodata\|z_.*\)$$/\#define ZO_\2 0x\1/p'
 
 quiet_cmd_zoffset = ZOFFSET $@
@@ -97,7 +88,7 @@ $(obj)/zoffset.h: $(obj)/compressed/vmlinux FORCE
 
 
 AFLAGS_header.o += -I$(obj)
-$(obj)/header.o: $(obj)/voffset.h $(obj)/zoffset.h
+$(obj)/header.o: $(obj)/zoffset.h
 
 LDFLAGS_setup.elf	:= -T
 $(obj)/setup.elf: $(src)/setup.ld $(SETUP_OBJS) FORCE
diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index 0a291cd..d9fee82 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -40,6 +40,18 @@ LDFLAGS_vmlinux := -T
 hostprogs-y	:= mkpiggy
 HOST_EXTRACFLAGS += -I$(srctree)/tools/include
 
+sed-voffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(_text\|_end\)$$/\#define VO_\2 _AC(0x\1,UL)/p'
+
+quiet_cmd_voffset = VOFFSET $@
+      cmd_voffset = $(NM) $< | sed -n $(sed-voffset) > $@
+
+targets += ../voffset.h
+
+$(obj)/../voffset.h: vmlinux FORCE
+	$(call if_changed,voffset)
+
+$(obj)/misc.o: $(obj)/../voffset.h
+
 vmlinux-objs-y := $(obj)/vmlinux.lds $(obj)/head_$(BITS).o $(obj)/misc.o \
 	$(obj)/string.o $(obj)/cmdline.o \
 	$(obj)/piggy.o $(obj)/cpuflags.o
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index a950864..4785c23 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -11,6 +11,7 @@
 
 #include "misc.h"
 #include "../string.h"
+#include "../voffset.h"
 
 /* WARNING!!
  * This code is compiled with -fPIC and it is relocated dynamically
@@ -390,6 +391,8 @@ asmlinkage __visible void *decompress_kernel(void *rmode, memptr heap,
 	lines = real_mode->screen_info.orig_video_lines;
 	cols = real_mode->screen_info.orig_video_cols;
 
+	run_size = VO__end - VO__text;
+
 	console_init();
 	debug_putstr("early console in decompress_kernel\n");
 
-- 
1.8.4.5


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v5 04/19] x86, kaslr: Kill not used run_size related code.
  2015-03-18  7:28 ` Yinghai Lu
                   ` (3 preceding siblings ...)
  (?)
@ 2015-03-18  7:28 ` Yinghai Lu
  -1 siblings, 0 replies; 33+ messages in thread
From: Yinghai Lu @ 2015-03-18  7:28 UTC (permalink / raw)
  To: Matt Fleming, H. Peter Anvin, Ingo Molnar, Jiri Kosina,
	Kees Cook, Borislav Petkov, Baoquan He
  Cc: Thomas Gleixner, linux-kernel, linux-efi, Yinghai Lu,
	Josh Triplett, Andrew Morton, Ard Biesheuvel, Junjie Mao

We use simple version to get run_size now, remove old run_size related code.

Fixes: e6023367d779 ("x86, kaslr: Prevent .bss from overlaping initrd")
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Matt Fleming <matt.fleming@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Junjie Mao <eternal.n08@gmail.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/boot/compressed/Makefile  |  4 +---
 arch/x86/boot/compressed/head_32.S |  3 +--
 arch/x86/boot/compressed/head_64.S |  3 ---
 arch/x86/boot/compressed/misc.c    |  6 ++----
 arch/x86/boot/compressed/mkpiggy.c |  9 ++------
 arch/x86/tools/calc_run_size.sh    | 42 --------------------------------------
 6 files changed, 6 insertions(+), 61 deletions(-)
 delete mode 100644 arch/x86/tools/calc_run_size.sh

diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index d9fee82..50daea7 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -104,10 +104,8 @@ suffix-$(CONFIG_KERNEL_XZ)	:= xz
 suffix-$(CONFIG_KERNEL_LZO) 	:= lzo
 suffix-$(CONFIG_KERNEL_LZ4) 	:= lz4
 
-RUN_SIZE = $(shell $(OBJDUMP) -h vmlinux | \
-	     $(CONFIG_SHELL) $(srctree)/arch/x86/tools/calc_run_size.sh)
 quiet_cmd_mkpiggy = MKPIGGY $@
-      cmd_mkpiggy = $(obj)/mkpiggy $< $(RUN_SIZE) > $@ || ( rm -f $@ ; false )
+      cmd_mkpiggy = $(obj)/mkpiggy $< > $@ || ( rm -f $@ ; false )
 
 targets += piggy.S
 $(obj)/piggy.S: $(obj)/vmlinux.bin.$(suffix-y) $(obj)/mkpiggy FORCE
diff --git a/arch/x86/boot/compressed/head_32.S b/arch/x86/boot/compressed/head_32.S
index 1410c42..673bfcf 100644
--- a/arch/x86/boot/compressed/head_32.S
+++ b/arch/x86/boot/compressed/head_32.S
@@ -209,7 +209,6 @@ relocated:
  * Do the decompression, and jump to the new kernel..
  */
 				/* push arguments for decompress_kernel: */
-	pushl	$z_run_size	/* size of kernel with .bss and .brk */
 	pushl	$z_output_len	/* decompressed length, end of relocs */
 
 	movl    BP_init_size(%esi), %eax
@@ -225,7 +224,7 @@ relocated:
 	pushl	%eax		/* heap area */
 	pushl	%esi		/* real mode pointer */
 	call	decompress_kernel /* returns kernel location in %eax */
-	addl	$28, %esp
+	addl	$24, %esp
 
 /*
  * Jump to the decompressed kernel.
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 4e30ee3..2678f03 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -406,8 +406,6 @@ relocated:
  * Do the decompression, and jump to the new kernel..
  */
 	pushq	%rsi			/* Save the real mode argument */
-	movq	$z_run_size, %r9	/* size of kernel with .bss and .brk */
-	pushq	%r9
 	movq	%rsi, %rdi		/* real mode address */
 	leaq	boot_heap(%rip), %rsi	/* malloc area for uncompression */
 	leaq	input_data(%rip), %rdx  /* input_data */
@@ -415,7 +413,6 @@ relocated:
 	movq	%rbp, %r8		/* output target address */
 	movq	$z_output_len, %r9	/* decompressed length, end of relocs */
 	call	decompress_kernel	/* returns kernel location in %rax */
-	popq	%r9
 	popq	%rsi
 
 /*
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 4785c23..1f290cc 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -371,9 +371,9 @@ asmlinkage __visible void *decompress_kernel(void *rmode, memptr heap,
 				  unsigned char *input_data,
 				  unsigned long input_len,
 				  unsigned char *output,
-				  unsigned long output_len,
-				  unsigned long run_size)
+				  unsigned long output_len)
 {
+	unsigned long run_size = VO__end - VO__text;
 	unsigned char *output_orig = output;
 
 	real_mode = rmode;
@@ -391,8 +391,6 @@ asmlinkage __visible void *decompress_kernel(void *rmode, memptr heap,
 	lines = real_mode->screen_info.orig_video_lines;
 	cols = real_mode->screen_info.orig_video_cols;
 
-	run_size = VO__end - VO__text;
-
 	console_init();
 	debug_putstr("early console in decompress_kernel\n");
 
diff --git a/arch/x86/boot/compressed/mkpiggy.c b/arch/x86/boot/compressed/mkpiggy.c
index 5faad09..c03b009 100644
--- a/arch/x86/boot/compressed/mkpiggy.c
+++ b/arch/x86/boot/compressed/mkpiggy.c
@@ -36,13 +36,11 @@ int main(int argc, char *argv[])
 	uint32_t olen;
 	long ilen;
 	unsigned long offs;
-	unsigned long run_size;
 	FILE *f = NULL;
 	int retval = 1;
 
-	if (argc < 3) {
-		fprintf(stderr, "Usage: %s compressed_file run_size\n",
-				argv[0]);
+	if (argc < 2) {
+		fprintf(stderr, "Usage: %s compressed_file\n", argv[0]);
 		goto bail;
 	}
 
@@ -76,7 +74,6 @@ int main(int argc, char *argv[])
 	offs += olen >> 12;	/* Add 8 bytes for each 32K block */
 	offs += 64*1024 + 128;	/* Add 64K + 128 bytes slack */
 	offs = (offs+4095) & ~4095; /* Round to a 4K boundary */
-	run_size = atoi(argv[2]);
 
 	printf(".section \".rodata..compressed\",\"a\",@progbits\n");
 	printf(".globl z_input_len\n");
@@ -85,8 +82,6 @@ int main(int argc, char *argv[])
 	printf("z_output_len = %lu\n", (unsigned long)olen);
 	printf(".globl z_min_extract_offset\n");
 	printf("z_min_extract_offset = 0x%lx\n", offs);
-	printf(".globl z_run_size\n");
-	printf("z_run_size = %lu\n", run_size);
 
 	printf(".globl input_data, input_data_end\n");
 	printf("input_data:\n");
diff --git a/arch/x86/tools/calc_run_size.sh b/arch/x86/tools/calc_run_size.sh
deleted file mode 100644
index 1a4c17b..0000000
--- a/arch/x86/tools/calc_run_size.sh
+++ /dev/null
@@ -1,42 +0,0 @@
-#!/bin/sh
-#
-# Calculate the amount of space needed to run the kernel, including room for
-# the .bss and .brk sections.
-#
-# Usage:
-# objdump -h a.out | sh calc_run_size.sh
-
-NUM='\([0-9a-fA-F]*[ \t]*\)'
-OUT=$(sed -n 's/^[ \t0-9]*.b[sr][sk][ \t]*'"$NUM$NUM$NUM$NUM"'.*/\1\4/p')
-if [ -z "$OUT" ] ; then
-	echo "Never found .bss or .brk file offset" >&2
-	exit 1
-fi
-
-OUT=$(echo ${OUT# })
-sizeA=$(printf "%d" 0x${OUT%% *})
-OUT=${OUT#* }
-offsetA=$(printf "%d" 0x${OUT%% *})
-OUT=${OUT#* }
-sizeB=$(printf "%d" 0x${OUT%% *})
-OUT=${OUT#* }
-offsetB=$(printf "%d" 0x${OUT%% *})
-
-run_size=$(( $offsetA + $sizeA + $sizeB ))
-
-# BFD linker shows the same file offset in ELF.
-if [ "$offsetA" -ne "$offsetB" ] ; then
-	# Gold linker shows them as consecutive.
-	endB=$(( $offsetB + $sizeB ))
-	if [ "$endB" != "$run_size" ] ; then
-		printf "sizeA: 0x%x\n" $sizeA >&2
-		printf "offsetA: 0x%x\n" $offsetA >&2
-		printf "sizeB: 0x%x\n" $sizeB >&2
-		printf "offsetB: 0x%x\n" $offsetB >&2
-		echo ".bss and .brk are non-contiguous" >&2
-		exit 1
-	fi
-fi
-
-printf "%d\n" $run_size
-exit 0
-- 
1.8.4.5


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v5 05/19] x86, kaslr: Use output_run_size
  2015-03-18  7:28 ` Yinghai Lu
                   ` (4 preceding siblings ...)
  (?)
@ 2015-03-18  7:28 ` Yinghai Lu
  -1 siblings, 0 replies; 33+ messages in thread
From: Yinghai Lu @ 2015-03-18  7:28 UTC (permalink / raw)
  To: Matt Fleming, H. Peter Anvin, Ingo Molnar, Jiri Kosina,
	Kees Cook, Borislav Petkov, Baoquan He
  Cc: Thomas Gleixner, linux-kernel, linux-efi, Yinghai Lu

Now we are using output_size as parameter, actually we are passing
max(output_len, run_size).

Change it to output_run_size to make it less confusing.

Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/boot/compressed/aslr.c | 10 +++++-----
 arch/x86/boot/compressed/misc.c |  6 ++++--
 arch/x86/boot/compressed/misc.h |  4 ++--
 3 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index da01c78..5dc1a65 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -138,7 +138,7 @@ static bool mem_overlaps(struct mem_vector *one, struct mem_vector *two)
 }
 
 static void mem_avoid_init(unsigned long input, unsigned long input_size,
-			   unsigned long output, unsigned long output_size)
+			   unsigned long output, unsigned long output_run_size)
 {
 	u64 initrd_start, initrd_size;
 	u64 cmd_line, cmd_line_size;
@@ -149,7 +149,7 @@ static void mem_avoid_init(unsigned long input, unsigned long input_size,
 	 * Avoid the region that is unsafe to overlap during
 	 * decompression (see calculations at top of misc.c).
 	 */
-	unsafe_len = (output_size >> 12) + 32768 + 18;
+	unsafe_len = (output_run_size >> 12) + 32768 + 18;
 	unsafe = (unsigned long)input + input_size - unsafe_len;
 	mem_avoid[0].start = unsafe;
 	mem_avoid[0].size = unsafe_len;
@@ -321,7 +321,7 @@ static void add_kaslr_setup_data(__u8 enabled)
 unsigned char *choose_kernel_location(unsigned char *input,
 				      unsigned long input_size,
 				      unsigned char *output,
-				      unsigned long output_size)
+				      unsigned long output_run_size)
 {
 	unsigned long choice = (unsigned long)output;
 	unsigned long random;
@@ -343,10 +343,10 @@ unsigned char *choose_kernel_location(unsigned char *input,
 
 	/* Record the various known unsafe memory ranges. */
 	mem_avoid_init((unsigned long)input, input_size,
-		       (unsigned long)output, output_size);
+		       (unsigned long)output, output_run_size);
 
 	/* Walk e820 and find a random address. */
-	random = find_random_addr(choice, output_size);
+	random = find_random_addr(choice, output_run_size);
 	if (!random) {
 		debug_putstr("KASLR could not find suitable E820 region...\n");
 		goto out;
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 1f290cc..7ef0eed 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -375,6 +375,7 @@ asmlinkage __visible void *decompress_kernel(void *rmode, memptr heap,
 {
 	unsigned long run_size = VO__end - VO__text;
 	unsigned char *output_orig = output;
+	unsigned long output_run_size;
 
 	real_mode = rmode;
 
@@ -397,14 +398,15 @@ asmlinkage __visible void *decompress_kernel(void *rmode, memptr heap,
 	free_mem_ptr     = heap;	/* Heap */
 	free_mem_end_ptr = heap + BOOT_HEAP_SIZE;
 
+	output_run_size = output_len > run_size ? output_len : run_size;
+
 	/*
 	 * The memory hole needed for the kernel is the larger of either
 	 * the entire decompressed kernel plus relocation table, or the
 	 * entire decompressed kernel plus .bss and .brk sections.
 	 */
 	output = choose_kernel_location(input_data, input_len, output,
-					output_len > run_size ? output_len
-							      : run_size);
+					output_run_size);
 
 	/* Validate memory location choices. */
 	if ((unsigned long)output & (MIN_KERNEL_ALIGN - 1))
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index 04477d6..dec1663 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -60,7 +60,7 @@ int cmdline_find_option_bool(const char *option);
 unsigned char *choose_kernel_location(unsigned char *input,
 				      unsigned long input_size,
 				      unsigned char *output,
-				      unsigned long output_size);
+				      unsigned long output_run_size);
 /* cpuflags.c */
 bool has_cpuflag(int flag);
 #else
@@ -68,7 +68,7 @@ static inline
 unsigned char *choose_kernel_location(unsigned char *input,
 				      unsigned long input_size,
 				      unsigned char *output,
-				      unsigned long output_size)
+				      unsigned long output_run_size)
 {
 	return output;
 }
-- 
1.8.4.5


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v5 06/19] x86, kaslr: Consolidate mem_avoid array filling
  2015-03-18  7:28 ` Yinghai Lu
                   ` (5 preceding siblings ...)
  (?)
@ 2015-03-18  7:28 ` Yinghai Lu
  -1 siblings, 0 replies; 33+ messages in thread
From: Yinghai Lu @ 2015-03-18  7:28 UTC (permalink / raw)
  To: Matt Fleming, H. Peter Anvin, Ingo Molnar, Jiri Kosina,
	Kees Cook, Borislav Petkov, Baoquan He
  Cc: Thomas Gleixner, linux-kernel, linux-efi, Yinghai Lu

We are going to support kaslr with 64bit above 4G, and new random output
buffer could be anywhere.

mem_avoid array is used for kaslr to search new output buffer.
Current code only track range that is after output+output_run_size.

We need to track all range not just after output+output_run_size.

Current code has first entry is extra bytes after input+input_size, and it
is according to output_run_size. Other entries are for initrd, cmdline,
and heap/stack for ZO running.

At first, check the first entry that should be in the mem_avoid array.

Now ZO sit end of the buffer always, we can find out where is ZO text
and data/bss etc.
                                                output+run_size
                                                      |
0   output               input      input+input_size  |     output+init_size
|     |                    |               |          |          |
|-----|-----------------|--|---------------|------|---|----------|
                        |                         |
               output+init_size-ZO_SIZE   output+output_size

[output, output+init_size) is the buffer for decompress.

[output, output+run_size) is for VO run size.
[output, output+output_size) is (VO (vmlinux after objcopy) plus relocs)

[output+init_size-ZO_SIZE, output+init_size) is copied ZO.
[input, input+input_size) is copied compressed (VO (vmlinux after objcopy)
plus relocs), not the ZO.

[input+input_size, output+init_size) is [_text, _end) for ZO. that could be
first range in mem_avoid.

That new first entry already include heap and stack for ZO running.  So we
don't need to put them separatedly into mem_avoid array.

Also we need to put [input, input+input_size) in mem_avoid array, ant it
is connected to first one, so merge them.

At last we need to put boot_params into the mem_avoid too. As with 64bit bootloader
could put it anywhere.

After those changes, we have all range needed to be avoided in mem_avoid array.

Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/boot/compressed/aslr.c | 29 +++++++++++++----------------
 1 file changed, 13 insertions(+), 16 deletions(-)

diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index 5dc1a65..9dab0d6 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -112,7 +112,7 @@ struct mem_vector {
 	unsigned long size;
 };
 
-#define MEM_AVOID_MAX 5
+#define MEM_AVOID_MAX 4
 static struct mem_vector mem_avoid[MEM_AVOID_MAX];
 
 static bool mem_contains(struct mem_vector *region, struct mem_vector *item)
@@ -138,21 +138,22 @@ static bool mem_overlaps(struct mem_vector *one, struct mem_vector *two)
 }
 
 static void mem_avoid_init(unsigned long input, unsigned long input_size,
-			   unsigned long output, unsigned long output_run_size)
+			   unsigned long output)
 {
+	unsigned long init_size = real_mode->hdr.init_size;
 	u64 initrd_start, initrd_size;
 	u64 cmd_line, cmd_line_size;
-	unsigned long unsafe, unsafe_len;
 	char *ptr;
 
 	/*
 	 * Avoid the region that is unsafe to overlap during
-	 * decompression (see calculations at top of misc.c).
+	 * decompression.
+	 * As we already move ZO (arch/x86/boot/compressed/vmlinux)
+	 * to the end of buffer, [input+input_size, output+init_size)
+	 * has [_text, _end) for ZO.
 	 */
-	unsafe_len = (output_run_size >> 12) + 32768 + 18;
-	unsafe = (unsigned long)input + input_size - unsafe_len;
-	mem_avoid[0].start = unsafe;
-	mem_avoid[0].size = unsafe_len;
+	mem_avoid[0].start = input;
+	mem_avoid[0].size = (output + init_size) - input;
 
 	/* Avoid initrd. */
 	initrd_start  = (u64)real_mode->ext_ramdisk_image << 32;
@@ -172,13 +173,9 @@ static void mem_avoid_init(unsigned long input, unsigned long input_size,
 	mem_avoid[2].start = cmd_line;
 	mem_avoid[2].size = cmd_line_size;
 
-	/* Avoid heap memory. */
-	mem_avoid[3].start = (unsigned long)free_mem_ptr;
-	mem_avoid[3].size = BOOT_HEAP_SIZE;
-
-	/* Avoid stack memory. */
-	mem_avoid[4].start = (unsigned long)free_mem_end_ptr;
-	mem_avoid[4].size = BOOT_STACK_SIZE;
+	/* Avoid params */
+	mem_avoid[3].start = (unsigned long)real_mode;
+	mem_avoid[3].size = sizeof(*real_mode);
 }
 
 /* Does this memory vector overlap a known avoided area? */
@@ -343,7 +340,7 @@ unsigned char *choose_kernel_location(unsigned char *input,
 
 	/* Record the various known unsafe memory ranges. */
 	mem_avoid_init((unsigned long)input, input_size,
-		       (unsigned long)output, output_run_size);
+		       (unsigned long)output);
 
 	/* Walk e820 and find a random address. */
 	random = find_random_addr(choice, output_run_size);
-- 
1.8.4.5


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v5 07/19] x86, boot: Move z_extract_offset calculation to header.S
  2015-03-18  7:28 ` Yinghai Lu
                   ` (6 preceding siblings ...)
  (?)
@ 2015-03-18  7:28 ` Yinghai Lu
  -1 siblings, 0 replies; 33+ messages in thread
From: Yinghai Lu @ 2015-03-18  7:28 UTC (permalink / raw)
  To: Matt Fleming, H. Peter Anvin, Ingo Molnar, Jiri Kosina,
	Kees Cook, Borislav Petkov, Baoquan He
  Cc: Thomas Gleixner, linux-kernel, linux-efi, Yinghai Lu

Old extract_offset calculation is done without knowledge of decompressor size.
so it guess one big size.

We can move it out header.S, we have exact decompressor size there.

We save 8 pages for init_size with this patch.

before patch:
kernel: [13e000000,13fa1dfff]
  input: [0x13f32d3b4-0x13fa01cc7], output: [0x13e000000-0x13f9ef81f], heap: [0x13fa0b680-0x13fa1367f]

after patch:
kernel: [13e000000,13fa15fff]
  input: [0x13f3253b4-0x13f9f9cc7], output: [0x13e000000-0x13f9ef81f], heap: [0x13fa03680-0x13fa0b67f]

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/boot/Makefile             |  2 +-
 arch/x86/boot/compressed/misc.c    |  5 +----
 arch/x86/boot/compressed/mkpiggy.c | 16 +---------------
 arch/x86/boot/header.S             | 29 +++++++++++++++++++++++++++++
 4 files changed, 32 insertions(+), 20 deletions(-)

diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
index e7ee9cd..19dbd32 100644
--- a/arch/x86/boot/Makefile
+++ b/arch/x86/boot/Makefile
@@ -77,7 +77,7 @@ $(obj)/vmlinux.bin: $(obj)/compressed/vmlinux FORCE
 
 SETUP_OBJS = $(addprefix $(obj)/,$(setup-y))
 
-sed-zoffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(startup_32\|startup_64\|efi32_stub_entry\|efi64_stub_entry\|efi_pe_entry\|input_data\|_end\|_rodata\|z_.*\)$$/\#define ZO_\2 0x\1/p'
+sed-zoffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(startup_32\|startup_64\|efi32_stub_entry\|efi64_stub_entry\|efi_pe_entry\|input_data\|_end\|_rodata\|_ehead\|_text\|z_.*\)$$/\#define ZO_\2 0x\1/p'
 
 quiet_cmd_zoffset = ZOFFSET $@
       cmd_zoffset = $(NM) $< | sed -n $(sed-zoffset) > $@
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 7ef0eed..8e81a88 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -84,13 +84,10 @@
  * To avoid problems with the compressed data's meta information an extra 18
  * bytes are needed.  Leading to the formula:
  *
- * extra_bytes = (uncompressed_size >> 12) + 32768 + 18 + decompressor_size.
+ * extra_bytes = (uncompressed_size >> 12) + 32768 + 18.
  *
  * Adding 8 bytes per 32K is a bit excessive but much easier to calculate.
  * Adding 32768 instead of 32767 just makes for round numbers.
- * Adding the decompressor_size is necessary as it musht live after all
- * of the data as well.  Last I measured the decompressor is about 14K.
- * 10K of actual data and 4K of bss.
  *
  */
 
diff --git a/arch/x86/boot/compressed/mkpiggy.c b/arch/x86/boot/compressed/mkpiggy.c
index c03b009..c5148642 100644
--- a/arch/x86/boot/compressed/mkpiggy.c
+++ b/arch/x86/boot/compressed/mkpiggy.c
@@ -21,8 +21,7 @@
  * ----------------------------------------------------------------------- */
 
 /*
- * Compute the desired load offset from a compressed program; outputs
- * a small assembly wrapper with the appropriate symbols defined.
+ * outputs a small assembly wrapper with the appropriate symbols defined.
  */
 
 #include <stdlib.h>
@@ -35,7 +34,6 @@ int main(int argc, char *argv[])
 {
 	uint32_t olen;
 	long ilen;
-	unsigned long offs;
 	FILE *f = NULL;
 	int retval = 1;
 
@@ -65,23 +63,11 @@ int main(int argc, char *argv[])
 	ilen = ftell(f);
 	olen = get_unaligned_le32(&olen);
 
-	/*
-	 * Now we have the input (compressed) and output (uncompressed)
-	 * sizes, compute the necessary decompression offset...
-	 */
-
-	offs = (olen > ilen) ? olen - ilen : 0;
-	offs += olen >> 12;	/* Add 8 bytes for each 32K block */
-	offs += 64*1024 + 128;	/* Add 64K + 128 bytes slack */
-	offs = (offs+4095) & ~4095; /* Round to a 4K boundary */
-
 	printf(".section \".rodata..compressed\",\"a\",@progbits\n");
 	printf(".globl z_input_len\n");
 	printf("z_input_len = %lu\n", ilen);
 	printf(".globl z_output_len\n");
 	printf("z_output_len = %lu\n", (unsigned long)olen);
-	printf(".globl z_min_extract_offset\n");
-	printf("z_min_extract_offset = 0x%lx\n", offs);
 
 	printf(".globl input_data, input_data_end\n");
 	printf("input_data:\n");
diff --git a/arch/x86/boot/header.S b/arch/x86/boot/header.S
index 226d166..9a68962 100644
--- a/arch/x86/boot/header.S
+++ b/arch/x86/boot/header.S
@@ -443,7 +443,36 @@ pref_address:		.quad LOAD_PHYSICAL_ADDR	# preferred load addr
 # don't overlap data area of ZO with VO bss
 #define ADDON_ZO_SIZE (ZO__end - ZO__rodata)
 
+/* check arch/x86/boot/compressed/misc.c for the formula about extra_bytes.  */
+#define ZO_z_extra_bytes ((ZO_z_output_len >> 12) + 32768 + 18)
+#if ZO_z_output_len > ZO_z_input_len
+#define ZO_z_extract_offset (ZO_z_output_len + ZO_z_extra_bytes - ZO_z_input_len)
+#else
+#define ZO_z_extract_offset ZO_z_extra_bytes
+#endif
+
+/*
+ * extract_offset has to be bigger than ZO head section.
+ * otherwise during head code running to move ZO to end of buffer,
+ * will overwrite head code itself.
+ */
+#if (ZO__ehead - ZO_startup_32) > ZO_z_extract_offset
+#define ZO_z_min_extract_offset ((ZO__ehead - ZO_startup_32 + 4095) & ~4095)
+#else
+#define ZO_z_min_extract_offset ((ZO_z_extract_offset + 4095) & ~4095)
+#endif
+
 #define ZO_INIT_SIZE	(ZO__end - ZO_startup_32 + ZO_z_min_extract_offset)
+
+/*
+ * ZO__end - ZO_startup_32 is (ZO__ehead - ZO_startup_32) + ZO_z_input_len + (ZO__end - ZO__text)
+ * ZO_z_min_extract_offset >= (ZO_z_output_len + ZO_z_extra_bytes - ZO_z_input_len)
+ * then ZO_INIT_SIZE >= (ZO__ehead - ZO_startup_32) + ZO_z_input_len + (ZO__end - ZO__text) + (ZO_z_output_len + ZO_z_extra_bytes - ZO_z_input_len)
+ * so (ZO_INIT_SIZE - ZO_z_output_len) > (ZO__end - ZO__text)
+ * That means during decompressor running, output could not
+ * overwrite the decompressor itself.
+ */
+
 #define VO_INIT_SIZE	(VO__end - VO__text)
 #if ZO_INIT_SIZE > VO_INIT_SIZE
 
-- 
1.8.4.5


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v5 08/19] x86, kaslr: Get correct max_addr for relocs pointer
  2015-03-18  7:28 ` Yinghai Lu
                   ` (7 preceding siblings ...)
  (?)
@ 2015-03-18  7:28 ` Yinghai Lu
  -1 siblings, 0 replies; 33+ messages in thread
From: Yinghai Lu @ 2015-03-18  7:28 UTC (permalink / raw)
  To: Matt Fleming, H. Peter Anvin, Ingo Molnar, Jiri Kosina,
	Kees Cook, Borislav Petkov, Baoquan He
  Cc: Thomas Gleixner, linux-kernel, linux-efi, Yinghai Lu

There is boundary checking for pointer in kaslr relocation handling.

Current code is using output_len, and that is VO (vmlinux after objcopy)
file size plus vmlinux.relocs file size.

That is not right, as we should use loaded address for running.

At that time parse_elf already move the sections according to ELF headers.

The valid range should be VO [_text, __bss_start) loaded physical addresses.

In the patch, add export for __bss_start to voffset.h and use it to get
max_addr.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/boot/compressed/Makefile | 2 +-
 arch/x86/boot/compressed/misc.c   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index 50daea7..e12a93c 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -40,7 +40,7 @@ LDFLAGS_vmlinux := -T
 hostprogs-y	:= mkpiggy
 HOST_EXTRACFLAGS += -I$(srctree)/tools/include
 
-sed-voffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(_text\|_end\)$$/\#define VO_\2 _AC(0x\1,UL)/p'
+sed-voffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(_text\|__bss_start\|_end\)$$/\#define VO_\2 _AC(0x\1,UL)/p'
 
 quiet_cmd_voffset = VOFFSET $@
       cmd_voffset = $(NM) $< | sed -n $(sed-voffset) > $@
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 8e81a88..f99c9c9 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -234,7 +234,7 @@ static void handle_relocations(void *output, unsigned long output_len)
 	int *reloc;
 	unsigned long delta, map, ptr;
 	unsigned long min_addr = (unsigned long)output;
-	unsigned long max_addr = min_addr + output_len;
+	unsigned long max_addr = min_addr + (VO___bss_start - VO__text);
 
 	/*
 	 * Calculate the delta between where vmlinux was linked to load
-- 
1.8.4.5


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v5 09/19] x86, boot: Split kernel_ident_mapping_init to another file
@ 2015-03-18  7:28   ` Yinghai Lu
  0 siblings, 0 replies; 33+ messages in thread
From: Yinghai Lu @ 2015-03-18  7:28 UTC (permalink / raw)
  To: Matt Fleming, H. Peter Anvin, Ingo Molnar, Jiri Kosina,
	Kees Cook, Borislav Petkov, Baoquan He
  Cc: Thomas Gleixner, linux-kernel, linux-efi, Yinghai Lu

We need to include that in boot::decompress_kernel stage to set new
ident mapping.

Also add checking for __pa/__va macro definition, as we need to override them
in boot::decompress_kernel stage.

Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/include/asm/page.h |  5 +++
 arch/x86/mm/ident_map.c     | 74 +++++++++++++++++++++++++++++++++++++++++++++
 arch/x86/mm/init_64.c       | 74 +--------------------------------------------
 3 files changed, 80 insertions(+), 73 deletions(-)
 create mode 100644 arch/x86/mm/ident_map.c

diff --git a/arch/x86/include/asm/page.h b/arch/x86/include/asm/page.h
index 802dde3..cf8f619 100644
--- a/arch/x86/include/asm/page.h
+++ b/arch/x86/include/asm/page.h
@@ -37,7 +37,10 @@ static inline void copy_user_page(void *to, void *from, unsigned long vaddr,
 	alloc_page_vma(GFP_HIGHUSER | __GFP_ZERO | movableflags, vma, vaddr)
 #define __HAVE_ARCH_ALLOC_ZEROED_USER_HIGHPAGE
 
+#ifndef __pa
 #define __pa(x)		__phys_addr((unsigned long)(x))
+#endif
+
 #define __pa_nodebug(x)	__phys_addr_nodebug((unsigned long)(x))
 /* __pa_symbol should be used for C visible symbols.
    This seems to be the official gcc blessed way to do such arithmetic. */
@@ -51,7 +54,9 @@ static inline void copy_user_page(void *to, void *from, unsigned long vaddr,
 #define __pa_symbol(x) \
 	__phys_addr_symbol(__phys_reloc_hide((unsigned long)(x)))
 
+#ifndef __va
 #define __va(x)			((void *)((unsigned long)(x)+PAGE_OFFSET))
+#endif
 
 #define __boot_va(x)		__va(x)
 #define __boot_pa(x)		__pa(x)
diff --git a/arch/x86/mm/ident_map.c b/arch/x86/mm/ident_map.c
new file mode 100644
index 0000000..751ca92
--- /dev/null
+++ b/arch/x86/mm/ident_map.c
@@ -0,0 +1,74 @@
+
+static void ident_pmd_init(unsigned long pmd_flag, pmd_t *pmd_page,
+			   unsigned long addr, unsigned long end)
+{
+	addr &= PMD_MASK;
+	for (; addr < end; addr += PMD_SIZE) {
+		pmd_t *pmd = pmd_page + pmd_index(addr);
+
+		if (!pmd_present(*pmd))
+			set_pmd(pmd, __pmd(addr | pmd_flag));
+	}
+}
+static int ident_pud_init(struct x86_mapping_info *info, pud_t *pud_page,
+			  unsigned long addr, unsigned long end)
+{
+	unsigned long next;
+
+	for (; addr < end; addr = next) {
+		pud_t *pud = pud_page + pud_index(addr);
+		pmd_t *pmd;
+
+		next = (addr & PUD_MASK) + PUD_SIZE;
+		if (next > end)
+			next = end;
+
+		if (pud_present(*pud)) {
+			pmd = pmd_offset(pud, 0);
+			ident_pmd_init(info->pmd_flag, pmd, addr, next);
+			continue;
+		}
+		pmd = (pmd_t *)info->alloc_pgt_page(info->context);
+		if (!pmd)
+			return -ENOMEM;
+		ident_pmd_init(info->pmd_flag, pmd, addr, next);
+		set_pud(pud, __pud(__pa(pmd) | _KERNPG_TABLE));
+	}
+
+	return 0;
+}
+
+int kernel_ident_mapping_init(struct x86_mapping_info *info, pgd_t *pgd_page,
+			      unsigned long addr, unsigned long end)
+{
+	unsigned long next;
+	int result;
+	int off = info->kernel_mapping ? pgd_index(__PAGE_OFFSET) : 0;
+
+	for (; addr < end; addr = next) {
+		pgd_t *pgd = pgd_page + pgd_index(addr) + off;
+		pud_t *pud;
+
+		next = (addr & PGDIR_MASK) + PGDIR_SIZE;
+		if (next > end)
+			next = end;
+
+		if (pgd_present(*pgd)) {
+			pud = pud_offset(pgd, 0);
+			result = ident_pud_init(info, pud, addr, next);
+			if (result)
+				return result;
+			continue;
+		}
+
+		pud = (pud_t *)info->alloc_pgt_page(info->context);
+		if (!pud)
+			return -ENOMEM;
+		result = ident_pud_init(info, pud, addr, next);
+		if (result)
+			return result;
+		set_pgd(pgd, __pgd(__pa(pud) | _KERNPG_TABLE));
+	}
+
+	return 0;
+}
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 30eb05a..c30efb6 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -56,79 +56,7 @@
 
 #include "mm_internal.h"
 
-static void ident_pmd_init(unsigned long pmd_flag, pmd_t *pmd_page,
-			   unsigned long addr, unsigned long end)
-{
-	addr &= PMD_MASK;
-	for (; addr < end; addr += PMD_SIZE) {
-		pmd_t *pmd = pmd_page + pmd_index(addr);
-
-		if (!pmd_present(*pmd))
-			set_pmd(pmd, __pmd(addr | pmd_flag));
-	}
-}
-static int ident_pud_init(struct x86_mapping_info *info, pud_t *pud_page,
-			  unsigned long addr, unsigned long end)
-{
-	unsigned long next;
-
-	for (; addr < end; addr = next) {
-		pud_t *pud = pud_page + pud_index(addr);
-		pmd_t *pmd;
-
-		next = (addr & PUD_MASK) + PUD_SIZE;
-		if (next > end)
-			next = end;
-
-		if (pud_present(*pud)) {
-			pmd = pmd_offset(pud, 0);
-			ident_pmd_init(info->pmd_flag, pmd, addr, next);
-			continue;
-		}
-		pmd = (pmd_t *)info->alloc_pgt_page(info->context);
-		if (!pmd)
-			return -ENOMEM;
-		ident_pmd_init(info->pmd_flag, pmd, addr, next);
-		set_pud(pud, __pud(__pa(pmd) | _KERNPG_TABLE));
-	}
-
-	return 0;
-}
-
-int kernel_ident_mapping_init(struct x86_mapping_info *info, pgd_t *pgd_page,
-			      unsigned long addr, unsigned long end)
-{
-	unsigned long next;
-	int result;
-	int off = info->kernel_mapping ? pgd_index(__PAGE_OFFSET) : 0;
-
-	for (; addr < end; addr = next) {
-		pgd_t *pgd = pgd_page + pgd_index(addr) + off;
-		pud_t *pud;
-
-		next = (addr & PGDIR_MASK) + PGDIR_SIZE;
-		if (next > end)
-			next = end;
-
-		if (pgd_present(*pgd)) {
-			pud = pud_offset(pgd, 0);
-			result = ident_pud_init(info, pud, addr, next);
-			if (result)
-				return result;
-			continue;
-		}
-
-		pud = (pud_t *)info->alloc_pgt_page(info->context);
-		if (!pud)
-			return -ENOMEM;
-		result = ident_pud_init(info, pud, addr, next);
-		if (result)
-			return result;
-		set_pgd(pgd, __pgd(__pa(pud) | _KERNPG_TABLE));
-	}
-
-	return 0;
-}
+#include "ident_map.c"
 
 static int __init parse_direct_gbpages_off(char *arg)
 {
-- 
1.8.4.5


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v5 09/19] x86, boot: Split kernel_ident_mapping_init to another file
@ 2015-03-18  7:28   ` Yinghai Lu
  0 siblings, 0 replies; 33+ messages in thread
From: Yinghai Lu @ 2015-03-18  7:28 UTC (permalink / raw)
  To: Matt Fleming, H. Peter Anvin, Ingo Molnar, Jiri Kosina,
	Kees Cook, Borislav Petkov, Baoquan He
  Cc: Thomas Gleixner, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-efi-u79uwXL29TY76Z2rM5mHXA, Yinghai Lu

We need to include that in boot::decompress_kernel stage to set new
ident mapping.

Also add checking for __pa/__va macro definition, as we need to override them
in boot::decompress_kernel stage.

Reviewed-by: Kees Cook <keescook-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
Signed-off-by: Yinghai Lu <yinghai-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 arch/x86/include/asm/page.h |  5 +++
 arch/x86/mm/ident_map.c     | 74 +++++++++++++++++++++++++++++++++++++++++++++
 arch/x86/mm/init_64.c       | 74 +--------------------------------------------
 3 files changed, 80 insertions(+), 73 deletions(-)
 create mode 100644 arch/x86/mm/ident_map.c

diff --git a/arch/x86/include/asm/page.h b/arch/x86/include/asm/page.h
index 802dde3..cf8f619 100644
--- a/arch/x86/include/asm/page.h
+++ b/arch/x86/include/asm/page.h
@@ -37,7 +37,10 @@ static inline void copy_user_page(void *to, void *from, unsigned long vaddr,
 	alloc_page_vma(GFP_HIGHUSER | __GFP_ZERO | movableflags, vma, vaddr)
 #define __HAVE_ARCH_ALLOC_ZEROED_USER_HIGHPAGE
 
+#ifndef __pa
 #define __pa(x)		__phys_addr((unsigned long)(x))
+#endif
+
 #define __pa_nodebug(x)	__phys_addr_nodebug((unsigned long)(x))
 /* __pa_symbol should be used for C visible symbols.
    This seems to be the official gcc blessed way to do such arithmetic. */
@@ -51,7 +54,9 @@ static inline void copy_user_page(void *to, void *from, unsigned long vaddr,
 #define __pa_symbol(x) \
 	__phys_addr_symbol(__phys_reloc_hide((unsigned long)(x)))
 
+#ifndef __va
 #define __va(x)			((void *)((unsigned long)(x)+PAGE_OFFSET))
+#endif
 
 #define __boot_va(x)		__va(x)
 #define __boot_pa(x)		__pa(x)
diff --git a/arch/x86/mm/ident_map.c b/arch/x86/mm/ident_map.c
new file mode 100644
index 0000000..751ca92
--- /dev/null
+++ b/arch/x86/mm/ident_map.c
@@ -0,0 +1,74 @@
+
+static void ident_pmd_init(unsigned long pmd_flag, pmd_t *pmd_page,
+			   unsigned long addr, unsigned long end)
+{
+	addr &= PMD_MASK;
+	for (; addr < end; addr += PMD_SIZE) {
+		pmd_t *pmd = pmd_page + pmd_index(addr);
+
+		if (!pmd_present(*pmd))
+			set_pmd(pmd, __pmd(addr | pmd_flag));
+	}
+}
+static int ident_pud_init(struct x86_mapping_info *info, pud_t *pud_page,
+			  unsigned long addr, unsigned long end)
+{
+	unsigned long next;
+
+	for (; addr < end; addr = next) {
+		pud_t *pud = pud_page + pud_index(addr);
+		pmd_t *pmd;
+
+		next = (addr & PUD_MASK) + PUD_SIZE;
+		if (next > end)
+			next = end;
+
+		if (pud_present(*pud)) {
+			pmd = pmd_offset(pud, 0);
+			ident_pmd_init(info->pmd_flag, pmd, addr, next);
+			continue;
+		}
+		pmd = (pmd_t *)info->alloc_pgt_page(info->context);
+		if (!pmd)
+			return -ENOMEM;
+		ident_pmd_init(info->pmd_flag, pmd, addr, next);
+		set_pud(pud, __pud(__pa(pmd) | _KERNPG_TABLE));
+	}
+
+	return 0;
+}
+
+int kernel_ident_mapping_init(struct x86_mapping_info *info, pgd_t *pgd_page,
+			      unsigned long addr, unsigned long end)
+{
+	unsigned long next;
+	int result;
+	int off = info->kernel_mapping ? pgd_index(__PAGE_OFFSET) : 0;
+
+	for (; addr < end; addr = next) {
+		pgd_t *pgd = pgd_page + pgd_index(addr) + off;
+		pud_t *pud;
+
+		next = (addr & PGDIR_MASK) + PGDIR_SIZE;
+		if (next > end)
+			next = end;
+
+		if (pgd_present(*pgd)) {
+			pud = pud_offset(pgd, 0);
+			result = ident_pud_init(info, pud, addr, next);
+			if (result)
+				return result;
+			continue;
+		}
+
+		pud = (pud_t *)info->alloc_pgt_page(info->context);
+		if (!pud)
+			return -ENOMEM;
+		result = ident_pud_init(info, pud, addr, next);
+		if (result)
+			return result;
+		set_pgd(pgd, __pgd(__pa(pud) | _KERNPG_TABLE));
+	}
+
+	return 0;
+}
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 30eb05a..c30efb6 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -56,79 +56,7 @@
 
 #include "mm_internal.h"
 
-static void ident_pmd_init(unsigned long pmd_flag, pmd_t *pmd_page,
-			   unsigned long addr, unsigned long end)
-{
-	addr &= PMD_MASK;
-	for (; addr < end; addr += PMD_SIZE) {
-		pmd_t *pmd = pmd_page + pmd_index(addr);
-
-		if (!pmd_present(*pmd))
-			set_pmd(pmd, __pmd(addr | pmd_flag));
-	}
-}
-static int ident_pud_init(struct x86_mapping_info *info, pud_t *pud_page,
-			  unsigned long addr, unsigned long end)
-{
-	unsigned long next;
-
-	for (; addr < end; addr = next) {
-		pud_t *pud = pud_page + pud_index(addr);
-		pmd_t *pmd;
-
-		next = (addr & PUD_MASK) + PUD_SIZE;
-		if (next > end)
-			next = end;
-
-		if (pud_present(*pud)) {
-			pmd = pmd_offset(pud, 0);
-			ident_pmd_init(info->pmd_flag, pmd, addr, next);
-			continue;
-		}
-		pmd = (pmd_t *)info->alloc_pgt_page(info->context);
-		if (!pmd)
-			return -ENOMEM;
-		ident_pmd_init(info->pmd_flag, pmd, addr, next);
-		set_pud(pud, __pud(__pa(pmd) | _KERNPG_TABLE));
-	}
-
-	return 0;
-}
-
-int kernel_ident_mapping_init(struct x86_mapping_info *info, pgd_t *pgd_page,
-			      unsigned long addr, unsigned long end)
-{
-	unsigned long next;
-	int result;
-	int off = info->kernel_mapping ? pgd_index(__PAGE_OFFSET) : 0;
-
-	for (; addr < end; addr = next) {
-		pgd_t *pgd = pgd_page + pgd_index(addr) + off;
-		pud_t *pud;
-
-		next = (addr & PGDIR_MASK) + PGDIR_SIZE;
-		if (next > end)
-			next = end;
-
-		if (pgd_present(*pgd)) {
-			pud = pud_offset(pgd, 0);
-			result = ident_pud_init(info, pud, addr, next);
-			if (result)
-				return result;
-			continue;
-		}
-
-		pud = (pud_t *)info->alloc_pgt_page(info->context);
-		if (!pud)
-			return -ENOMEM;
-		result = ident_pud_init(info, pud, addr, next);
-		if (result)
-			return result;
-		set_pgd(pgd, __pgd(__pa(pud) | _KERNPG_TABLE));
-	}
-
-	return 0;
-}
+#include "ident_map.c"
 
 static int __init parse_direct_gbpages_off(char *arg)
 {
-- 
1.8.4.5

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v5 10/19] x86, 64bit: Set ident_mapping for kaslr
  2015-03-18  7:28 ` Yinghai Lu
                   ` (9 preceding siblings ...)
  (?)
@ 2015-03-18  7:28 ` Yinghai Lu
  -1 siblings, 0 replies; 33+ messages in thread
From: Yinghai Lu @ 2015-03-18  7:28 UTC (permalink / raw)
  To: Matt Fleming, H. Peter Anvin, Ingo Molnar, Jiri Kosina,
	Kees Cook, Borislav Petkov, Baoquan He
  Cc: Thomas Gleixner, linux-kernel, linux-efi, Yinghai Lu

Current aslr only support random in near range, and new range still use
old mapping. Also it does not support new range above 4G.

We need to have ident set for the new range before we can do decompress
to the new ouput, and later run them.

In this patch, we add ident mapping for all needed range.

At first, to aslr support to put random VO above 4G, we must set ident
mapping for the new range when it come via startup_32 path.

Secondly, when boot from 64bit bootloader, bootloader set ident mapping,
and boot via ZO (arch/x86/boot/compressed/vmlinux) startup_64.
Those pages for pagetable need to be avoided when we select new random
VO (vmlinux) base. Otherwise decompressor would overwrite them during
decompressing.
First way would be: go through pagetable and find out every page is used by
pagetable for every mem_aovid checking but we will need extra code, and may
need to increase mem_avoid array size to hold them.
Other way would be: We can create new ident mapping instead, and pages for
pagetable will come from _pagetable section of ZO, and they are in mem_avoid
array already. In this way, we can reuse the code for setting ident mapping.

The _pgtable will be shared 32bit and 64bit path to reduce init_size,
as now ZO _rodata to _end will contribute init_size.

We need to increase pgt buffer size.
When boot via startup_64, as we need to cover old VO, params, cmdline
and new VO, in extreme case we could have them all cross 512G boundary,
will need (2+2)*4 pages with 2M mapping. And need 2 for first 2M for vga ram.
Plus one for level4. Total will be 19 pages.
When boot via startup_32, aslr would move new VO above 4G, we need set extra
ident mapping for new VO, pgt buffer come from _pgtable offset 6 pages.
should only need (2+2) pages at most when it cross 512G boundary.
So 19 pages could make both paths happy.

Cc: Kees Cook <keescook@chromium.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Borislav Petkov <bp@suse.de>
Cc: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/boot/compressed/Makefile   |  3 ++
 arch/x86/boot/compressed/aslr.c     | 14 ++++++
 arch/x86/boot/compressed/head_64.S  |  4 +-
 arch/x86/boot/compressed/misc.h     | 11 +++++
 arch/x86/boot/compressed/misc_pgt.c | 91 +++++++++++++++++++++++++++++++++++++
 arch/x86/include/asm/boot.h         | 19 ++++++++
 6 files changed, 140 insertions(+), 2 deletions(-)
 create mode 100644 arch/x86/boot/compressed/misc_pgt.c

diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index e12a93c..66461b4 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -58,6 +58,9 @@ vmlinux-objs-y := $(obj)/vmlinux.lds $(obj)/head_$(BITS).o $(obj)/misc.o \
 
 vmlinux-objs-$(CONFIG_EARLY_PRINTK) += $(obj)/early_serial_console.o
 vmlinux-objs-$(CONFIG_RANDOMIZE_BASE) += $(obj)/aslr.o
+ifdef CONFIG_X86_64
+	vmlinux-objs-$(CONFIG_RANDOMIZE_BASE) += $(obj)/misc_pgt.o
+endif
 
 $(obj)/eboot.o: KBUILD_CFLAGS += -fshort-wchar -mno-red-zone
 
diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index 9dab0d6..f8d095d 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -154,6 +154,7 @@ static void mem_avoid_init(unsigned long input, unsigned long input_size,
 	 */
 	mem_avoid[0].start = input;
 	mem_avoid[0].size = (output + init_size) - input;
+	fill_pagetable(input, (output + init_size) - input);
 
 	/* Avoid initrd. */
 	initrd_start  = (u64)real_mode->ext_ramdisk_image << 32;
@@ -162,6 +163,7 @@ static void mem_avoid_init(unsigned long input, unsigned long input_size,
 	initrd_size |= real_mode->hdr.ramdisk_size;
 	mem_avoid[1].start = initrd_start;
 	mem_avoid[1].size = initrd_size;
+	/* don't need to set mapping for initrd */
 
 	/* Avoid kernel command line. */
 	cmd_line  = (u64)real_mode->ext_cmd_line_ptr << 32;
@@ -172,10 +174,19 @@ static void mem_avoid_init(unsigned long input, unsigned long input_size,
 		;
 	mem_avoid[2].start = cmd_line;
 	mem_avoid[2].size = cmd_line_size;
+	fill_pagetable(cmd_line, cmd_line_size);
 
 	/* Avoid params */
 	mem_avoid[3].start = (unsigned long)real_mode;
 	mem_avoid[3].size = sizeof(*real_mode);
+	fill_pagetable((unsigned long)real_mode, sizeof(*real_mode));
+
+	/* don't need to set mapping for setup_data */
+
+#ifdef CONFIG_X86_VERBOSE_BOOTUP
+	/* for video ram */
+	fill_pagetable(0, PMD_SIZE);
+#endif
 }
 
 /* Does this memory vector overlap a known avoided area? */
@@ -354,6 +365,9 @@ unsigned char *choose_kernel_location(unsigned char *input,
 		goto out;
 
 	choice = random;
+
+	fill_pagetable(choice, output_run_size);
+	switch_pagetable();
 out:
 	return (unsigned char *)choice;
 }
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 2678f03..11b1fbe 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -125,7 +125,7 @@ ENTRY(startup_32)
 	/* Initialize Page tables to 0 */
 	leal	pgtable(%ebx), %edi
 	xorl	%eax, %eax
-	movl	$((4096*6)/4), %ecx
+	movl	$(BOOT_INIT_PGT_SIZE/4), %ecx
 	rep	stosl
 
 	/* Build Level 4 */
@@ -477,4 +477,4 @@ boot_stack_end:
 	.section ".pgtable","a",@nobits
 	.balign 4096
 pgtable:
-	.fill 6*4096, 1, 0
+	.fill BOOT_PGT_SIZE, 1, 0
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index dec1663..98914a0 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -74,6 +74,17 @@ unsigned char *choose_kernel_location(unsigned char *input,
 }
 #endif
 
+#ifdef CONFIG_X86_64
+void fill_pagetable(unsigned long start, unsigned long size);
+void switch_pagetable(void);
+extern unsigned char _pgtable[];
+#else
+static inline void fill_pagetable(unsigned long start, unsigned long size)
+{ }
+static inline void switch_pagetable(void)
+{ }
+#endif
+
 #ifdef CONFIG_EARLY_PRINTK
 /* early_serial_console.c */
 extern int early_serial_base;
diff --git a/arch/x86/boot/compressed/misc_pgt.c b/arch/x86/boot/compressed/misc_pgt.c
new file mode 100644
index 0000000..4b81fb7
--- /dev/null
+++ b/arch/x86/boot/compressed/misc_pgt.c
@@ -0,0 +1,91 @@
+#define __pa(x)  ((unsigned long)(x))
+#define __va(x)  ((void *)((unsigned long)(x)))
+
+#include "misc.h"
+
+#include <asm/init.h>
+#include <asm/pgtable.h>
+
+#include "../../mm/ident_map.c"
+
+struct alloc_pgt_data {
+	unsigned char *pgt_buf;
+	unsigned long pgt_buf_size;
+	unsigned long pgt_buf_offset;
+};
+
+static void *alloc_pgt_page(void *context)
+{
+	struct alloc_pgt_data *d = (struct alloc_pgt_data *)context;
+	unsigned char *p = (unsigned char *)d->pgt_buf;
+
+	if (d->pgt_buf_offset >= d->pgt_buf_size) {
+		debug_putstr("out of pgt_buf in misc.c\n");
+		return NULL;
+	}
+
+	p += d->pgt_buf_offset;
+	d->pgt_buf_offset += PAGE_SIZE;
+
+	return p;
+}
+
+/*
+ * Use a normal definition of memset() from string.c. There are already
+ * included header files which expect a definition of memset() and by
+ * the time we define memset macro, it is too late.
+ */
+#undef memset
+#define memzero(s, n)   memset((s), 0, (n))
+
+unsigned long __force_order;
+static struct alloc_pgt_data pgt_data;
+static struct x86_mapping_info mapping_info;
+static pgd_t *level4p;
+
+void fill_pagetable(unsigned long start, unsigned long size)
+{
+	unsigned long end = start + size;
+
+	if (!level4p) {
+		pgt_data.pgt_buf_offset = 0;
+		mapping_info.alloc_pgt_page = alloc_pgt_page;
+		mapping_info.context = &pgt_data;
+		mapping_info.pmd_flag = __PAGE_KERNEL_LARGE_EXEC;
+
+		/*
+		 * come from startup_32 ?
+		 * then cr3 is _pgtable, we can reuse it.
+		 */
+		level4p = (pgd_t *)read_cr3();
+		if ((unsigned long)level4p == (unsigned long)_pgtable) {
+			pgt_data.pgt_buf = (unsigned char *)_pgtable +
+						 BOOT_INIT_PGT_SIZE;
+			pgt_data.pgt_buf_size = BOOT_PGT_SIZE -
+						 BOOT_INIT_PGT_SIZE;
+
+			debug_putstr("boot via startup_32\n");
+		} else {
+			pgt_data.pgt_buf = (unsigned char *)_pgtable;
+			pgt_data.pgt_buf_size = BOOT_PGT_SIZE;
+
+			debug_putstr("boot via startup_64\n");
+			level4p = (pgd_t *)alloc_pgt_page(&pgt_data);
+		}
+		memset((unsigned char *)pgt_data.pgt_buf, 0,
+			 pgt_data.pgt_buf_size);
+	}
+
+	/* align boundary to 2M */
+	start = round_down(start, PMD_SIZE);
+	end = round_up(end, PMD_SIZE);
+	if (start >= end)
+		return;
+
+	kernel_ident_mapping_init(&mapping_info, level4p, start, end);
+}
+
+void switch_pagetable(void)
+{
+	write_cr3((unsigned long)level4p);
+}
diff --git a/arch/x86/include/asm/boot.h b/arch/x86/include/asm/boot.h
index 4fa687a..7b23908 100644
--- a/arch/x86/include/asm/boot.h
+++ b/arch/x86/include/asm/boot.h
@@ -32,7 +32,26 @@
 #endif /* !CONFIG_KERNEL_BZIP2 */
 
 #ifdef CONFIG_X86_64
+
 #define BOOT_STACK_SIZE	0x4000
+
+#define BOOT_INIT_PGT_SIZE (6*4096)
+#ifdef CONFIG_RANDOMIZE_BASE
+/*
+ * 1 page for level4, 2 pages for first 2M.
+ * (2+2)*4 pages for kernel, param, cmd_line, random kernel
+ * if all cross 512G boundary.
+ * So total will be 19 pages.
+ */
+#ifdef CONFIG_X86_VERBOSE_BOOTUP
+#define BOOT_PGT_SIZE (19*4096)
+#else
+#define BOOT_PGT_SIZE (17*4096)
+#endif
+#else
+#define BOOT_PGT_SIZE BOOT_INIT_PGT_SIZE
+#endif
+
 #else
 #define BOOT_STACK_SIZE	0x1000
 #endif
-- 
1.8.4.5


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v5 11/19] x86, boot: Add checking for memcpy
@ 2015-03-18  7:28   ` Yinghai Lu
  0 siblings, 0 replies; 33+ messages in thread
From: Yinghai Lu @ 2015-03-18  7:28 UTC (permalink / raw)
  To: Matt Fleming, H. Peter Anvin, Ingo Molnar, Jiri Kosina,
	Kees Cook, Borislav Petkov, Baoquan He
  Cc: Thomas Gleixner, linux-kernel, linux-efi, Yinghai Lu

parse_elf is using local memcpy to move section to running position.

That memcpy actually only support no overlapping or dest < src.

Add checking in memcpy to find out wrong with future use, at that time
we will need to have backward memcpy for it.

Also put comments in parse_elf about the fact.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/boot/compressed/misc.c   | 14 +++++++-------
 arch/x86/boot/compressed/misc.h   |  2 ++
 arch/x86/boot/compressed/string.c | 28 ++++++++++++++++++++++++++--
 3 files changed, 35 insertions(+), 9 deletions(-)

diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index f99c9c9..94e283c 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -106,9 +106,6 @@
 #undef memset
 #define memzero(s, n)	memset((s), 0, (n))
 
-
-static void error(char *m);
-
 /*
  * This is set up by the setup-routine at boot-time
  */
@@ -218,7 +215,7 @@ void __putstr(const char *s)
 	outb(0xff & (pos >> 1), vidport+1);
 }
 
-static void error(char *x)
+void error(char *x)
 {
 	error_putstr("\n\n");
 	error_putstr(x);
@@ -353,9 +350,12 @@ static void parse_elf(void *output)
 #else
 			dest = (void *)(phdr->p_paddr);
 #endif
-			memcpy(dest,
-			       output + phdr->p_offset,
-			       phdr->p_filesz);
+			/*
+			 * simple version memcpy only can work when dest is
+			 *   smaller than src or no overlapping.
+			 * Here dest is smaller than src always.
+			 */
+			memcpy(dest, output + phdr->p_offset, phdr->p_filesz);
 			break;
 		default: /* Ignore other PT_* */ break;
 		}
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index 98914a0..a7f3826 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -37,6 +37,8 @@ extern struct boot_params *real_mode;		/* Pointer to real-mode data */
 void __putstr(const char *s);
 #define error_putstr(__x)  __putstr(__x)
 
+void error(char *x);
+
 #ifdef CONFIG_X86_VERBOSE_BOOTUP
 
 #define debug_putstr(__x)  __putstr(__x)
diff --git a/arch/x86/boot/compressed/string.c b/arch/x86/boot/compressed/string.c
index 00e788b..03805a4 100644
--- a/arch/x86/boot/compressed/string.c
+++ b/arch/x86/boot/compressed/string.c
@@ -1,7 +1,7 @@
 #include "../string.c"
 
 #ifdef CONFIG_X86_32
-void *memcpy(void *dest, const void *src, size_t n)
+void *__memcpy(void *dest, const void *src, size_t n)
 {
 	int d0, d1, d2;
 	asm volatile(
@@ -15,7 +15,7 @@ void *memcpy(void *dest, const void *src, size_t n)
 	return dest;
 }
 #else
-void *memcpy(void *dest, const void *src, size_t n)
+void *__memcpy(void *dest, const void *src, size_t n)
 {
 	long d0, d1, d2;
 	asm volatile(
@@ -30,6 +30,30 @@ void *memcpy(void *dest, const void *src, size_t n)
 }
 #endif
 
+void *memcpy(void *dest, const void *src, size_t n)
+{
+	unsigned long start_dest, end_dest;
+	unsigned long start_src, end_src;
+	unsigned long max_start, min_end;
+
+	if (dest < src)
+		return __memcpy(dest, src, n);
+
+	start_dest = (unsigned long)dest;
+	end_dest = (unsigned long)dest + n;
+	start_src = (unsigned long)src;
+	end_src = (unsigned long)src + n;
+	max_start = (start_dest > start_src) ?  start_dest : start_src;
+	min_end = (end_dest < end_src) ? end_dest : end_src;
+
+	if (max_start >= min_end)
+		return __memcpy(dest, src, n);
+
+	error("memcpy does not support overlapping with dest > src!\n");
+
+	return dest;
+}
+
 void *memset(void *s, int c, size_t n)
 {
 	int i;
-- 
1.8.4.5


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v5 11/19] x86, boot: Add checking for memcpy
@ 2015-03-18  7:28   ` Yinghai Lu
  0 siblings, 0 replies; 33+ messages in thread
From: Yinghai Lu @ 2015-03-18  7:28 UTC (permalink / raw)
  To: Matt Fleming, H. Peter Anvin, Ingo Molnar, Jiri Kosina,
	Kees Cook, Borislav Petkov, Baoquan He
  Cc: Thomas Gleixner, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-efi-u79uwXL29TY76Z2rM5mHXA, Yinghai Lu

parse_elf is using local memcpy to move section to running position.

That memcpy actually only support no overlapping or dest < src.

Add checking in memcpy to find out wrong with future use, at that time
we will need to have backward memcpy for it.

Also put comments in parse_elf about the fact.

Signed-off-by: Yinghai Lu <yinghai-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 arch/x86/boot/compressed/misc.c   | 14 +++++++-------
 arch/x86/boot/compressed/misc.h   |  2 ++
 arch/x86/boot/compressed/string.c | 28 ++++++++++++++++++++++++++--
 3 files changed, 35 insertions(+), 9 deletions(-)

diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index f99c9c9..94e283c 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -106,9 +106,6 @@
 #undef memset
 #define memzero(s, n)	memset((s), 0, (n))
 
-
-static void error(char *m);
-
 /*
  * This is set up by the setup-routine at boot-time
  */
@@ -218,7 +215,7 @@ void __putstr(const char *s)
 	outb(0xff & (pos >> 1), vidport+1);
 }
 
-static void error(char *x)
+void error(char *x)
 {
 	error_putstr("\n\n");
 	error_putstr(x);
@@ -353,9 +350,12 @@ static void parse_elf(void *output)
 #else
 			dest = (void *)(phdr->p_paddr);
 #endif
-			memcpy(dest,
-			       output + phdr->p_offset,
-			       phdr->p_filesz);
+			/*
+			 * simple version memcpy only can work when dest is
+			 *   smaller than src or no overlapping.
+			 * Here dest is smaller than src always.
+			 */
+			memcpy(dest, output + phdr->p_offset, phdr->p_filesz);
 			break;
 		default: /* Ignore other PT_* */ break;
 		}
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index 98914a0..a7f3826 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -37,6 +37,8 @@ extern struct boot_params *real_mode;		/* Pointer to real-mode data */
 void __putstr(const char *s);
 #define error_putstr(__x)  __putstr(__x)
 
+void error(char *x);
+
 #ifdef CONFIG_X86_VERBOSE_BOOTUP
 
 #define debug_putstr(__x)  __putstr(__x)
diff --git a/arch/x86/boot/compressed/string.c b/arch/x86/boot/compressed/string.c
index 00e788b..03805a4 100644
--- a/arch/x86/boot/compressed/string.c
+++ b/arch/x86/boot/compressed/string.c
@@ -1,7 +1,7 @@
 #include "../string.c"
 
 #ifdef CONFIG_X86_32
-void *memcpy(void *dest, const void *src, size_t n)
+void *__memcpy(void *dest, const void *src, size_t n)
 {
 	int d0, d1, d2;
 	asm volatile(
@@ -15,7 +15,7 @@ void *memcpy(void *dest, const void *src, size_t n)
 	return dest;
 }
 #else
-void *memcpy(void *dest, const void *src, size_t n)
+void *__memcpy(void *dest, const void *src, size_t n)
 {
 	long d0, d1, d2;
 	asm volatile(
@@ -30,6 +30,30 @@ void *memcpy(void *dest, const void *src, size_t n)
 }
 #endif
 
+void *memcpy(void *dest, const void *src, size_t n)
+{
+	unsigned long start_dest, end_dest;
+	unsigned long start_src, end_src;
+	unsigned long max_start, min_end;
+
+	if (dest < src)
+		return __memcpy(dest, src, n);
+
+	start_dest = (unsigned long)dest;
+	end_dest = (unsigned long)dest + n;
+	start_src = (unsigned long)src;
+	end_src = (unsigned long)src + n;
+	max_start = (start_dest > start_src) ?  start_dest : start_src;
+	min_end = (end_dest < end_src) ? end_dest : end_src;
+
+	if (max_start >= min_end)
+		return __memcpy(dest, src, n);
+
+	error("memcpy does not support overlapping with dest > src!\n");
+
+	return dest;
+}
+
 void *memset(void *s, int c, size_t n)
 {
 	int i;
-- 
1.8.4.5

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v5 12/19] x86, kaslr: Fix a bug that relocation can not be handled when kernel is loaded above 2G
  2015-03-18  7:28 ` Yinghai Lu
                   ` (11 preceding siblings ...)
  (?)
@ 2015-03-18  7:28 ` Yinghai Lu
  -1 siblings, 0 replies; 33+ messages in thread
From: Yinghai Lu @ 2015-03-18  7:28 UTC (permalink / raw)
  To: Matt Fleming, H. Peter Anvin, Ingo Molnar, Jiri Kosina,
	Kees Cook, Borislav Petkov, Baoquan He
  Cc: Thomas Gleixner, linux-kernel, linux-efi

From: Baoquan He <bhe@redhat.com>

When process 32 bit relocation tables a local variable extended is
defined to calculate the physical address of relocs entry. However
it's type is int which is enough for i386, for x86_64 not enough.
That's why relocation can only be handled when kernel is loaded
below 2G, otherwise a overflow will happen and cause system hang.

Here change it to long as 32 bit inverse relocation processing does,
and this change is safe for i386 relocation handling too.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 arch/x86/boot/compressed/misc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 94e283c..d6b4d91 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -273,7 +273,7 @@ static void handle_relocations(void *output, unsigned long output_len)
 	 * So we work backwards from the end of the decompressed image.
 	 */
 	for (reloc = output + output_len - sizeof(*reloc); *reloc; reloc--) {
-		int extended = *reloc;
+		long extended = *reloc;
 		extended += map;
 
 		ptr = (unsigned long)extended;
-- 
1.8.4.5


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v5 13/19] x86, kaslr: Introduce struct slot_area to manage randomization slot info
@ 2015-03-18  7:28   ` Yinghai Lu
  0 siblings, 0 replies; 33+ messages in thread
From: Yinghai Lu @ 2015-03-18  7:28 UTC (permalink / raw)
  To: Matt Fleming, H. Peter Anvin, Ingo Molnar, Jiri Kosina,
	Kees Cook, Borislav Petkov, Baoquan He
  Cc: Thomas Gleixner, linux-kernel, linux-efi

From: Baoquan He <bhe@redhat.com>

Kernel is expected to be randomly reloaded anywhere in the whole
physical memory area, it could be near 64T at most. In this case
there could be about 4*1024*1024 randomization slots. Hence the
old slot array will cost too much memory and also not efficient
to store the slot information one by one into slot array.

Here introduce struct slot_area to manage randomization slot info
in one contiguous memory area excluding the avoid area. slot_areas
is used to store all slot area info. Since setup_data is a linked
list, could contain many datas by pointer to point one by one,
excluding them will split RAM memory into many smaller areas, here
only take the first 100 slot areas if too many of them.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 arch/x86/boot/compressed/aslr.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index f8d095d..33693c1 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -219,8 +219,20 @@ static bool mem_avoid_overlap(struct mem_vector *img)
 
 static unsigned long slots[CONFIG_RANDOMIZE_BASE_MAX_OFFSET /
 			   CONFIG_PHYSICAL_ALIGN];
+
+struct slot_area {
+	unsigned long addr;
+	int num;
+};
+
+#define MAX_SLOT_AREA 100
+
+static struct slot_area slot_areas[MAX_SLOT_AREA];
+
 static unsigned long slot_max;
 
+static unsigned long slot_area_index;
+
 static void slots_append(unsigned long addr)
 {
 	/* Overflowing the slots list should be impossible. */
-- 
1.8.4.5


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v5 13/19] x86, kaslr: Introduce struct slot_area to manage randomization slot info
@ 2015-03-18  7:28   ` Yinghai Lu
  0 siblings, 0 replies; 33+ messages in thread
From: Yinghai Lu @ 2015-03-18  7:28 UTC (permalink / raw)
  To: Matt Fleming, H. Peter Anvin, Ingo Molnar, Jiri Kosina,
	Kees Cook, Borislav Petkov, Baoquan He
  Cc: Thomas Gleixner, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-efi-u79uwXL29TY76Z2rM5mHXA

From: Baoquan He <bhe-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

Kernel is expected to be randomly reloaded anywhere in the whole
physical memory area, it could be near 64T at most. In this case
there could be about 4*1024*1024 randomization slots. Hence the
old slot array will cost too much memory and also not efficient
to store the slot information one by one into slot array.

Here introduce struct slot_area to manage randomization slot info
in one contiguous memory area excluding the avoid area. slot_areas
is used to store all slot area info. Since setup_data is a linked
list, could contain many datas by pointer to point one by one,
excluding them will split RAM memory into many smaller areas, here
only take the first 100 slot areas if too many of them.

Signed-off-by: Baoquan He <bhe-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 arch/x86/boot/compressed/aslr.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index f8d095d..33693c1 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -219,8 +219,20 @@ static bool mem_avoid_overlap(struct mem_vector *img)
 
 static unsigned long slots[CONFIG_RANDOMIZE_BASE_MAX_OFFSET /
 			   CONFIG_PHYSICAL_ALIGN];
+
+struct slot_area {
+	unsigned long addr;
+	int num;
+};
+
+#define MAX_SLOT_AREA 100
+
+static struct slot_area slot_areas[MAX_SLOT_AREA];
+
 static unsigned long slot_max;
 
+static unsigned long slot_area_index;
+
 static void slots_append(unsigned long addr)
 {
 	/* Overflowing the slots list should be impossible. */
-- 
1.8.4.5

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v5 14/19] x86, kaslr: Add two functions which will be used later
  2015-03-18  7:28 ` Yinghai Lu
                   ` (13 preceding siblings ...)
  (?)
@ 2015-03-18  7:28 ` Yinghai Lu
  -1 siblings, 0 replies; 33+ messages in thread
From: Yinghai Lu @ 2015-03-18  7:28 UTC (permalink / raw)
  To: Matt Fleming, H. Peter Anvin, Ingo Molnar, Jiri Kosina,
	Kees Cook, Borislav Petkov, Baoquan He
  Cc: Thomas Gleixner, linux-kernel, linux-efi

From: Baoquan He <bhe@redhat.com>

Add two functions mem_min_overlap() and store_slot_info() which will be
used later.

Given a memory region mem_min_overlap will iterate all avoid region to
find the first one which overlap with it.

store_slot_info() calculates the slot info of passed in region and
store it into slot_areas[].

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 arch/x86/boot/compressed/aslr.c | 51 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 51 insertions(+)

diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index 33693c1..37dce4f 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -217,6 +217,40 @@ static bool mem_avoid_overlap(struct mem_vector *img)
 	return false;
 }
 
+static unsigned long
+mem_min_overlap(struct mem_vector *img, struct mem_vector *out)
+{
+	int i;
+	struct setup_data *ptr;
+	unsigned long min = img->start + img->size;
+
+	for (i = 0; i < MEM_AVOID_MAX; i++) {
+		if (mem_overlaps(img, &mem_avoid[i]) &&
+			(mem_avoid[i].start < min)) {
+			*out = mem_avoid[i];
+			min = mem_avoid[i].start;
+		}
+	}
+
+	/* Check all entries in the setup_data linked list. */
+	ptr = (struct setup_data *)(unsigned long)real_mode->hdr.setup_data;
+	while (ptr) {
+		struct mem_vector avoid;
+
+		avoid.start = (unsigned long)ptr;
+		avoid.size = sizeof(*ptr) + ptr->len;
+
+		if (mem_overlaps(img, &avoid) && (avoid.start < min)) {
+			*out = avoid;
+			min = avoid.start;
+		}
+
+		ptr = (struct setup_data *)(unsigned long)ptr->next;
+	}
+
+	return min;
+}
+
 static unsigned long slots[CONFIG_RANDOMIZE_BASE_MAX_OFFSET /
 			   CONFIG_PHYSICAL_ALIGN];
 
@@ -233,6 +267,23 @@ static unsigned long slot_max;
 
 static unsigned long slot_area_index;
 
+static void store_slot_info(struct mem_vector *region, unsigned long image_size)
+{
+	struct slot_area slot_area;
+
+	slot_area.addr = region->start;
+	if (image_size <= CONFIG_PHYSICAL_ALIGN)
+		slot_area.num = region->size / CONFIG_PHYSICAL_ALIGN;
+	else
+		slot_area.num = (region->size - image_size) /
+				CONFIG_PHYSICAL_ALIGN + 1;
+
+	if (slot_area.num > 0) {
+		slot_areas[slot_area_index++] = slot_area;
+		slot_max += slot_area.num;
+	}
+}
+
 static void slots_append(unsigned long addr)
 {
 	/* Overflowing the slots list should be impossible. */
-- 
1.8.4.5


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v5 15/19] x86, kaslr: Introduce fetch_random_virt_offset to randomize the kernel text mapping address
@ 2015-03-18  7:28   ` Yinghai Lu
  0 siblings, 0 replies; 33+ messages in thread
From: Yinghai Lu @ 2015-03-18  7:28 UTC (permalink / raw)
  To: Matt Fleming, H. Peter Anvin, Ingo Molnar, Jiri Kosina,
	Kees Cook, Borislav Petkov, Baoquan He
  Cc: Thomas Gleixner, linux-kernel, linux-efi

From: Baoquan He <bhe@redhat.com>

Kaslr extended kernel text mapping region size from 512M to 1G,
namely CONFIG_RANDOMIZE_BASE_MAX_OFFSET. This means kernel text
can be mapped to below region:

[__START_KERNEL_map + LOAD_PHYSICAL_ADDR, __START_KERNEL_map + 1G]

Introduce a function find_random_virt_offset() to get random value
between LOAD_PHYSICAL_ADDR and CONFIG_RANDOMIZE_BASE_MAX_OFFSET.
This random value will be added to __START_KERNEL_map to get the
starting address which kernel text is mapped from. Since slot can
be anywhere of this region, means it is an independent slot_area,
it is simple to get a slot according to random value.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 arch/x86/boot/compressed/aslr.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index 37dce4f..5114142 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -369,6 +369,27 @@ static unsigned long find_random_addr(unsigned long minimum,
 	return slots_fetch_random();
 }
 
+static unsigned long find_random_virt_offset(unsigned long minimum,
+				  unsigned long image_size)
+{
+	unsigned long slot_num, random;
+
+	/* Make sure minimum is aligned. */
+	minimum = ALIGN(minimum, CONFIG_PHYSICAL_ALIGN);
+
+	if (image_size <= CONFIG_PHYSICAL_ALIGN)
+		slot_num = (CONFIG_RANDOMIZE_BASE_MAX_OFFSET - minimum) /
+				CONFIG_PHYSICAL_ALIGN;
+	else
+		slot_num = (CONFIG_RANDOMIZE_BASE_MAX_OFFSET -
+				minimum - image_size) /
+				CONFIG_PHYSICAL_ALIGN + 1;
+
+	random = get_random_long() % slot_num;
+
+	return random * CONFIG_PHYSICAL_ALIGN + minimum;
+}
+
 static void add_kaslr_setup_data(__u8 enabled)
 {
 	struct setup_data *data;
-- 
1.8.4.5


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v5 15/19] x86, kaslr: Introduce fetch_random_virt_offset to randomize the kernel text mapping address
@ 2015-03-18  7:28   ` Yinghai Lu
  0 siblings, 0 replies; 33+ messages in thread
From: Yinghai Lu @ 2015-03-18  7:28 UTC (permalink / raw)
  To: Matt Fleming, H. Peter Anvin, Ingo Molnar, Jiri Kosina,
	Kees Cook, Borislav Petkov, Baoquan He
  Cc: Thomas Gleixner, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-efi-u79uwXL29TY76Z2rM5mHXA

From: Baoquan He <bhe-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

Kaslr extended kernel text mapping region size from 512M to 1G,
namely CONFIG_RANDOMIZE_BASE_MAX_OFFSET. This means kernel text
can be mapped to below region:

[__START_KERNEL_map + LOAD_PHYSICAL_ADDR, __START_KERNEL_map + 1G]

Introduce a function find_random_virt_offset() to get random value
between LOAD_PHYSICAL_ADDR and CONFIG_RANDOMIZE_BASE_MAX_OFFSET.
This random value will be added to __START_KERNEL_map to get the
starting address which kernel text is mapped from. Since slot can
be anywhere of this region, means it is an independent slot_area,
it is simple to get a slot according to random value.

Signed-off-by: Baoquan He <bhe-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 arch/x86/boot/compressed/aslr.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index 37dce4f..5114142 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -369,6 +369,27 @@ static unsigned long find_random_addr(unsigned long minimum,
 	return slots_fetch_random();
 }
 
+static unsigned long find_random_virt_offset(unsigned long minimum,
+				  unsigned long image_size)
+{
+	unsigned long slot_num, random;
+
+	/* Make sure minimum is aligned. */
+	minimum = ALIGN(minimum, CONFIG_PHYSICAL_ALIGN);
+
+	if (image_size <= CONFIG_PHYSICAL_ALIGN)
+		slot_num = (CONFIG_RANDOMIZE_BASE_MAX_OFFSET - minimum) /
+				CONFIG_PHYSICAL_ALIGN;
+	else
+		slot_num = (CONFIG_RANDOMIZE_BASE_MAX_OFFSET -
+				minimum - image_size) /
+				CONFIG_PHYSICAL_ALIGN + 1;
+
+	random = get_random_long() % slot_num;
+
+	return random * CONFIG_PHYSICAL_ALIGN + minimum;
+}
+
 static void add_kaslr_setup_data(__u8 enabled)
 {
 	struct setup_data *data;
-- 
1.8.4.5

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v5 16/19] x86, kaslr: Randomize physical and virtual address of kernel separately
@ 2015-03-18  7:28   ` Yinghai Lu
  0 siblings, 0 replies; 33+ messages in thread
From: Yinghai Lu @ 2015-03-18  7:28 UTC (permalink / raw)
  To: Matt Fleming, H. Peter Anvin, Ingo Molnar, Jiri Kosina,
	Kees Cook, Borislav Petkov, Baoquan He
  Cc: Thomas Gleixner, linux-kernel, linux-efi

From: Baoquan He <bhe@redhat.com>

On x86_64, in old kaslr implementaion only physical address of kernel
loading is randomized. Then calculate the delta of physical address
where vmlinux was linked to load and where it is finally loaded. If
delta is not equal to 0, namely there's a new physical address where
kernel is actually decompressed, relocation handling need be done. Then
delta is added to offset of kernel symbol relocation, this makes the
address of kernel text mapping move delta long.

Here the behavior is changed. Randomize both the physical address
where kernel is decompressed and the virtual address where kernel text
is mapped. And relocation handling only depends on virtual address
randomization. Means if and only if virtual address is randomized to
a different value, we add the delta to the offset of kernel relocs.

Note that up to now both virtual offset and physical addr randomization
cann't exceed CONFIG_RANDOMIZE_BASE_MAX_OFFSET, namely 1G.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 arch/x86/boot/compressed/aslr.c | 46 +++++++++++++++++++++--------------------
 arch/x86/boot/compressed/misc.c | 39 ++++++++++++++++++++--------------
 arch/x86/boot/compressed/misc.h | 19 +++++++++--------
 3 files changed, 58 insertions(+), 46 deletions(-)

diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index 5114142..ae0aed9 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -352,7 +352,7 @@ static void process_e820_entry(struct e820entry *entry,
 	}
 }
 
-static unsigned long find_random_addr(unsigned long minimum,
+static unsigned long find_random_phy_addr(unsigned long minimum,
 				      unsigned long size)
 {
 	int i;
@@ -410,48 +410,50 @@ static void add_kaslr_setup_data(__u8 enabled)
 		real_mode->hdr.setup_data = (unsigned long)&kaslr_setup_data;
 }
 
-unsigned char *choose_kernel_location(unsigned char *input,
-				      unsigned long input_size,
-				      unsigned char *output,
-				      unsigned long output_run_size)
+void choose_kernel_location(unsigned char *input,
+				unsigned long input_size,
+				unsigned char **output,
+				unsigned long output_run_size,
+				unsigned char **virt_offset)
 {
-	unsigned long choice = (unsigned long)output;
 	unsigned long random;
+	*virt_offset = (unsigned char *)LOAD_PHYSICAL_ADDR;
 
 #ifdef CONFIG_HIBERNATION
 	if (!cmdline_find_option_bool("kaslr")) {
 		debug_putstr("KASLR disabled by default...\n");
 		add_kaslr_setup_data(0);
-		goto out;
+		return;
 	}
 #else
 	if (cmdline_find_option_bool("nokaslr")) {
 		debug_putstr("KASLR disabled by cmdline...\n");
 		add_kaslr_setup_data(0);
-		goto out;
+		return;
 	}
 #endif
 	add_kaslr_setup_data(1);
 
 	/* Record the various known unsafe memory ranges. */
 	mem_avoid_init((unsigned long)input, input_size,
-		       (unsigned long)output);
+		       (unsigned long)*output);
 
 	/* Walk e820 and find a random address. */
-	random = find_random_addr(choice, output_run_size);
-	if (!random) {
+	random = find_random_phy_addr((unsigned long)*output, output_run_size);
+	if (!random)
 		debug_putstr("KASLR could not find suitable E820 region...\n");
-		goto out;
+	else {
+		if ((unsigned long)*output != random) {
+			fill_pagetable(random, output_run_size);
+			switch_pagetable();
+			*output = (unsigned char *)random;
+		}
 	}
 
-	/* Always enforce the minimum. */
-	if (random < choice)
-		goto out;
-
-	choice = random;
-
-	fill_pagetable(choice, output_run_size);
-	switch_pagetable();
-out:
-	return (unsigned char *)choice;
+	/*
+	 * Get a random address between LOAD_PHYSICAL_ADDR and
+	 * CONFIG_RANDOMIZE_BASE_MAX_OFFSET
+	 */
+	random = find_random_virt_offset(LOAD_PHYSICAL_ADDR, output_run_size);
+	*virt_offset = (unsigned char *)random;
 }
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index d6b4d91..03fa414 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -226,7 +226,8 @@ void error(char *x)
 }
 
 #if CONFIG_X86_NEED_RELOCS
-static void handle_relocations(void *output, unsigned long output_len)
+static void handle_relocations(void *output, unsigned long output_len,
+			       void *virt_offset)
 {
 	int *reloc;
 	unsigned long delta, map, ptr;
@@ -238,11 +239,6 @@ static void handle_relocations(void *output, unsigned long output_len)
 	 * and where it was actually loaded.
 	 */
 	delta = min_addr - LOAD_PHYSICAL_ADDR;
-	if (!delta) {
-		debug_putstr("No relocation needed... ");
-		return;
-	}
-	debug_putstr("Performing relocations... ");
 
 	/*
 	 * The kernel contains a table of relocation addresses. Those
@@ -253,6 +249,22 @@ static void handle_relocations(void *output, unsigned long output_len)
 	 */
 	map = delta - __START_KERNEL_map;
 
+
+
+	/*
+	 * 32-bit always performs relocations. 64-bit relocations are only
+	 * needed if kASLR has chosen a different starting address offset
+	 * from __START_KERNEL_map.
+	 */
+	if (IS_ENABLED(CONFIG_X86_64))
+		delta = (unsigned long)virt_offset - LOAD_PHYSICAL_ADDR;
+
+	if (!delta) {
+		debug_putstr("No relocation needed... ");
+		return;
+	}
+	debug_putstr("Performing relocations... ");
+
 	/*
 	 * Process relocations: 32 bit relocations first then 64 bit after.
 	 * Three sets of binary relocations are added to the end of the kernel
@@ -306,7 +318,8 @@ static void handle_relocations(void *output, unsigned long output_len)
 #endif
 }
 #else
-static inline void handle_relocations(void *output, unsigned long output_len)
+static inline void handle_relocations(void *output, unsigned long output_len,
+				      void *virt_offset)
 { }
 #endif
 
@@ -373,6 +386,7 @@ asmlinkage __visible void *decompress_kernel(void *rmode, memptr heap,
 	unsigned long run_size = VO__end - VO__text;
 	unsigned char *output_orig = output;
 	unsigned long output_run_size;
+	unsigned char *virt_offset;
 
 	real_mode = rmode;
 
@@ -402,8 +416,8 @@ asmlinkage __visible void *decompress_kernel(void *rmode, memptr heap,
 	 * the entire decompressed kernel plus relocation table, or the
 	 * entire decompressed kernel plus .bss and .brk sections.
 	 */
-	output = choose_kernel_location(input_data, input_len, output,
-					output_run_size);
+	choose_kernel_location(input_data, input_len, &output,
+			       output_run_size, &virt_offset);
 
 	/* Validate memory location choices. */
 	if ((unsigned long)output & (MIN_KERNEL_ALIGN - 1))
@@ -423,12 +437,7 @@ asmlinkage __visible void *decompress_kernel(void *rmode, memptr heap,
 	debug_putstr("\nDecompressing Linux... ");
 	decompress(input_data, input_len, NULL, NULL, output, NULL, error);
 	parse_elf(output);
-	/*
-	 * 32-bit always performs relocations. 64-bit relocations are only
-	 * needed if kASLR has chosen a different load address.
-	 */
-	if (!IS_ENABLED(CONFIG_X86_64) || output != output_orig)
-		handle_relocations(output, output_len);
+	handle_relocations(output, output_len, virt_offset);
 	debug_putstr("done.\nBooting the kernel.\n");
 	return output;
 }
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index a7f3826..461f20b 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -59,20 +59,21 @@ int cmdline_find_option_bool(const char *option);
 
 #if CONFIG_RANDOMIZE_BASE
 /* aslr.c */
-unsigned char *choose_kernel_location(unsigned char *input,
-				      unsigned long input_size,
-				      unsigned char *output,
-				      unsigned long output_run_size);
+void choose_kernel_location(unsigned char *input,
+				unsigned long input_size,
+				unsigned char **output,
+				unsigned long output_run_size,
+				unsigned char **virt_offset);
 /* cpuflags.c */
 bool has_cpuflag(int flag);
 #else
 static inline
-unsigned char *choose_kernel_location(unsigned char *input,
-				      unsigned long input_size,
-				      unsigned char *output,
-				      unsigned long output_run_size)
+void choose_kernel_location(unsigned char *input,
+				unsigned long input_size,
+				unsigned char **output,
+				unsigned long output_run_size,
+				unsigned char **virt_offset)
 {
-	return output;
 }
 #endif
 
-- 
1.8.4.5


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v5 16/19] x86, kaslr: Randomize physical and virtual address of kernel separately
@ 2015-03-18  7:28   ` Yinghai Lu
  0 siblings, 0 replies; 33+ messages in thread
From: Yinghai Lu @ 2015-03-18  7:28 UTC (permalink / raw)
  To: Matt Fleming, H. Peter Anvin, Ingo Molnar, Jiri Kosina,
	Kees Cook, Borislav Petkov, Baoquan He
  Cc: Thomas Gleixner, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-efi-u79uwXL29TY76Z2rM5mHXA

From: Baoquan He <bhe-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

On x86_64, in old kaslr implementaion only physical address of kernel
loading is randomized. Then calculate the delta of physical address
where vmlinux was linked to load and where it is finally loaded. If
delta is not equal to 0, namely there's a new physical address where
kernel is actually decompressed, relocation handling need be done. Then
delta is added to offset of kernel symbol relocation, this makes the
address of kernel text mapping move delta long.

Here the behavior is changed. Randomize both the physical address
where kernel is decompressed and the virtual address where kernel text
is mapped. And relocation handling only depends on virtual address
randomization. Means if and only if virtual address is randomized to
a different value, we add the delta to the offset of kernel relocs.

Note that up to now both virtual offset and physical addr randomization
cann't exceed CONFIG_RANDOMIZE_BASE_MAX_OFFSET, namely 1G.

Signed-off-by: Baoquan He <bhe-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 arch/x86/boot/compressed/aslr.c | 46 +++++++++++++++++++++--------------------
 arch/x86/boot/compressed/misc.c | 39 ++++++++++++++++++++--------------
 arch/x86/boot/compressed/misc.h | 19 +++++++++--------
 3 files changed, 58 insertions(+), 46 deletions(-)

diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index 5114142..ae0aed9 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -352,7 +352,7 @@ static void process_e820_entry(struct e820entry *entry,
 	}
 }
 
-static unsigned long find_random_addr(unsigned long minimum,
+static unsigned long find_random_phy_addr(unsigned long minimum,
 				      unsigned long size)
 {
 	int i;
@@ -410,48 +410,50 @@ static void add_kaslr_setup_data(__u8 enabled)
 		real_mode->hdr.setup_data = (unsigned long)&kaslr_setup_data;
 }
 
-unsigned char *choose_kernel_location(unsigned char *input,
-				      unsigned long input_size,
-				      unsigned char *output,
-				      unsigned long output_run_size)
+void choose_kernel_location(unsigned char *input,
+				unsigned long input_size,
+				unsigned char **output,
+				unsigned long output_run_size,
+				unsigned char **virt_offset)
 {
-	unsigned long choice = (unsigned long)output;
 	unsigned long random;
+	*virt_offset = (unsigned char *)LOAD_PHYSICAL_ADDR;
 
 #ifdef CONFIG_HIBERNATION
 	if (!cmdline_find_option_bool("kaslr")) {
 		debug_putstr("KASLR disabled by default...\n");
 		add_kaslr_setup_data(0);
-		goto out;
+		return;
 	}
 #else
 	if (cmdline_find_option_bool("nokaslr")) {
 		debug_putstr("KASLR disabled by cmdline...\n");
 		add_kaslr_setup_data(0);
-		goto out;
+		return;
 	}
 #endif
 	add_kaslr_setup_data(1);
 
 	/* Record the various known unsafe memory ranges. */
 	mem_avoid_init((unsigned long)input, input_size,
-		       (unsigned long)output);
+		       (unsigned long)*output);
 
 	/* Walk e820 and find a random address. */
-	random = find_random_addr(choice, output_run_size);
-	if (!random) {
+	random = find_random_phy_addr((unsigned long)*output, output_run_size);
+	if (!random)
 		debug_putstr("KASLR could not find suitable E820 region...\n");
-		goto out;
+	else {
+		if ((unsigned long)*output != random) {
+			fill_pagetable(random, output_run_size);
+			switch_pagetable();
+			*output = (unsigned char *)random;
+		}
 	}
 
-	/* Always enforce the minimum. */
-	if (random < choice)
-		goto out;
-
-	choice = random;
-
-	fill_pagetable(choice, output_run_size);
-	switch_pagetable();
-out:
-	return (unsigned char *)choice;
+	/*
+	 * Get a random address between LOAD_PHYSICAL_ADDR and
+	 * CONFIG_RANDOMIZE_BASE_MAX_OFFSET
+	 */
+	random = find_random_virt_offset(LOAD_PHYSICAL_ADDR, output_run_size);
+	*virt_offset = (unsigned char *)random;
 }
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index d6b4d91..03fa414 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -226,7 +226,8 @@ void error(char *x)
 }
 
 #if CONFIG_X86_NEED_RELOCS
-static void handle_relocations(void *output, unsigned long output_len)
+static void handle_relocations(void *output, unsigned long output_len,
+			       void *virt_offset)
 {
 	int *reloc;
 	unsigned long delta, map, ptr;
@@ -238,11 +239,6 @@ static void handle_relocations(void *output, unsigned long output_len)
 	 * and where it was actually loaded.
 	 */
 	delta = min_addr - LOAD_PHYSICAL_ADDR;
-	if (!delta) {
-		debug_putstr("No relocation needed... ");
-		return;
-	}
-	debug_putstr("Performing relocations... ");
 
 	/*
 	 * The kernel contains a table of relocation addresses. Those
@@ -253,6 +249,22 @@ static void handle_relocations(void *output, unsigned long output_len)
 	 */
 	map = delta - __START_KERNEL_map;
 
+
+
+	/*
+	 * 32-bit always performs relocations. 64-bit relocations are only
+	 * needed if kASLR has chosen a different starting address offset
+	 * from __START_KERNEL_map.
+	 */
+	if (IS_ENABLED(CONFIG_X86_64))
+		delta = (unsigned long)virt_offset - LOAD_PHYSICAL_ADDR;
+
+	if (!delta) {
+		debug_putstr("No relocation needed... ");
+		return;
+	}
+	debug_putstr("Performing relocations... ");
+
 	/*
 	 * Process relocations: 32 bit relocations first then 64 bit after.
 	 * Three sets of binary relocations are added to the end of the kernel
@@ -306,7 +318,8 @@ static void handle_relocations(void *output, unsigned long output_len)
 #endif
 }
 #else
-static inline void handle_relocations(void *output, unsigned long output_len)
+static inline void handle_relocations(void *output, unsigned long output_len,
+				      void *virt_offset)
 { }
 #endif
 
@@ -373,6 +386,7 @@ asmlinkage __visible void *decompress_kernel(void *rmode, memptr heap,
 	unsigned long run_size = VO__end - VO__text;
 	unsigned char *output_orig = output;
 	unsigned long output_run_size;
+	unsigned char *virt_offset;
 
 	real_mode = rmode;
 
@@ -402,8 +416,8 @@ asmlinkage __visible void *decompress_kernel(void *rmode, memptr heap,
 	 * the entire decompressed kernel plus relocation table, or the
 	 * entire decompressed kernel plus .bss and .brk sections.
 	 */
-	output = choose_kernel_location(input_data, input_len, output,
-					output_run_size);
+	choose_kernel_location(input_data, input_len, &output,
+			       output_run_size, &virt_offset);
 
 	/* Validate memory location choices. */
 	if ((unsigned long)output & (MIN_KERNEL_ALIGN - 1))
@@ -423,12 +437,7 @@ asmlinkage __visible void *decompress_kernel(void *rmode, memptr heap,
 	debug_putstr("\nDecompressing Linux... ");
 	decompress(input_data, input_len, NULL, NULL, output, NULL, error);
 	parse_elf(output);
-	/*
-	 * 32-bit always performs relocations. 64-bit relocations are only
-	 * needed if kASLR has chosen a different load address.
-	 */
-	if (!IS_ENABLED(CONFIG_X86_64) || output != output_orig)
-		handle_relocations(output, output_len);
+	handle_relocations(output, output_len, virt_offset);
 	debug_putstr("done.\nBooting the kernel.\n");
 	return output;
 }
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index a7f3826..461f20b 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -59,20 +59,21 @@ int cmdline_find_option_bool(const char *option);
 
 #if CONFIG_RANDOMIZE_BASE
 /* aslr.c */
-unsigned char *choose_kernel_location(unsigned char *input,
-				      unsigned long input_size,
-				      unsigned char *output,
-				      unsigned long output_run_size);
+void choose_kernel_location(unsigned char *input,
+				unsigned long input_size,
+				unsigned char **output,
+				unsigned long output_run_size,
+				unsigned char **virt_offset);
 /* cpuflags.c */
 bool has_cpuflag(int flag);
 #else
 static inline
-unsigned char *choose_kernel_location(unsigned char *input,
-				      unsigned long input_size,
-				      unsigned char *output,
-				      unsigned long output_run_size)
+void choose_kernel_location(unsigned char *input,
+				unsigned long input_size,
+				unsigned char **output,
+				unsigned long output_run_size,
+				unsigned char **virt_offset)
 {
-	return output;
 }
 #endif
 
-- 
1.8.4.5

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v5 17/19] x86, kaslr: Add support of kernel physical address randomization above 4G
  2015-03-18  7:28 ` Yinghai Lu
                   ` (16 preceding siblings ...)
  (?)
@ 2015-03-18  7:28 ` Yinghai Lu
  -1 siblings, 0 replies; 33+ messages in thread
From: Yinghai Lu @ 2015-03-18  7:28 UTC (permalink / raw)
  To: Matt Fleming, H. Peter Anvin, Ingo Molnar, Jiri Kosina,
	Kees Cook, Borislav Petkov, Baoquan He
  Cc: Thomas Gleixner, linux-kernel, linux-efi

From: Baoquan He <bhe@redhat.com>

In kaslr implementation mechanism, mainly process_e820_entry and
slots_fetch_random do the job. process_e820_entry is responsible
for storing the slot information. slots_fetch_random takes care
of fetching slot information. In this patch, for adding support
of kernel physical address randomization above 4G, both of these
two functions are changed based on the new slot_area data structure.

Now kernel can be reloaded and decompressed anywhere of the whole
physical memory, even near 64T at most.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 arch/x86/boot/compressed/aslr.c | 68 ++++++++++++++++++++++++++++++-----------
 1 file changed, 51 insertions(+), 17 deletions(-)

diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index ae0aed9..2f60f41 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -296,27 +296,40 @@ static void slots_append(unsigned long addr)
 
 static unsigned long slots_fetch_random(void)
 {
+	unsigned long random;
+	int i;
+
 	/* Handle case of no slots stored. */
 	if (slot_max == 0)
 		return 0;
 
-	return slots[get_random_long() % slot_max];
+	random = get_random_long() % slot_max;
+
+	for (i = 0; i < slot_area_index; i++) {
+		if (random >= slot_areas[i].num) {
+			random -= slot_areas[i].num;
+			continue;
+		}
+		return slot_areas[i].addr + random * CONFIG_PHYSICAL_ALIGN;
+	}
+
+	if (i == slot_area_index)
+		debug_putstr("Something wrong happened in slots_fetch_random()...\n");
+	return 0;
 }
 
 static void process_e820_entry(struct e820entry *entry,
 			       unsigned long minimum,
 			       unsigned long image_size)
 {
-	struct mem_vector region, img;
+	struct mem_vector region, out;
+	struct slot_area slot_area;
+	unsigned long min, start_orig;
 
 	/* Skip non-RAM entries. */
 	if (entry->type != E820_RAM)
 		return;
 
-	/* Ignore entries entirely above our maximum. */
-	if (entry->addr >= CONFIG_RANDOMIZE_BASE_MAX_OFFSET)
-		return;
-
 	/* Ignore entries entirely below our minimum. */
 	if (entry->addr + entry->size < minimum)
 		return;
@@ -324,10 +337,17 @@ static void process_e820_entry(struct e820entry *entry,
 	region.start = entry->addr;
 	region.size = entry->size;
 
+repeat:
+	start_orig = region.start;
+
 	/* Potentially raise address to minimum location. */
 	if (region.start < minimum)
 		region.start = minimum;
 
+	/* Return if slot area array is full */
+	if (slot_area_index == MAX_SLOT_AREA)
+		return;
+
 	/* Potentially raise address to meet alignment requirements. */
 	region.start = ALIGN(region.start, CONFIG_PHYSICAL_ALIGN);
 
@@ -336,20 +356,30 @@ static void process_e820_entry(struct e820entry *entry,
 		return;
 
 	/* Reduce size by any delta from the original address. */
-	region.size -= region.start - entry->addr;
+	region.size -= region.start - start_orig;
 
-	/* Reduce maximum size to fit end of image within maximum limit. */
-	if (region.start + region.size > CONFIG_RANDOMIZE_BASE_MAX_OFFSET)
-		region.size = CONFIG_RANDOMIZE_BASE_MAX_OFFSET - region.start;
+	/* Return if region can't contain decompressed kernel */
+	if (region.size < image_size)
+		return;
 
-	/* Walk each aligned slot and check for avoided areas. */
-	for (img.start = region.start, img.size = image_size ;
-	     mem_contains(&region, &img) ;
-	     img.start += CONFIG_PHYSICAL_ALIGN) {
-		if (mem_avoid_overlap(&img))
-			continue;
-		slots_append(img.start);
+	if (!mem_avoid_overlap(&region)) {
+		store_slot_info(&region, image_size);
+		return;
 	}
+
+	min = mem_min_overlap(&region, &out);
+
+	if (min > region.start + image_size) {
+		struct mem_vector tmp;
+
+		tmp.start = region.start;
+		tmp.size = min - region.start;
+		store_slot_info(&tmp, image_size);
+	}
+
+	region.size -= out.start - region.start + out.size;
+	region.start = out.start + out.size;
+	goto repeat;
 }
 
 static unsigned long find_random_phy_addr(unsigned long minimum,
@@ -364,6 +394,10 @@ static unsigned long find_random_phy_addr(unsigned long minimum,
 	/* Verify potential e820 positions, appending to slots list. */
 	for (i = 0; i < real_mode->e820_entries; i++) {
 		process_e820_entry(&real_mode->e820_map[i], minimum, size);
+		if (slot_area_index == MAX_SLOT_AREA) {
+			debug_putstr("Stop processing e820 since slot_areas is full...\n");
+			break;
+		}
 	}
 
 	return slots_fetch_random();
-- 
1.8.4.5


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v5 18/19] x86, kaslr: Remove useless codes
@ 2015-03-18  7:28   ` Yinghai Lu
  0 siblings, 0 replies; 33+ messages in thread
From: Yinghai Lu @ 2015-03-18  7:28 UTC (permalink / raw)
  To: Matt Fleming, H. Peter Anvin, Ingo Molnar, Jiri Kosina,
	Kees Cook, Borislav Petkov, Baoquan He
  Cc: Thomas Gleixner, linux-kernel, linux-efi

From: Baoquan He <bhe@redhat.com>

Several auxiliary functions and slots[] are not needed any more since
struct slot_area is used to store the slot info of kaslr now. Hence
remove them in this patch.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 arch/x86/boot/compressed/aslr.c | 24 ------------------------
 1 file changed, 24 deletions(-)

diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index 2f60f41..e3bd2e2 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -115,17 +115,6 @@ struct mem_vector {
 #define MEM_AVOID_MAX 4
 static struct mem_vector mem_avoid[MEM_AVOID_MAX];
 
-static bool mem_contains(struct mem_vector *region, struct mem_vector *item)
-{
-	/* Item at least partially before region. */
-	if (item->start < region->start)
-		return false;
-	/* Item at least partially after region. */
-	if (item->start + item->size > region->start + region->size)
-		return false;
-	return true;
-}
-
 static bool mem_overlaps(struct mem_vector *one, struct mem_vector *two)
 {
 	/* Item one is entirely before item two. */
@@ -251,9 +240,6 @@ mem_min_overlap(struct mem_vector *img, struct mem_vector *out)
 	return min;
 }
 
-static unsigned long slots[CONFIG_RANDOMIZE_BASE_MAX_OFFSET /
-			   CONFIG_PHYSICAL_ALIGN];
-
 struct slot_area {
 	unsigned long addr;
 	int num;
@@ -284,16 +270,6 @@ static void store_slot_info(struct mem_vector *region, unsigned long image_size)
 	}
 }
 
-static void slots_append(unsigned long addr)
-{
-	/* Overflowing the slots list should be impossible. */
-	if (slot_max >= CONFIG_RANDOMIZE_BASE_MAX_OFFSET /
-			CONFIG_PHYSICAL_ALIGN)
-		return;
-
-	slots[slot_max++] = addr;
-}
-
 static unsigned long slots_fetch_random(void)
 {
 	unsigned long random;
-- 
1.8.4.5


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v5 18/19] x86, kaslr: Remove useless codes
@ 2015-03-18  7:28   ` Yinghai Lu
  0 siblings, 0 replies; 33+ messages in thread
From: Yinghai Lu @ 2015-03-18  7:28 UTC (permalink / raw)
  To: Matt Fleming, H. Peter Anvin, Ingo Molnar, Jiri Kosina,
	Kees Cook, Borislav Petkov, Baoquan He
  Cc: Thomas Gleixner, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-efi-u79uwXL29TY76Z2rM5mHXA

From: Baoquan He <bhe-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

Several auxiliary functions and slots[] are not needed any more since
struct slot_area is used to store the slot info of kaslr now. Hence
remove them in this patch.

Signed-off-by: Baoquan He <bhe-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 arch/x86/boot/compressed/aslr.c | 24 ------------------------
 1 file changed, 24 deletions(-)

diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index 2f60f41..e3bd2e2 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -115,17 +115,6 @@ struct mem_vector {
 #define MEM_AVOID_MAX 4
 static struct mem_vector mem_avoid[MEM_AVOID_MAX];
 
-static bool mem_contains(struct mem_vector *region, struct mem_vector *item)
-{
-	/* Item at least partially before region. */
-	if (item->start < region->start)
-		return false;
-	/* Item at least partially after region. */
-	if (item->start + item->size > region->start + region->size)
-		return false;
-	return true;
-}
-
 static bool mem_overlaps(struct mem_vector *one, struct mem_vector *two)
 {
 	/* Item one is entirely before item two. */
@@ -251,9 +240,6 @@ mem_min_overlap(struct mem_vector *img, struct mem_vector *out)
 	return min;
 }
 
-static unsigned long slots[CONFIG_RANDOMIZE_BASE_MAX_OFFSET /
-			   CONFIG_PHYSICAL_ALIGN];
-
 struct slot_area {
 	unsigned long addr;
 	int num;
@@ -284,16 +270,6 @@ static void store_slot_info(struct mem_vector *region, unsigned long image_size)
 	}
 }
 
-static void slots_append(unsigned long addr)
-{
-	/* Overflowing the slots list should be impossible. */
-	if (slot_max >= CONFIG_RANDOMIZE_BASE_MAX_OFFSET /
-			CONFIG_PHYSICAL_ALIGN)
-		return;
-
-	slots[slot_max++] = addr;
-}
-
 static unsigned long slots_fetch_random(void)
 {
 	unsigned long random;
-- 
1.8.4.5

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v5 19/19] x86, kaslr: Allow random address could be below loaded address
  2015-03-18  7:28 ` Yinghai Lu
                   ` (18 preceding siblings ...)
  (?)
@ 2015-03-18  7:28 ` Yinghai Lu
  -1 siblings, 0 replies; 33+ messages in thread
From: Yinghai Lu @ 2015-03-18  7:28 UTC (permalink / raw)
  To: Matt Fleming, H. Peter Anvin, Ingo Molnar, Jiri Kosina,
	Kees Cook, Borislav Petkov, Baoquan He
  Cc: Thomas Gleixner, linux-kernel, linux-efi, Yinghai Lu

Now new output buffer is always after current one.

With correct tracking in mem_avoid, we can buffer below that.

That would make sure when bootloader like patched grub2 or kexec
have put output rather near the end of ram, we still can get
random base below output.

Now just pick 512M as min_addr.

with this patch, will get:

early console in decompress_kernel
decompress_kernel:
  input: [0x13e9ee3b4-0x13f36b9df], output: [0x13c000000-0x13f394fff], heap: [0x13f376ac0-0x13f37eabf]
boot via startup_64
KASLR using RDTSC...
KASLR using RDTSC...
                     new output: [0x6f000000-0x72394fff]

Decompressing Linux... xz... Parsing ELF... Performing relocations... done.
Booting the kernel.
[    0.000000] bootconsole [uart0] enabled
[    0.000000] Kernel Layout:
[    0.000000]   .text: [0x6f000000-0x70096a9c]
[    0.000000] .rodata: [0x70200000-0x70a4efff]
[    0.000000]   .data: [0x70c00000-0x70e4e9bf]
[    0.000000]   .init: [0x70e50000-0x7120bfff]
[    0.000000]    .bss: [0x71219000-0x7234efff]
[    0.000000]    .brk: [0x7234f000-0x72374fff]

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/boot/compressed/aslr.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index e3bd2e2..35da03c 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -426,7 +426,8 @@ void choose_kernel_location(unsigned char *input,
 				unsigned long output_run_size,
 				unsigned char **virt_offset)
 {
-	unsigned long random;
+	unsigned long random, min_addr;
+
 	*virt_offset = (unsigned char *)LOAD_PHYSICAL_ADDR;
 
 #ifdef CONFIG_HIBERNATION
@@ -448,8 +449,13 @@ void choose_kernel_location(unsigned char *input,
 	mem_avoid_init((unsigned long)input, input_size,
 		       (unsigned long)*output);
 
+	/* start from 512M */
+	min_addr = (unsigned long)*output;
+	if (min_addr > (512UL<<20))
+		min_addr = 512UL<<20;
+
 	/* Walk e820 and find a random address. */
-	random = find_random_phy_addr((unsigned long)*output, output_run_size);
+	random = find_random_phy_addr(min_addr, output_run_size);
 	if (!random)
 		debug_putstr("KASLR could not find suitable E820 region...\n");
 	else {
-- 
1.8.4.5


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v5 03/19] x86, boot: Simplify run_size calculation
@ 2015-03-23  3:25     ` Baoquan He
  0 siblings, 0 replies; 33+ messages in thread
From: Baoquan He @ 2015-03-23  3:25 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Matt Fleming, H. Peter Anvin, Ingo Molnar, Jiri Kosina,
	Kees Cook, Borislav Petkov, Thomas Gleixner, linux-kernel,
	linux-efi, Junjie Mao, Josh Triplett, Andrew Morton

On 03/18/15 at 12:28am, Yinghai Lu wrote:
> While looking at the boot code to add mem mapping for kasl
> with 64bit above 4G support, I found that e6023367d779 ("x86, kaslr: Prevent
> .bss from overlaping initrd") and later introduced way to get kernel run_size
> and pass it around.  First via perl and then change to shell scripts.
> 
> That is not necessary. As that run_size is simple constant, we don't
> need to pass it around and we already have voffset.h for that.
> 
> We can share voffset.h between misc.c and header.S instead
> of adding other way to get run_size.
> 
> This patch:
> Move voffset.h creation code to boot/compressed/Makefile.
> 
> Dependence was:
> boot/header.S ==> boot/voffset.h ==> vmlinux
> boot/header.S ==> compressed/vmlinux ==> compressed/misc.c
> Now become:
> boot/header.S ==> compressed/vmlinux ==> compressed/misc.c ==> boot/voffset.h ==> vmlinux
> 
> Use macro in misc.c to replace passed run_size.
> 
> Fixes: e6023367d779 ("x86, kaslr: Prevent .bss from overlaping initrd")
> Cc: Junjie Mao <eternal.n08@gmail.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Josh Triplett <josh@joshtriplett.org>
> Cc: Matt Fleming <matt.fleming@intel.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> ---
>  arch/x86/boot/Makefile            | 11 +----------
>  arch/x86/boot/compressed/Makefile | 12 ++++++++++++
>  arch/x86/boot/compressed/misc.c   |  3 +++
>  3 files changed, 16 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
> index 863ef25..e7ee9cd 100644
> --- a/arch/x86/boot/Makefile
> +++ b/arch/x86/boot/Makefile
> @@ -77,15 +77,6 @@ $(obj)/vmlinux.bin: $(obj)/compressed/vmlinux FORCE
>  
>  SETUP_OBJS = $(addprefix $(obj)/,$(setup-y))
>  
> -sed-voffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(_text\|_end\)$$/\#define VO_\2 0x\1/p'
> -
> -quiet_cmd_voffset = VOFFSET $@
> -      cmd_voffset = $(NM) $< | sed -n $(sed-voffset) > $@
> -
> -targets += voffset.h
> -$(obj)/voffset.h: vmlinux FORCE
> -	$(call if_changed,voffset)
> -
>  sed-zoffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(startup_32\|startup_64\|efi32_stub_entry\|efi64_stub_entry\|efi_pe_entry\|input_data\|_end\|_rodata\|z_.*\)$$/\#define ZO_\2 0x\1/p'
>  
>  quiet_cmd_zoffset = ZOFFSET $@
> @@ -97,7 +88,7 @@ $(obj)/zoffset.h: $(obj)/compressed/vmlinux FORCE
>  
>  
>  AFLAGS_header.o += -I$(obj)
> -$(obj)/header.o: $(obj)/voffset.h $(obj)/zoffset.h
> +$(obj)/header.o: $(obj)/zoffset.h
>  
>  LDFLAGS_setup.elf	:= -T
>  $(obj)/setup.elf: $(src)/setup.ld $(SETUP_OBJS) FORCE
> diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
> index 0a291cd..d9fee82 100644
> --- a/arch/x86/boot/compressed/Makefile
> +++ b/arch/x86/boot/compressed/Makefile
> @@ -40,6 +40,18 @@ LDFLAGS_vmlinux := -T
>  hostprogs-y	:= mkpiggy
>  HOST_EXTRACFLAGS += -I$(srctree)/tools/include
>  
> +sed-voffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(_text\|_end\)$$/\#define VO_\2 _AC(0x\1,UL)/p'
> +
> +quiet_cmd_voffset = VOFFSET $@
> +      cmd_voffset = $(NM) $< | sed -n $(sed-voffset) > $@
> +
> +targets += ../voffset.h
> +
> +$(obj)/../voffset.h: vmlinux FORCE
> +	$(call if_changed,voffset)
> +
> +$(obj)/misc.o: $(obj)/../voffset.h
> +
>  vmlinux-objs-y := $(obj)/vmlinux.lds $(obj)/head_$(BITS).o $(obj)/misc.o \
>  	$(obj)/string.o $(obj)/cmdline.o \
>  	$(obj)/piggy.o $(obj)/cpuflags.o
> diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
> index a950864..4785c23 100644
> --- a/arch/x86/boot/compressed/misc.c
> +++ b/arch/x86/boot/compressed/misc.c
> @@ -11,6 +11,7 @@
>  
>  #include "misc.h"
>  #include "../string.h"
> +#include "../voffset.h"
>  
>  /* WARNING!!
>   * This code is compiled with -fPIC and it is relocated dynamically
> @@ -390,6 +391,8 @@ asmlinkage __visible void *decompress_kernel(void *rmode, memptr heap,
>  	lines = real_mode->screen_info.orig_video_lines;
>  	cols = real_mode->screen_info.orig_video_cols;
>  
> +	run_size = VO__end - VO__text;

Hi Yinghai,

This may not be correct. In commit e602336
runsize = offset(.bss) + size(.bss) + size(.brk), why this formula comes
out can be checked from discussion between Kees and Junjie:
https://lkml.org/lkml/2014/10/30/612

And in my one kernel build the related values are:
-) objdump -h vmlinux
vmlinux:     file format elf64-x86-64

Sections:
Idx Name          Size      VMA               LMA               File off
Algn
 27 .bss          00167000  ffffffff81e92000  0000000001e92000  01292000
2**12
                  ALLOC
 28 .brk          00027000  ffffffff81ff9000  0000000001ff9000  01292000
2**0
                  ALLOC

run_size on old calculation is
0x01292000+0x00167000+0x00027000=0x1420000

-) nm vmlinux 
ffffffff81000000 T _text
ffffffff82020000 B _end

run_size on your method is 0x82020000 - 0x81000000 = 0x1020000

So if output_len which is the length of vmlinux.bin + vmlinux.relocs is
between the old run_size and your new run_size, the problem Junjie tried
to fix will happen again.


Thanks
Baoquan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v5 03/19] x86, boot: Simplify run_size calculation
@ 2015-03-23  3:25     ` Baoquan He
  0 siblings, 0 replies; 33+ messages in thread
From: Baoquan He @ 2015-03-23  3:25 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Matt Fleming, H. Peter Anvin, Ingo Molnar, Jiri Kosina,
	Kees Cook, Borislav Petkov, Thomas Gleixner,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-efi-u79uwXL29TY76Z2rM5mHXA, Junjie Mao, Josh Triplett,
	Andrew Morton

On 03/18/15 at 12:28am, Yinghai Lu wrote:
> While looking at the boot code to add mem mapping for kasl
> with 64bit above 4G support, I found that e6023367d779 ("x86, kaslr: Prevent
> .bss from overlaping initrd") and later introduced way to get kernel run_size
> and pass it around.  First via perl and then change to shell scripts.
> 
> That is not necessary. As that run_size is simple constant, we don't
> need to pass it around and we already have voffset.h for that.
> 
> We can share voffset.h between misc.c and header.S instead
> of adding other way to get run_size.
> 
> This patch:
> Move voffset.h creation code to boot/compressed/Makefile.
> 
> Dependence was:
> boot/header.S ==> boot/voffset.h ==> vmlinux
> boot/header.S ==> compressed/vmlinux ==> compressed/misc.c
> Now become:
> boot/header.S ==> compressed/vmlinux ==> compressed/misc.c ==> boot/voffset.h ==> vmlinux
> 
> Use macro in misc.c to replace passed run_size.
> 
> Fixes: e6023367d779 ("x86, kaslr: Prevent .bss from overlaping initrd")
> Cc: Junjie Mao <eternal.n08-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> Cc: Kees Cook <keescook-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
> Cc: Josh Triplett <josh-iaAMLnmF4UmaiuxdJuQwMA@public.gmane.org>
> Cc: Matt Fleming <matt.fleming-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> Cc: Andrew Morton <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
> Signed-off-by: Yinghai Lu <yinghai-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> ---
>  arch/x86/boot/Makefile            | 11 +----------
>  arch/x86/boot/compressed/Makefile | 12 ++++++++++++
>  arch/x86/boot/compressed/misc.c   |  3 +++
>  3 files changed, 16 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
> index 863ef25..e7ee9cd 100644
> --- a/arch/x86/boot/Makefile
> +++ b/arch/x86/boot/Makefile
> @@ -77,15 +77,6 @@ $(obj)/vmlinux.bin: $(obj)/compressed/vmlinux FORCE
>  
>  SETUP_OBJS = $(addprefix $(obj)/,$(setup-y))
>  
> -sed-voffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(_text\|_end\)$$/\#define VO_\2 0x\1/p'
> -
> -quiet_cmd_voffset = VOFFSET $@
> -      cmd_voffset = $(NM) $< | sed -n $(sed-voffset) > $@
> -
> -targets += voffset.h
> -$(obj)/voffset.h: vmlinux FORCE
> -	$(call if_changed,voffset)
> -
>  sed-zoffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(startup_32\|startup_64\|efi32_stub_entry\|efi64_stub_entry\|efi_pe_entry\|input_data\|_end\|_rodata\|z_.*\)$$/\#define ZO_\2 0x\1/p'
>  
>  quiet_cmd_zoffset = ZOFFSET $@
> @@ -97,7 +88,7 @@ $(obj)/zoffset.h: $(obj)/compressed/vmlinux FORCE
>  
>  
>  AFLAGS_header.o += -I$(obj)
> -$(obj)/header.o: $(obj)/voffset.h $(obj)/zoffset.h
> +$(obj)/header.o: $(obj)/zoffset.h
>  
>  LDFLAGS_setup.elf	:= -T
>  $(obj)/setup.elf: $(src)/setup.ld $(SETUP_OBJS) FORCE
> diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
> index 0a291cd..d9fee82 100644
> --- a/arch/x86/boot/compressed/Makefile
> +++ b/arch/x86/boot/compressed/Makefile
> @@ -40,6 +40,18 @@ LDFLAGS_vmlinux := -T
>  hostprogs-y	:= mkpiggy
>  HOST_EXTRACFLAGS += -I$(srctree)/tools/include
>  
> +sed-voffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(_text\|_end\)$$/\#define VO_\2 _AC(0x\1,UL)/p'
> +
> +quiet_cmd_voffset = VOFFSET $@
> +      cmd_voffset = $(NM) $< | sed -n $(sed-voffset) > $@
> +
> +targets += ../voffset.h
> +
> +$(obj)/../voffset.h: vmlinux FORCE
> +	$(call if_changed,voffset)
> +
> +$(obj)/misc.o: $(obj)/../voffset.h
> +
>  vmlinux-objs-y := $(obj)/vmlinux.lds $(obj)/head_$(BITS).o $(obj)/misc.o \
>  	$(obj)/string.o $(obj)/cmdline.o \
>  	$(obj)/piggy.o $(obj)/cpuflags.o
> diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
> index a950864..4785c23 100644
> --- a/arch/x86/boot/compressed/misc.c
> +++ b/arch/x86/boot/compressed/misc.c
> @@ -11,6 +11,7 @@
>  
>  #include "misc.h"
>  #include "../string.h"
> +#include "../voffset.h"
>  
>  /* WARNING!!
>   * This code is compiled with -fPIC and it is relocated dynamically
> @@ -390,6 +391,8 @@ asmlinkage __visible void *decompress_kernel(void *rmode, memptr heap,
>  	lines = real_mode->screen_info.orig_video_lines;
>  	cols = real_mode->screen_info.orig_video_cols;
>  
> +	run_size = VO__end - VO__text;

Hi Yinghai,

This may not be correct. In commit e602336
runsize = offset(.bss) + size(.bss) + size(.brk), why this formula comes
out can be checked from discussion between Kees and Junjie:
https://lkml.org/lkml/2014/10/30/612

And in my one kernel build the related values are:
-) objdump -h vmlinux
vmlinux:     file format elf64-x86-64

Sections:
Idx Name          Size      VMA               LMA               File off
Algn
 27 .bss          00167000  ffffffff81e92000  0000000001e92000  01292000
2**12
                  ALLOC
 28 .brk          00027000  ffffffff81ff9000  0000000001ff9000  01292000
2**0
                  ALLOC

run_size on old calculation is
0x01292000+0x00167000+0x00027000=0x1420000

-) nm vmlinux 
ffffffff81000000 T _text
ffffffff82020000 B _end

run_size on your method is 0x82020000 - 0x81000000 = 0x1020000

So if output_len which is the length of vmlinux.bin + vmlinux.relocs is
between the old run_size and your new run_size, the problem Junjie tried
to fix will happen again.


Thanks
Baoquan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v5 03/19] x86, boot: Simplify run_size calculation
@ 2015-03-23  7:12       ` Yinghai Lu
  0 siblings, 0 replies; 33+ messages in thread
From: Yinghai Lu @ 2015-03-23  7:12 UTC (permalink / raw)
  To: Baoquan He
  Cc: Matt Fleming, H. Peter Anvin, Ingo Molnar, Jiri Kosina,
	Kees Cook, Borislav Petkov, Thomas Gleixner,
	Linux Kernel Mailing List, linux-efi, Junjie Mao, Josh Triplett,
	Andrew Morton

On Sun, Mar 22, 2015 at 8:25 PM, Baoquan He <bhe@redhat.com> wrote:
>
> This may not be correct. In commit e602336
> runsize = offset(.bss) + size(.bss) + size(.brk), why this formula comes
> out can be checked from discussion between Kees and Junjie:
> https://lkml.org/lkml/2014/10/30/612
>
> And in my one kernel build the related values are:
> -) objdump -h vmlinux
> vmlinux:     file format elf64-x86-64
>
> Sections:
> Idx Name          Size      VMA               LMA               File off
> Algn
>  27 .bss          00167000  ffffffff81e92000  0000000001e92000  01292000
> 2**12
>                   ALLOC
>  28 .brk          00027000  ffffffff81ff9000  0000000001ff9000  01292000
> 2**0
>                   ALLOC
>
> run_size on old calculation is
> 0x01292000+0x00167000+0x00027000=0x1420000
>
> -) nm vmlinux
> ffffffff81000000 T _text
> ffffffff82020000 B _end
>
> run_size on your method is 0x82020000 - 0x81000000 = 0x1020000
>
> So if output_len which is the length of vmlinux.bin + vmlinux.relocs is
> between the old run_size and your new run_size, the problem Junjie tried
> to fix will happen again.

no. We should not use file offset in the elf for run_size.
when the section can not fill 2M, use file offset will get not needed
extra size.

parse_elf will move sections forward according to program headers.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v5 03/19] x86, boot: Simplify run_size calculation
@ 2015-03-23  7:12       ` Yinghai Lu
  0 siblings, 0 replies; 33+ messages in thread
From: Yinghai Lu @ 2015-03-23  7:12 UTC (permalink / raw)
  To: Baoquan He
  Cc: Matt Fleming, H. Peter Anvin, Ingo Molnar, Jiri Kosina,
	Kees Cook, Borislav Petkov, Thomas Gleixner,
	Linux Kernel Mailing List, linux-efi-u79uwXL29TY76Z2rM5mHXA,
	Junjie Mao, Josh Triplett, Andrew Morton

On Sun, Mar 22, 2015 at 8:25 PM, Baoquan He <bhe-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
>
> This may not be correct. In commit e602336
> runsize = offset(.bss) + size(.bss) + size(.brk), why this formula comes
> out can be checked from discussion between Kees and Junjie:
> https://lkml.org/lkml/2014/10/30/612
>
> And in my one kernel build the related values are:
> -) objdump -h vmlinux
> vmlinux:     file format elf64-x86-64
>
> Sections:
> Idx Name          Size      VMA               LMA               File off
> Algn
>  27 .bss          00167000  ffffffff81e92000  0000000001e92000  01292000
> 2**12
>                   ALLOC
>  28 .brk          00027000  ffffffff81ff9000  0000000001ff9000  01292000
> 2**0
>                   ALLOC
>
> run_size on old calculation is
> 0x01292000+0x00167000+0x00027000=0x1420000
>
> -) nm vmlinux
> ffffffff81000000 T _text
> ffffffff82020000 B _end
>
> run_size on your method is 0x82020000 - 0x81000000 = 0x1020000
>
> So if output_len which is the length of vmlinux.bin + vmlinux.relocs is
> between the old run_size and your new run_size, the problem Junjie tried
> to fix will happen again.

no. We should not use file offset in the elf for run_size.
when the section can not fill 2M, use file offset will get not needed
extra size.

parse_elf will move sections forward according to program headers.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v5 00/19] x86, boot: kaslr cleanup and 64bit kaslr support
@ 2015-04-05  1:25   ` Baoquan He
  0 siblings, 0 replies; 33+ messages in thread
From: Baoquan He @ 2015-04-05  1:25 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Matt Fleming, H. Peter Anvin, Ingo Molnar, Jiri Kosina,
	Kees Cook, Borislav Petkov, Thomas Gleixner, linux-kernel,
	linux-efi

Hi Yinghai,

Seems this patchset contains much content at one time so that it's not
very convenient to understand and review. Could it be made by 2 or 3
steps? like

Firstly post a patchset to handle kaslr putting kernel above 4G. This
involves many lines of code change but its concept is simple. And code
change can be understood and reviewed very easily.

Secondly a patchset to clean up the VO/ZO/runsize issue. This involves
less codes but very complicated and a good description is necessary.

At last, based on the 2nd change handle the mem_avoid issue and furthur
clean up issues, then based on them kaslr can random to below the loaded
address.

Otherwise this patchset got too much fix. Reviewers need spend much time
to understand and review. And also not easy to explain each of them and
connections between them.

What do you think?

Thanks
Baoquan

On 03/18/15 at 12:28am, Yinghai Lu wrote:
> First make ZO (arch/x86/boot/compressed/vmlinux) data region is not
> overwritten by VO (vmlinux) after decompress.  So could pass data from ZO to VO.
> 
> Second one is second try for kaslr_setup_data support.
> 
> Patch 3-11, are kaslr clean up and enable ident mapping for He's patches.
>   kill run_size calculation shell scripts.
>   create new ident mapping for kasl 64bit, so we can cover
>    above 4G random kernel base, also don't need to track pagetable
>    for 64bit bootloader (patched grub2 or kexec).
>    that will make mem_avoid handling simple.
> 
> Also put 7 patches from He that support random random, as I already used
> his patches to test the ident mapping code, and could save some rebase
> work for him.
> 
> also at:
> git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git for-x86-4.0-rc5-aslr
> 
> Thanks
> 
> Yinghai
> 
> 
> Baoquan He (7):
>   x86, kaslr: Fix a bug that relocation can not be handled when kernel is loaded above 2G
>   x86, kaslr: Introduce struct slot_area to manage randomization slot info
>   x86, kaslr: Add two functions which will be used later
>   x86, kaslr: Introduce fetch_random_virt_offset to randomize the kernel text mapping address
>   x86, kaslr: Randomize physical and virtual address of kernel separately
>   x86, kaslr: Add support of kernel physical address randomization above 4G
>   x86, kaslr: Remove useless codes
> 
> Jiri Kosina (1):
>   x86, kaslr: Propagate base load address calculation v2
> 
> Yinghai Lu (11):
>   x86, boot: Make data from decompress_kernel stage live longer
>   x86, boot: Simplify run_size calculation
>   x86, kaslr: Kill not used run_size related code.
>   x86, kaslr: Use output_run_size
>   x86, kaslr: Consolidate mem_avoid array filling
>   x86, boot: Move z_extract_offset calculation to header.S
>   x86, kaslr: Get correct max_addr for relocs pointer
>   x86, boot: Split kernel_ident_mapping_init to another file
>   x86, 64bit: Set ident_mapping for kaslr
>   x86, boot: Add checking for memcpy
>   x86, kaslr: Allow random address could be below loaded address
> 
>  arch/x86/boot/Makefile                 |  13 +-
>  arch/x86/boot/compressed/Makefile      |  19 ++-
>  arch/x86/boot/compressed/aslr.c        | 281 ++++++++++++++++++++++++---------
>  arch/x86/boot/compressed/head_32.S     |  14 +-
>  arch/x86/boot/compressed/head_64.S     |  15 +-
>  arch/x86/boot/compressed/misc.c        |  71 +++++----
>  arch/x86/boot/compressed/misc.h        |  32 ++--
>  arch/x86/boot/compressed/misc_pgt.c    |  91 +++++++++++
>  arch/x86/boot/compressed/mkpiggy.c     |  28 +---
>  arch/x86/boot/compressed/string.c      |  28 +++-
>  arch/x86/boot/compressed/vmlinux.lds.S |   2 +
>  arch/x86/boot/header.S                 |  43 ++++-
>  arch/x86/include/asm/aslr.h            |  10 ++
>  arch/x86/include/asm/boot.h            |  19 +++
>  arch/x86/include/asm/page.h            |   5 +
>  arch/x86/include/asm/page_types.h      |   2 +
>  arch/x86/include/uapi/asm/bootparam.h  |   1 +
>  arch/x86/kernel/asm-offsets.c          |   1 +
>  arch/x86/kernel/module.c               |  10 +-
>  arch/x86/kernel/setup.c                |  27 +++-
>  arch/x86/kernel/vmlinux.lds.S          |   1 +
>  arch/x86/mm/ident_map.c                |  74 +++++++++
>  arch/x86/mm/init_64.c                  |  74 +--------
>  arch/x86/tools/calc_run_size.sh        |  42 -----
>  24 files changed, 610 insertions(+), 293 deletions(-)
>  create mode 100644 arch/x86/boot/compressed/misc_pgt.c
>  create mode 100644 arch/x86/include/asm/aslr.h
>  create mode 100644 arch/x86/mm/ident_map.c
>  delete mode 100644 arch/x86/tools/calc_run_size.sh
> 
> -- 
> 1.8.4.5
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v5 00/19] x86, boot: kaslr cleanup and 64bit kaslr support
@ 2015-04-05  1:25   ` Baoquan He
  0 siblings, 0 replies; 33+ messages in thread
From: Baoquan He @ 2015-04-05  1:25 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Matt Fleming, H. Peter Anvin, Ingo Molnar, Jiri Kosina,
	Kees Cook, Borislav Petkov, Thomas Gleixner,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-efi-u79uwXL29TY76Z2rM5mHXA

Hi Yinghai,

Seems this patchset contains much content at one time so that it's not
very convenient to understand and review. Could it be made by 2 or 3
steps? like

Firstly post a patchset to handle kaslr putting kernel above 4G. This
involves many lines of code change but its concept is simple. And code
change can be understood and reviewed very easily.

Secondly a patchset to clean up the VO/ZO/runsize issue. This involves
less codes but very complicated and a good description is necessary.

At last, based on the 2nd change handle the mem_avoid issue and furthur
clean up issues, then based on them kaslr can random to below the loaded
address.

Otherwise this patchset got too much fix. Reviewers need spend much time
to understand and review. And also not easy to explain each of them and
connections between them.

What do you think?

Thanks
Baoquan

On 03/18/15 at 12:28am, Yinghai Lu wrote:
> First make ZO (arch/x86/boot/compressed/vmlinux) data region is not
> overwritten by VO (vmlinux) after decompress.  So could pass data from ZO to VO.
> 
> Second one is second try for kaslr_setup_data support.
> 
> Patch 3-11, are kaslr clean up and enable ident mapping for He's patches.
>   kill run_size calculation shell scripts.
>   create new ident mapping for kasl 64bit, so we can cover
>    above 4G random kernel base, also don't need to track pagetable
>    for 64bit bootloader (patched grub2 or kexec).
>    that will make mem_avoid handling simple.
> 
> Also put 7 patches from He that support random random, as I already used
> his patches to test the ident mapping code, and could save some rebase
> work for him.
> 
> also at:
> git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git for-x86-4.0-rc5-aslr
> 
> Thanks
> 
> Yinghai
> 
> 
> Baoquan He (7):
>   x86, kaslr: Fix a bug that relocation can not be handled when kernel is loaded above 2G
>   x86, kaslr: Introduce struct slot_area to manage randomization slot info
>   x86, kaslr: Add two functions which will be used later
>   x86, kaslr: Introduce fetch_random_virt_offset to randomize the kernel text mapping address
>   x86, kaslr: Randomize physical and virtual address of kernel separately
>   x86, kaslr: Add support of kernel physical address randomization above 4G
>   x86, kaslr: Remove useless codes
> 
> Jiri Kosina (1):
>   x86, kaslr: Propagate base load address calculation v2
> 
> Yinghai Lu (11):
>   x86, boot: Make data from decompress_kernel stage live longer
>   x86, boot: Simplify run_size calculation
>   x86, kaslr: Kill not used run_size related code.
>   x86, kaslr: Use output_run_size
>   x86, kaslr: Consolidate mem_avoid array filling
>   x86, boot: Move z_extract_offset calculation to header.S
>   x86, kaslr: Get correct max_addr for relocs pointer
>   x86, boot: Split kernel_ident_mapping_init to another file
>   x86, 64bit: Set ident_mapping for kaslr
>   x86, boot: Add checking for memcpy
>   x86, kaslr: Allow random address could be below loaded address
> 
>  arch/x86/boot/Makefile                 |  13 +-
>  arch/x86/boot/compressed/Makefile      |  19 ++-
>  arch/x86/boot/compressed/aslr.c        | 281 ++++++++++++++++++++++++---------
>  arch/x86/boot/compressed/head_32.S     |  14 +-
>  arch/x86/boot/compressed/head_64.S     |  15 +-
>  arch/x86/boot/compressed/misc.c        |  71 +++++----
>  arch/x86/boot/compressed/misc.h        |  32 ++--
>  arch/x86/boot/compressed/misc_pgt.c    |  91 +++++++++++
>  arch/x86/boot/compressed/mkpiggy.c     |  28 +---
>  arch/x86/boot/compressed/string.c      |  28 +++-
>  arch/x86/boot/compressed/vmlinux.lds.S |   2 +
>  arch/x86/boot/header.S                 |  43 ++++-
>  arch/x86/include/asm/aslr.h            |  10 ++
>  arch/x86/include/asm/boot.h            |  19 +++
>  arch/x86/include/asm/page.h            |   5 +
>  arch/x86/include/asm/page_types.h      |   2 +
>  arch/x86/include/uapi/asm/bootparam.h  |   1 +
>  arch/x86/kernel/asm-offsets.c          |   1 +
>  arch/x86/kernel/module.c               |  10 +-
>  arch/x86/kernel/setup.c                |  27 +++-
>  arch/x86/kernel/vmlinux.lds.S          |   1 +
>  arch/x86/mm/ident_map.c                |  74 +++++++++
>  arch/x86/mm/init_64.c                  |  74 +--------
>  arch/x86/tools/calc_run_size.sh        |  42 -----
>  24 files changed, 610 insertions(+), 293 deletions(-)
>  create mode 100644 arch/x86/boot/compressed/misc_pgt.c
>  create mode 100644 arch/x86/include/asm/aslr.h
>  create mode 100644 arch/x86/mm/ident_map.c
>  delete mode 100644 arch/x86/tools/calc_run_size.sh
> 
> -- 
> 1.8.4.5
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2015-04-05  1:25 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-18  7:28 [PATCH v5 00/19] x86, boot: kaslr cleanup and 64bit kaslr support Yinghai Lu
2015-03-18  7:28 ` Yinghai Lu
2015-03-18  7:28 ` [PATCH v5 01/19] x86, boot: Make data from decompress_kernel stage live longer Yinghai Lu
2015-03-18  7:28 ` [PATCH v5 02/19] x86, kaslr: Propagate base load address calculation v2 Yinghai Lu
2015-03-18  7:28 ` [PATCH v5 03/19] x86, boot: Simplify run_size calculation Yinghai Lu
2015-03-23  3:25   ` Baoquan He
2015-03-23  3:25     ` Baoquan He
2015-03-23  7:12     ` Yinghai Lu
2015-03-23  7:12       ` Yinghai Lu
2015-03-18  7:28 ` [PATCH v5 04/19] x86, kaslr: Kill not used run_size related code Yinghai Lu
2015-03-18  7:28 ` [PATCH v5 05/19] x86, kaslr: Use output_run_size Yinghai Lu
2015-03-18  7:28 ` [PATCH v5 06/19] x86, kaslr: Consolidate mem_avoid array filling Yinghai Lu
2015-03-18  7:28 ` [PATCH v5 07/19] x86, boot: Move z_extract_offset calculation to header.S Yinghai Lu
2015-03-18  7:28 ` [PATCH v5 08/19] x86, kaslr: Get correct max_addr for relocs pointer Yinghai Lu
2015-03-18  7:28 ` [PATCH v5 09/19] x86, boot: Split kernel_ident_mapping_init to another file Yinghai Lu
2015-03-18  7:28   ` Yinghai Lu
2015-03-18  7:28 ` [PATCH v5 10/19] x86, 64bit: Set ident_mapping for kaslr Yinghai Lu
2015-03-18  7:28 ` [PATCH v5 11/19] x86, boot: Add checking for memcpy Yinghai Lu
2015-03-18  7:28   ` Yinghai Lu
2015-03-18  7:28 ` [PATCH v5 12/19] x86, kaslr: Fix a bug that relocation can not be handled when kernel is loaded above 2G Yinghai Lu
2015-03-18  7:28 ` [PATCH v5 13/19] x86, kaslr: Introduce struct slot_area to manage randomization slot info Yinghai Lu
2015-03-18  7:28   ` Yinghai Lu
2015-03-18  7:28 ` [PATCH v5 14/19] x86, kaslr: Add two functions which will be used later Yinghai Lu
2015-03-18  7:28 ` [PATCH v5 15/19] x86, kaslr: Introduce fetch_random_virt_offset to randomize the kernel text mapping address Yinghai Lu
2015-03-18  7:28   ` Yinghai Lu
2015-03-18  7:28 ` [PATCH v5 16/19] x86, kaslr: Randomize physical and virtual address of kernel separately Yinghai Lu
2015-03-18  7:28   ` Yinghai Lu
2015-03-18  7:28 ` [PATCH v5 17/19] x86, kaslr: Add support of kernel physical address randomization above 4G Yinghai Lu
2015-03-18  7:28 ` [PATCH v5 18/19] x86, kaslr: Remove useless codes Yinghai Lu
2015-03-18  7:28   ` Yinghai Lu
2015-03-18  7:28 ` [PATCH v5 19/19] x86, kaslr: Allow random address could be below loaded address Yinghai Lu
2015-04-05  1:25 ` [PATCH v5 00/19] x86, boot: kaslr cleanup and 64bit kaslr support Baoquan He
2015-04-05  1:25   ` Baoquan He

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.