linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 00/19] x86, boot: kaslr cleanup and 64bit kaslr support
@ 2016-03-04 16:24 Baoquan He
  2016-03-04 16:24 ` [PATCH v3 01/19] x86, kaslr: Remove not needed parameter for choose_kernel_location Baoquan He
                   ` (19 more replies)
  0 siblings, 20 replies; 51+ messages in thread
From: Baoquan He @ 2016-03-04 16:24 UTC (permalink / raw)
  To: linux-kernel
  Cc: yinghai, keescook, hpa, vgoyal, mingo, bp, luto, lasse.collin,
	akpm, dyoung, Baoquan He

***Background:
Previously a bug is reported that kdump didn't work when kaslr is enabled. During
discussing that bug fix, we found current kaslr has a limilation that it can
only randomize in 1GB region.

This is because in curent kaslr implementaion only physical address of kernel
loading is randomized. Then calculate the delta of physical address where
vmlinux was linked to load and where it is finally loaded. If delta is not
equal to 0, namely there's a new physical address where kernel is actually
decompressed, relocation handling need be done. Then delta is added to offset
of kernel symbol relocation, this makes the address of kernel text mapping move
delta long. Though in principle kernel can be randomized to any physical address,
kernel text mapping address space is limited and only 1G, namely as follows on
x86_64:
	[0xffffffff80000000, 0xffffffffc0000000)

In one word, kernel text physical address and virtual address randomization is
coupled. This causes the limitation.

Then hpa and Vivek suggested we should change this. To decouple the physical
address and virtual address randomization of kernel text and let them work
separately. Then kernel text physical address can be randomized in region
[16M, 64T), and kernel text virtual address can be randomized in region
[0xffffffff80000000, 0xffffffffc0000000).

***Problems we need solved:
  - For kernel boot from startup_32 case, only 0~4G identity mapping is built.
    If kernel will be randomly put anywhere from 16M to 64T at most, the price
    to build all region of identity mapping is too high. We need build the
    identity mapping on demand, not covering all physical address space.

  - Decouple the physical address and virtual address randomization of kernel
    text and let them work separately.

***Parts:
   - The 1st part is Yinghai's identity mapping building on demand patches.
     This is used to solve the first problem mentioned above.
     (Patch 09-10/19)
   - The 2nd part is decoupling the physical address and virtual address
     randomization of kernel text and letting them work separately patches
     based on Yinghai's ident mapping patches.
     (Patch 12-19/19)
   - The 3rd part is some clean up patches which Yinghai found when he reviewed
     my patches and the related code around.
     (Patch 01-08/19)

***Patch status:
This patchset went through several rounds of review.

   - The first round can be found here:
	https://lwn.net/Articles/637115/

   - In 2nd round Yinghai made a big patchset including this kaslr fix and another
     setup_data related fix. The link is here:
      http://lists-archives.com/linux-kernel/28346903-x86-updated-patches-for-kaslr-and-setup_data-etc-for-v4-3.html
     You can get the code from Yinghai's git branch:
     git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git for-x86-v4.3-next

   - This post is the 3rd round. It only takes care of the kaslr related patches.
     For reviewers it's better to discuss only one issue in one thread.
       * I take off one patch as follows from Yinghai's because I think it's unnecessay. 
          - Patch 05/19 x86, kaslr: rename output_size to output_run_size
         output_size is enough to represen the value:
		output_len > run_size ? output_len : run_size
      
       * I add Patch 04/19, it's a comment update patch. For other patches, I just
         adjust patch log and do several places of change comparing with 2nd round.
         Please check the change log under patch log of each patch for details.

       * I adjust sequence of several patches to make review easier. It doesn't
         affect codes.

   - You can also get this patchset from my github:
     https://github.com/baoquan-he/linux.git kaslr-above-4G

Any comments and suggestions are welcome. Code changes, code comments, patch logs,
anything you think it's unclear, please add your comment.

Baoquan He (8):
  x86, kaskr: Update the description for decompressor worst case
  x86, kaslr: Fix a bug that relocation can not be handled when kernel
    is loaded above 2G
  x86, kaslr: Introduce struct slot_area to manage randomization slot
    info
  x86, kaslr: Add two functions which will be used later
  x86, kaslr: Introduce fetch_random_virt_offset to randomize the kernel
    text mapping address
  x86, kaslr: Randomize physical and virtual address of kernel
    separately
  x86, kaslr: Add support of kernel physical address randomization above
    4G
  x86, kaslr: Remove useless codes

Yinghai Lu (11):
  x86, kaslr: Remove not needed parameter for choose_kernel_location
  x86, boot: Move compressed kernel to end of buffer before
    decompressing
  x86, boot: Move z_extract_offset calculation to header.S
  x86, boot: Fix run_size calculation
  x86, kaslr: Clean up useless code related to run_size.
  x86, kaslr: Get correct max_addr for relocs pointer
  x86, kaslr: Consolidate mem_avoid array filling
  x86, boot: Split kernel_ident_mapping_init to another file
  x86, 64bit: Set ident_mapping for kaslr
  x86, boot: Add checking for memcpy
  x86, kaslr: Allow random address to be below loaded address

 arch/x86/boot/Makefile                 |  13 +-
 arch/x86/boot/compressed/Makefile      |  19 ++-
 arch/x86/boot/compressed/aslr.c        | 258 +++++++++++++++++++++++----------
 arch/x86/boot/compressed/head_32.S     |  14 +-
 arch/x86/boot/compressed/head_64.S     |  15 +-
 arch/x86/boot/compressed/misc.c        |  94 +++++++-----
 arch/x86/boot/compressed/misc.h        |  34 +++--
 arch/x86/boot/compressed/misc_pgt.c    |  91 ++++++++++++
 arch/x86/boot/compressed/mkpiggy.c     |  28 +---
 arch/x86/boot/compressed/string.c      |  29 +++-
 arch/x86/boot/compressed/vmlinux.lds.S |   1 +
 arch/x86/boot/header.S                 |  22 ++-
 arch/x86/include/asm/boot.h            |  19 +++
 arch/x86/include/asm/page.h            |   5 +
 arch/x86/kernel/asm-offsets.c          |   1 +
 arch/x86/kernel/vmlinux.lds.S          |   1 +
 arch/x86/mm/ident_map.c                |  74 ++++++++++
 arch/x86/mm/init_64.c                  |  74 +---------
 arch/x86/tools/calc_run_size.sh        |  42 ------
 19 files changed, 543 insertions(+), 291 deletions(-)
 create mode 100644 arch/x86/boot/compressed/misc_pgt.c
 create mode 100644 arch/x86/mm/ident_map.c
 delete mode 100644 arch/x86/tools/calc_run_size.sh

-- 
2.5.0

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH v3 01/19] x86, kaslr: Remove not needed parameter for choose_kernel_location
  2016-03-04 16:24 [PATCH v3 00/19] x86, boot: kaslr cleanup and 64bit kaslr support Baoquan He
@ 2016-03-04 16:24 ` Baoquan He
  2016-03-07 22:28   ` Kees Cook
  2016-03-04 16:25 ` [PATCH v3 02/19] x86, boot: Move compressed kernel to end of buffer before decompressing Baoquan He
                   ` (18 subsequent siblings)
  19 siblings, 1 reply; 51+ messages in thread
From: Baoquan He @ 2016-03-04 16:24 UTC (permalink / raw)
  To: linux-kernel
  Cc: yinghai, keescook, hpa, vgoyal, mingo, bp, luto, lasse.collin,
	akpm, dyoung

From: Yinghai Lu <yinghai@kernel.org>

real_mode is global variable, so we do not need to pass it around.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/boot/compressed/aslr.c | 5 ++---
 arch/x86/boot/compressed/misc.c | 2 +-
 arch/x86/boot/compressed/misc.h | 6 ++----
 3 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index 6a9b96b..622aa88 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -295,8 +295,7 @@ static unsigned long find_random_addr(unsigned long minimum,
 	return slots_fetch_random();
 }
 
-unsigned char *choose_kernel_location(struct boot_params *boot_params,
-				      unsigned char *input,
+unsigned char *choose_kernel_location(unsigned char *input,
 				      unsigned long input_size,
 				      unsigned char *output,
 				      unsigned long output_size)
@@ -316,7 +315,7 @@ unsigned char *choose_kernel_location(struct boot_params *boot_params,
 	}
 #endif
 
-	boot_params->hdr.loadflags |= KASLR_FLAG;
+	real_mode->hdr.loadflags |= KASLR_FLAG;
 
 	/* Record the various known unsafe memory ranges. */
 	mem_avoid_init((unsigned long)input, input_size,
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 79dac17..f35ad9e 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -428,7 +428,7 @@ asmlinkage __visible void *decompress_kernel(void *rmode, memptr heap,
 	 * the entire decompressed kernel plus relocation table, or the
 	 * entire decompressed kernel plus .bss and .brk sections.
 	 */
-	output = choose_kernel_location(real_mode, input_data, input_len, output,
+	output = choose_kernel_location(input_data, input_len, output,
 					output_len > run_size ? output_len
 							      : run_size);
 
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index 3783dc3..dcf01c2 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -67,8 +67,7 @@ int cmdline_find_option_bool(const char *option);
 
 #if CONFIG_RANDOMIZE_BASE
 /* aslr.c */
-unsigned char *choose_kernel_location(struct boot_params *boot_params,
-				      unsigned char *input,
+unsigned char *choose_kernel_location(unsigned char *input,
 				      unsigned long input_size,
 				      unsigned char *output,
 				      unsigned long output_size);
@@ -76,8 +75,7 @@ unsigned char *choose_kernel_location(struct boot_params *boot_params,
 bool has_cpuflag(int flag);
 #else
 static inline
-unsigned char *choose_kernel_location(struct boot_params *boot_params,
-				      unsigned char *input,
+unsigned char *choose_kernel_location(unsigned char *input,
 				      unsigned long input_size,
 				      unsigned char *output,
 				      unsigned long output_size)
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v3 02/19] x86, boot: Move compressed kernel to end of buffer before decompressing
  2016-03-04 16:24 [PATCH v3 00/19] x86, boot: kaslr cleanup and 64bit kaslr support Baoquan He
  2016-03-04 16:24 ` [PATCH v3 01/19] x86, kaslr: Remove not needed parameter for choose_kernel_location Baoquan He
@ 2016-03-04 16:25 ` Baoquan He
  2016-03-07 22:35   ` Kees Cook
  2016-03-04 16:25 ` [PATCH v3 03/19] x86, boot: Move z_extract_offset calculation to header.S Baoquan He
                   ` (17 subsequent siblings)
  19 siblings, 1 reply; 51+ messages in thread
From: Baoquan He @ 2016-03-04 16:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: yinghai, keescook, hpa, vgoyal, mingo, bp, luto, lasse.collin,
	akpm, dyoung

From: Yinghai Lu <yinghai@kernel.org>

For better understanding clarify what are VO and ZO firstly. They are
introduced in below commits by hpa.
77d1a49 x86, boot: make symbols from the main vmlinux available
37ba7ab x86, boot: make kernel_alignment adjustable; new bzImage fields

VO:
- uncompressed kernel
- size: VO__end - VO__text

ZO:
- boot/compressed/vmlinux
- head text + compressed (VO+relocs) + decompressor code
- size: ZO__end - ZO_startup_32

And also copy INIT_SIZE definition here for reference. From below macros
we can see INIT_SIZE is either bigger than VO_INIT_SIZE or equal to
VO_INIT_SIZE.

 #define ZO_INIT_SIZE    (ZO__end - ZO_startup_32 + ZO_z_extract_offset)
 #define VO_INIT_SIZE    (VO__end - VO__text)
 #if ZO_INIT_SIZE > VO_INIT_SIZE
 #define INIT_SIZE ZO_INIT_SIZE
 #else
 #define INIT_SIZE VO_INIT_SIZE
 #endif

Current code uses extract_offset to decide the copied ZO position, it
put ZO starting from extract_offset. When INIT_SIZE is bigger than
VO_INIT_SIZE copied ZO occupies space from extract_offset to the end
of decompressing buffer, just like below.

0                       extract_offset                       init_size
|-----------|---------------|------------------------|---------|
            |               |                        |         |
          VO__text      startup_32 of ZO         VO__end     ZO__end

However when INIT_SIZE is euqal to VO_INIT_SIZE there's still space left
from end of ZO to the end of decompressing buffer, like below.

0                       extract_offset                       init_size
|-----------|---------------|------------------------|---------|
            |               |                        |         |
          VO__text      startup_32 of ZO         ZO__end     VO__end

It's better to always put ZO right in the end of decompressing buffer.
Anyway, who doesn't like it to be better.

So in this patch add BP_init_size into asm-offsets.c to make it visible
to asm modules. And find the starting position of copied ZO by init_size
and ZO run size, namely (ZO__end - startup_32) when move ZO.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
v2->v3:
    Adjust the patch log.

 arch/x86/boot/compressed/head_32.S     | 11 +++++++++--
 arch/x86/boot/compressed/head_64.S     |  8 ++++++--
 arch/x86/boot/compressed/mkpiggy.c     |  3 ---
 arch/x86/boot/compressed/vmlinux.lds.S |  1 +
 arch/x86/kernel/asm-offsets.c          |  1 +
 arch/x86/kernel/vmlinux.lds.S          |  1 +
 6 files changed, 18 insertions(+), 7 deletions(-)

diff --git a/arch/x86/boot/compressed/head_32.S b/arch/x86/boot/compressed/head_32.S
index 8ef964d..0c140f9 100644
--- a/arch/x86/boot/compressed/head_32.S
+++ b/arch/x86/boot/compressed/head_32.S
@@ -148,7 +148,9 @@ preferred_addr:
 1:
 
 	/* Target address to relocate to for decompression */
-	addl	$z_extract_offset, %ebx
+	movl    BP_init_size(%esi), %eax
+	subl    $_end, %eax
+	addl    %eax, %ebx
 
 	/* Set up the stack */
 	leal	boot_stack_end(%ebx), %esp
@@ -210,8 +212,13 @@ relocated:
 				/* push arguments for decompress_kernel: */
 	pushl	$z_run_size	/* size of kernel with .bss and .brk */
 	pushl	$z_output_len	/* decompressed length, end of relocs */
-	leal	z_extract_offset_negative(%ebx), %ebp
+
+	movl    BP_init_size(%esi), %eax
+	subl    $_end, %eax
+	movl    %ebx, %ebp
+	subl    %eax, %ebp
 	pushl	%ebp		/* output address */
+
 	pushl	$z_input_len	/* input_len */
 	leal	input_data(%ebx), %eax
 	pushl	%eax		/* input_data */
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index b0c0d16..67dd8d3 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -102,7 +102,9 @@ ENTRY(startup_32)
 1:
 
 	/* Target address to relocate to for decompression */
-	addl	$z_extract_offset, %ebx
+	movl	BP_init_size(%esi), %eax
+	subl	$_end, %eax
+	addl	%eax, %ebx
 
 /*
  * Prepare for entering 64 bit mode
@@ -330,7 +332,9 @@ preferred_addr:
 1:
 
 	/* Target address to relocate to for decompression */
-	leaq	z_extract_offset(%rbp), %rbx
+	movl	BP_init_size(%rsi), %ebx
+	subl	$_end, %ebx
+	addq	%rbp, %rbx
 
 	/* Set up the stack */
 	leaq	boot_stack_end(%rbx), %rsp
diff --git a/arch/x86/boot/compressed/mkpiggy.c b/arch/x86/boot/compressed/mkpiggy.c
index d8222f2..b980046 100644
--- a/arch/x86/boot/compressed/mkpiggy.c
+++ b/arch/x86/boot/compressed/mkpiggy.c
@@ -85,9 +85,6 @@ int main(int argc, char *argv[])
 	printf("z_output_len = %lu\n", (unsigned long)olen);
 	printf(".globl z_extract_offset\n");
 	printf("z_extract_offset = 0x%lx\n", offs);
-	/* z_extract_offset_negative allows simplification of head_32.S */
-	printf(".globl z_extract_offset_negative\n");
-	printf("z_extract_offset_negative = -0x%lx\n", offs);
 	printf(".globl z_run_size\n");
 	printf("z_run_size = %lu\n", run_size);
 
diff --git a/arch/x86/boot/compressed/vmlinux.lds.S b/arch/x86/boot/compressed/vmlinux.lds.S
index 34d047c..e24e0a0 100644
--- a/arch/x86/boot/compressed/vmlinux.lds.S
+++ b/arch/x86/boot/compressed/vmlinux.lds.S
@@ -70,5 +70,6 @@ SECTIONS
 		_epgtable = . ;
 	}
 #endif
+	. = ALIGN(PAGE_SIZE);	/* keep ZO size page aligned */
 	_end = .;
 }
diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index 84a7524..506de14 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -81,6 +81,7 @@ void common(void) {
 	OFFSET(BP_hardware_subarch, boot_params, hdr.hardware_subarch);
 	OFFSET(BP_version, boot_params, hdr.version);
 	OFFSET(BP_kernel_alignment, boot_params, hdr.kernel_alignment);
+	OFFSET(BP_init_size, boot_params, hdr.init_size);
 	OFFSET(BP_pref_address, boot_params, hdr.pref_address);
 	OFFSET(BP_code32_start, boot_params, hdr.code32_start);
 
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 74e4bf1..217d5b1 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -325,6 +325,7 @@ SECTIONS
 		__brk_limit = .;
 	}
 
+	. = ALIGN(PAGE_SIZE);		/* keep VO_INIT_SIZE page aligned */
 	_end = .;
 
         STABS_DEBUG
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v3 03/19] x86, boot: Move z_extract_offset calculation to header.S
  2016-03-04 16:24 [PATCH v3 00/19] x86, boot: kaslr cleanup and 64bit kaslr support Baoquan He
  2016-03-04 16:24 ` [PATCH v3 01/19] x86, kaslr: Remove not needed parameter for choose_kernel_location Baoquan He
  2016-03-04 16:25 ` [PATCH v3 02/19] x86, boot: Move compressed kernel to end of buffer before decompressing Baoquan He
@ 2016-03-04 16:25 ` Baoquan He
  2016-03-04 16:25 ` [PATCH v3 04/19] x86, kaskr: Update the description for decompressor worst case Baoquan He
                   ` (16 subsequent siblings)
  19 siblings, 0 replies; 51+ messages in thread
From: Baoquan He @ 2016-03-04 16:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: yinghai, keescook, hpa, vgoyal, mingo, bp, luto, lasse.collin,
	akpm, dyoung, Baoquan He

From: Yinghai Lu <yinghai@kernel.org>

Current z_extract_offset is calculated in boot/compressed/mkpiggy.c. The
problem is in mkpiggy.c we don't know the detail of decompressor. Then
we just get a rough z_extract_offset according to extra_bytes. As we know
extra_bytes can only promise a safety margin when decompressing. In fact
this brings some risks:

 - output+output_len could be much bigger than input+input_len. In this
   cass decompressed kernel plus relocs could overwrite decompressing
   method code even when they are running.

 - ehead of ZO could be bigger than z_extract_offset. In this case overwrite
   could happen when head code is running to move ZO to end of buffer.
   Though currently the size of head code is very small it's still a
   potential risk. Because no rule to limit the size of head code of ZO,
   it has possibility to be very big.

Now move z_extract_offset calculation to header.S, and adjust z_extract_offset
to make sure that above two cases never happen.

Besides we have made ZO always be in the end of decompressing buffer,
z_extract_offset is only used here to calculate an appropriate INIT_SIZE,
no other place need it now. So no need to put it in voffset.h.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Baoquan He <bhe@redhat.com>
---
v2->v3:
    Tune the patch log.
    Remove code comment above init_size.
    I still think extra_bytes shoule be taken as below formula to cover
    the worst case of xz.
#define ZO_z_extra_bytes ((ZO_z_output_len >> 12) + 65536 + 128)

 arch/x86/boot/Makefile             |  2 +-
 arch/x86/boot/compressed/misc.c    |  5 +----
 arch/x86/boot/compressed/mkpiggy.c | 15 +--------------
 arch/x86/boot/header.S             | 22 +++++++++++++++++++++-
 4 files changed, 24 insertions(+), 20 deletions(-)

diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
index bbe1a62..bd021e5 100644
--- a/arch/x86/boot/Makefile
+++ b/arch/x86/boot/Makefile
@@ -87,7 +87,7 @@ targets += voffset.h
 $(obj)/voffset.h: vmlinux FORCE
 	$(call if_changed,voffset)
 
-sed-zoffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(startup_32\|startup_64\|efi32_stub_entry\|efi64_stub_entry\|efi_pe_entry\|input_data\|_end\|z_.*\)$$/\#define ZO_\2 0x\1/p'
+sed-zoffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(startup_32\|startup_64\|efi32_stub_entry\|efi64_stub_entry\|efi_pe_entry\|input_data\|_end\|_ehead\|_text\|z_.*\)$$/\#define ZO_\2 0x\1/p'
 
 quiet_cmd_zoffset = ZOFFSET $@
       cmd_zoffset = $(NM) $< | sed -n $(sed-zoffset) > $@
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index f35ad9e..a56bb5d 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -83,13 +83,10 @@
  * To avoid problems with the compressed data's meta information an extra 18
  * bytes are needed.  Leading to the formula:
  *
- * extra_bytes = (uncompressed_size >> 12) + 32768 + 18 + decompressor_size.
+ * extra_bytes = (uncompressed_size >> 12) + 32768 + 18.
  *
  * Adding 8 bytes per 32K is a bit excessive but much easier to calculate.
  * Adding 32768 instead of 32767 just makes for round numbers.
- * Adding the decompressor_size is necessary as it musht live after all
- * of the data as well.  Last I measured the decompressor is about 14K.
- * 10K of actual data and 4K of bss.
  *
  */
 
diff --git a/arch/x86/boot/compressed/mkpiggy.c b/arch/x86/boot/compressed/mkpiggy.c
index b980046..a613c84 100644
--- a/arch/x86/boot/compressed/mkpiggy.c
+++ b/arch/x86/boot/compressed/mkpiggy.c
@@ -21,8 +21,7 @@
  * ----------------------------------------------------------------------- */
 
 /*
- * Compute the desired load offset from a compressed program; outputs
- * a small assembly wrapper with the appropriate symbols defined.
+ * outputs a small assembly wrapper with the appropriate symbols defined.
  */
 
 #include <stdlib.h>
@@ -35,7 +34,6 @@ int main(int argc, char *argv[])
 {
 	uint32_t olen;
 	long ilen;
-	unsigned long offs;
 	unsigned long run_size;
 	FILE *f = NULL;
 	int retval = 1;
@@ -67,15 +65,6 @@ int main(int argc, char *argv[])
 	ilen = ftell(f);
 	olen = get_unaligned_le32(&olen);
 
-	/*
-	 * Now we have the input (compressed) and output (uncompressed)
-	 * sizes, compute the necessary decompression offset...
-	 */
-
-	offs = (olen > ilen) ? olen - ilen : 0;
-	offs += olen >> 12;	/* Add 8 bytes for each 32K block */
-	offs += 64*1024 + 128;	/* Add 64K + 128 bytes slack */
-	offs = (offs+4095) & ~4095; /* Round to a 4K boundary */
 	run_size = atoi(argv[2]);
 
 	printf(".section \".rodata..compressed\",\"a\",@progbits\n");
@@ -83,8 +72,6 @@ int main(int argc, char *argv[])
 	printf("z_input_len = %lu\n", ilen);
 	printf(".globl z_output_len\n");
 	printf("z_output_len = %lu\n", (unsigned long)olen);
-	printf(".globl z_extract_offset\n");
-	printf("z_extract_offset = 0x%lx\n", offs);
 	printf(".globl z_run_size\n");
 	printf("z_run_size = %lu\n", run_size);
 
diff --git a/arch/x86/boot/header.S b/arch/x86/boot/header.S
index 6236b9e..1c057e3 100644
--- a/arch/x86/boot/header.S
+++ b/arch/x86/boot/header.S
@@ -440,7 +440,27 @@ setup_data:		.quad 0			# 64-bit physical pointer to
 
 pref_address:		.quad LOAD_PHYSICAL_ADDR	# preferred load addr
 
-#define ZO_INIT_SIZE	(ZO__end - ZO_startup_32 + ZO_z_extract_offset)
+/* Check arch/x86/boot/compressed/misc.c for the formula of extra_bytes*/
+#define ZO_z_extra_bytes ((ZO_z_output_len >> 12) + 65536 + 128)
+#if ZO_z_output_len > ZO_z_input_len
+#define ZO_z_extract_offset (ZO_z_output_len + ZO_z_extra_bytes - ZO_z_input_len)
+#else
+#define ZO_z_extract_offset ZO_z_extra_bytes
+#endif
+
+/*
+ * extract_offset has to be bigger than ZO head section. Otherwise
+ * during head code running to move ZO to end of buffer, it will
+ * overwrite head code itself.
+ */
+#if (ZO__ehead - ZO_startup_32) > ZO_z_extract_offset
+#define ZO_z_min_extract_offset ((ZO__ehead - ZO_startup_32 + 4095) & ~4095)
+#else
+#define ZO_z_min_extract_offset ((ZO_z_extract_offset + 4095) & ~4095)
+#endif
+
+#define ZO_INIT_SIZE	(ZO__end - ZO_startup_32 + ZO_z_min_extract_offset)
+
 #define VO_INIT_SIZE	(VO__end - VO__text)
 #if ZO_INIT_SIZE > VO_INIT_SIZE
 #define INIT_SIZE ZO_INIT_SIZE
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v3 04/19] x86, kaskr: Update the description for decompressor worst case
  2016-03-04 16:24 [PATCH v3 00/19] x86, boot: kaslr cleanup and 64bit kaslr support Baoquan He
                   ` (2 preceding siblings ...)
  2016-03-04 16:25 ` [PATCH v3 03/19] x86, boot: Move z_extract_offset calculation to header.S Baoquan He
@ 2016-03-04 16:25 ` Baoquan He
  2016-03-04 16:25 ` [PATCH v3 05/19] x86, boot: Fix run_size calculation Baoquan He
                   ` (15 subsequent siblings)
  19 siblings, 0 replies; 51+ messages in thread
From: Baoquan He @ 2016-03-04 16:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: yinghai, keescook, hpa, vgoyal, mingo, bp, luto, lasse.collin,
	akpm, dyoung, Baoquan He

Current analysis is only for gzip decompressor only. In fact we can
support 6 different decompressor. Update the description to cover
all of them. Meanwhile fix several typos.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
v2->v3:
    This is newly added.

 arch/x86/boot/compressed/misc.c | 26 +++++++++++++++++---------
 1 file changed, 17 insertions(+), 9 deletions(-)

diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index a56bb5d..7a65a8a 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -19,11 +19,13 @@
  */
 
 /*
- * Getting to provable safe in place decompression is hard.
- * Worst case behaviours need to be analyzed.
- * Background information:
+ * Getting to provable safe in place decompression is hard. Worst case
+ * behaviours need be analyzed. Here let's take decompressing gzip-compressed
+ * kernel as example to illustrate it.
+ *
+ * The file layout of gzip compressed kernel is as follows. For more information,
+ * please refer to rfc 1951 and rfc 1952.
  *
- * The file layout is:
  *    magic[2]
  *    method[1]
  *    flags[1]
@@ -70,13 +72,13 @@
  * of 5 bytes per 32767 bytes.
  *
  * The worst case internal to a compressed block is very hard to figure.
- * The worst case can at least be boundined by having one bit that represents
+ * The worst case can at least be bounded by having one bit that represents
  * 32764 bytes and then all of the rest of the bytes representing the very
  * very last byte.
  *
  * All of which is enough to compute an amount of extra data that is required
  * to be safe.  To avoid problems at the block level allocating 5 extra bytes
- * per 32767 bytes of data is sufficient.  To avoind problems internal to a
+ * per 32767 bytes of data is sufficient.  To avoid problems internal to a
  * block adding an extra 32767 bytes (the worst case uncompressed block size)
  * is sufficient, to ensure that in the worst case the decompressed data for
  * block will stop the byte before the compressed data for a block begins.
@@ -88,11 +90,17 @@
  * Adding 8 bytes per 32K is a bit excessive but much easier to calculate.
  * Adding 32768 instead of 32767 just makes for round numbers.
  *
+ * Above analysis is for decompressing gzip compressed kernel only. Up to
+ * now 6 different decompressor are supported all together. And among them
+ * xz stores data in chunks and has maximum chunk of 64K. Hence safety
+ * margin should be updated to cover all decompressors so that we don't
+ * need to deal with each of them separately. Please check
+ * the description in lib/decompressor_xxx.c for specific information.
+ *
+ * extra_bytes = (uncompressed_size >> 12) + 65536 + 128.
+ *
  */
 
-/*
- * gzip declarations
- */
 #define STATIC		static
 
 #undef memcpy
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v3 05/19] x86, boot: Fix run_size calculation
  2016-03-04 16:24 [PATCH v3 00/19] x86, boot: kaslr cleanup and 64bit kaslr support Baoquan He
                   ` (3 preceding siblings ...)
  2016-03-04 16:25 ` [PATCH v3 04/19] x86, kaskr: Update the description for decompressor worst case Baoquan He
@ 2016-03-04 16:25 ` Baoquan He
  2016-03-07 23:10   ` Kees Cook
  2016-03-04 16:25 ` [PATCH v3 06/19] x86, kaslr: Clean up useless code related to run_size Baoquan He
                   ` (14 subsequent siblings)
  19 siblings, 1 reply; 51+ messages in thread
From: Baoquan He @ 2016-03-04 16:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: yinghai, keescook, hpa, vgoyal, mingo, bp, luto, lasse.collin,
	akpm, dyoung, Junjie Mao, Josh Triplett, Matt Fleming

From: Yinghai Lu <yinghai@kernel.org>

Firstly, current run_size is calculated via shell script
arch/x86/tools/calc_run_size.sh. It gets file offset and mem size of section
.bss and .brk in vmlinux, then add them as follows:

run_size=$(( $offsetA + $sizeA + $sizeB ))

However this is completely wrong. The offset is the starting address of
section or segment in elf file. Below is a vmlinux I compiled:

[bhe@x1 linux]$ objdump -h vmlinux

vmlinux:     file format elf64-x86-64

Sections:
Idx Name          Size      VMA               LMA               File off  Algn
 27 .bss          00170000  ffffffff81ec8000  0000000001ec8000  012c8000  2**12
                  ALLOC
 28 .brk          00027000  ffffffff82038000  0000000002038000  012c8000  2**0
                  ALLOC

Here we can get run_size is 0x145f000.
0x012c8000+0x012c8000+0x00027000=0x145f000

[bhe@x1 linux]$ readelf -l vmlinux

Elf file type is EXEC (Executable file)
Entry point 0x1000000
There are 5 program headers, starting at offset 64

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  LOAD           0x0000000000200000 0xffffffff81000000 0x0000000001000000
                 0x0000000000b5e000 0x0000000000b5e000  R E    200000
  LOAD           0x0000000000e00000 0xffffffff81c00000 0x0000000001c00000
                 0x0000000000145000 0x0000000000145000  RW     200000
  LOAD           0x0000000001000000 0x0000000000000000 0x0000000001d45000
                 0x0000000000018158 0x0000000000018158  RW     200000
  LOAD           0x000000000115e000 0xffffffff81d5e000 0x0000000001d5e000
                 0x000000000016a000 0x0000000000301000  RWE    200000
  NOTE           0x000000000099bcac 0xffffffff8179bcac 0x000000000179bcac
                 0x00000000000001bc 0x00000000000001bc         4

 Section to Segment mapping:
  Segment Sections...
   00     .text .notes __ex_table .rodata __bug_table .pci_fixup .tracedata __ksymtab __ksymtab_gpl __ksymtab_strings __init_rodata __param __modver
   01     .data .vvar
   02     .data..percpu
   03     .init.text .init.data .x86_cpu_dev.init .parainstructions .altinstructions .altinstr_replacement .iommu_table .apicdrivers .exit.text .smp_locks .bss .brk
   04     .notes

Here we can get the same value as current run_size if we add p_offset
and p_memsz.
0x000000000115e000+0x0000000000301000=0x145f000

But is it right? Obviously not. We should calculate it using the last LOAD
program segment like this:
run_size = phdr->p_paddr + phdr->p_memsz - physical load addr of kernel
run_size=0x0000000001d5e000+0x0000000000301000-0x0000000001000000=0x105f000

It's equal to VO_end-VO_text and certainly it's simpler to do.
_end: 0xffffffff8205f000
_text:0xffffffff81000000
run_size = 0xffffffff8205f000-0xffffffff81000000=0x105f000

Secondly run_size is a simple constant, we don't need to pass it around
and we already have voffset.h for that. We can share voffset.h between
misc.c and header.S instead of getting run_size in other ways. Hence in
this patch, we move voffset.h creation code to boot/compressed/Makefile.

Dependence was:
boot/header.S ==> boot/voffset.h ==> vmlinux
boot/header.S ==> compressed/vmlinux ==> compressed/misc.c
Now become:
boot/header.S ==> compressed/vmlinux ==> compressed/misc.c ==> boot/voffset.h ==> vmlinux

Use macro in misc.c to replace passed run_size.

Fixes: e6023367d779 ("x86, kaslr: Prevent .bss from overlaping initrd")
Cc: Junjie Mao <eternal.n08@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Matt Fleming <matt.fleming@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
v2->v3:
    Adjust the patch log.

 arch/x86/boot/Makefile            | 11 +----------
 arch/x86/boot/compressed/Makefile | 12 ++++++++++++
 arch/x86/boot/compressed/misc.c   |  3 +++
 3 files changed, 16 insertions(+), 10 deletions(-)

diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
index bd021e5..40f0ae2 100644
--- a/arch/x86/boot/Makefile
+++ b/arch/x86/boot/Makefile
@@ -78,15 +78,6 @@ $(obj)/vmlinux.bin: $(obj)/compressed/vmlinux FORCE
 
 SETUP_OBJS = $(addprefix $(obj)/,$(setup-y))
 
-sed-voffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(_text\|_end\)$$/\#define VO_\2 0x\1/p'
-
-quiet_cmd_voffset = VOFFSET $@
-      cmd_voffset = $(NM) $< | sed -n $(sed-voffset) > $@
-
-targets += voffset.h
-$(obj)/voffset.h: vmlinux FORCE
-	$(call if_changed,voffset)
-
 sed-zoffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(startup_32\|startup_64\|efi32_stub_entry\|efi64_stub_entry\|efi_pe_entry\|input_data\|_end\|_ehead\|_text\|z_.*\)$$/\#define ZO_\2 0x\1/p'
 
 quiet_cmd_zoffset = ZOFFSET $@
@@ -98,7 +89,7 @@ $(obj)/zoffset.h: $(obj)/compressed/vmlinux FORCE
 
 
 AFLAGS_header.o += -I$(obj)
-$(obj)/header.o: $(obj)/voffset.h $(obj)/zoffset.h
+$(obj)/header.o: $(obj)/zoffset.h
 
 LDFLAGS_setup.elf	:= -T
 $(obj)/setup.elf: $(src)/setup.ld $(SETUP_OBJS) FORCE
diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index f9ce75d..d9dedd9 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -41,6 +41,18 @@ LDFLAGS_vmlinux := -T
 hostprogs-y	:= mkpiggy
 HOST_EXTRACFLAGS += -I$(srctree)/tools/include
 
+sed-voffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(_text\|_end\)$$/\#define VO_\2 _AC(0x\1,UL)/p'
+
+quiet_cmd_voffset = VOFFSET $@
+      cmd_voffset = $(NM) $< | sed -n $(sed-voffset) > $@
+
+targets += ../voffset.h
+
+$(obj)/../voffset.h: vmlinux FORCE
+	$(call if_changed,voffset)
+
+$(obj)/misc.o: $(obj)/../voffset.h
+
 vmlinux-objs-y := $(obj)/vmlinux.lds $(obj)/head_$(BITS).o $(obj)/misc.o \
 	$(obj)/string.o $(obj)/cmdline.o \
 	$(obj)/piggy.o $(obj)/cpuflags.o
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 7a65a8a..42e76c2 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -11,6 +11,7 @@
 
 #include "misc.h"
 #include "../string.h"
+#include "../voffset.h"
 
 /* WARNING!!
  * This code is compiled with -fPIC and it is relocated dynamically
@@ -415,6 +416,8 @@ asmlinkage __visible void *decompress_kernel(void *rmode, memptr heap,
 	lines = real_mode->screen_info.orig_video_lines;
 	cols = real_mode->screen_info.orig_video_cols;
 
+	run_size = VO__end - VO__text;
+
 	console_init();
 	debug_putstr("early console in decompress_kernel\n");
 
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v3 06/19] x86, kaslr: Clean up useless code related to run_size.
  2016-03-04 16:24 [PATCH v3 00/19] x86, boot: kaslr cleanup and 64bit kaslr support Baoquan He
                   ` (4 preceding siblings ...)
  2016-03-04 16:25 ` [PATCH v3 05/19] x86, boot: Fix run_size calculation Baoquan He
@ 2016-03-04 16:25 ` Baoquan He
  2016-03-07 23:12   ` Kees Cook
  2016-03-04 16:25 ` [PATCH v3 07/19] x86, kaslr: Get correct max_addr for relocs pointer Baoquan He
                   ` (13 subsequent siblings)
  19 siblings, 1 reply; 51+ messages in thread
From: Baoquan He @ 2016-03-04 16:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: yinghai, keescook, hpa, vgoyal, mingo, bp, luto, lasse.collin,
	akpm, dyoung, Josh Triplett, Matt Fleming, Ard Biesheuvel,
	Junjie Mao

From: Yinghai Lu <yinghai@kernel.org>

So now we use the below formula to get run_size.

	run_size = VO__end - VO__text

Let's remove code for old run_size calculation and clean up the places
where run_size need be passed in and out.

Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Matt Fleming <matt.fleming@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Junjie Mao <eternal.n08@gmail.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
v2->v3:
    Adjust the patch log.

 arch/x86/boot/compressed/Makefile  |  4 +---
 arch/x86/boot/compressed/head_32.S |  3 +--
 arch/x86/boot/compressed/head_64.S |  3 ---
 arch/x86/boot/compressed/misc.c    |  6 ++----
 arch/x86/boot/compressed/mkpiggy.c | 10 ++-------
 arch/x86/tools/calc_run_size.sh    | 42 --------------------------------------
 6 files changed, 6 insertions(+), 62 deletions(-)
 delete mode 100644 arch/x86/tools/calc_run_size.sh

diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index d9dedd9..fef80fa 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -105,10 +105,8 @@ suffix-$(CONFIG_KERNEL_XZ)	:= xz
 suffix-$(CONFIG_KERNEL_LZO) 	:= lzo
 suffix-$(CONFIG_KERNEL_LZ4) 	:= lz4
 
-RUN_SIZE = $(shell $(OBJDUMP) -h vmlinux | \
-	     $(CONFIG_SHELL) $(srctree)/arch/x86/tools/calc_run_size.sh)
 quiet_cmd_mkpiggy = MKPIGGY $@
-      cmd_mkpiggy = $(obj)/mkpiggy $< $(RUN_SIZE) > $@ || ( rm -f $@ ; false )
+      cmd_mkpiggy = $(obj)/mkpiggy $< > $@ || ( rm -f $@ ; false )
 
 targets += piggy.S
 $(obj)/piggy.S: $(obj)/vmlinux.bin.$(suffix-y) $(obj)/mkpiggy FORCE
diff --git a/arch/x86/boot/compressed/head_32.S b/arch/x86/boot/compressed/head_32.S
index 0c140f9..122b32f 100644
--- a/arch/x86/boot/compressed/head_32.S
+++ b/arch/x86/boot/compressed/head_32.S
@@ -210,7 +210,6 @@ relocated:
  * Do the decompression, and jump to the new kernel..
  */
 				/* push arguments for decompress_kernel: */
-	pushl	$z_run_size	/* size of kernel with .bss and .brk */
 	pushl	$z_output_len	/* decompressed length, end of relocs */
 
 	movl    BP_init_size(%esi), %eax
@@ -226,7 +225,7 @@ relocated:
 	pushl	%eax		/* heap area */
 	pushl	%esi		/* real mode pointer */
 	call	decompress_kernel /* returns kernel location in %eax */
-	addl	$28, %esp
+	addl	$24, %esp
 
 /*
  * Jump to the decompressed kernel.
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 67dd8d3..3691451 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -407,8 +407,6 @@ relocated:
  * Do the decompression, and jump to the new kernel..
  */
 	pushq	%rsi			/* Save the real mode argument */
-	movq	$z_run_size, %r9	/* size of kernel with .bss and .brk */
-	pushq	%r9
 	movq	%rsi, %rdi		/* real mode address */
 	leaq	boot_heap(%rip), %rsi	/* malloc area for uncompression */
 	leaq	input_data(%rip), %rdx  /* input_data */
@@ -416,7 +414,6 @@ relocated:
 	movq	%rbp, %r8		/* output target address */
 	movq	$z_output_len, %r9	/* decompressed length, end of relocs */
 	call	decompress_kernel	/* returns kernel location in %rax */
-	popq	%r9
 	popq	%rsi
 
 /*
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 42e76c2..069120e 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -393,9 +393,9 @@ asmlinkage __visible void *decompress_kernel(void *rmode, memptr heap,
 				  unsigned char *input_data,
 				  unsigned long input_len,
 				  unsigned char *output,
-				  unsigned long output_len,
-				  unsigned long run_size)
+				  unsigned long output_len)
 {
+	unsigned long run_size = VO__end - VO__text;
 	unsigned char *output_orig = output;
 
 	real_mode = rmode;
@@ -416,8 +416,6 @@ asmlinkage __visible void *decompress_kernel(void *rmode, memptr heap,
 	lines = real_mode->screen_info.orig_video_lines;
 	cols = real_mode->screen_info.orig_video_cols;
 
-	run_size = VO__end - VO__text;
-
 	console_init();
 	debug_putstr("early console in decompress_kernel\n");
 
diff --git a/arch/x86/boot/compressed/mkpiggy.c b/arch/x86/boot/compressed/mkpiggy.c
index a613c84..c5148642 100644
--- a/arch/x86/boot/compressed/mkpiggy.c
+++ b/arch/x86/boot/compressed/mkpiggy.c
@@ -34,13 +34,11 @@ int main(int argc, char *argv[])
 {
 	uint32_t olen;
 	long ilen;
-	unsigned long run_size;
 	FILE *f = NULL;
 	int retval = 1;
 
-	if (argc < 3) {
-		fprintf(stderr, "Usage: %s compressed_file run_size\n",
-				argv[0]);
+	if (argc < 2) {
+		fprintf(stderr, "Usage: %s compressed_file\n", argv[0]);
 		goto bail;
 	}
 
@@ -65,15 +63,11 @@ int main(int argc, char *argv[])
 	ilen = ftell(f);
 	olen = get_unaligned_le32(&olen);
 
-	run_size = atoi(argv[2]);
-
 	printf(".section \".rodata..compressed\",\"a\",@progbits\n");
 	printf(".globl z_input_len\n");
 	printf("z_input_len = %lu\n", ilen);
 	printf(".globl z_output_len\n");
 	printf("z_output_len = %lu\n", (unsigned long)olen);
-	printf(".globl z_run_size\n");
-	printf("z_run_size = %lu\n", run_size);
 
 	printf(".globl input_data, input_data_end\n");
 	printf("input_data:\n");
diff --git a/arch/x86/tools/calc_run_size.sh b/arch/x86/tools/calc_run_size.sh
deleted file mode 100644
index 1a4c17b..0000000
--- a/arch/x86/tools/calc_run_size.sh
+++ /dev/null
@@ -1,42 +0,0 @@
-#!/bin/sh
-#
-# Calculate the amount of space needed to run the kernel, including room for
-# the .bss and .brk sections.
-#
-# Usage:
-# objdump -h a.out | sh calc_run_size.sh
-
-NUM='\([0-9a-fA-F]*[ \t]*\)'
-OUT=$(sed -n 's/^[ \t0-9]*.b[sr][sk][ \t]*'"$NUM$NUM$NUM$NUM"'.*/\1\4/p')
-if [ -z "$OUT" ] ; then
-	echo "Never found .bss or .brk file offset" >&2
-	exit 1
-fi
-
-OUT=$(echo ${OUT# })
-sizeA=$(printf "%d" 0x${OUT%% *})
-OUT=${OUT#* }
-offsetA=$(printf "%d" 0x${OUT%% *})
-OUT=${OUT#* }
-sizeB=$(printf "%d" 0x${OUT%% *})
-OUT=${OUT#* }
-offsetB=$(printf "%d" 0x${OUT%% *})
-
-run_size=$(( $offsetA + $sizeA + $sizeB ))
-
-# BFD linker shows the same file offset in ELF.
-if [ "$offsetA" -ne "$offsetB" ] ; then
-	# Gold linker shows them as consecutive.
-	endB=$(( $offsetB + $sizeB ))
-	if [ "$endB" != "$run_size" ] ; then
-		printf "sizeA: 0x%x\n" $sizeA >&2
-		printf "offsetA: 0x%x\n" $offsetA >&2
-		printf "sizeB: 0x%x\n" $sizeB >&2
-		printf "offsetB: 0x%x\n" $offsetB >&2
-		echo ".bss and .brk are non-contiguous" >&2
-		exit 1
-	fi
-fi
-
-printf "%d\n" $run_size
-exit 0
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v3 07/19] x86, kaslr: Get correct max_addr for relocs pointer
  2016-03-04 16:24 [PATCH v3 00/19] x86, boot: kaslr cleanup and 64bit kaslr support Baoquan He
                   ` (5 preceding siblings ...)
  2016-03-04 16:25 ` [PATCH v3 06/19] x86, kaslr: Clean up useless code related to run_size Baoquan He
@ 2016-03-04 16:25 ` Baoquan He
  2016-03-07 23:16   ` Kees Cook
  2016-03-04 16:25 ` [PATCH v3 08/19] x86, kaslr: Consolidate mem_avoid array filling Baoquan He
                   ` (12 subsequent siblings)
  19 siblings, 1 reply; 51+ messages in thread
From: Baoquan He @ 2016-03-04 16:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: yinghai, keescook, hpa, vgoyal, mingo, bp, luto, lasse.collin,
	akpm, dyoung

From: Yinghai Lu <yinghai@kernel.org>

There is boundary checking for pointer in kaslr relocation handling.

Current code is using output_len, and that is VO (vmlinux after objcopy)
file size plus vmlinux.relocs file size.

That is not right, as we should use loaded address for running.

At that time parse_elf already move the sections according to ELF headers.

The valid range should be VO [_text, __bss_start) loaded physical addresses.

In the patch, add export for __bss_start to voffset.h and use it to get
max_addr.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
v2->v3:
    Tune the patch log.

 arch/x86/boot/compressed/Makefile | 2 +-
 arch/x86/boot/compressed/misc.c   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index fef80fa..2e7c0ce 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -41,7 +41,7 @@ LDFLAGS_vmlinux := -T
 hostprogs-y	:= mkpiggy
 HOST_EXTRACFLAGS += -I$(srctree)/tools/include
 
-sed-voffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(_text\|_end\)$$/\#define VO_\2 _AC(0x\1,UL)/p'
+sed-voffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(_text\|__bss_start\|_end\)$$/\#define VO_\2 _AC(0x\1,UL)/p'
 
 quiet_cmd_voffset = VOFFSET $@
       cmd_voffset = $(NM) $< | sed -n $(sed-voffset) > $@
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 069120e..dd7ed8a 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -259,7 +259,7 @@ static void handle_relocations(void *output, unsigned long output_len)
 	int *reloc;
 	unsigned long delta, map, ptr;
 	unsigned long min_addr = (unsigned long)output;
-	unsigned long max_addr = min_addr + output_len;
+	unsigned long max_addr = min_addr + (VO___bss_start - VO__text);
 
 	/*
 	 * Calculate the delta between where vmlinux was linked to load
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v3 08/19] x86, kaslr: Consolidate mem_avoid array filling
  2016-03-04 16:24 [PATCH v3 00/19] x86, boot: kaslr cleanup and 64bit kaslr support Baoquan He
                   ` (6 preceding siblings ...)
  2016-03-04 16:25 ` [PATCH v3 07/19] x86, kaslr: Get correct max_addr for relocs pointer Baoquan He
@ 2016-03-04 16:25 ` Baoquan He
  2016-03-07 23:28   ` Kees Cook
  2016-03-04 16:25 ` [PATCH v3 09/19] x86, boot: Split kernel_ident_mapping_init to another file Baoquan He
                   ` (11 subsequent siblings)
  19 siblings, 1 reply; 51+ messages in thread
From: Baoquan He @ 2016-03-04 16:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: yinghai, keescook, hpa, vgoyal, mingo, bp, luto, lasse.collin,
	akpm, dyoung

From: Yinghai Lu <yinghai@kernel.org>

We are going to support kaslr with 64bit above 4G, and new random output
could be anywhere. Array mem_avoid is used for kaslr to search new output
address. Current code only track range that is after output+output_size.
So we need to track all ranges instead of just after output+output_size.

Current code has first entry which is extra bytes before input+input_size,
and it is according to output_size. Other entries are for initrd, cmdline,
and heap/stack for ZO running.

At first, let's check the first entry that should be in the mem_avoid array.
Now ZO sit end of the buffer always, we can find out where is text and
data/bss etc of ZO.

Since init_size >= run_size, and input+input_len >= output+output_len,
here make several assumptions for better presentation by graph:
 - init_size > run_size
 - input+input_len > output+output_len
 - run_size > output_len

0   output                       input             input+input_len          output+init_size
|     |                            |                       |                       |
|-----|-------------------|--------|------------------|----|------------|----------|
                          |                           |                 |
             output+init_size-ZO_INIT_SIZE    output+output_len    output+run_size

[output, output+init_size) is the for decompressing buffer for compressed
kernel.

[output, output+run_size) is for VO run size.
[output, output+output_len) is (VO (vmlinux after objcopy) plus relocs)

[output+init_size-ZO_INIT_SIZE, output+init_size) is copied ZO.
[input, input+input_len) is copied compressed (VO (vmlinux after objcopy)
plus relocs), not the ZO.

[input+input_len, output+init_size) is [_text, _end) for ZO. that could be
first range in mem_avoid. Now the new first entry already includes heap and
stack for ZO running. So no need to put them separately into mem_avoid array.

Also [input, input+input_size) need be put in mem_avoid array. It is adjacent
to new first entry, so merge them.

At last we need to put boot_params into the mem_avoid too. As with 64bit
bootloader could put it anywhere.

After these changes, we have all ranges which need be avoided in mem_avoid
array.

Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
v2->v3:
    Adjust the patch log.

 arch/x86/boot/compressed/aslr.c | 29 +++++++++++++----------------
 1 file changed, 13 insertions(+), 16 deletions(-)

diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index 622aa88..b93be03 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -109,7 +109,7 @@ struct mem_vector {
 	unsigned long size;
 };
 
-#define MEM_AVOID_MAX 5
+#define MEM_AVOID_MAX 4
 static struct mem_vector mem_avoid[MEM_AVOID_MAX];
 
 static bool mem_contains(struct mem_vector *region, struct mem_vector *item)
@@ -135,21 +135,22 @@ static bool mem_overlaps(struct mem_vector *one, struct mem_vector *two)
 }
 
 static void mem_avoid_init(unsigned long input, unsigned long input_size,
-			   unsigned long output, unsigned long output_size)
+			   unsigned long output)
 {
+	unsigned long init_size = real_mode->hdr.init_size;
 	u64 initrd_start, initrd_size;
 	u64 cmd_line, cmd_line_size;
-	unsigned long unsafe, unsafe_len;
 	char *ptr;
 
 	/*
 	 * Avoid the region that is unsafe to overlap during
-	 * decompression (see calculations at top of misc.c).
+	 * decompression.
+	 * As we already move ZO (arch/x86/boot/compressed/vmlinux)
+	 * to the end of buffer, [input+input_size, output+init_size)
+	 * has [_text, _end) for ZO.
 	 */
-	unsafe_len = (output_size >> 12) + 32768 + 18;
-	unsafe = (unsigned long)input + input_size - unsafe_len;
-	mem_avoid[0].start = unsafe;
-	mem_avoid[0].size = unsafe_len;
+	mem_avoid[0].start = input;
+	mem_avoid[0].size = (output + init_size) - input;
 
 	/* Avoid initrd. */
 	initrd_start  = (u64)real_mode->ext_ramdisk_image << 32;
@@ -169,13 +170,9 @@ static void mem_avoid_init(unsigned long input, unsigned long input_size,
 	mem_avoid[2].start = cmd_line;
 	mem_avoid[2].size = cmd_line_size;
 
-	/* Avoid heap memory. */
-	mem_avoid[3].start = (unsigned long)free_mem_ptr;
-	mem_avoid[3].size = BOOT_HEAP_SIZE;
-
-	/* Avoid stack memory. */
-	mem_avoid[4].start = (unsigned long)free_mem_end_ptr;
-	mem_avoid[4].size = BOOT_STACK_SIZE;
+	/* Avoid params */
+	mem_avoid[3].start = (unsigned long)real_mode;
+	mem_avoid[3].size = sizeof(*real_mode);
 }
 
 /* Does this memory vector overlap a known avoided area? */
@@ -319,7 +316,7 @@ unsigned char *choose_kernel_location(unsigned char *input,
 
 	/* Record the various known unsafe memory ranges. */
 	mem_avoid_init((unsigned long)input, input_size,
-		       (unsigned long)output, output_size);
+		       (unsigned long)output);
 
 	/* Walk e820 and find a random address. */
 	random = find_random_addr(choice, output_size);
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v3 09/19] x86, boot: Split kernel_ident_mapping_init to another file
  2016-03-04 16:24 [PATCH v3 00/19] x86, boot: kaslr cleanup and 64bit kaslr support Baoquan He
                   ` (7 preceding siblings ...)
  2016-03-04 16:25 ` [PATCH v3 08/19] x86, kaslr: Consolidate mem_avoid array filling Baoquan He
@ 2016-03-04 16:25 ` Baoquan He
  2016-03-04 16:25 ` [PATCH v3 10/19] x86, 64bit: Set ident_mapping for kaslr Baoquan He
                   ` (10 subsequent siblings)
  19 siblings, 0 replies; 51+ messages in thread
From: Baoquan He @ 2016-03-04 16:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: yinghai, keescook, hpa, vgoyal, mingo, bp, luto, lasse.collin,
	akpm, dyoung

From: Yinghai Lu <yinghai@kernel.org>

We need to include that in boot::decompress_kernel stage to set new
ident mapping.

Also add checking for __pa/__va macro definition, as we need to override them
in boot::decompress_kernel stage.

Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/include/asm/page.h |  5 +++
 arch/x86/mm/ident_map.c     | 74 +++++++++++++++++++++++++++++++++++++++++++++
 arch/x86/mm/init_64.c       | 74 +--------------------------------------------
 3 files changed, 80 insertions(+), 73 deletions(-)
 create mode 100644 arch/x86/mm/ident_map.c

diff --git a/arch/x86/include/asm/page.h b/arch/x86/include/asm/page.h
index 802dde3..cf8f619 100644
--- a/arch/x86/include/asm/page.h
+++ b/arch/x86/include/asm/page.h
@@ -37,7 +37,10 @@ static inline void copy_user_page(void *to, void *from, unsigned long vaddr,
 	alloc_page_vma(GFP_HIGHUSER | __GFP_ZERO | movableflags, vma, vaddr)
 #define __HAVE_ARCH_ALLOC_ZEROED_USER_HIGHPAGE
 
+#ifndef __pa
 #define __pa(x)		__phys_addr((unsigned long)(x))
+#endif
+
 #define __pa_nodebug(x)	__phys_addr_nodebug((unsigned long)(x))
 /* __pa_symbol should be used for C visible symbols.
    This seems to be the official gcc blessed way to do such arithmetic. */
@@ -51,7 +54,9 @@ static inline void copy_user_page(void *to, void *from, unsigned long vaddr,
 #define __pa_symbol(x) \
 	__phys_addr_symbol(__phys_reloc_hide((unsigned long)(x)))
 
+#ifndef __va
 #define __va(x)			((void *)((unsigned long)(x)+PAGE_OFFSET))
+#endif
 
 #define __boot_va(x)		__va(x)
 #define __boot_pa(x)		__pa(x)
diff --git a/arch/x86/mm/ident_map.c b/arch/x86/mm/ident_map.c
new file mode 100644
index 0000000..751ca92
--- /dev/null
+++ b/arch/x86/mm/ident_map.c
@@ -0,0 +1,74 @@
+
+static void ident_pmd_init(unsigned long pmd_flag, pmd_t *pmd_page,
+			   unsigned long addr, unsigned long end)
+{
+	addr &= PMD_MASK;
+	for (; addr < end; addr += PMD_SIZE) {
+		pmd_t *pmd = pmd_page + pmd_index(addr);
+
+		if (!pmd_present(*pmd))
+			set_pmd(pmd, __pmd(addr | pmd_flag));
+	}
+}
+static int ident_pud_init(struct x86_mapping_info *info, pud_t *pud_page,
+			  unsigned long addr, unsigned long end)
+{
+	unsigned long next;
+
+	for (; addr < end; addr = next) {
+		pud_t *pud = pud_page + pud_index(addr);
+		pmd_t *pmd;
+
+		next = (addr & PUD_MASK) + PUD_SIZE;
+		if (next > end)
+			next = end;
+
+		if (pud_present(*pud)) {
+			pmd = pmd_offset(pud, 0);
+			ident_pmd_init(info->pmd_flag, pmd, addr, next);
+			continue;
+		}
+		pmd = (pmd_t *)info->alloc_pgt_page(info->context);
+		if (!pmd)
+			return -ENOMEM;
+		ident_pmd_init(info->pmd_flag, pmd, addr, next);
+		set_pud(pud, __pud(__pa(pmd) | _KERNPG_TABLE));
+	}
+
+	return 0;
+}
+
+int kernel_ident_mapping_init(struct x86_mapping_info *info, pgd_t *pgd_page,
+			      unsigned long addr, unsigned long end)
+{
+	unsigned long next;
+	int result;
+	int off = info->kernel_mapping ? pgd_index(__PAGE_OFFSET) : 0;
+
+	for (; addr < end; addr = next) {
+		pgd_t *pgd = pgd_page + pgd_index(addr) + off;
+		pud_t *pud;
+
+		next = (addr & PGDIR_MASK) + PGDIR_SIZE;
+		if (next > end)
+			next = end;
+
+		if (pgd_present(*pgd)) {
+			pud = pud_offset(pgd, 0);
+			result = ident_pud_init(info, pud, addr, next);
+			if (result)
+				return result;
+			continue;
+		}
+
+		pud = (pud_t *)info->alloc_pgt_page(info->context);
+		if (!pud)
+			return -ENOMEM;
+		result = ident_pud_init(info, pud, addr, next);
+		if (result)
+			return result;
+		set_pgd(pgd, __pgd(__pa(pud) | _KERNPG_TABLE));
+	}
+
+	return 0;
+}
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 5488d21..b4fc56e 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -57,79 +57,7 @@
 
 #include "mm_internal.h"
 
-static void ident_pmd_init(unsigned long pmd_flag, pmd_t *pmd_page,
-			   unsigned long addr, unsigned long end)
-{
-	addr &= PMD_MASK;
-	for (; addr < end; addr += PMD_SIZE) {
-		pmd_t *pmd = pmd_page + pmd_index(addr);
-
-		if (!pmd_present(*pmd))
-			set_pmd(pmd, __pmd(addr | pmd_flag));
-	}
-}
-static int ident_pud_init(struct x86_mapping_info *info, pud_t *pud_page,
-			  unsigned long addr, unsigned long end)
-{
-	unsigned long next;
-
-	for (; addr < end; addr = next) {
-		pud_t *pud = pud_page + pud_index(addr);
-		pmd_t *pmd;
-
-		next = (addr & PUD_MASK) + PUD_SIZE;
-		if (next > end)
-			next = end;
-
-		if (pud_present(*pud)) {
-			pmd = pmd_offset(pud, 0);
-			ident_pmd_init(info->pmd_flag, pmd, addr, next);
-			continue;
-		}
-		pmd = (pmd_t *)info->alloc_pgt_page(info->context);
-		if (!pmd)
-			return -ENOMEM;
-		ident_pmd_init(info->pmd_flag, pmd, addr, next);
-		set_pud(pud, __pud(__pa(pmd) | _KERNPG_TABLE));
-	}
-
-	return 0;
-}
-
-int kernel_ident_mapping_init(struct x86_mapping_info *info, pgd_t *pgd_page,
-			      unsigned long addr, unsigned long end)
-{
-	unsigned long next;
-	int result;
-	int off = info->kernel_mapping ? pgd_index(__PAGE_OFFSET) : 0;
-
-	for (; addr < end; addr = next) {
-		pgd_t *pgd = pgd_page + pgd_index(addr) + off;
-		pud_t *pud;
-
-		next = (addr & PGDIR_MASK) + PGDIR_SIZE;
-		if (next > end)
-			next = end;
-
-		if (pgd_present(*pgd)) {
-			pud = pud_offset(pgd, 0);
-			result = ident_pud_init(info, pud, addr, next);
-			if (result)
-				return result;
-			continue;
-		}
-
-		pud = (pud_t *)info->alloc_pgt_page(info->context);
-		if (!pud)
-			return -ENOMEM;
-		result = ident_pud_init(info, pud, addr, next);
-		if (result)
-			return result;
-		set_pgd(pgd, __pgd(__pa(pud) | _KERNPG_TABLE));
-	}
-
-	return 0;
-}
+#include "ident_map.c"
 
 /*
  * NOTE: pagetable_init alloc all the fixmap pagetables contiguous on the
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v3 10/19] x86, 64bit: Set ident_mapping for kaslr
  2016-03-04 16:24 [PATCH v3 00/19] x86, boot: kaslr cleanup and 64bit kaslr support Baoquan He
                   ` (8 preceding siblings ...)
  2016-03-04 16:25 ` [PATCH v3 09/19] x86, boot: Split kernel_ident_mapping_init to another file Baoquan He
@ 2016-03-04 16:25 ` Baoquan He
  2016-03-07 23:34   ` Kees Cook
  2016-03-04 16:25 ` [PATCH v3 11/19] x86, boot: Add checking for memcpy Baoquan He
                   ` (9 subsequent siblings)
  19 siblings, 1 reply; 51+ messages in thread
From: Baoquan He @ 2016-03-04 16:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: yinghai, keescook, hpa, vgoyal, mingo, bp, luto, lasse.collin,
	akpm, dyoung, Jiri Kosina, Borislav Petkov, Matt Fleming

From: Yinghai Lu <yinghai@kernel.org>

Current aslr only support random in small range, from 16M to 1G.  And
new range still use old mapping. Also it does not support new range
above 4G.

We need to have ident mapping for the new range before we can do
decompress to the new output, and later run them.

In this patch, we add ident mapping for all needed range.

At first, to support aslr to put random VO above 4G, we must set ident
mapping for the new range when it come via startup_32 path.

Secondly, when boot from 64bit bootloader, bootloader set ident mapping,
and boot via ZO (arch/x86/boot/compressed/vmlinux) startup_64.
Those pages for pagetable need to be avoided when we select new random
VO (vmlinux) base. Otherwise decompressor would overwrite them during
decompressing.
First way would be: walk through pagetable and find out every page is used
by pagetable for every mem_aovid checking but we will need extra code, and
may need to increase mem_avoid array size to hold them.
Other way would be: We can create new ident mapping instead, and pages for
pagetable will come from _pagetable section of ZO, and they are in
mem_avoid array already. In this way, we can reuse the code for ident
mapping.

The _pgtable will be shared by 32bit and 64bit path to reduce init_size,
as now ZO _rodata to _end will contribute init_size.

We need to increase pgt buffer size.
When boot via startup_64, as we need to cover old VO, params, cmdline
and new VO, in extreme case we could have them all cross 512G boundary,
will need (2+2)*4 pages with 2M mapping. And need 2 for first 2M for vga
ram. Plus one for level4. Total will be 19 pages.
When boot via startup_32, aslr would move new VO above 4G, we need set
extra ident mapping for new VO, pgt buffer come from _pgtable offset 6
pages. Should only need (2+2) pages at most when it cross 512G boundary.
So 19 pages could make both paths happy.

Cc: Kees Cook <keescook@chromium.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Borislav Petkov <bp@suse.de>
Cc: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/boot/compressed/Makefile   |  3 ++
 arch/x86/boot/compressed/aslr.c     | 14 ++++++
 arch/x86/boot/compressed/head_64.S  |  4 +-
 arch/x86/boot/compressed/misc.h     | 11 +++++
 arch/x86/boot/compressed/misc_pgt.c | 91 +++++++++++++++++++++++++++++++++++++
 arch/x86/include/asm/boot.h         | 19 ++++++++
 6 files changed, 140 insertions(+), 2 deletions(-)
 create mode 100644 arch/x86/boot/compressed/misc_pgt.c

diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index 2e7c0ce..229604d 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -59,6 +59,9 @@ vmlinux-objs-y := $(obj)/vmlinux.lds $(obj)/head_$(BITS).o $(obj)/misc.o \
 
 vmlinux-objs-$(CONFIG_EARLY_PRINTK) += $(obj)/early_serial_console.o
 vmlinux-objs-$(CONFIG_RANDOMIZE_BASE) += $(obj)/aslr.o
+ifdef CONFIG_X86_64
+	vmlinux-objs-$(CONFIG_RANDOMIZE_BASE) += $(obj)/misc_pgt.o
+endif
 
 $(obj)/eboot.o: KBUILD_CFLAGS += -fshort-wchar -mno-red-zone
 
diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index b93be03..35f2eac 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -151,6 +151,7 @@ static void mem_avoid_init(unsigned long input, unsigned long input_size,
 	 */
 	mem_avoid[0].start = input;
 	mem_avoid[0].size = (output + init_size) - input;
+	fill_pagetable(input, (output + init_size) - input);
 
 	/* Avoid initrd. */
 	initrd_start  = (u64)real_mode->ext_ramdisk_image << 32;
@@ -159,6 +160,7 @@ static void mem_avoid_init(unsigned long input, unsigned long input_size,
 	initrd_size |= real_mode->hdr.ramdisk_size;
 	mem_avoid[1].start = initrd_start;
 	mem_avoid[1].size = initrd_size;
+	/* don't need to set mapping for initrd */
 
 	/* Avoid kernel command line. */
 	cmd_line  = (u64)real_mode->ext_cmd_line_ptr << 32;
@@ -169,10 +171,19 @@ static void mem_avoid_init(unsigned long input, unsigned long input_size,
 		;
 	mem_avoid[2].start = cmd_line;
 	mem_avoid[2].size = cmd_line_size;
+	fill_pagetable(cmd_line, cmd_line_size);
 
 	/* Avoid params */
 	mem_avoid[3].start = (unsigned long)real_mode;
 	mem_avoid[3].size = sizeof(*real_mode);
+	fill_pagetable((unsigned long)real_mode, sizeof(*real_mode));
+
+	/* don't need to set mapping for setup_data */
+
+#ifdef CONFIG_X86_VERBOSE_BOOTUP
+	/* for video ram */
+	fill_pagetable(0, PMD_SIZE);
+#endif
 }
 
 /* Does this memory vector overlap a known avoided area? */
@@ -330,6 +341,9 @@ unsigned char *choose_kernel_location(unsigned char *input,
 		goto out;
 
 	choice = random;
+
+	fill_pagetable(choice, output_size);
+	switch_pagetable();
 out:
 	return (unsigned char *)choice;
 }
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 3691451..075bb15 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -126,7 +126,7 @@ ENTRY(startup_32)
 	/* Initialize Page tables to 0 */
 	leal	pgtable(%ebx), %edi
 	xorl	%eax, %eax
-	movl	$((4096*6)/4), %ecx
+	movl	$(BOOT_INIT_PGT_SIZE/4), %ecx
 	rep	stosl
 
 	/* Build Level 4 */
@@ -478,4 +478,4 @@ boot_stack_end:
 	.section ".pgtable","a",@nobits
 	.balign 4096
 pgtable:
-	.fill 6*4096, 1, 0
+	.fill BOOT_PGT_SIZE, 1, 0
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index dcf01c2..11736a6 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -84,6 +84,17 @@ unsigned char *choose_kernel_location(unsigned char *input,
 }
 #endif
 
+#ifdef CONFIG_X86_64
+void fill_pagetable(unsigned long start, unsigned long size);
+void switch_pagetable(void);
+extern unsigned char _pgtable[];
+#else
+static inline void fill_pagetable(unsigned long start, unsigned long size)
+{ }
+static inline void switch_pagetable(void)
+{ }
+#endif
+
 #ifdef CONFIG_EARLY_PRINTK
 /* early_serial_console.c */
 extern int early_serial_base;
diff --git a/arch/x86/boot/compressed/misc_pgt.c b/arch/x86/boot/compressed/misc_pgt.c
new file mode 100644
index 0000000..954811e
--- /dev/null
+++ b/arch/x86/boot/compressed/misc_pgt.c
@@ -0,0 +1,91 @@
+#define __pa(x)  ((unsigned long)(x))
+#define __va(x)  ((void *)((unsigned long)(x)))
+
+#include "misc.h"
+
+#include <asm/init.h>
+#include <asm/pgtable.h>
+
+#include "../../mm/ident_map.c"
+#include "../string.h"
+
+struct alloc_pgt_data {
+	unsigned char *pgt_buf;
+	unsigned long pgt_buf_size;
+	unsigned long pgt_buf_offset;
+};
+
+static void *alloc_pgt_page(void *context)
+{
+	struct alloc_pgt_data *d = (struct alloc_pgt_data *)context;
+	unsigned char *p = (unsigned char *)d->pgt_buf;
+
+	if (d->pgt_buf_offset >= d->pgt_buf_size) {
+		debug_putstr("out of pgt_buf in misc.c\n");
+		return NULL;
+	}
+
+	p += d->pgt_buf_offset;
+	d->pgt_buf_offset += PAGE_SIZE;
+
+	return p;
+}
+
+/*
+ * Use a normal definition of memset() from string.c. There are already
+ * included header files which expect a definition of memset() and by
+ * the time we define memset macro, it is too late.
+ */
+#undef memset
+
+unsigned long __force_order;
+static struct alloc_pgt_data pgt_data;
+static struct x86_mapping_info mapping_info;
+static pgd_t *level4p;
+
+void fill_pagetable(unsigned long start, unsigned long size)
+{
+	unsigned long end = start + size;
+
+	if (!level4p) {
+		pgt_data.pgt_buf_offset = 0;
+		mapping_info.alloc_pgt_page = alloc_pgt_page;
+		mapping_info.context = &pgt_data;
+		mapping_info.pmd_flag = __PAGE_KERNEL_LARGE_EXEC;
+
+		/*
+		 * come from startup_32 ?
+		 * then cr3 is _pgtable, we can reuse it.
+		 */
+		level4p = (pgd_t *)read_cr3();
+		if ((unsigned long)level4p == (unsigned long)_pgtable) {
+			pgt_data.pgt_buf = (unsigned char *)_pgtable +
+						 BOOT_INIT_PGT_SIZE;
+			pgt_data.pgt_buf_size = BOOT_PGT_SIZE -
+						 BOOT_INIT_PGT_SIZE;
+			memset((unsigned char *)pgt_data.pgt_buf, 0,
+				pgt_data.pgt_buf_size);
+			debug_putstr("boot via startup_32\n");
+		} else {
+			pgt_data.pgt_buf = (unsigned char *)_pgtable;
+			pgt_data.pgt_buf_size = BOOT_PGT_SIZE;
+			memset((unsigned char *)pgt_data.pgt_buf, 0,
+				pgt_data.pgt_buf_size);
+			debug_putstr("boot via startup_64\n");
+			level4p = (pgd_t *)alloc_pgt_page(&pgt_data);
+		}
+	}
+
+	/* align boundary to 2M */
+	start = round_down(start, PMD_SIZE);
+	end = round_up(end, PMD_SIZE);
+	if (start >= end)
+		return;
+
+	kernel_ident_mapping_init(&mapping_info, level4p, start, end);
+}
+
+void switch_pagetable(void)
+{
+	write_cr3((unsigned long)level4p);
+}
diff --git a/arch/x86/include/asm/boot.h b/arch/x86/include/asm/boot.h
index 6b8d6e8..52a9cbc 100644
--- a/arch/x86/include/asm/boot.h
+++ b/arch/x86/include/asm/boot.h
@@ -32,7 +32,26 @@
 #endif /* !CONFIG_KERNEL_BZIP2 */
 
 #ifdef CONFIG_X86_64
+
 #define BOOT_STACK_SIZE	0x4000
+
+#define BOOT_INIT_PGT_SIZE (6*4096)
+#ifdef CONFIG_RANDOMIZE_BASE
+/*
+ * 1 page for level4, 2 pages for first 2M.
+ * (2+2)*4 pages for kernel, param, cmd_line, random kernel
+ * if all cross 512G boundary.
+ * So total will be 19 pages.
+ */
+#ifdef CONFIG_X86_VERBOSE_BOOTUP
+#define BOOT_PGT_SIZE (19*4096)
+#else
+#define BOOT_PGT_SIZE (17*4096)
+#endif
+#else
+#define BOOT_PGT_SIZE BOOT_INIT_PGT_SIZE
+#endif
+
 #else
 #define BOOT_STACK_SIZE	0x1000
 #endif
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v3 11/19] x86, boot: Add checking for memcpy
  2016-03-04 16:24 [PATCH v3 00/19] x86, boot: kaslr cleanup and 64bit kaslr support Baoquan He
                   ` (9 preceding siblings ...)
  2016-03-04 16:25 ` [PATCH v3 10/19] x86, 64bit: Set ident_mapping for kaslr Baoquan He
@ 2016-03-04 16:25 ` Baoquan He
  2016-03-07 23:36   ` Kees Cook
  2016-03-04 16:25 ` [PATCH v3 12/19] x86, kaslr: Fix a bug that relocation can not be handled when kernel is loaded above 2G Baoquan He
                   ` (8 subsequent siblings)
  19 siblings, 1 reply; 51+ messages in thread
From: Baoquan He @ 2016-03-04 16:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: yinghai, keescook, hpa, vgoyal, mingo, bp, luto, lasse.collin,
	akpm, dyoung

From: Yinghai Lu <yinghai@kernel.org>

parse_elf is using local memcpy to move section to running position.
That memcpy actually only support no overlapping case or when dest < src.

Add checking in memcpy to find out the wrong case for future use, at
that time we will need to have backward memcpy for it.

Also add comments in parse_elf about the fact.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
v2->v3:
    Add a declaration for error() since its declaration is in misc.h.
    But it's not included in compressed/string.c.

 arch/x86/boot/compressed/misc.c   | 14 +++++++-------
 arch/x86/boot/compressed/misc.h   |  2 ++
 arch/x86/boot/compressed/string.c | 29 +++++++++++++++++++++++++++--
 3 files changed, 36 insertions(+), 9 deletions(-)

diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index dd7ed8a..4b2cd0c 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -114,9 +114,6 @@
 #undef memset
 #define memzero(s, n)	memset((s), 0, (n))
 
-
-static void error(char *m);
-
 /*
  * This is set up by the setup-routine at boot-time
  */
@@ -243,7 +240,7 @@ void __puthex(unsigned long value)
 	}
 }
 
-static void error(char *x)
+void error(char *x)
 {
 	error_putstr("\n\n");
 	error_putstr(x);
@@ -378,9 +375,12 @@ static void parse_elf(void *output)
 #else
 			dest = (void *)(phdr->p_paddr);
 #endif
-			memcpy(dest,
-			       output + phdr->p_offset,
-			       phdr->p_filesz);
+			/*
+			 * simple version memcpy only can work when dest is
+			 *   smaller than src or no overlapping.
+			 * Here dest is smaller than src always.
+			 */
+			memcpy(dest, output + phdr->p_offset, phdr->p_filesz);
 			break;
 		default: /* Ignore other PT_* */ break;
 		}
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index 11736a6..39d0e9a 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -38,6 +38,8 @@ void __puthex(unsigned long value);
 #define error_putstr(__x)  __putstr(__x)
 #define error_puthex(__x)  __puthex(__x)
 
+void error(char *x);
+
 #ifdef CONFIG_X86_VERBOSE_BOOTUP
 
 #define debug_putstr(__x)  __putstr(__x)
diff --git a/arch/x86/boot/compressed/string.c b/arch/x86/boot/compressed/string.c
index 00e788b..3a935d0 100644
--- a/arch/x86/boot/compressed/string.c
+++ b/arch/x86/boot/compressed/string.c
@@ -1,7 +1,7 @@
 #include "../string.c"
 
 #ifdef CONFIG_X86_32
-void *memcpy(void *dest, const void *src, size_t n)
+void *__memcpy(void *dest, const void *src, size_t n)
 {
 	int d0, d1, d2;
 	asm volatile(
@@ -15,7 +15,7 @@ void *memcpy(void *dest, const void *src, size_t n)
 	return dest;
 }
 #else
-void *memcpy(void *dest, const void *src, size_t n)
+void *__memcpy(void *dest, const void *src, size_t n)
 {
 	long d0, d1, d2;
 	asm volatile(
@@ -30,6 +30,31 @@ void *memcpy(void *dest, const void *src, size_t n)
 }
 #endif
 
+extern void error(char *x);
+void *memcpy(void *dest, const void *src, size_t n)
+{
+	unsigned long start_dest, end_dest;
+	unsigned long start_src, end_src;
+	unsigned long max_start, min_end;
+
+	if (dest < src)
+		return __memcpy(dest, src, n);
+
+	start_dest = (unsigned long)dest;
+	end_dest = (unsigned long)dest + n;
+	start_src = (unsigned long)src;
+	end_src = (unsigned long)src + n;
+	max_start = (start_dest > start_src) ?  start_dest : start_src;
+	min_end = (end_dest < end_src) ? end_dest : end_src;
+
+	if (max_start >= min_end)
+		return __memcpy(dest, src, n);
+
+	error("memcpy does not support overlapping with dest > src!\n");
+
+	return dest;
+}
+
 void *memset(void *s, int c, size_t n)
 {
 	int i;
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v3 12/19] x86, kaslr: Fix a bug that relocation can not be handled when kernel is loaded above 2G
  2016-03-04 16:24 [PATCH v3 00/19] x86, boot: kaslr cleanup and 64bit kaslr support Baoquan He
                   ` (10 preceding siblings ...)
  2016-03-04 16:25 ` [PATCH v3 11/19] x86, boot: Add checking for memcpy Baoquan He
@ 2016-03-04 16:25 ` Baoquan He
  2016-03-07 23:30   ` Kees Cook
  2016-03-04 16:25 ` [PATCH v3 13/19] x86, kaslr: Introduce struct slot_area to manage randomization slot info Baoquan He
                   ` (7 subsequent siblings)
  19 siblings, 1 reply; 51+ messages in thread
From: Baoquan He @ 2016-03-04 16:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: yinghai, keescook, hpa, vgoyal, mingo, bp, luto, lasse.collin,
	akpm, dyoung, Baoquan He

When process 32 bit relocation tables a local variable 'extended'
is defined to calculate the physical address of relocs entry.
However its type is 'int' which is enough for i386, but not enough
for x86_64. That's why relocation can only be handled under 2G.
Otherwise a overflow will happen and cause system hang.

Here change it to 'long' as 32 bit inverse relocation processing
does, and this change is safe for i386 relocation handling.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 arch/x86/boot/compressed/misc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 4b2cd0c..70445c3 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -298,7 +298,7 @@ static void handle_relocations(void *output, unsigned long output_len)
 	 * So we work backwards from the end of the decompressed image.
 	 */
 	for (reloc = output + output_len - sizeof(*reloc); *reloc; reloc--) {
-		int extended = *reloc;
+		long extended = *reloc;
 		extended += map;
 
 		ptr = (unsigned long)extended;
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v3 13/19] x86, kaslr: Introduce struct slot_area to manage randomization slot info
  2016-03-04 16:24 [PATCH v3 00/19] x86, boot: kaslr cleanup and 64bit kaslr support Baoquan He
                   ` (11 preceding siblings ...)
  2016-03-04 16:25 ` [PATCH v3 12/19] x86, kaslr: Fix a bug that relocation can not be handled when kernel is loaded above 2G Baoquan He
@ 2016-03-04 16:25 ` Baoquan He
  2016-03-04 16:25 ` [PATCH v3 14/19] x86, kaslr: Add two functions which will be used later Baoquan He
                   ` (6 subsequent siblings)
  19 siblings, 0 replies; 51+ messages in thread
From: Baoquan He @ 2016-03-04 16:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: yinghai, keescook, hpa, vgoyal, mingo, bp, luto, lasse.collin,
	akpm, dyoung, Baoquan He

Kernel is expected to be randomly reloaded anywhere in the whole
physical memory area, it could be near 64T at most. In this case
there could be about 4*1024*1024 randomization slots if
CONFIG_PHYSICAL_ALIGN is 0x1000000. Currently the starting address
of candidate position is stored into array slots[], and it finds
each available slot one time and save it into slots[]. With this
old way the slot arrry will cost too much memory and it's also
very unefficient to get and save the slot information into slot
array one by one.

Here introduce struct slot_area to manage randomization slot info
in one contiguous memory area excluding the avoid area. Each
slot_area will contain the starting address and how many available
slots in this area. Array slot_areas is used to store all slot area
info.

Since setup_data is a linked list, could contain many datas by
pointer to point one by one, excluding them will split RAM memory
into many smaller areas, here only take the first 100 slot areas if
too many of them.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 arch/x86/boot/compressed/aslr.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index 35f2eac..7fc22a6 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -216,8 +216,20 @@ static bool mem_avoid_overlap(struct mem_vector *img)
 
 static unsigned long slots[CONFIG_RANDOMIZE_BASE_MAX_OFFSET /
 			   CONFIG_PHYSICAL_ALIGN];
+
+struct slot_area {
+	unsigned long addr;
+	int num;
+};
+
+#define MAX_SLOT_AREA 100
+
+static struct slot_area slot_areas[MAX_SLOT_AREA];
+
 static unsigned long slot_max;
 
+static unsigned long slot_area_index;
+
 static void slots_append(unsigned long addr)
 {
 	/* Overflowing the slots list should be impossible. */
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v3 14/19] x86, kaslr: Add two functions which will be used later
  2016-03-04 16:24 [PATCH v3 00/19] x86, boot: kaslr cleanup and 64bit kaslr support Baoquan He
                   ` (12 preceding siblings ...)
  2016-03-04 16:25 ` [PATCH v3 13/19] x86, kaslr: Introduce struct slot_area to manage randomization slot info Baoquan He
@ 2016-03-04 16:25 ` Baoquan He
  2016-03-04 16:25 ` [PATCH v3 15/19] x86, kaslr: Introduce fetch_random_virt_offset to randomize the kernel text mapping address Baoquan He
                   ` (5 subsequent siblings)
  19 siblings, 0 replies; 51+ messages in thread
From: Baoquan He @ 2016-03-04 16:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: yinghai, keescook, hpa, vgoyal, mingo, bp, luto, lasse.collin,
	akpm, dyoung, Baoquan He

Function store_slot_info() is used to calculate the slot info of passed
in memory region and store it into slot_areas[].

Function mem_min_overlap is used to iterate all avoid regions to find the
one which overlap with it in the lowest address. E.g there's a memory
region[1024M, 2048M), after iterating all avoid regions we found the first
avoid region by starting address from low to high is [1536M, 1664M). With
this information we can split memory region [1024M, 1536M) out as the
avaliable slot area and save it into array slot_areas[].

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 arch/x86/boot/compressed/aslr.c | 51 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 51 insertions(+)

diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index 7fc22a6..fa6192b 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -214,6 +214,40 @@ static bool mem_avoid_overlap(struct mem_vector *img)
 	return false;
 }
 
+static unsigned long
+mem_min_overlap(struct mem_vector *img, struct mem_vector *out)
+{
+	int i;
+	struct setup_data *ptr;
+	unsigned long min = img->start + img->size;
+
+	for (i = 0; i < MEM_AVOID_MAX; i++) {
+		if (mem_overlaps(img, &mem_avoid[i]) &&
+			(mem_avoid[i].start < min)) {
+			*out = mem_avoid[i];
+			min = mem_avoid[i].start;
+		}
+	}
+
+	/* Check all entries in the setup_data linked list. */
+	ptr = (struct setup_data *)(unsigned long)real_mode->hdr.setup_data;
+	while (ptr) {
+		struct mem_vector avoid;
+
+		avoid.start = (unsigned long)ptr;
+		avoid.size = sizeof(*ptr) + ptr->len;
+
+		if (mem_overlaps(img, &avoid) && (avoid.start < min)) {
+			*out = avoid;
+			min = avoid.start;
+		}
+
+		ptr = (struct setup_data *)(unsigned long)ptr->next;
+	}
+
+	return min;
+}
+
 static unsigned long slots[CONFIG_RANDOMIZE_BASE_MAX_OFFSET /
 			   CONFIG_PHYSICAL_ALIGN];
 
@@ -230,6 +264,23 @@ static unsigned long slot_max;
 
 static unsigned long slot_area_index;
 
+static void store_slot_info(struct mem_vector *region, unsigned long image_size)
+{
+	struct slot_area slot_area;
+
+	slot_area.addr = region->start;
+	if (image_size <= CONFIG_PHYSICAL_ALIGN)
+		slot_area.num = region->size / CONFIG_PHYSICAL_ALIGN;
+	else
+		slot_area.num = (region->size - image_size) /
+				CONFIG_PHYSICAL_ALIGN + 1;
+
+	if (slot_area.num > 0) {
+		slot_areas[slot_area_index++] = slot_area;
+		slot_max += slot_area.num;
+	}
+}
+
 static void slots_append(unsigned long addr)
 {
 	/* Overflowing the slots list should be impossible. */
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v3 15/19] x86, kaslr: Introduce fetch_random_virt_offset to randomize the kernel text mapping address
  2016-03-04 16:24 [PATCH v3 00/19] x86, boot: kaslr cleanup and 64bit kaslr support Baoquan He
                   ` (13 preceding siblings ...)
  2016-03-04 16:25 ` [PATCH v3 14/19] x86, kaslr: Add two functions which will be used later Baoquan He
@ 2016-03-04 16:25 ` Baoquan He
  2016-03-04 16:25 ` [PATCH v3 16/19] x86, kaslr: Randomize physical and virtual address of kernel separately Baoquan He
                   ` (4 subsequent siblings)
  19 siblings, 0 replies; 51+ messages in thread
From: Baoquan He @ 2016-03-04 16:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: yinghai, keescook, hpa, vgoyal, mingo, bp, luto, lasse.collin,
	akpm, dyoung, Baoquan He

Kaslr extended kernel text mapping region size from 512M to 1G,
namely CONFIG_RANDOMIZE_BASE_MAX_OFFSET. This means kernel text
can be mapped to below region:

[__START_KERNEL_map + LOAD_PHYSICAL_ADDR, __START_KERNEL_map + 1G]

Introduce a function find_random_virt_offset() to get random value
between LOAD_PHYSICAL_ADDR and CONFIG_RANDOMIZE_BASE_MAX_OFFSET.
This random value will be added to __START_KERNEL_map to get the
starting address which kernel text is mapped from. Since slot can
be anywhere of this region, means it is an independent slot_area,
it is very simple to get a slot according to random value.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 arch/x86/boot/compressed/aslr.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index fa6192b..3730a33 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -366,6 +366,27 @@ static unsigned long find_random_addr(unsigned long minimum,
 	return slots_fetch_random();
 }
 
+static unsigned long find_random_virt_offset(unsigned long minimum,
+				  unsigned long image_size)
+{
+	unsigned long slot_num, random;
+
+	/* Make sure minimum is aligned. */
+	minimum = ALIGN(minimum, CONFIG_PHYSICAL_ALIGN);
+
+	if (image_size <= CONFIG_PHYSICAL_ALIGN)
+		slot_num = (CONFIG_RANDOMIZE_BASE_MAX_OFFSET - minimum) /
+				CONFIG_PHYSICAL_ALIGN;
+	else
+		slot_num = (CONFIG_RANDOMIZE_BASE_MAX_OFFSET -
+				minimum - image_size) /
+				CONFIG_PHYSICAL_ALIGN + 1;
+
+	random = get_random_long() % slot_num;
+
+	return random * CONFIG_PHYSICAL_ALIGN + minimum;
+}
+
 unsigned char *choose_kernel_location(unsigned char *input,
 				      unsigned long input_size,
 				      unsigned char *output,
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v3 16/19] x86, kaslr: Randomize physical and virtual address of kernel separately
  2016-03-04 16:24 [PATCH v3 00/19] x86, boot: kaslr cleanup and 64bit kaslr support Baoquan He
                   ` (14 preceding siblings ...)
  2016-03-04 16:25 ` [PATCH v3 15/19] x86, kaslr: Introduce fetch_random_virt_offset to randomize the kernel text mapping address Baoquan He
@ 2016-03-04 16:25 ` Baoquan He
  2016-03-07 23:51   ` Kees Cook
  2016-03-04 16:25 ` [PATCH v3 17/19] x86, kaslr: Add support of kernel physical address randomization above 4G Baoquan He
                   ` (3 subsequent siblings)
  19 siblings, 1 reply; 51+ messages in thread
From: Baoquan He @ 2016-03-04 16:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: yinghai, keescook, hpa, vgoyal, mingo, bp, luto, lasse.collin,
	akpm, dyoung, Baoquan He

On x86_64, in old kaslr implementaion only physical address of kernel
loading is randomized. Then calculate the delta of physical address
where vmlinux was linked to load and where it is finally loaded. If
delta is not equal to 0, namely there's a new physical address where
kernel is actually decompressed, relocation handling need be done. Then
delta is added to offset of kernel symbol relocation, this makes the
address of kernel text mapping move delta long.

Here the behavior is changed. Randomize both the physical address
where kernel is decompressed and the virtual address where kernel text
is mapped. And relocation handling only depends on virtual address
randomization. Means if and only if virtual address is randomized to
a different value, we add the delta to the offset of kernel relocs.

Note that up to now both virtual offset and physical addr randomization
cann't exceed CONFIG_RANDOMIZE_BASE_MAX_OFFSET, namely 1G.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 arch/x86/boot/compressed/aslr.c | 46 +++++++++++++++++++++--------------------
 arch/x86/boot/compressed/misc.c | 40 +++++++++++++++++++++--------------
 arch/x86/boot/compressed/misc.h | 19 +++++++++--------
 3 files changed, 59 insertions(+), 46 deletions(-)

diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index 3730a33..75ea931 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -349,7 +349,7 @@ static void process_e820_entry(struct e820entry *entry,
 	}
 }
 
-static unsigned long find_random_addr(unsigned long minimum,
+static unsigned long find_random_phy_addr(unsigned long minimum,
 				      unsigned long size)
 {
 	int i;
@@ -387,23 +387,24 @@ static unsigned long find_random_virt_offset(unsigned long minimum,
 	return random * CONFIG_PHYSICAL_ALIGN + minimum;
 }
 
-unsigned char *choose_kernel_location(unsigned char *input,
-				      unsigned long input_size,
-				      unsigned char *output,
-				      unsigned long output_size)
+void choose_kernel_location(unsigned char *input,
+				unsigned long input_size,
+				unsigned char **output,
+				unsigned long output_size,
+				unsigned char **virt_offset)
 {
-	unsigned long choice = (unsigned long)output;
 	unsigned long random;
+	*virt_offset = (unsigned char *)LOAD_PHYSICAL_ADDR;
 
 #ifdef CONFIG_HIBERNATION
 	if (!cmdline_find_option_bool("kaslr")) {
 		debug_putstr("KASLR disabled by default...\n");
-		goto out;
+		return;
 	}
 #else
 	if (cmdline_find_option_bool("nokaslr")) {
 		debug_putstr("KASLR disabled by cmdline...\n");
-		goto out;
+		return;
 	}
 #endif
 
@@ -411,23 +412,24 @@ unsigned char *choose_kernel_location(unsigned char *input,
 
 	/* Record the various known unsafe memory ranges. */
 	mem_avoid_init((unsigned long)input, input_size,
-		       (unsigned long)output);
+		       (unsigned long)*output);
 
 	/* Walk e820 and find a random address. */
-	random = find_random_addr(choice, output_size);
-	if (!random) {
+	random = find_random_phy_addr((unsigned long)*output, output_size);
+	if (!random)
 		debug_putstr("KASLR could not find suitable E820 region...\n");
-		goto out;
+	else {
+		if ((unsigned long)*output != random) {
+			fill_pagetable(random, output_size);
+			switch_pagetable();
+			*output = (unsigned char *)random;
+		}
 	}
 
-	/* Always enforce the minimum. */
-	if (random < choice)
-		goto out;
-
-	choice = random;
-
-	fill_pagetable(choice, output_size);
-	switch_pagetable();
-out:
-	return (unsigned char *)choice;
+	/*
+	 * Get a random address between LOAD_PHYSICAL_ADDR and
+	 * CONFIG_RANDOMIZE_BASE_MAX_OFFSET
+	 */
+	random = find_random_virt_offset(LOAD_PHYSICAL_ADDR, output_size);
+	*virt_offset = (unsigned char *)random;
 }
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 70445c3..362d508 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -251,7 +251,8 @@ void error(char *x)
 }
 
 #if CONFIG_X86_NEED_RELOCS
-static void handle_relocations(void *output, unsigned long output_len)
+static void handle_relocations(void *output, unsigned long output_len,
+			       void *virt_offset)
 {
 	int *reloc;
 	unsigned long delta, map, ptr;
@@ -263,11 +264,6 @@ static void handle_relocations(void *output, unsigned long output_len)
 	 * and where it was actually loaded.
 	 */
 	delta = min_addr - LOAD_PHYSICAL_ADDR;
-	if (!delta) {
-		debug_putstr("No relocation needed... ");
-		return;
-	}
-	debug_putstr("Performing relocations... ");
 
 	/*
 	 * The kernel contains a table of relocation addresses. Those
@@ -278,6 +274,22 @@ static void handle_relocations(void *output, unsigned long output_len)
 	 */
 	map = delta - __START_KERNEL_map;
 
+
+
+	/*
+	 * 32-bit always performs relocations. 64-bit relocations are only
+	 * needed if kASLR has chosen a different starting address offset
+	 * from __START_KERNEL_map.
+	 */
+	if (IS_ENABLED(CONFIG_X86_64))
+		delta = (unsigned long)virt_offset - LOAD_PHYSICAL_ADDR;
+
+	if (!delta) {
+		debug_putstr("No relocation needed... ");
+		return;
+	}
+	debug_putstr("Performing relocations... ");
+
 	/*
 	 * Process relocations: 32 bit relocations first then 64 bit after.
 	 * Three sets of binary relocations are added to the end of the kernel
@@ -331,7 +343,8 @@ static void handle_relocations(void *output, unsigned long output_len)
 #endif
 }
 #else
-static inline void handle_relocations(void *output, unsigned long output_len)
+static inline void handle_relocations(void *output, unsigned long output_len,
+				      void *virt_offset)
 { }
 #endif
 
@@ -397,6 +410,7 @@ asmlinkage __visible void *decompress_kernel(void *rmode, memptr heap,
 {
 	unsigned long run_size = VO__end - VO__text;
 	unsigned char *output_orig = output;
+	unsigned char *virt_offset;
 
 	real_mode = rmode;
 
@@ -434,9 +448,10 @@ asmlinkage __visible void *decompress_kernel(void *rmode, memptr heap,
 	 * the entire decompressed kernel plus relocation table, or the
 	 * entire decompressed kernel plus .bss and .brk sections.
 	 */
-	output = choose_kernel_location(input_data, input_len, output,
+	choose_kernel_location(input_data, input_len, &output,
 					output_len > run_size ? output_len
-							      : run_size);
+							      : run_size,
+					&virt_offset);
 
 	/* Validate memory location choices. */
 	if ((unsigned long)output & (MIN_KERNEL_ALIGN - 1))
@@ -457,12 +472,7 @@ asmlinkage __visible void *decompress_kernel(void *rmode, memptr heap,
 	__decompress(input_data, input_len, NULL, NULL, output, output_len,
 			NULL, error);
 	parse_elf(output);
-	/*
-	 * 32-bit always performs relocations. 64-bit relocations are only
-	 * needed if kASLR has chosen a different load address.
-	 */
-	if (!IS_ENABLED(CONFIG_X86_64) || output != output_orig)
-		handle_relocations(output, output_len);
+	handle_relocations(output, output_len, virt_offset);
 	debug_putstr("done.\nBooting the kernel.\n");
 	return output;
 }
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index 39d0e9a..cde6aec 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -69,20 +69,21 @@ int cmdline_find_option_bool(const char *option);
 
 #if CONFIG_RANDOMIZE_BASE
 /* aslr.c */
-unsigned char *choose_kernel_location(unsigned char *input,
-				      unsigned long input_size,
-				      unsigned char *output,
-				      unsigned long output_size);
+void choose_kernel_location(unsigned char *input,
+				unsigned long input_size,
+				unsigned char **output,
+				unsigned long output_size,
+				unsigned char **virt_offset);
 /* cpuflags.c */
 bool has_cpuflag(int flag);
 #else
 static inline
-unsigned char *choose_kernel_location(unsigned char *input,
-				      unsigned long input_size,
-				      unsigned char *output,
-				      unsigned long output_size)
+void choose_kernel_location(unsigned char *input,
+				unsigned long input_size,
+				unsigned char **output,
+				unsigned long output_size,
+				unsigned char **virt_offset)
 {
-	return output;
 }
 #endif
 
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v3 17/19] x86, kaslr: Add support of kernel physical address randomization above 4G
  2016-03-04 16:24 [PATCH v3 00/19] x86, boot: kaslr cleanup and 64bit kaslr support Baoquan He
                   ` (15 preceding siblings ...)
  2016-03-04 16:25 ` [PATCH v3 16/19] x86, kaslr: Randomize physical and virtual address of kernel separately Baoquan He
@ 2016-03-04 16:25 ` Baoquan He
  2016-03-04 16:25 ` [PATCH v3 18/19] x86, kaslr: Remove useless codes Baoquan He
                   ` (2 subsequent siblings)
  19 siblings, 0 replies; 51+ messages in thread
From: Baoquan He @ 2016-03-04 16:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: yinghai, keescook, hpa, vgoyal, mingo, bp, luto, lasse.collin,
	akpm, dyoung, Baoquan He

In kaslr implementation mechanism, process_e820_entry and
slots_fetch_random are 2 key functions.

process_e820_entry is used to parse passed in memory region and
store available slot area information into array slot_areas[].
slots_fetch_random is used to get a random value and translate
it into starting address of corresponding slot.

In this patch, for adding support of kernel physical address
randomization above 4G, both of these two functions are changed
based on the new slot_area data structure.

Now kernel can be reloaded and decompressed anywhere of the whole
physical memory, even near 64T at most.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 arch/x86/boot/compressed/aslr.c | 68 ++++++++++++++++++++++++++++++-----------
 1 file changed, 51 insertions(+), 17 deletions(-)

diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index 75ea931..d5ef3b4 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -293,27 +293,40 @@ static void slots_append(unsigned long addr)
 
 static unsigned long slots_fetch_random(void)
 {
+	unsigned long random;
+	int i;
+
 	/* Handle case of no slots stored. */
 	if (slot_max == 0)
 		return 0;
 
-	return slots[get_random_long() % slot_max];
+	random = get_random_long() % slot_max;
+
+	for (i = 0; i < slot_area_index; i++) {
+		if (random >= slot_areas[i].num) {
+			random -= slot_areas[i].num;
+			continue;
+		}
+		return slot_areas[i].addr + random * CONFIG_PHYSICAL_ALIGN;
+	}
+
+	if (i == slot_area_index)
+		debug_putstr("Something wrong happened in slots_fetch_random()...\n");
+	return 0;
 }
 
 static void process_e820_entry(struct e820entry *entry,
 			       unsigned long minimum,
 			       unsigned long image_size)
 {
-	struct mem_vector region, img;
+	struct mem_vector region, out;
+	struct slot_area slot_area;
+	unsigned long min, start_orig;
 
 	/* Skip non-RAM entries. */
 	if (entry->type != E820_RAM)
 		return;
 
-	/* Ignore entries entirely above our maximum. */
-	if (entry->addr >= CONFIG_RANDOMIZE_BASE_MAX_OFFSET)
-		return;
-
 	/* Ignore entries entirely below our minimum. */
 	if (entry->addr + entry->size < minimum)
 		return;
@@ -321,10 +334,17 @@ static void process_e820_entry(struct e820entry *entry,
 	region.start = entry->addr;
 	region.size = entry->size;
 
+repeat:
+	start_orig = region.start;
+
 	/* Potentially raise address to minimum location. */
 	if (region.start < minimum)
 		region.start = minimum;
 
+	/* Return if slot area array is full */
+	if (slot_area_index == MAX_SLOT_AREA)
+		return;
+
 	/* Potentially raise address to meet alignment requirements. */
 	region.start = ALIGN(region.start, CONFIG_PHYSICAL_ALIGN);
 
@@ -333,20 +353,30 @@ static void process_e820_entry(struct e820entry *entry,
 		return;
 
 	/* Reduce size by any delta from the original address. */
-	region.size -= region.start - entry->addr;
+	region.size -= region.start - start_orig;
 
-	/* Reduce maximum size to fit end of image within maximum limit. */
-	if (region.start + region.size > CONFIG_RANDOMIZE_BASE_MAX_OFFSET)
-		region.size = CONFIG_RANDOMIZE_BASE_MAX_OFFSET - region.start;
+	/* Return if region can't contain decompressed kernel */
+	if (region.size < image_size)
+		return;
 
-	/* Walk each aligned slot and check for avoided areas. */
-	for (img.start = region.start, img.size = image_size ;
-	     mem_contains(&region, &img) ;
-	     img.start += CONFIG_PHYSICAL_ALIGN) {
-		if (mem_avoid_overlap(&img))
-			continue;
-		slots_append(img.start);
+	if (!mem_avoid_overlap(&region)) {
+		store_slot_info(&region, image_size);
+		return;
 	}
+
+	min = mem_min_overlap(&region, &out);
+
+	if (min > region.start + image_size) {
+		struct mem_vector tmp;
+
+		tmp.start = region.start;
+		tmp.size = min - region.start;
+		store_slot_info(&tmp, image_size);
+	}
+
+	region.size -= out.start - region.start + out.size;
+	region.start = out.start + out.size;
+	goto repeat;
 }
 
 static unsigned long find_random_phy_addr(unsigned long minimum,
@@ -361,6 +391,10 @@ static unsigned long find_random_phy_addr(unsigned long minimum,
 	/* Verify potential e820 positions, appending to slots list. */
 	for (i = 0; i < real_mode->e820_entries; i++) {
 		process_e820_entry(&real_mode->e820_map[i], minimum, size);
+		if (slot_area_index == MAX_SLOT_AREA) {
+			debug_putstr("Stop processing e820 since slot_areas is full...\n");
+			break;
+		}
 	}
 
 	return slots_fetch_random();
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v3 18/19] x86, kaslr: Remove useless codes
  2016-03-04 16:24 [PATCH v3 00/19] x86, boot: kaslr cleanup and 64bit kaslr support Baoquan He
                   ` (16 preceding siblings ...)
  2016-03-04 16:25 ` [PATCH v3 17/19] x86, kaslr: Add support of kernel physical address randomization above 4G Baoquan He
@ 2016-03-04 16:25 ` Baoquan He
  2016-03-04 16:25 ` [PATCH v3 19/19] x86, kaslr: Allow random address to be below loaded address Baoquan He
  2016-03-05 11:35 ` [PATCH v3 00/19] x86, boot: kaslr cleanup and 64bit kaslr support Baoquan He
  19 siblings, 0 replies; 51+ messages in thread
From: Baoquan He @ 2016-03-04 16:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: yinghai, keescook, hpa, vgoyal, mingo, bp, luto, lasse.collin,
	akpm, dyoung, Baoquan He

Several auxiliary functions and slots[] are not needed any more since
struct slot_area is used to store the slot info of kaslr now. Hence
remove them in this patch.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 arch/x86/boot/compressed/aslr.c | 24 ------------------------
 1 file changed, 24 deletions(-)

diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index d5ef3b4..d45175d 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -112,17 +112,6 @@ struct mem_vector {
 #define MEM_AVOID_MAX 4
 static struct mem_vector mem_avoid[MEM_AVOID_MAX];
 
-static bool mem_contains(struct mem_vector *region, struct mem_vector *item)
-{
-	/* Item at least partially before region. */
-	if (item->start < region->start)
-		return false;
-	/* Item at least partially after region. */
-	if (item->start + item->size > region->start + region->size)
-		return false;
-	return true;
-}
-
 static bool mem_overlaps(struct mem_vector *one, struct mem_vector *two)
 {
 	/* Item one is entirely before item two. */
@@ -248,9 +237,6 @@ mem_min_overlap(struct mem_vector *img, struct mem_vector *out)
 	return min;
 }
 
-static unsigned long slots[CONFIG_RANDOMIZE_BASE_MAX_OFFSET /
-			   CONFIG_PHYSICAL_ALIGN];
-
 struct slot_area {
 	unsigned long addr;
 	int num;
@@ -281,16 +267,6 @@ static void store_slot_info(struct mem_vector *region, unsigned long image_size)
 	}
 }
 
-static void slots_append(unsigned long addr)
-{
-	/* Overflowing the slots list should be impossible. */
-	if (slot_max >= CONFIG_RANDOMIZE_BASE_MAX_OFFSET /
-			CONFIG_PHYSICAL_ALIGN)
-		return;
-
-	slots[slot_max++] = addr;
-}
-
 static unsigned long slots_fetch_random(void)
 {
 	unsigned long random;
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v3 19/19] x86, kaslr: Allow random address to be below loaded address
  2016-03-04 16:24 [PATCH v3 00/19] x86, boot: kaslr cleanup and 64bit kaslr support Baoquan He
                   ` (17 preceding siblings ...)
  2016-03-04 16:25 ` [PATCH v3 18/19] x86, kaslr: Remove useless codes Baoquan He
@ 2016-03-04 16:25 ` Baoquan He
  2016-03-05 11:35 ` [PATCH v3 00/19] x86, boot: kaslr cleanup and 64bit kaslr support Baoquan He
  19 siblings, 0 replies; 51+ messages in thread
From: Baoquan He @ 2016-03-04 16:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: yinghai, keescook, hpa, vgoyal, mingo, bp, luto, lasse.collin,
	akpm, dyoung

From: Yinghai Lu <yinghai@kernel.org>

Now new randomized output can only be chosen from regions above loaded
address. In this case, for bootloaders like kexec which always loads
kernel near the end of ram, it doesn't do randomization at all. Or kernel
is loaded in a very big starting address, we should not give up that area
is loaded in a very large address, then the area below the large loaded
address will be given up. This is not reasonable.

With correct tracking in mem_avoid  we can allow random output below
loaded address. With this change, though kexec can get random ouput
below its loaded address of kernel.

Now we just pick 512M as min_addr. If kernel loaded address is bigger than
512M, E.g 8G. Then [512M, 8G) can be added into random output candidate area.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/boot/compressed/aslr.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index d45175d..45e820d 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -403,7 +403,8 @@ void choose_kernel_location(unsigned char *input,
 				unsigned long output_size,
 				unsigned char **virt_offset)
 {
-	unsigned long random;
+	unsigned long random, min_addr;
+
 	*virt_offset = (unsigned char *)LOAD_PHYSICAL_ADDR;
 
 #ifdef CONFIG_HIBERNATION
@@ -424,8 +425,13 @@ void choose_kernel_location(unsigned char *input,
 	mem_avoid_init((unsigned long)input, input_size,
 		       (unsigned long)*output);
 
+	/* start from 512M */
+	min_addr = (unsigned long)*output;
+	if (min_addr > (512UL<<20))
+		min_addr = 512UL<<20;
+
 	/* Walk e820 and find a random address. */
-	random = find_random_phy_addr((unsigned long)*output, output_size);
+	random = find_random_phy_addr(min_addr, output_size);
 	if (!random)
 		debug_putstr("KASLR could not find suitable E820 region...\n");
 	else {
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 00/19] x86, boot: kaslr cleanup and 64bit kaslr support
  2016-03-04 16:24 [PATCH v3 00/19] x86, boot: kaslr cleanup and 64bit kaslr support Baoquan He
                   ` (18 preceding siblings ...)
  2016-03-04 16:25 ` [PATCH v3 19/19] x86, kaslr: Allow random address to be below loaded address Baoquan He
@ 2016-03-05 11:35 ` Baoquan He
  19 siblings, 0 replies; 51+ messages in thread
From: Baoquan He @ 2016-03-05 11:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: yinghai, keescook, hpa, vgoyal, mingo, bp, luto, lasse.collin,
	akpm, dyoung

Forget mentioning this patchset is based on v4.5-rc6 of Linus's tree.

On 03/05/16 at 12:24am, Baoquan He wrote:
> ***Background:
> Previously a bug is reported that kdump didn't work when kaslr is enabled. During
> discussing that bug fix, we found current kaslr has a limilation that it can
> only randomize in 1GB region.
> 
> This is because in curent kaslr implementaion only physical address of kernel
> loading is randomized. Then calculate the delta of physical address where
> vmlinux was linked to load and where it is finally loaded. If delta is not
> equal to 0, namely there's a new physical address where kernel is actually
> decompressed, relocation handling need be done. Then delta is added to offset
> of kernel symbol relocation, this makes the address of kernel text mapping move
> delta long. Though in principle kernel can be randomized to any physical address,
> kernel text mapping address space is limited and only 1G, namely as follows on
> x86_64:
> 	[0xffffffff80000000, 0xffffffffc0000000)
> 
> In one word, kernel text physical address and virtual address randomization is
> coupled. This causes the limitation.
> 
> Then hpa and Vivek suggested we should change this. To decouple the physical
> address and virtual address randomization of kernel text and let them work
> separately. Then kernel text physical address can be randomized in region
> [16M, 64T), and kernel text virtual address can be randomized in region
> [0xffffffff80000000, 0xffffffffc0000000).
> 
> ***Problems we need solved:
>   - For kernel boot from startup_32 case, only 0~4G identity mapping is built.
>     If kernel will be randomly put anywhere from 16M to 64T at most, the price
>     to build all region of identity mapping is too high. We need build the
>     identity mapping on demand, not covering all physical address space.
> 
>   - Decouple the physical address and virtual address randomization of kernel
>     text and let them work separately.
> 
> ***Parts:
>    - The 1st part is Yinghai's identity mapping building on demand patches.
>      This is used to solve the first problem mentioned above.
>      (Patch 09-10/19)
>    - The 2nd part is decoupling the physical address and virtual address
>      randomization of kernel text and letting them work separately patches
>      based on Yinghai's ident mapping patches.
>      (Patch 12-19/19)
>    - The 3rd part is some clean up patches which Yinghai found when he reviewed
>      my patches and the related code around.
>      (Patch 01-08/19)
> 
> ***Patch status:
> This patchset went through several rounds of review.
> 
>    - The first round can be found here:
> 	https://lwn.net/Articles/637115/
> 
>    - In 2nd round Yinghai made a big patchset including this kaslr fix and another
>      setup_data related fix. The link is here:
>       http://lists-archives.com/linux-kernel/28346903-x86-updated-patches-for-kaslr-and-setup_data-etc-for-v4-3.html
>      You can get the code from Yinghai's git branch:
>      git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git for-x86-v4.3-next
> 
>    - This post is the 3rd round. It only takes care of the kaslr related patches.
>      For reviewers it's better to discuss only one issue in one thread.
>        * I take off one patch as follows from Yinghai's because I think it's unnecessay. 
>           - Patch 05/19 x86, kaslr: rename output_size to output_run_size
>          output_size is enough to represen the value:
> 		output_len > run_size ? output_len : run_size
>       
>        * I add Patch 04/19, it's a comment update patch. For other patches, I just
>          adjust patch log and do several places of change comparing with 2nd round.
>          Please check the change log under patch log of each patch for details.
> 
>        * I adjust sequence of several patches to make review easier. It doesn't
>          affect codes.
> 
>    - You can also get this patchset from my github:
>      https://github.com/baoquan-he/linux.git kaslr-above-4G
> 
> Any comments and suggestions are welcome. Code changes, code comments, patch logs,
> anything you think it's unclear, please add your comment.
> 
> Baoquan He (8):
>   x86, kaskr: Update the description for decompressor worst case
>   x86, kaslr: Fix a bug that relocation can not be handled when kernel
>     is loaded above 2G
>   x86, kaslr: Introduce struct slot_area to manage randomization slot
>     info
>   x86, kaslr: Add two functions which will be used later
>   x86, kaslr: Introduce fetch_random_virt_offset to randomize the kernel
>     text mapping address
>   x86, kaslr: Randomize physical and virtual address of kernel
>     separately
>   x86, kaslr: Add support of kernel physical address randomization above
>     4G
>   x86, kaslr: Remove useless codes
> 
> Yinghai Lu (11):
>   x86, kaslr: Remove not needed parameter for choose_kernel_location
>   x86, boot: Move compressed kernel to end of buffer before
>     decompressing
>   x86, boot: Move z_extract_offset calculation to header.S
>   x86, boot: Fix run_size calculation
>   x86, kaslr: Clean up useless code related to run_size.
>   x86, kaslr: Get correct max_addr for relocs pointer
>   x86, kaslr: Consolidate mem_avoid array filling
>   x86, boot: Split kernel_ident_mapping_init to another file
>   x86, 64bit: Set ident_mapping for kaslr
>   x86, boot: Add checking for memcpy
>   x86, kaslr: Allow random address to be below loaded address
> 
>  arch/x86/boot/Makefile                 |  13 +-
>  arch/x86/boot/compressed/Makefile      |  19 ++-
>  arch/x86/boot/compressed/aslr.c        | 258 +++++++++++++++++++++++----------
>  arch/x86/boot/compressed/head_32.S     |  14 +-
>  arch/x86/boot/compressed/head_64.S     |  15 +-
>  arch/x86/boot/compressed/misc.c        |  94 +++++++-----
>  arch/x86/boot/compressed/misc.h        |  34 +++--
>  arch/x86/boot/compressed/misc_pgt.c    |  91 ++++++++++++
>  arch/x86/boot/compressed/mkpiggy.c     |  28 +---
>  arch/x86/boot/compressed/string.c      |  29 +++-
>  arch/x86/boot/compressed/vmlinux.lds.S |   1 +
>  arch/x86/boot/header.S                 |  22 ++-
>  arch/x86/include/asm/boot.h            |  19 +++
>  arch/x86/include/asm/page.h            |   5 +
>  arch/x86/kernel/asm-offsets.c          |   1 +
>  arch/x86/kernel/vmlinux.lds.S          |   1 +
>  arch/x86/mm/ident_map.c                |  74 ++++++++++
>  arch/x86/mm/init_64.c                  |  74 +---------
>  arch/x86/tools/calc_run_size.sh        |  42 ------
>  19 files changed, 543 insertions(+), 291 deletions(-)
>  create mode 100644 arch/x86/boot/compressed/misc_pgt.c
>  create mode 100644 arch/x86/mm/ident_map.c
>  delete mode 100644 arch/x86/tools/calc_run_size.sh
> 
> -- 
> 2.5.0
> 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 01/19] x86, kaslr: Remove not needed parameter for choose_kernel_location
  2016-03-04 16:24 ` [PATCH v3 01/19] x86, kaslr: Remove not needed parameter for choose_kernel_location Baoquan He
@ 2016-03-07 22:28   ` Kees Cook
  0 siblings, 0 replies; 51+ messages in thread
From: Kees Cook @ 2016-03-07 22:28 UTC (permalink / raw)
  To: Baoquan He
  Cc: LKML, Yinghai Lu, H. Peter Anvin, Vivek Goyal, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, lasse.collin, Andrew Morton,
	Dave Young

On Fri, Mar 4, 2016 at 8:24 AM, Baoquan He <bhe@redhat.com> wrote:
> From: Yinghai Lu <yinghai@kernel.org>
>
> real_mode is global variable, so we do not need to pass it around.
>
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>

Cool, this makes things cleaner.

Acked-by: Kees Cook <keescook@chromium.org>

-Kees

> ---
>  arch/x86/boot/compressed/aslr.c | 5 ++---
>  arch/x86/boot/compressed/misc.c | 2 +-
>  arch/x86/boot/compressed/misc.h | 6 ++----
>  3 files changed, 5 insertions(+), 8 deletions(-)
>
> diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
> index 6a9b96b..622aa88 100644
> --- a/arch/x86/boot/compressed/aslr.c
> +++ b/arch/x86/boot/compressed/aslr.c
> @@ -295,8 +295,7 @@ static unsigned long find_random_addr(unsigned long minimum,
>         return slots_fetch_random();
>  }
>
> -unsigned char *choose_kernel_location(struct boot_params *boot_params,
> -                                     unsigned char *input,
> +unsigned char *choose_kernel_location(unsigned char *input,
>                                       unsigned long input_size,
>                                       unsigned char *output,
>                                       unsigned long output_size)
> @@ -316,7 +315,7 @@ unsigned char *choose_kernel_location(struct boot_params *boot_params,
>         }
>  #endif
>
> -       boot_params->hdr.loadflags |= KASLR_FLAG;
> +       real_mode->hdr.loadflags |= KASLR_FLAG;
>
>         /* Record the various known unsafe memory ranges. */
>         mem_avoid_init((unsigned long)input, input_size,
> diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
> index 79dac17..f35ad9e 100644
> --- a/arch/x86/boot/compressed/misc.c
> +++ b/arch/x86/boot/compressed/misc.c
> @@ -428,7 +428,7 @@ asmlinkage __visible void *decompress_kernel(void *rmode, memptr heap,
>          * the entire decompressed kernel plus relocation table, or the
>          * entire decompressed kernel plus .bss and .brk sections.
>          */
> -       output = choose_kernel_location(real_mode, input_data, input_len, output,
> +       output = choose_kernel_location(input_data, input_len, output,
>                                         output_len > run_size ? output_len
>                                                               : run_size);
>
> diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
> index 3783dc3..dcf01c2 100644
> --- a/arch/x86/boot/compressed/misc.h
> +++ b/arch/x86/boot/compressed/misc.h
> @@ -67,8 +67,7 @@ int cmdline_find_option_bool(const char *option);
>
>  #if CONFIG_RANDOMIZE_BASE
>  /* aslr.c */
> -unsigned char *choose_kernel_location(struct boot_params *boot_params,
> -                                     unsigned char *input,
> +unsigned char *choose_kernel_location(unsigned char *input,
>                                       unsigned long input_size,
>                                       unsigned char *output,
>                                       unsigned long output_size);
> @@ -76,8 +75,7 @@ unsigned char *choose_kernel_location(struct boot_params *boot_params,
>  bool has_cpuflag(int flag);
>  #else
>  static inline
> -unsigned char *choose_kernel_location(struct boot_params *boot_params,
> -                                     unsigned char *input,
> +unsigned char *choose_kernel_location(unsigned char *input,
>                                       unsigned long input_size,
>                                       unsigned char *output,
>                                       unsigned long output_size)
> --
> 2.5.0
>



-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 02/19] x86, boot: Move compressed kernel to end of buffer before decompressing
  2016-03-04 16:25 ` [PATCH v3 02/19] x86, boot: Move compressed kernel to end of buffer before decompressing Baoquan He
@ 2016-03-07 22:35   ` Kees Cook
  2016-03-08  4:50     ` Baoquan He
  0 siblings, 1 reply; 51+ messages in thread
From: Kees Cook @ 2016-03-07 22:35 UTC (permalink / raw)
  To: Baoquan He
  Cc: LKML, Yinghai Lu, H. Peter Anvin, Vivek Goyal, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, lasse.collin, Andrew Morton,
	Dave Young

On Fri, Mar 4, 2016 at 8:25 AM, Baoquan He <bhe@redhat.com> wrote:
> From: Yinghai Lu <yinghai@kernel.org>
>
> For better understanding clarify what are VO and ZO firstly. They are
> introduced in below commits by hpa.
> 77d1a49 x86, boot: make symbols from the main vmlinux available
> 37ba7ab x86, boot: make kernel_alignment adjustable; new bzImage fields
>
> VO:
> - uncompressed kernel
> - size: VO__end - VO__text
>
> ZO:
> - boot/compressed/vmlinux
> - head text + compressed (VO+relocs) + decompressor code
> - size: ZO__end - ZO_startup_32
>
> And also copy INIT_SIZE definition here for reference. From below macros
> we can see INIT_SIZE is either bigger than VO_INIT_SIZE or equal to
> VO_INIT_SIZE.
>
>  #define ZO_INIT_SIZE    (ZO__end - ZO_startup_32 + ZO_z_extract_offset)
>  #define VO_INIT_SIZE    (VO__end - VO__text)
>  #if ZO_INIT_SIZE > VO_INIT_SIZE
>  #define INIT_SIZE ZO_INIT_SIZE
>  #else
>  #define INIT_SIZE VO_INIT_SIZE
>  #endif
>
> Current code uses extract_offset to decide the copied ZO position, it
> put ZO starting from extract_offset. When INIT_SIZE is bigger than
> VO_INIT_SIZE copied ZO occupies space from extract_offset to the end
> of decompressing buffer, just like below.
>
> 0                       extract_offset                       init_size
> |-----------|---------------|------------------------|---------|
>             |               |                        |         |
>           VO__text      startup_32 of ZO         VO__end     ZO__end
>
> However when INIT_SIZE is euqal to VO_INIT_SIZE there's still space left

typo: "equal"

> from end of ZO to the end of decompressing buffer, like below.
>
> 0                       extract_offset                       init_size
> |-----------|---------------|------------------------|---------|
>             |               |                        |         |
>           VO__text      startup_32 of ZO         ZO__end     VO__end
>
> It's better to always put ZO right in the end of decompressing buffer.
> Anyway, who doesn't like it to be better.

It looks to me like it's just cleaner to always have ZO__end ==
VO__end? Is that the motivation for moving it? What problem is solved
with this change?

-Kees

>
> So in this patch add BP_init_size into asm-offsets.c to make it visible
> to asm modules. And find the starting position of copied ZO by init_size
> and ZO run size, namely (ZO__end - startup_32) when move ZO.
>
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> ---
> v2->v3:
>     Adjust the patch log.
>
>  arch/x86/boot/compressed/head_32.S     | 11 +++++++++--
>  arch/x86/boot/compressed/head_64.S     |  8 ++++++--
>  arch/x86/boot/compressed/mkpiggy.c     |  3 ---
>  arch/x86/boot/compressed/vmlinux.lds.S |  1 +
>  arch/x86/kernel/asm-offsets.c          |  1 +
>  arch/x86/kernel/vmlinux.lds.S          |  1 +
>  6 files changed, 18 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/boot/compressed/head_32.S b/arch/x86/boot/compressed/head_32.S
> index 8ef964d..0c140f9 100644
> --- a/arch/x86/boot/compressed/head_32.S
> +++ b/arch/x86/boot/compressed/head_32.S
> @@ -148,7 +148,9 @@ preferred_addr:
>  1:
>
>         /* Target address to relocate to for decompression */
> -       addl    $z_extract_offset, %ebx
> +       movl    BP_init_size(%esi), %eax
> +       subl    $_end, %eax
> +       addl    %eax, %ebx
>
>         /* Set up the stack */
>         leal    boot_stack_end(%ebx), %esp
> @@ -210,8 +212,13 @@ relocated:
>                                 /* push arguments for decompress_kernel: */
>         pushl   $z_run_size     /* size of kernel with .bss and .brk */
>         pushl   $z_output_len   /* decompressed length, end of relocs */
> -       leal    z_extract_offset_negative(%ebx), %ebp
> +
> +       movl    BP_init_size(%esi), %eax
> +       subl    $_end, %eax
> +       movl    %ebx, %ebp
> +       subl    %eax, %ebp
>         pushl   %ebp            /* output address */
> +
>         pushl   $z_input_len    /* input_len */
>         leal    input_data(%ebx), %eax
>         pushl   %eax            /* input_data */
> diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
> index b0c0d16..67dd8d3 100644
> --- a/arch/x86/boot/compressed/head_64.S
> +++ b/arch/x86/boot/compressed/head_64.S
> @@ -102,7 +102,9 @@ ENTRY(startup_32)
>  1:
>
>         /* Target address to relocate to for decompression */
> -       addl    $z_extract_offset, %ebx
> +       movl    BP_init_size(%esi), %eax
> +       subl    $_end, %eax
> +       addl    %eax, %ebx
>
>  /*
>   * Prepare for entering 64 bit mode
> @@ -330,7 +332,9 @@ preferred_addr:
>  1:
>
>         /* Target address to relocate to for decompression */
> -       leaq    z_extract_offset(%rbp), %rbx
> +       movl    BP_init_size(%rsi), %ebx
> +       subl    $_end, %ebx
> +       addq    %rbp, %rbx
>
>         /* Set up the stack */
>         leaq    boot_stack_end(%rbx), %rsp
> diff --git a/arch/x86/boot/compressed/mkpiggy.c b/arch/x86/boot/compressed/mkpiggy.c
> index d8222f2..b980046 100644
> --- a/arch/x86/boot/compressed/mkpiggy.c
> +++ b/arch/x86/boot/compressed/mkpiggy.c
> @@ -85,9 +85,6 @@ int main(int argc, char *argv[])
>         printf("z_output_len = %lu\n", (unsigned long)olen);
>         printf(".globl z_extract_offset\n");
>         printf("z_extract_offset = 0x%lx\n", offs);
> -       /* z_extract_offset_negative allows simplification of head_32.S */
> -       printf(".globl z_extract_offset_negative\n");
> -       printf("z_extract_offset_negative = -0x%lx\n", offs);
>         printf(".globl z_run_size\n");
>         printf("z_run_size = %lu\n", run_size);
>
> diff --git a/arch/x86/boot/compressed/vmlinux.lds.S b/arch/x86/boot/compressed/vmlinux.lds.S
> index 34d047c..e24e0a0 100644
> --- a/arch/x86/boot/compressed/vmlinux.lds.S
> +++ b/arch/x86/boot/compressed/vmlinux.lds.S
> @@ -70,5 +70,6 @@ SECTIONS
>                 _epgtable = . ;
>         }
>  #endif
> +       . = ALIGN(PAGE_SIZE);   /* keep ZO size page aligned */
>         _end = .;
>  }
> diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
> index 84a7524..506de14 100644
> --- a/arch/x86/kernel/asm-offsets.c
> +++ b/arch/x86/kernel/asm-offsets.c
> @@ -81,6 +81,7 @@ void common(void) {
>         OFFSET(BP_hardware_subarch, boot_params, hdr.hardware_subarch);
>         OFFSET(BP_version, boot_params, hdr.version);
>         OFFSET(BP_kernel_alignment, boot_params, hdr.kernel_alignment);
> +       OFFSET(BP_init_size, boot_params, hdr.init_size);
>         OFFSET(BP_pref_address, boot_params, hdr.pref_address);
>         OFFSET(BP_code32_start, boot_params, hdr.code32_start);
>
> diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
> index 74e4bf1..217d5b1 100644
> --- a/arch/x86/kernel/vmlinux.lds.S
> +++ b/arch/x86/kernel/vmlinux.lds.S
> @@ -325,6 +325,7 @@ SECTIONS
>                 __brk_limit = .;
>         }
>
> +       . = ALIGN(PAGE_SIZE);           /* keep VO_INIT_SIZE page aligned */
>         _end = .;
>
>          STABS_DEBUG
> --
> 2.5.0
>



-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 05/19] x86, boot: Fix run_size calculation
  2016-03-04 16:25 ` [PATCH v3 05/19] x86, boot: Fix run_size calculation Baoquan He
@ 2016-03-07 23:10   ` Kees Cook
  2016-03-08  4:57     ` Baoquan He
  0 siblings, 1 reply; 51+ messages in thread
From: Kees Cook @ 2016-03-07 23:10 UTC (permalink / raw)
  To: Baoquan He
  Cc: LKML, Yinghai Lu, H. Peter Anvin, Vivek Goyal, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, lasse.collin, Andrew Morton,
	Dave Young, Junjie Mao, Josh Triplett, Matt Fleming

On Fri, Mar 4, 2016 at 8:25 AM, Baoquan He <bhe@redhat.com> wrote:
> From: Yinghai Lu <yinghai@kernel.org>
>
> Firstly, current run_size is calculated via shell script
> arch/x86/tools/calc_run_size.sh. It gets file offset and mem size of section
> .bss and .brk in vmlinux, then add them as follows:
>
> run_size=$(( $offsetA + $sizeA + $sizeB ))
>
> However this is completely wrong. The offset is the starting address of
> section or segment in elf file. Below is a vmlinux I compiled:
>
> [bhe@x1 linux]$ objdump -h vmlinux
>
> vmlinux:     file format elf64-x86-64
>
> Sections:
> Idx Name          Size      VMA               LMA               File off  Algn
>  27 .bss          00170000  ffffffff81ec8000  0000000001ec8000  012c8000  2**12
>                   ALLOC
>  28 .brk          00027000  ffffffff82038000  0000000002038000  012c8000  2**0
>                   ALLOC
>
> Here we can get run_size is 0x145f000.
> 0x012c8000+0x012c8000+0x00027000=0x145f000

This example calculation looks wrong to me. run_size is offset + size
+ size (not offset + offset + size):

0x12c8000+0x17000+0x27000 = 0x1306000

> [bhe@x1 linux]$ readelf -l vmlinux
>
> Elf file type is EXEC (Executable file)
> Entry point 0x1000000
> There are 5 program headers, starting at offset 64
>
> Program Headers:
>   Type           Offset             VirtAddr           PhysAddr
>                  FileSiz            MemSiz              Flags  Align
>   LOAD           0x0000000000200000 0xffffffff81000000 0x0000000001000000
>                  0x0000000000b5e000 0x0000000000b5e000  R E    200000
>   LOAD           0x0000000000e00000 0xffffffff81c00000 0x0000000001c00000
>                  0x0000000000145000 0x0000000000145000  RW     200000
>   LOAD           0x0000000001000000 0x0000000000000000 0x0000000001d45000
>                  0x0000000000018158 0x0000000000018158  RW     200000
>   LOAD           0x000000000115e000 0xffffffff81d5e000 0x0000000001d5e000
>                  0x000000000016a000 0x0000000000301000  RWE    200000
>   NOTE           0x000000000099bcac 0xffffffff8179bcac 0x000000000179bcac
>                  0x00000000000001bc 0x00000000000001bc         4
>
>  Section to Segment mapping:
>   Segment Sections...
>    00     .text .notes __ex_table .rodata __bug_table .pci_fixup .tracedata __ksymtab __ksymtab_gpl __ksymtab_strings __init_rodata __param __modver
>    01     .data .vvar
>    02     .data..percpu
>    03     .init.text .init.data .x86_cpu_dev.init .parainstructions .altinstructions .altinstr_replacement .iommu_table .apicdrivers .exit.text .smp_locks .bss .brk
>    04     .notes
>
> Here we can get the same value as current run_size if we add p_offset
> and p_memsz.
> 0x000000000115e000+0x0000000000301000=0x145f000
>
> But is it right? Obviously not. We should calculate it using the last LOAD
> program segment like this:
> run_size = phdr->p_paddr + phdr->p_memsz - physical load addr of kernel
> run_size=0x0000000001d5e000+0x0000000000301000-0x0000000001000000=0x105f000

Segment 03 ends at 0xffffffff81d5e000 + 0x301000 = 0xffffffff8205f000,
which does match where .brk ends (0xffffffff82038000 + 0x27000 =
0xffffffff8205f000).

>
> It's equal to VO_end-VO_text and certainly it's simpler to do.
> _end: 0xffffffff8205f000
> _text:0xffffffff81000000
> run_size = 0xffffffff8205f000-0xffffffff81000000=0x105f000

I would agree, it would seem like the existing run_size calculation is
0x247000 too high in this example.

-Kees

> Secondly run_size is a simple constant, we don't need to pass it around
> and we already have voffset.h for that. We can share voffset.h between
> misc.c and header.S instead of getting run_size in other ways. Hence in
> this patch, we move voffset.h creation code to boot/compressed/Makefile.
>
> Dependence was:
> boot/header.S ==> boot/voffset.h ==> vmlinux
> boot/header.S ==> compressed/vmlinux ==> compressed/misc.c
> Now become:
> boot/header.S ==> compressed/vmlinux ==> compressed/misc.c ==> boot/voffset.h ==> vmlinux
>
> Use macro in misc.c to replace passed run_size.
>
> Fixes: e6023367d779 ("x86, kaslr: Prevent .bss from overlaping initrd")
> Cc: Junjie Mao <eternal.n08@gmail.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Josh Triplett <josh@joshtriplett.org>
> Cc: Matt Fleming <matt.fleming@intel.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> ---
> v2->v3:
>     Adjust the patch log.
>
>  arch/x86/boot/Makefile            | 11 +----------
>  arch/x86/boot/compressed/Makefile | 12 ++++++++++++
>  arch/x86/boot/compressed/misc.c   |  3 +++
>  3 files changed, 16 insertions(+), 10 deletions(-)
>
> diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
> index bd021e5..40f0ae2 100644
> --- a/arch/x86/boot/Makefile
> +++ b/arch/x86/boot/Makefile
> @@ -78,15 +78,6 @@ $(obj)/vmlinux.bin: $(obj)/compressed/vmlinux FORCE
>
>  SETUP_OBJS = $(addprefix $(obj)/,$(setup-y))
>
> -sed-voffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(_text\|_end\)$$/\#define VO_\2 0x\1/p'
> -
> -quiet_cmd_voffset = VOFFSET $@
> -      cmd_voffset = $(NM) $< | sed -n $(sed-voffset) > $@
> -
> -targets += voffset.h
> -$(obj)/voffset.h: vmlinux FORCE
> -       $(call if_changed,voffset)
> -
>  sed-zoffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(startup_32\|startup_64\|efi32_stub_entry\|efi64_stub_entry\|efi_pe_entry\|input_data\|_end\|_ehead\|_text\|z_.*\)$$/\#define ZO_\2 0x\1/p'
>
>  quiet_cmd_zoffset = ZOFFSET $@
> @@ -98,7 +89,7 @@ $(obj)/zoffset.h: $(obj)/compressed/vmlinux FORCE
>
>
>  AFLAGS_header.o += -I$(obj)
> -$(obj)/header.o: $(obj)/voffset.h $(obj)/zoffset.h
> +$(obj)/header.o: $(obj)/zoffset.h
>
>  LDFLAGS_setup.elf      := -T
>  $(obj)/setup.elf: $(src)/setup.ld $(SETUP_OBJS) FORCE
> diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
> index f9ce75d..d9dedd9 100644
> --- a/arch/x86/boot/compressed/Makefile
> +++ b/arch/x86/boot/compressed/Makefile
> @@ -41,6 +41,18 @@ LDFLAGS_vmlinux := -T
>  hostprogs-y    := mkpiggy
>  HOST_EXTRACFLAGS += -I$(srctree)/tools/include
>
> +sed-voffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(_text\|_end\)$$/\#define VO_\2 _AC(0x\1,UL)/p'
> +
> +quiet_cmd_voffset = VOFFSET $@
> +      cmd_voffset = $(NM) $< | sed -n $(sed-voffset) > $@
> +
> +targets += ../voffset.h
> +
> +$(obj)/../voffset.h: vmlinux FORCE
> +       $(call if_changed,voffset)
> +
> +$(obj)/misc.o: $(obj)/../voffset.h
> +
>  vmlinux-objs-y := $(obj)/vmlinux.lds $(obj)/head_$(BITS).o $(obj)/misc.o \
>         $(obj)/string.o $(obj)/cmdline.o \
>         $(obj)/piggy.o $(obj)/cpuflags.o
> diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
> index 7a65a8a..42e76c2 100644
> --- a/arch/x86/boot/compressed/misc.c
> +++ b/arch/x86/boot/compressed/misc.c
> @@ -11,6 +11,7 @@
>
>  #include "misc.h"
>  #include "../string.h"
> +#include "../voffset.h"
>
>  /* WARNING!!
>   * This code is compiled with -fPIC and it is relocated dynamically
> @@ -415,6 +416,8 @@ asmlinkage __visible void *decompress_kernel(void *rmode, memptr heap,
>         lines = real_mode->screen_info.orig_video_lines;
>         cols = real_mode->screen_info.orig_video_cols;
>
> +       run_size = VO__end - VO__text;
> +
>         console_init();
>         debug_putstr("early console in decompress_kernel\n");
>
> --
> 2.5.0
>



-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 06/19] x86, kaslr: Clean up useless code related to run_size.
  2016-03-04 16:25 ` [PATCH v3 06/19] x86, kaslr: Clean up useless code related to run_size Baoquan He
@ 2016-03-07 23:12   ` Kees Cook
  2016-03-08  5:00     ` Baoquan He
  0 siblings, 1 reply; 51+ messages in thread
From: Kees Cook @ 2016-03-07 23:12 UTC (permalink / raw)
  To: Baoquan He
  Cc: LKML, Yinghai Lu, H. Peter Anvin, Vivek Goyal, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, lasse.collin, Andrew Morton,
	Dave Young, Josh Triplett, Matt Fleming, Ard Biesheuvel,
	Junjie Mao

On Fri, Mar 4, 2016 at 8:25 AM, Baoquan He <bhe@redhat.com> wrote:
> From: Yinghai Lu <yinghai@kernel.org>
>
> So now we use the below formula to get run_size.
>
>         run_size = VO__end - VO__text
>
> Let's remove code for old run_size calculation and clean up the places
> where run_size need be passed in and out.
>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Josh Triplett <josh@joshtriplett.org>
> Cc: Matt Fleming <matt.fleming@intel.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> Cc: Junjie Mao <eternal.n08@gmail.com>
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> ---
> v2->v3:
>     Adjust the patch log.
>
>  arch/x86/boot/compressed/Makefile  |  4 +---
>  arch/x86/boot/compressed/head_32.S |  3 +--
>  arch/x86/boot/compressed/head_64.S |  3 ---
>  arch/x86/boot/compressed/misc.c    |  6 ++----
>  arch/x86/boot/compressed/mkpiggy.c | 10 ++-------
>  arch/x86/tools/calc_run_size.sh    | 42 --------------------------------------
>  6 files changed, 6 insertions(+), 62 deletions(-)
>  delete mode 100644 arch/x86/tools/calc_run_size.sh
>
> diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
> index d9dedd9..fef80fa 100644
> --- a/arch/x86/boot/compressed/Makefile
> +++ b/arch/x86/boot/compressed/Makefile
> @@ -105,10 +105,8 @@ suffix-$(CONFIG_KERNEL_XZ) := xz
>  suffix-$(CONFIG_KERNEL_LZO)    := lzo
>  suffix-$(CONFIG_KERNEL_LZ4)    := lz4
>
> -RUN_SIZE = $(shell $(OBJDUMP) -h vmlinux | \
> -            $(CONFIG_SHELL) $(srctree)/arch/x86/tools/calc_run_size.sh)
>  quiet_cmd_mkpiggy = MKPIGGY $@
> -      cmd_mkpiggy = $(obj)/mkpiggy $< $(RUN_SIZE) > $@ || ( rm -f $@ ; false )
> +      cmd_mkpiggy = $(obj)/mkpiggy $< > $@ || ( rm -f $@ ; false )
>
>  targets += piggy.S
>  $(obj)/piggy.S: $(obj)/vmlinux.bin.$(suffix-y) $(obj)/mkpiggy FORCE
> diff --git a/arch/x86/boot/compressed/head_32.S b/arch/x86/boot/compressed/head_32.S
> index 0c140f9..122b32f 100644
> --- a/arch/x86/boot/compressed/head_32.S
> +++ b/arch/x86/boot/compressed/head_32.S
> @@ -210,7 +210,6 @@ relocated:
>   * Do the decompression, and jump to the new kernel..
>   */
>                                 /* push arguments for decompress_kernel: */
> -       pushl   $z_run_size     /* size of kernel with .bss and .brk */
>         pushl   $z_output_len   /* decompressed length, end of relocs */
>
>         movl    BP_init_size(%esi), %eax
> @@ -226,7 +225,7 @@ relocated:
>         pushl   %eax            /* heap area */
>         pushl   %esi            /* real mode pointer */
>         call    decompress_kernel /* returns kernel location in %eax */
> -       addl    $28, %esp
> +       addl    $24, %esp
>
>  /*
>   * Jump to the decompressed kernel.
> diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
> index 67dd8d3..3691451 100644
> --- a/arch/x86/boot/compressed/head_64.S
> +++ b/arch/x86/boot/compressed/head_64.S
> @@ -407,8 +407,6 @@ relocated:
>   * Do the decompression, and jump to the new kernel..
>   */
>         pushq   %rsi                    /* Save the real mode argument */
> -       movq    $z_run_size, %r9        /* size of kernel with .bss and .brk */
> -       pushq   %r9
>         movq    %rsi, %rdi              /* real mode address */
>         leaq    boot_heap(%rip), %rsi   /* malloc area for uncompression */
>         leaq    input_data(%rip), %rdx  /* input_data */
> @@ -416,7 +414,6 @@ relocated:
>         movq    %rbp, %r8               /* output target address */
>         movq    $z_output_len, %r9      /* decompressed length, end of relocs */
>         call    decompress_kernel       /* returns kernel location in %rax */
> -       popq    %r9
>         popq    %rsi
>
>  /*
> diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
> index 42e76c2..069120e 100644
> --- a/arch/x86/boot/compressed/misc.c
> +++ b/arch/x86/boot/compressed/misc.c
> @@ -393,9 +393,9 @@ asmlinkage __visible void *decompress_kernel(void *rmode, memptr heap,
>                                   unsigned char *input_data,
>                                   unsigned long input_len,
>                                   unsigned char *output,
> -                                 unsigned long output_len,
> -                                 unsigned long run_size)
> +                                 unsigned long output_len)
>  {
> +       unsigned long run_size = VO__end - VO__text;

I think this can be "const" now, yes?

-Kees

>         unsigned char *output_orig = output;
>
>         real_mode = rmode;
> @@ -416,8 +416,6 @@ asmlinkage __visible void *decompress_kernel(void *rmode, memptr heap,
>         lines = real_mode->screen_info.orig_video_lines;
>         cols = real_mode->screen_info.orig_video_cols;
>
> -       run_size = VO__end - VO__text;
> -
>         console_init();
>         debug_putstr("early console in decompress_kernel\n");
>
> diff --git a/arch/x86/boot/compressed/mkpiggy.c b/arch/x86/boot/compressed/mkpiggy.c
> index a613c84..c5148642 100644
> --- a/arch/x86/boot/compressed/mkpiggy.c
> +++ b/arch/x86/boot/compressed/mkpiggy.c
> @@ -34,13 +34,11 @@ int main(int argc, char *argv[])
>  {
>         uint32_t olen;
>         long ilen;
> -       unsigned long run_size;
>         FILE *f = NULL;
>         int retval = 1;
>
> -       if (argc < 3) {
> -               fprintf(stderr, "Usage: %s compressed_file run_size\n",
> -                               argv[0]);
> +       if (argc < 2) {
> +               fprintf(stderr, "Usage: %s compressed_file\n", argv[0]);
>                 goto bail;
>         }
>
> @@ -65,15 +63,11 @@ int main(int argc, char *argv[])
>         ilen = ftell(f);
>         olen = get_unaligned_le32(&olen);
>
> -       run_size = atoi(argv[2]);
> -
>         printf(".section \".rodata..compressed\",\"a\",@progbits\n");
>         printf(".globl z_input_len\n");
>         printf("z_input_len = %lu\n", ilen);
>         printf(".globl z_output_len\n");
>         printf("z_output_len = %lu\n", (unsigned long)olen);
> -       printf(".globl z_run_size\n");
> -       printf("z_run_size = %lu\n", run_size);
>
>         printf(".globl input_data, input_data_end\n");
>         printf("input_data:\n");
> diff --git a/arch/x86/tools/calc_run_size.sh b/arch/x86/tools/calc_run_size.sh
> deleted file mode 100644
> index 1a4c17b..0000000
> --- a/arch/x86/tools/calc_run_size.sh
> +++ /dev/null
> @@ -1,42 +0,0 @@
> -#!/bin/sh
> -#
> -# Calculate the amount of space needed to run the kernel, including room for
> -# the .bss and .brk sections.
> -#
> -# Usage:
> -# objdump -h a.out | sh calc_run_size.sh
> -
> -NUM='\([0-9a-fA-F]*[ \t]*\)'
> -OUT=$(sed -n 's/^[ \t0-9]*.b[sr][sk][ \t]*'"$NUM$NUM$NUM$NUM"'.*/\1\4/p')
> -if [ -z "$OUT" ] ; then
> -       echo "Never found .bss or .brk file offset" >&2
> -       exit 1
> -fi
> -
> -OUT=$(echo ${OUT# })
> -sizeA=$(printf "%d" 0x${OUT%% *})
> -OUT=${OUT#* }
> -offsetA=$(printf "%d" 0x${OUT%% *})
> -OUT=${OUT#* }
> -sizeB=$(printf "%d" 0x${OUT%% *})
> -OUT=${OUT#* }
> -offsetB=$(printf "%d" 0x${OUT%% *})
> -
> -run_size=$(( $offsetA + $sizeA + $sizeB ))
> -
> -# BFD linker shows the same file offset in ELF.
> -if [ "$offsetA" -ne "$offsetB" ] ; then
> -       # Gold linker shows them as consecutive.
> -       endB=$(( $offsetB + $sizeB ))
> -       if [ "$endB" != "$run_size" ] ; then
> -               printf "sizeA: 0x%x\n" $sizeA >&2
> -               printf "offsetA: 0x%x\n" $offsetA >&2
> -               printf "sizeB: 0x%x\n" $sizeB >&2
> -               printf "offsetB: 0x%x\n" $offsetB >&2
> -               echo ".bss and .brk are non-contiguous" >&2
> -               exit 1
> -       fi
> -fi
> -
> -printf "%d\n" $run_size
> -exit 0
> --
> 2.5.0
>



-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 07/19] x86, kaslr: Get correct max_addr for relocs pointer
  2016-03-04 16:25 ` [PATCH v3 07/19] x86, kaslr: Get correct max_addr for relocs pointer Baoquan He
@ 2016-03-07 23:16   ` Kees Cook
  2016-03-08  5:13     ` Baoquan He
  0 siblings, 1 reply; 51+ messages in thread
From: Kees Cook @ 2016-03-07 23:16 UTC (permalink / raw)
  To: Baoquan He
  Cc: LKML, Yinghai Lu, H. Peter Anvin, Vivek Goyal, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, lasse.collin, Andrew Morton,
	Dave Young

On Fri, Mar 4, 2016 at 8:25 AM, Baoquan He <bhe@redhat.com> wrote:
> From: Yinghai Lu <yinghai@kernel.org>
>
> There is boundary checking for pointer in kaslr relocation handling.
>
> Current code is using output_len, and that is VO (vmlinux after objcopy)
> file size plus vmlinux.relocs file size.
>
> That is not right, as we should use loaded address for running.

I think this needs clarification. If I'm understanding correctly, the
max_addr check should stop at the end of VO, and not include bss, brk,
etc. which follows. In which case, this change is correct: the bounds
checking needed to be tightened.

-Kees

>
> At that time parse_elf already move the sections according to ELF headers.
>
> The valid range should be VO [_text, __bss_start) loaded physical addresses.
>
> In the patch, add export for __bss_start to voffset.h and use it to get
> max_addr.
>
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> ---
> v2->v3:
>     Tune the patch log.
>
>  arch/x86/boot/compressed/Makefile | 2 +-
>  arch/x86/boot/compressed/misc.c   | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
> index fef80fa..2e7c0ce 100644
> --- a/arch/x86/boot/compressed/Makefile
> +++ b/arch/x86/boot/compressed/Makefile
> @@ -41,7 +41,7 @@ LDFLAGS_vmlinux := -T
>  hostprogs-y    := mkpiggy
>  HOST_EXTRACFLAGS += -I$(srctree)/tools/include
>
> -sed-voffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(_text\|_end\)$$/\#define VO_\2 _AC(0x\1,UL)/p'
> +sed-voffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(_text\|__bss_start\|_end\)$$/\#define VO_\2 _AC(0x\1,UL)/p'
>
>  quiet_cmd_voffset = VOFFSET $@
>        cmd_voffset = $(NM) $< | sed -n $(sed-voffset) > $@
> diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
> index 069120e..dd7ed8a 100644
> --- a/arch/x86/boot/compressed/misc.c
> +++ b/arch/x86/boot/compressed/misc.c
> @@ -259,7 +259,7 @@ static void handle_relocations(void *output, unsigned long output_len)
>         int *reloc;
>         unsigned long delta, map, ptr;
>         unsigned long min_addr = (unsigned long)output;
> -       unsigned long max_addr = min_addr + output_len;
> +       unsigned long max_addr = min_addr + (VO___bss_start - VO__text);
>
>         /*
>          * Calculate the delta between where vmlinux was linked to load
> --
> 2.5.0
>



-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 08/19] x86, kaslr: Consolidate mem_avoid array filling
  2016-03-04 16:25 ` [PATCH v3 08/19] x86, kaslr: Consolidate mem_avoid array filling Baoquan He
@ 2016-03-07 23:28   ` Kees Cook
  2016-03-08  5:21     ` Baoquan He
  0 siblings, 1 reply; 51+ messages in thread
From: Kees Cook @ 2016-03-07 23:28 UTC (permalink / raw)
  To: Baoquan He
  Cc: LKML, Yinghai Lu, H. Peter Anvin, Vivek Goyal, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, lasse.collin, Andrew Morton,
	Dave Young

On Fri, Mar 4, 2016 at 8:25 AM, Baoquan He <bhe@redhat.com> wrote:
> From: Yinghai Lu <yinghai@kernel.org>
>
> We are going to support kaslr with 64bit above 4G, and new random output
> could be anywhere. Array mem_avoid is used for kaslr to search new output
> address. Current code only track range that is after output+output_size.
> So we need to track all ranges instead of just after output+output_size.
>
> Current code has first entry which is extra bytes before input+input_size,
> and it is according to output_size. Other entries are for initrd, cmdline,
> and heap/stack for ZO running.
>
> At first, let's check the first entry that should be in the mem_avoid array.
> Now ZO sit end of the buffer always, we can find out where is text and
> data/bss etc of ZO.
>
> Since init_size >= run_size, and input+input_len >= output+output_len,
> here make several assumptions for better presentation by graph:
>  - init_size > run_size
>  - input+input_len > output+output_len
>  - run_size > output_len

I would like to see each of these assumptions justified. Why is
init_size > run_size, etc?
choose_kernel_location's "output_size" is calculated as max(run_size,
output_len), so run_size may not be > output_len...

>
> 0   output                       input             input+input_len          output+init_size
> |     |                            |                       |                       |
> |-----|-------------------|--------|------------------|----|------------|----------|
>                           |                           |                 |
>              output+init_size-ZO_INIT_SIZE    output+output_len    output+run_size
>
> [output, output+init_size) is the for decompressing buffer for compressed
> kernel.
>
> [output, output+run_size) is for VO run size.
> [output, output+output_len) is (VO (vmlinux after objcopy) plus relocs)
>
> [output+init_size-ZO_INIT_SIZE, output+init_size) is copied ZO.
> [input, input+input_len) is copied compressed (VO (vmlinux after objcopy)
> plus relocs), not the ZO.
>
> [input+input_len, output+init_size) is [_text, _end) for ZO. that could be
> first range in mem_avoid. Now the new first entry already includes heap and
> stack for ZO running. So no need to put them separately into mem_avoid array.
>
> Also [input, input+input_size) need be put in mem_avoid array. It is adjacent
> to new first entry, so merge them.

I wonder if this diagram and description should live in a comment with the code.

-Kees

>
> At last we need to put boot_params into the mem_avoid too. As with 64bit
> bootloader could put it anywhere.
>
> After these changes, we have all ranges which need be avoided in mem_avoid
> array.
>
> Cc: Kees Cook <keescook@chromium.org>
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> ---
> v2->v3:
>     Adjust the patch log.
>
>  arch/x86/boot/compressed/aslr.c | 29 +++++++++++++----------------
>  1 file changed, 13 insertions(+), 16 deletions(-)
>
> diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
> index 622aa88..b93be03 100644
> --- a/arch/x86/boot/compressed/aslr.c
> +++ b/arch/x86/boot/compressed/aslr.c
> @@ -109,7 +109,7 @@ struct mem_vector {
>         unsigned long size;
>  };
>
> -#define MEM_AVOID_MAX 5
> +#define MEM_AVOID_MAX 4
>  static struct mem_vector mem_avoid[MEM_AVOID_MAX];
>
>  static bool mem_contains(struct mem_vector *region, struct mem_vector *item)
> @@ -135,21 +135,22 @@ static bool mem_overlaps(struct mem_vector *one, struct mem_vector *two)
>  }
>
>  static void mem_avoid_init(unsigned long input, unsigned long input_size,
> -                          unsigned long output, unsigned long output_size)
> +                          unsigned long output)
>  {
> +       unsigned long init_size = real_mode->hdr.init_size;
>         u64 initrd_start, initrd_size;
>         u64 cmd_line, cmd_line_size;
> -       unsigned long unsafe, unsafe_len;
>         char *ptr;
>
>         /*
>          * Avoid the region that is unsafe to overlap during
> -        * decompression (see calculations at top of misc.c).
> +        * decompression.
> +        * As we already move ZO (arch/x86/boot/compressed/vmlinux)
> +        * to the end of buffer, [input+input_size, output+init_size)
> +        * has [_text, _end) for ZO.
>          */
> -       unsafe_len = (output_size >> 12) + 32768 + 18;
> -       unsafe = (unsigned long)input + input_size - unsafe_len;
> -       mem_avoid[0].start = unsafe;
> -       mem_avoid[0].size = unsafe_len;
> +       mem_avoid[0].start = input;
> +       mem_avoid[0].size = (output + init_size) - input;
>
>         /* Avoid initrd. */
>         initrd_start  = (u64)real_mode->ext_ramdisk_image << 32;
> @@ -169,13 +170,9 @@ static void mem_avoid_init(unsigned long input, unsigned long input_size,
>         mem_avoid[2].start = cmd_line;
>         mem_avoid[2].size = cmd_line_size;
>
> -       /* Avoid heap memory. */
> -       mem_avoid[3].start = (unsigned long)free_mem_ptr;
> -       mem_avoid[3].size = BOOT_HEAP_SIZE;
> -
> -       /* Avoid stack memory. */
> -       mem_avoid[4].start = (unsigned long)free_mem_end_ptr;
> -       mem_avoid[4].size = BOOT_STACK_SIZE;
> +       /* Avoid params */
> +       mem_avoid[3].start = (unsigned long)real_mode;
> +       mem_avoid[3].size = sizeof(*real_mode);
>  }
>
>  /* Does this memory vector overlap a known avoided area? */
> @@ -319,7 +316,7 @@ unsigned char *choose_kernel_location(unsigned char *input,
>
>         /* Record the various known unsafe memory ranges. */
>         mem_avoid_init((unsigned long)input, input_size,
> -                      (unsigned long)output, output_size);
> +                      (unsigned long)output);
>
>         /* Walk e820 and find a random address. */
>         random = find_random_addr(choice, output_size);
> --
> 2.5.0
>



-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 12/19] x86, kaslr: Fix a bug that relocation can not be handled when kernel is loaded above 2G
  2016-03-04 16:25 ` [PATCH v3 12/19] x86, kaslr: Fix a bug that relocation can not be handled when kernel is loaded above 2G Baoquan He
@ 2016-03-07 23:30   ` Kees Cook
  2016-03-08  5:22     ` Baoquan He
  0 siblings, 1 reply; 51+ messages in thread
From: Kees Cook @ 2016-03-07 23:30 UTC (permalink / raw)
  To: Baoquan He
  Cc: LKML, Yinghai Lu, H. Peter Anvin, Vivek Goyal, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, lasse.collin, Andrew Morton,
	Dave Young

On Fri, Mar 4, 2016 at 8:25 AM, Baoquan He <bhe@redhat.com> wrote:
> When process 32 bit relocation tables a local variable 'extended'
> is defined to calculate the physical address of relocs entry.
> However its type is 'int' which is enough for i386, but not enough
> for x86_64. That's why relocation can only be handled under 2G.
> Otherwise a overflow will happen and cause system hang.
>
> Here change it to 'long' as 32 bit inverse relocation processing
> does, and this change is safe for i386 relocation handling.
>
> Signed-off-by: Baoquan He <bhe@redhat.com>

Acked-by: Kees Cook <keescook@chromium.org>

This patch should probably be moved up into the start of this series
instead of living in the middle here, since it's a clear stand-alone
bug fix.

-Kees

> ---
>  arch/x86/boot/compressed/misc.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
> index 4b2cd0c..70445c3 100644
> --- a/arch/x86/boot/compressed/misc.c
> +++ b/arch/x86/boot/compressed/misc.c
> @@ -298,7 +298,7 @@ static void handle_relocations(void *output, unsigned long output_len)
>          * So we work backwards from the end of the decompressed image.
>          */
>         for (reloc = output + output_len - sizeof(*reloc); *reloc; reloc--) {
> -               int extended = *reloc;
> +               long extended = *reloc;
>                 extended += map;
>
>                 ptr = (unsigned long)extended;
> --
> 2.5.0
>



-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 10/19] x86, 64bit: Set ident_mapping for kaslr
  2016-03-04 16:25 ` [PATCH v3 10/19] x86, 64bit: Set ident_mapping for kaslr Baoquan He
@ 2016-03-07 23:34   ` Kees Cook
  2016-03-08  5:25     ` Baoquan He
  2016-03-21  7:50     ` Baoquan He
  0 siblings, 2 replies; 51+ messages in thread
From: Kees Cook @ 2016-03-07 23:34 UTC (permalink / raw)
  To: Baoquan He
  Cc: LKML, Yinghai Lu, H. Peter Anvin, Vivek Goyal, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, lasse.collin, Andrew Morton,
	Dave Young, Jiri Kosina, Borislav Petkov, Matt Fleming

On Fri, Mar 4, 2016 at 8:25 AM, Baoquan He <bhe@redhat.com> wrote:
> From: Yinghai Lu <yinghai@kernel.org>
>
> Current aslr only support random in small range, from 16M to 1G.  And
> new range still use old mapping. Also it does not support new range
> above 4G.
>
> We need to have ident mapping for the new range before we can do
> decompress to the new output, and later run them.
>
> In this patch, we add ident mapping for all needed range.
>
> At first, to support aslr to put random VO above 4G, we must set ident
> mapping for the new range when it come via startup_32 path.
>
> Secondly, when boot from 64bit bootloader, bootloader set ident mapping,
> and boot via ZO (arch/x86/boot/compressed/vmlinux) startup_64.
> Those pages for pagetable need to be avoided when we select new random
> VO (vmlinux) base. Otherwise decompressor would overwrite them during
> decompressing.
> First way would be: walk through pagetable and find out every page is used
> by pagetable for every mem_aovid checking but we will need extra code, and
> may need to increase mem_avoid array size to hold them.
> Other way would be: We can create new ident mapping instead, and pages for
> pagetable will come from _pagetable section of ZO, and they are in
> mem_avoid array already. In this way, we can reuse the code for ident
> mapping.
>
> The _pgtable will be shared by 32bit and 64bit path to reduce init_size,
> as now ZO _rodata to _end will contribute init_size.
>
> We need to increase pgt buffer size.
> When boot via startup_64, as we need to cover old VO, params, cmdline
> and new VO, in extreme case we could have them all cross 512G boundary,
> will need (2+2)*4 pages with 2M mapping. And need 2 for first 2M for vga
> ram. Plus one for level4. Total will be 19 pages.
> When boot via startup_32, aslr would move new VO above 4G, we need set
> extra ident mapping for new VO, pgt buffer come from _pgtable offset 6
> pages. Should only need (2+2) pages at most when it cross 512G boundary.
> So 19 pages could make both paths happy.
>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Jiri Kosina <jkosina@suse.cz>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: Matt Fleming <matt.fleming@intel.com>
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> ---
>  arch/x86/boot/compressed/Makefile   |  3 ++
>  arch/x86/boot/compressed/aslr.c     | 14 ++++++
>  arch/x86/boot/compressed/head_64.S  |  4 +-
>  arch/x86/boot/compressed/misc.h     | 11 +++++
>  arch/x86/boot/compressed/misc_pgt.c | 91 +++++++++++++++++++++++++++++++++++++
>  arch/x86/include/asm/boot.h         | 19 ++++++++
>  6 files changed, 140 insertions(+), 2 deletions(-)
>  create mode 100644 arch/x86/boot/compressed/misc_pgt.c
>
> diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
> index 2e7c0ce..229604d 100644
> --- a/arch/x86/boot/compressed/Makefile
> +++ b/arch/x86/boot/compressed/Makefile
> @@ -59,6 +59,9 @@ vmlinux-objs-y := $(obj)/vmlinux.lds $(obj)/head_$(BITS).o $(obj)/misc.o \
>
>  vmlinux-objs-$(CONFIG_EARLY_PRINTK) += $(obj)/early_serial_console.o
>  vmlinux-objs-$(CONFIG_RANDOMIZE_BASE) += $(obj)/aslr.o
> +ifdef CONFIG_X86_64
> +       vmlinux-objs-$(CONFIG_RANDOMIZE_BASE) += $(obj)/misc_pgt.o
> +endif
>
>  $(obj)/eboot.o: KBUILD_CFLAGS += -fshort-wchar -mno-red-zone
>
> diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
> index b93be03..35f2eac 100644
> --- a/arch/x86/boot/compressed/aslr.c
> +++ b/arch/x86/boot/compressed/aslr.c
> @@ -151,6 +151,7 @@ static void mem_avoid_init(unsigned long input, unsigned long input_size,
>          */
>         mem_avoid[0].start = input;
>         mem_avoid[0].size = (output + init_size) - input;
> +       fill_pagetable(input, (output + init_size) - input);

These calls should juse use the .start and .size arguments instead of
the copy/paste.

>
>         /* Avoid initrd. */
>         initrd_start  = (u64)real_mode->ext_ramdisk_image << 32;
> @@ -159,6 +160,7 @@ static void mem_avoid_init(unsigned long input, unsigned long input_size,
>         initrd_size |= real_mode->hdr.ramdisk_size;
>         mem_avoid[1].start = initrd_start;
>         mem_avoid[1].size = initrd_size;
> +       /* don't need to set mapping for initrd */
>
>         /* Avoid kernel command line. */
>         cmd_line  = (u64)real_mode->ext_cmd_line_ptr << 32;
> @@ -169,10 +171,19 @@ static void mem_avoid_init(unsigned long input, unsigned long input_size,
>                 ;
>         mem_avoid[2].start = cmd_line;
>         mem_avoid[2].size = cmd_line_size;
> +       fill_pagetable(cmd_line, cmd_line_size);
>
>         /* Avoid params */
>         mem_avoid[3].start = (unsigned long)real_mode;
>         mem_avoid[3].size = sizeof(*real_mode);
> +       fill_pagetable((unsigned long)real_mode, sizeof(*real_mode));
> +
> +       /* don't need to set mapping for setup_data */
> +
> +#ifdef CONFIG_X86_VERBOSE_BOOTUP
> +       /* for video ram */
> +       fill_pagetable(0, PMD_SIZE);
> +#endif
>  }
>
>  /* Does this memory vector overlap a known avoided area? */
> @@ -330,6 +341,9 @@ unsigned char *choose_kernel_location(unsigned char *input,
>                 goto out;
>
>         choice = random;
> +
> +       fill_pagetable(choice, output_size);
> +       switch_pagetable();
>  out:
>         return (unsigned char *)choice;
>  }
> diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
> index 3691451..075bb15 100644
> --- a/arch/x86/boot/compressed/head_64.S
> +++ b/arch/x86/boot/compressed/head_64.S
> @@ -126,7 +126,7 @@ ENTRY(startup_32)
>         /* Initialize Page tables to 0 */
>         leal    pgtable(%ebx), %edi
>         xorl    %eax, %eax
> -       movl    $((4096*6)/4), %ecx
> +       movl    $(BOOT_INIT_PGT_SIZE/4), %ecx
>         rep     stosl
>
>         /* Build Level 4 */
> @@ -478,4 +478,4 @@ boot_stack_end:
>         .section ".pgtable","a",@nobits
>         .balign 4096
>  pgtable:
> -       .fill 6*4096, 1, 0
> +       .fill BOOT_PGT_SIZE, 1, 0
> diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
> index dcf01c2..11736a6 100644
> --- a/arch/x86/boot/compressed/misc.h
> +++ b/arch/x86/boot/compressed/misc.h
> @@ -84,6 +84,17 @@ unsigned char *choose_kernel_location(unsigned char *input,
>  }
>  #endif
>
> +#ifdef CONFIG_X86_64
> +void fill_pagetable(unsigned long start, unsigned long size);
> +void switch_pagetable(void);
> +extern unsigned char _pgtable[];
> +#else
> +static inline void fill_pagetable(unsigned long start, unsigned long size)
> +{ }
> +static inline void switch_pagetable(void)
> +{ }
> +#endif
> +
>  #ifdef CONFIG_EARLY_PRINTK
>  /* early_serial_console.c */
>  extern int early_serial_base;
> diff --git a/arch/x86/boot/compressed/misc_pgt.c b/arch/x86/boot/compressed/misc_pgt.c
> new file mode 100644
> index 0000000..954811e
> --- /dev/null
> +++ b/arch/x86/boot/compressed/misc_pgt.c
> @@ -0,0 +1,91 @@
> +#define __pa(x)  ((unsigned long)(x))
> +#define __va(x)  ((void *)((unsigned long)(x)))
> +
> +#include "misc.h"
> +
> +#include <asm/init.h>
> +#include <asm/pgtable.h>
> +
> +#include "../../mm/ident_map.c"
> +#include "../string.h"
> +
> +struct alloc_pgt_data {
> +       unsigned char *pgt_buf;
> +       unsigned long pgt_buf_size;
> +       unsigned long pgt_buf_offset;
> +};
> +
> +static void *alloc_pgt_page(void *context)
> +{
> +       struct alloc_pgt_data *d = (struct alloc_pgt_data *)context;
> +       unsigned char *p = (unsigned char *)d->pgt_buf;
> +
> +       if (d->pgt_buf_offset >= d->pgt_buf_size) {
> +               debug_putstr("out of pgt_buf in misc.c\n");

This line probably isn't needed any more. Or, if it is, maybe it could
include more details from the context?

-Kees

> +               return NULL;
> +       }
> +
> +       p += d->pgt_buf_offset;
> +       d->pgt_buf_offset += PAGE_SIZE;
> +
> +       return p;
> +}
> +
> +/*
> + * Use a normal definition of memset() from string.c. There are already
> + * included header files which expect a definition of memset() and by
> + * the time we define memset macro, it is too late.
> + */
> +#undef memset
> +
> +unsigned long __force_order;
> +static struct alloc_pgt_data pgt_data;
> +static struct x86_mapping_info mapping_info;
> +static pgd_t *level4p;
> +
> +void fill_pagetable(unsigned long start, unsigned long size)
> +{
> +       unsigned long end = start + size;
> +
> +       if (!level4p) {
> +               pgt_data.pgt_buf_offset = 0;
> +               mapping_info.alloc_pgt_page = alloc_pgt_page;
> +               mapping_info.context = &pgt_data;
> +               mapping_info.pmd_flag = __PAGE_KERNEL_LARGE_EXEC;
> +
> +               /*
> +                * come from startup_32 ?
> +                * then cr3 is _pgtable, we can reuse it.
> +                */
> +               level4p = (pgd_t *)read_cr3();
> +               if ((unsigned long)level4p == (unsigned long)_pgtable) {
> +                       pgt_data.pgt_buf = (unsigned char *)_pgtable +
> +                                                BOOT_INIT_PGT_SIZE;
> +                       pgt_data.pgt_buf_size = BOOT_PGT_SIZE -
> +                                                BOOT_INIT_PGT_SIZE;
> +                       memset((unsigned char *)pgt_data.pgt_buf, 0,
> +                               pgt_data.pgt_buf_size);
> +                       debug_putstr("boot via startup_32\n");
> +               } else {
> +                       pgt_data.pgt_buf = (unsigned char *)_pgtable;
> +                       pgt_data.pgt_buf_size = BOOT_PGT_SIZE;
> +                       memset((unsigned char *)pgt_data.pgt_buf, 0,
> +                               pgt_data.pgt_buf_size);
> +                       debug_putstr("boot via startup_64\n");
> +                       level4p = (pgd_t *)alloc_pgt_page(&pgt_data);
> +               }
> +       }
> +
> +       /* align boundary to 2M */
> +       start = round_down(start, PMD_SIZE);
> +       end = round_up(end, PMD_SIZE);
> +       if (start >= end)
> +               return;
> +
> +       kernel_ident_mapping_init(&mapping_info, level4p, start, end);
> +}
> +
> +void switch_pagetable(void)
> +{
> +       write_cr3((unsigned long)level4p);
> +}
> diff --git a/arch/x86/include/asm/boot.h b/arch/x86/include/asm/boot.h
> index 6b8d6e8..52a9cbc 100644
> --- a/arch/x86/include/asm/boot.h
> +++ b/arch/x86/include/asm/boot.h
> @@ -32,7 +32,26 @@
>  #endif /* !CONFIG_KERNEL_BZIP2 */
>
>  #ifdef CONFIG_X86_64
> +
>  #define BOOT_STACK_SIZE        0x4000
> +
> +#define BOOT_INIT_PGT_SIZE (6*4096)
> +#ifdef CONFIG_RANDOMIZE_BASE
> +/*
> + * 1 page for level4, 2 pages for first 2M.
> + * (2+2)*4 pages for kernel, param, cmd_line, random kernel
> + * if all cross 512G boundary.
> + * So total will be 19 pages.
> + */
> +#ifdef CONFIG_X86_VERBOSE_BOOTUP
> +#define BOOT_PGT_SIZE (19*4096)
> +#else
> +#define BOOT_PGT_SIZE (17*4096)
> +#endif
> +#else
> +#define BOOT_PGT_SIZE BOOT_INIT_PGT_SIZE
> +#endif
> +
>  #else
>  #define BOOT_STACK_SIZE        0x1000
>  #endif
> --
> 2.5.0
>



-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 11/19] x86, boot: Add checking for memcpy
  2016-03-04 16:25 ` [PATCH v3 11/19] x86, boot: Add checking for memcpy Baoquan He
@ 2016-03-07 23:36   ` Kees Cook
  2016-03-08  5:30     ` Baoquan He
  0 siblings, 1 reply; 51+ messages in thread
From: Kees Cook @ 2016-03-07 23:36 UTC (permalink / raw)
  To: Baoquan He
  Cc: LKML, Yinghai Lu, H. Peter Anvin, Vivek Goyal, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, lasse.collin, Andrew Morton,
	Dave Young

On Fri, Mar 4, 2016 at 8:25 AM, Baoquan He <bhe@redhat.com> wrote:
> From: Yinghai Lu <yinghai@kernel.org>
>
> parse_elf is using local memcpy to move section to running position.
> That memcpy actually only support no overlapping case or when dest < src.
>
> Add checking in memcpy to find out the wrong case for future use, at
> that time we will need to have backward memcpy for it.
>
> Also add comments in parse_elf about the fact.

Seems like this would be better to just fix the memcpy to handle the overlap?

-Kees

>
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> ---
> v2->v3:
>     Add a declaration for error() since its declaration is in misc.h.
>     But it's not included in compressed/string.c.
>
>  arch/x86/boot/compressed/misc.c   | 14 +++++++-------
>  arch/x86/boot/compressed/misc.h   |  2 ++
>  arch/x86/boot/compressed/string.c | 29 +++++++++++++++++++++++++++--
>  3 files changed, 36 insertions(+), 9 deletions(-)
>
> diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
> index dd7ed8a..4b2cd0c 100644
> --- a/arch/x86/boot/compressed/misc.c
> +++ b/arch/x86/boot/compressed/misc.c
> @@ -114,9 +114,6 @@
>  #undef memset
>  #define memzero(s, n)  memset((s), 0, (n))
>
> -
> -static void error(char *m);
> -
>  /*
>   * This is set up by the setup-routine at boot-time
>   */
> @@ -243,7 +240,7 @@ void __puthex(unsigned long value)
>         }
>  }
>
> -static void error(char *x)
> +void error(char *x)
>  {
>         error_putstr("\n\n");
>         error_putstr(x);
> @@ -378,9 +375,12 @@ static void parse_elf(void *output)
>  #else
>                         dest = (void *)(phdr->p_paddr);
>  #endif
> -                       memcpy(dest,
> -                              output + phdr->p_offset,
> -                              phdr->p_filesz);
> +                       /*
> +                        * simple version memcpy only can work when dest is
> +                        *   smaller than src or no overlapping.
> +                        * Here dest is smaller than src always.
> +                        */
> +                       memcpy(dest, output + phdr->p_offset, phdr->p_filesz);
>                         break;
>                 default: /* Ignore other PT_* */ break;
>                 }
> diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
> index 11736a6..39d0e9a 100644
> --- a/arch/x86/boot/compressed/misc.h
> +++ b/arch/x86/boot/compressed/misc.h
> @@ -38,6 +38,8 @@ void __puthex(unsigned long value);
>  #define error_putstr(__x)  __putstr(__x)
>  #define error_puthex(__x)  __puthex(__x)
>
> +void error(char *x);
> +
>  #ifdef CONFIG_X86_VERBOSE_BOOTUP
>
>  #define debug_putstr(__x)  __putstr(__x)
> diff --git a/arch/x86/boot/compressed/string.c b/arch/x86/boot/compressed/string.c
> index 00e788b..3a935d0 100644
> --- a/arch/x86/boot/compressed/string.c
> +++ b/arch/x86/boot/compressed/string.c
> @@ -1,7 +1,7 @@
>  #include "../string.c"
>
>  #ifdef CONFIG_X86_32
> -void *memcpy(void *dest, const void *src, size_t n)
> +void *__memcpy(void *dest, const void *src, size_t n)
>  {
>         int d0, d1, d2;
>         asm volatile(
> @@ -15,7 +15,7 @@ void *memcpy(void *dest, const void *src, size_t n)
>         return dest;
>  }
>  #else
> -void *memcpy(void *dest, const void *src, size_t n)
> +void *__memcpy(void *dest, const void *src, size_t n)
>  {
>         long d0, d1, d2;
>         asm volatile(
> @@ -30,6 +30,31 @@ void *memcpy(void *dest, const void *src, size_t n)
>  }
>  #endif
>
> +extern void error(char *x);
> +void *memcpy(void *dest, const void *src, size_t n)
> +{
> +       unsigned long start_dest, end_dest;
> +       unsigned long start_src, end_src;
> +       unsigned long max_start, min_end;
> +
> +       if (dest < src)
> +               return __memcpy(dest, src, n);
> +
> +       start_dest = (unsigned long)dest;
> +       end_dest = (unsigned long)dest + n;
> +       start_src = (unsigned long)src;
> +       end_src = (unsigned long)src + n;
> +       max_start = (start_dest > start_src) ?  start_dest : start_src;
> +       min_end = (end_dest < end_src) ? end_dest : end_src;
> +
> +       if (max_start >= min_end)
> +               return __memcpy(dest, src, n);
> +
> +       error("memcpy does not support overlapping with dest > src!\n");
> +
> +       return dest;
> +}
> +
>  void *memset(void *s, int c, size_t n)
>  {
>         int i;
> --
> 2.5.0
>



-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 16/19] x86, kaslr: Randomize physical and virtual address of kernel separately
  2016-03-04 16:25 ` [PATCH v3 16/19] x86, kaslr: Randomize physical and virtual address of kernel separately Baoquan He
@ 2016-03-07 23:51   ` Kees Cook
  2016-03-08  5:34     ` Baoquan He
  0 siblings, 1 reply; 51+ messages in thread
From: Kees Cook @ 2016-03-07 23:51 UTC (permalink / raw)
  To: Baoquan He
  Cc: LKML, Yinghai Lu, H. Peter Anvin, Vivek Goyal, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, lasse.collin, Andrew Morton,
	Dave Young

On Fri, Mar 4, 2016 at 8:25 AM, Baoquan He <bhe@redhat.com> wrote:
> On x86_64, in old kaslr implementaion only physical address of kernel
> loading is randomized. Then calculate the delta of physical address
> where vmlinux was linked to load and where it is finally loaded. If
> delta is not equal to 0, namely there's a new physical address where
> kernel is actually decompressed, relocation handling need be done. Then
> delta is added to offset of kernel symbol relocation, this makes the
> address of kernel text mapping move delta long.
>
> Here the behavior is changed. Randomize both the physical address
> where kernel is decompressed and the virtual address where kernel text
> is mapped. And relocation handling only depends on virtual address
> randomization. Means if and only if virtual address is randomized to
> a different value, we add the delta to the offset of kernel relocs.
>
> Note that up to now both virtual offset and physical addr randomization
> cann't exceed CONFIG_RANDOMIZE_BASE_MAX_OFFSET, namely 1G.

It seems like CONFIG_RANDOMIZE_BASE_MAX_OFFSET should have been
eliminated when the on-demand page table code was added. Once that was
added, there's no physical max any more. And virtual randomization
should have no max at all.

-Kees

>
> Signed-off-by: Baoquan He <bhe@redhat.com>
> ---
>  arch/x86/boot/compressed/aslr.c | 46 +++++++++++++++++++++--------------------
>  arch/x86/boot/compressed/misc.c | 40 +++++++++++++++++++++--------------
>  arch/x86/boot/compressed/misc.h | 19 +++++++++--------
>  3 files changed, 59 insertions(+), 46 deletions(-)
>
> diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
> index 3730a33..75ea931 100644
> --- a/arch/x86/boot/compressed/aslr.c
> +++ b/arch/x86/boot/compressed/aslr.c
> @@ -349,7 +349,7 @@ static void process_e820_entry(struct e820entry *entry,
>         }
>  }
>
> -static unsigned long find_random_addr(unsigned long minimum,
> +static unsigned long find_random_phy_addr(unsigned long minimum,
>                                       unsigned long size)
>  {
>         int i;
> @@ -387,23 +387,24 @@ static unsigned long find_random_virt_offset(unsigned long minimum,
>         return random * CONFIG_PHYSICAL_ALIGN + minimum;
>  }
>
> -unsigned char *choose_kernel_location(unsigned char *input,
> -                                     unsigned long input_size,
> -                                     unsigned char *output,
> -                                     unsigned long output_size)
> +void choose_kernel_location(unsigned char *input,
> +                               unsigned long input_size,
> +                               unsigned char **output,
> +                               unsigned long output_size,
> +                               unsigned char **virt_offset)
>  {
> -       unsigned long choice = (unsigned long)output;
>         unsigned long random;
> +       *virt_offset = (unsigned char *)LOAD_PHYSICAL_ADDR;
>
>  #ifdef CONFIG_HIBERNATION
>         if (!cmdline_find_option_bool("kaslr")) {
>                 debug_putstr("KASLR disabled by default...\n");
> -               goto out;
> +               return;
>         }
>  #else
>         if (cmdline_find_option_bool("nokaslr")) {
>                 debug_putstr("KASLR disabled by cmdline...\n");
> -               goto out;
> +               return;
>         }
>  #endif
>
> @@ -411,23 +412,24 @@ unsigned char *choose_kernel_location(unsigned char *input,
>
>         /* Record the various known unsafe memory ranges. */
>         mem_avoid_init((unsigned long)input, input_size,
> -                      (unsigned long)output);
> +                      (unsigned long)*output);
>
>         /* Walk e820 and find a random address. */
> -       random = find_random_addr(choice, output_size);
> -       if (!random) {
> +       random = find_random_phy_addr((unsigned long)*output, output_size);
> +       if (!random)
>                 debug_putstr("KASLR could not find suitable E820 region...\n");
> -               goto out;
> +       else {
> +               if ((unsigned long)*output != random) {
> +                       fill_pagetable(random, output_size);
> +                       switch_pagetable();
> +                       *output = (unsigned char *)random;
> +               }
>         }
>
> -       /* Always enforce the minimum. */
> -       if (random < choice)
> -               goto out;
> -
> -       choice = random;
> -
> -       fill_pagetable(choice, output_size);
> -       switch_pagetable();
> -out:
> -       return (unsigned char *)choice;
> +       /*
> +        * Get a random address between LOAD_PHYSICAL_ADDR and
> +        * CONFIG_RANDOMIZE_BASE_MAX_OFFSET
> +        */
> +       random = find_random_virt_offset(LOAD_PHYSICAL_ADDR, output_size);
> +       *virt_offset = (unsigned char *)random;
>  }
> diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
> index 70445c3..362d508 100644
> --- a/arch/x86/boot/compressed/misc.c
> +++ b/arch/x86/boot/compressed/misc.c
> @@ -251,7 +251,8 @@ void error(char *x)
>  }
>
>  #if CONFIG_X86_NEED_RELOCS
> -static void handle_relocations(void *output, unsigned long output_len)
> +static void handle_relocations(void *output, unsigned long output_len,
> +                              void *virt_offset)
>  {
>         int *reloc;
>         unsigned long delta, map, ptr;
> @@ -263,11 +264,6 @@ static void handle_relocations(void *output, unsigned long output_len)
>          * and where it was actually loaded.
>          */
>         delta = min_addr - LOAD_PHYSICAL_ADDR;
> -       if (!delta) {
> -               debug_putstr("No relocation needed... ");
> -               return;
> -       }
> -       debug_putstr("Performing relocations... ");
>
>         /*
>          * The kernel contains a table of relocation addresses. Those
> @@ -278,6 +274,22 @@ static void handle_relocations(void *output, unsigned long output_len)
>          */
>         map = delta - __START_KERNEL_map;
>
> +
> +
> +       /*
> +        * 32-bit always performs relocations. 64-bit relocations are only
> +        * needed if kASLR has chosen a different starting address offset
> +        * from __START_KERNEL_map.
> +        */
> +       if (IS_ENABLED(CONFIG_X86_64))
> +               delta = (unsigned long)virt_offset - LOAD_PHYSICAL_ADDR;
> +
> +       if (!delta) {
> +               debug_putstr("No relocation needed... ");
> +               return;
> +       }
> +       debug_putstr("Performing relocations... ");
> +
>         /*
>          * Process relocations: 32 bit relocations first then 64 bit after.
>          * Three sets of binary relocations are added to the end of the kernel
> @@ -331,7 +343,8 @@ static void handle_relocations(void *output, unsigned long output_len)
>  #endif
>  }
>  #else
> -static inline void handle_relocations(void *output, unsigned long output_len)
> +static inline void handle_relocations(void *output, unsigned long output_len,
> +                                     void *virt_offset)
>  { }
>  #endif
>
> @@ -397,6 +410,7 @@ asmlinkage __visible void *decompress_kernel(void *rmode, memptr heap,
>  {
>         unsigned long run_size = VO__end - VO__text;
>         unsigned char *output_orig = output;
> +       unsigned char *virt_offset;
>
>         real_mode = rmode;
>
> @@ -434,9 +448,10 @@ asmlinkage __visible void *decompress_kernel(void *rmode, memptr heap,
>          * the entire decompressed kernel plus relocation table, or the
>          * entire decompressed kernel plus .bss and .brk sections.
>          */
> -       output = choose_kernel_location(input_data, input_len, output,
> +       choose_kernel_location(input_data, input_len, &output,
>                                         output_len > run_size ? output_len
> -                                                             : run_size);
> +                                                             : run_size,
> +                                       &virt_offset);
>
>         /* Validate memory location choices. */
>         if ((unsigned long)output & (MIN_KERNEL_ALIGN - 1))
> @@ -457,12 +472,7 @@ asmlinkage __visible void *decompress_kernel(void *rmode, memptr heap,
>         __decompress(input_data, input_len, NULL, NULL, output, output_len,
>                         NULL, error);
>         parse_elf(output);
> -       /*
> -        * 32-bit always performs relocations. 64-bit relocations are only
> -        * needed if kASLR has chosen a different load address.
> -        */
> -       if (!IS_ENABLED(CONFIG_X86_64) || output != output_orig)
> -               handle_relocations(output, output_len);
> +       handle_relocations(output, output_len, virt_offset);
>         debug_putstr("done.\nBooting the kernel.\n");
>         return output;
>  }
> diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
> index 39d0e9a..cde6aec 100644
> --- a/arch/x86/boot/compressed/misc.h
> +++ b/arch/x86/boot/compressed/misc.h
> @@ -69,20 +69,21 @@ int cmdline_find_option_bool(const char *option);
>
>  #if CONFIG_RANDOMIZE_BASE
>  /* aslr.c */
> -unsigned char *choose_kernel_location(unsigned char *input,
> -                                     unsigned long input_size,
> -                                     unsigned char *output,
> -                                     unsigned long output_size);
> +void choose_kernel_location(unsigned char *input,
> +                               unsigned long input_size,
> +                               unsigned char **output,
> +                               unsigned long output_size,
> +                               unsigned char **virt_offset);
>  /* cpuflags.c */
>  bool has_cpuflag(int flag);
>  #else
>  static inline
> -unsigned char *choose_kernel_location(unsigned char *input,
> -                                     unsigned long input_size,
> -                                     unsigned char *output,
> -                                     unsigned long output_size)
> +void choose_kernel_location(unsigned char *input,
> +                               unsigned long input_size,
> +                               unsigned char **output,
> +                               unsigned long output_size,
> +                               unsigned char **virt_offset)
>  {
> -       return output;
>  }
>  #endif
>
> --
> 2.5.0
>



-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 02/19] x86, boot: Move compressed kernel to end of buffer before decompressing
  2016-03-07 22:35   ` Kees Cook
@ 2016-03-08  4:50     ` Baoquan He
  0 siblings, 0 replies; 51+ messages in thread
From: Baoquan He @ 2016-03-08  4:50 UTC (permalink / raw)
  To: Kees Cook
  Cc: LKML, Yinghai Lu, H. Peter Anvin, Vivek Goyal, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, lasse.collin, Andrew Morton,
	Dave Young

Hi Kees,

Many thanks for your review and suggestions!

On 03/07/16 at 02:35pm, Kees Cook wrote:
> > Current code uses extract_offset to decide the copied ZO position, it
> > put ZO starting from extract_offset. When INIT_SIZE is bigger than
> > VO_INIT_SIZE copied ZO occupies space from extract_offset to the end
> > of decompressing buffer, just like below.
> >
> > 0                       extract_offset                       init_size
> > |-----------|---------------|------------------------|---------|
> >             |               |                        |         |
> >           VO__text      startup_32 of ZO         VO__end     ZO__end
> >
> > However when INIT_SIZE is euqal to VO_INIT_SIZE there's still space left
> 
> typo: "equal"

Yes, will change it in next post.

> 
> > from end of ZO to the end of decompressing buffer, like below.
> >
> > 0                       extract_offset                       init_size
> > |-----------|---------------|------------------------|---------|
> >             |               |                        |         |
> >           VO__text      startup_32 of ZO         ZO__end     VO__end
> >
> > It's better to always put ZO right in the end of decompressing buffer.
> > Anyway, who doesn't like it to be better.
> 
> It looks to me like it's just cleaner to always have ZO__end ==
> VO__end? Is that the motivation for moving it? What problem is solved
> with this change?

As you can see there are 2 cases. The first case is depicted in the 1st
graph. In that case ZO__end is after VO__end. In the 2nd case, yes, we
move ZO__end to make that happen ZO__end == VO__end. In fact this patch
is not trying to make them equal, what it is trying to do is adjusting
the behaviour. We always put ZO just in the end by getting init_size
which is the deompressing buffer and ZO_INIT_SIZE, not like before we
just put ZO starting from extract_offset. We change a different way to
get the starting position of ZO.
> 
> -Kees
> 
> >
> > So in this patch add BP_init_size into asm-offsets.c to make it visible
> > to asm modules. And find the starting position of copied ZO by init_size
> > and ZO run size, namely (ZO__end - startup_32) when move ZO.
> >
> > Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> > ---
> > v2->v3:
> >     Adjust the patch log.
> >
> >  arch/x86/boot/compressed/head_32.S     | 11 +++++++++--
> >  arch/x86/boot/compressed/head_64.S     |  8 ++++++--
> >  arch/x86/boot/compressed/mkpiggy.c     |  3 ---
> >  arch/x86/boot/compressed/vmlinux.lds.S |  1 +
> >  arch/x86/kernel/asm-offsets.c          |  1 +
> >  arch/x86/kernel/vmlinux.lds.S          |  1 +
> >  6 files changed, 18 insertions(+), 7 deletions(-)
> >
> > diff --git a/arch/x86/boot/compressed/head_32.S b/arch/x86/boot/compressed/head_32.S
> > index 8ef964d..0c140f9 100644
> > --- a/arch/x86/boot/compressed/head_32.S
> > +++ b/arch/x86/boot/compressed/head_32.S
> > @@ -148,7 +148,9 @@ preferred_addr:
> >  1:
> >
> >         /* Target address to relocate to for decompression */
> > -       addl    $z_extract_offset, %ebx
> > +       movl    BP_init_size(%esi), %eax
> > +       subl    $_end, %eax
> > +       addl    %eax, %ebx
> >
> >         /* Set up the stack */
> >         leal    boot_stack_end(%ebx), %esp
> > @@ -210,8 +212,13 @@ relocated:
> >                                 /* push arguments for decompress_kernel: */
> >         pushl   $z_run_size     /* size of kernel with .bss and .brk */
> >         pushl   $z_output_len   /* decompressed length, end of relocs */
> > -       leal    z_extract_offset_negative(%ebx), %ebp
> > +
> > +       movl    BP_init_size(%esi), %eax
> > +       subl    $_end, %eax
> > +       movl    %ebx, %ebp
> > +       subl    %eax, %ebp
> >         pushl   %ebp            /* output address */
> > +
> >         pushl   $z_input_len    /* input_len */
> >         leal    input_data(%ebx), %eax
> >         pushl   %eax            /* input_data */
> > diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
> > index b0c0d16..67dd8d3 100644
> > --- a/arch/x86/boot/compressed/head_64.S
> > +++ b/arch/x86/boot/compressed/head_64.S
> > @@ -102,7 +102,9 @@ ENTRY(startup_32)
> >  1:
> >
> >         /* Target address to relocate to for decompression */
> > -       addl    $z_extract_offset, %ebx
> > +       movl    BP_init_size(%esi), %eax
> > +       subl    $_end, %eax
> > +       addl    %eax, %ebx
> >
> >  /*
> >   * Prepare for entering 64 bit mode
> > @@ -330,7 +332,9 @@ preferred_addr:
> >  1:
> >
> >         /* Target address to relocate to for decompression */
> > -       leaq    z_extract_offset(%rbp), %rbx
> > +       movl    BP_init_size(%rsi), %ebx
> > +       subl    $_end, %ebx
> > +       addq    %rbp, %rbx
> >
> >         /* Set up the stack */
> >         leaq    boot_stack_end(%rbx), %rsp
> > diff --git a/arch/x86/boot/compressed/mkpiggy.c b/arch/x86/boot/compressed/mkpiggy.c
> > index d8222f2..b980046 100644
> > --- a/arch/x86/boot/compressed/mkpiggy.c
> > +++ b/arch/x86/boot/compressed/mkpiggy.c
> > @@ -85,9 +85,6 @@ int main(int argc, char *argv[])
> >         printf("z_output_len = %lu\n", (unsigned long)olen);
> >         printf(".globl z_extract_offset\n");
> >         printf("z_extract_offset = 0x%lx\n", offs);
> > -       /* z_extract_offset_negative allows simplification of head_32.S */
> > -       printf(".globl z_extract_offset_negative\n");
> > -       printf("z_extract_offset_negative = -0x%lx\n", offs);
> >         printf(".globl z_run_size\n");
> >         printf("z_run_size = %lu\n", run_size);
> >
> > diff --git a/arch/x86/boot/compressed/vmlinux.lds.S b/arch/x86/boot/compressed/vmlinux.lds.S
> > index 34d047c..e24e0a0 100644
> > --- a/arch/x86/boot/compressed/vmlinux.lds.S
> > +++ b/arch/x86/boot/compressed/vmlinux.lds.S
> > @@ -70,5 +70,6 @@ SECTIONS
> >                 _epgtable = . ;
> >         }
> >  #endif
> > +       . = ALIGN(PAGE_SIZE);   /* keep ZO size page aligned */
> >         _end = .;
> >  }
> > diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
> > index 84a7524..506de14 100644
> > --- a/arch/x86/kernel/asm-offsets.c
> > +++ b/arch/x86/kernel/asm-offsets.c
> > @@ -81,6 +81,7 @@ void common(void) {
> >         OFFSET(BP_hardware_subarch, boot_params, hdr.hardware_subarch);
> >         OFFSET(BP_version, boot_params, hdr.version);
> >         OFFSET(BP_kernel_alignment, boot_params, hdr.kernel_alignment);
> > +       OFFSET(BP_init_size, boot_params, hdr.init_size);
> >         OFFSET(BP_pref_address, boot_params, hdr.pref_address);
> >         OFFSET(BP_code32_start, boot_params, hdr.code32_start);
> >
> > diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
> > index 74e4bf1..217d5b1 100644
> > --- a/arch/x86/kernel/vmlinux.lds.S
> > +++ b/arch/x86/kernel/vmlinux.lds.S
> > @@ -325,6 +325,7 @@ SECTIONS
> >                 __brk_limit = .;
> >         }
> >
> > +       . = ALIGN(PAGE_SIZE);           /* keep VO_INIT_SIZE page aligned */
> >         _end = .;
> >
> >          STABS_DEBUG
> > --
> > 2.5.0
> >
> 
> 
> 
> -- 
> Kees Cook
> Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 05/19] x86, boot: Fix run_size calculation
  2016-03-07 23:10   ` Kees Cook
@ 2016-03-08  4:57     ` Baoquan He
  2016-03-08 18:05       ` Kees Cook
  0 siblings, 1 reply; 51+ messages in thread
From: Baoquan He @ 2016-03-08  4:57 UTC (permalink / raw)
  To: Kees Cook
  Cc: LKML, Yinghai Lu, H. Peter Anvin, Vivek Goyal, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, lasse.collin, Andrew Morton,
	Dave Young, Junjie Mao, Josh Triplett, Matt Fleming

On 03/07/16 at 03:10pm, Kees Cook wrote:
> On Fri, Mar 4, 2016 at 8:25 AM, Baoquan He <bhe@redhat.com> wrote:
> > From: Yinghai Lu <yinghai@kernel.org>
> >
> > Firstly, current run_size is calculated via shell script
> > arch/x86/tools/calc_run_size.sh. It gets file offset and mem size of section
> > .bss and .brk in vmlinux, then add them as follows:
> >
> > run_size=$(( $offsetA + $sizeA + $sizeB ))
> >
> > However this is completely wrong. The offset is the starting address of
> > section or segment in elf file. Below is a vmlinux I compiled:
> >
> > [bhe@x1 linux]$ objdump -h vmlinux
> >
> > vmlinux:     file format elf64-x86-64
> >
> > Sections:
> > Idx Name          Size      VMA               LMA               File off  Algn
> >  27 .bss          00170000  ffffffff81ec8000  0000000001ec8000  012c8000  2**12
> >                   ALLOC
> >  28 .brk          00027000  ffffffff82038000  0000000002038000  012c8000  2**0
> >                   ALLOC
> >
> > Here we can get run_size is 0x145f000.
> > 0x012c8000+0x012c8000+0x00027000=0x145f000
> 
> This example calculation looks wrong to me. run_size is offset + size
> + size (not offset + offset + size):
> 
> 0x12c8000+0x17000+0x27000 = 0x1306000

Yeah, please forgive my carelessness. I copied the wrong size of .bss.
But you also typied the wrong value of .bss size, it should be 0x170000.
So the result is still right.

0x12c8000+0x170000+0x27000 = 0x145f000

> 
> > [bhe@x1 linux]$ readelf -l vmlinux
> >
> > Elf file type is EXEC (Executable file)
> > Entry point 0x1000000
> > There are 5 program headers, starting at offset 64
> >
> > Program Headers:
> >   Type           Offset             VirtAddr           PhysAddr
> >                  FileSiz            MemSiz              Flags  Align
> >   LOAD           0x0000000000200000 0xffffffff81000000 0x0000000001000000
> >                  0x0000000000b5e000 0x0000000000b5e000  R E    200000
> >   LOAD           0x0000000000e00000 0xffffffff81c00000 0x0000000001c00000
> >                  0x0000000000145000 0x0000000000145000  RW     200000
> >   LOAD           0x0000000001000000 0x0000000000000000 0x0000000001d45000
> >                  0x0000000000018158 0x0000000000018158  RW     200000
> >   LOAD           0x000000000115e000 0xffffffff81d5e000 0x0000000001d5e000
> >                  0x000000000016a000 0x0000000000301000  RWE    200000
> >   NOTE           0x000000000099bcac 0xffffffff8179bcac 0x000000000179bcac
> >                  0x00000000000001bc 0x00000000000001bc         4
> >
> >  Section to Segment mapping:
> >   Segment Sections...
> >    00     .text .notes __ex_table .rodata __bug_table .pci_fixup .tracedata __ksymtab __ksymtab_gpl __ksymtab_strings __init_rodata __param __modver
> >    01     .data .vvar
> >    02     .data..percpu
> >    03     .init.text .init.data .x86_cpu_dev.init .parainstructions .altinstructions .altinstr_replacement .iommu_table .apicdrivers .exit.text .smp_locks .bss .brk
> >    04     .notes
> >
> > Here we can get the same value as current run_size if we add p_offset
> > and p_memsz.
> > 0x000000000115e000+0x0000000000301000=0x145f000
> >
> > But is it right? Obviously not. We should calculate it using the last LOAD
> > program segment like this:
> > run_size = phdr->p_paddr + phdr->p_memsz - physical load addr of kernel
> > run_size=0x0000000001d5e000+0x0000000000301000-0x0000000001000000=0x105f000
> 
> Segment 03 ends at 0xffffffff81d5e000 + 0x301000 = 0xffffffff8205f000,
> which does match where .brk ends (0xffffffff82038000 + 0x27000 =
> 0xffffffff8205f000).

Ah, yes, exactly. They prove it in different way.

> 
> >
> > It's equal to VO_end-VO_text and certainly it's simpler to do.
> > _end: 0xffffffff8205f000
> > _text:0xffffffff81000000
> > run_size = 0xffffffff8205f000-0xffffffff81000000=0x105f000
> 
> I would agree, it would seem like the existing run_size calculation is
> 0x247000 too high in this example.

It should be 0x400000 high as you mistakenly input the size of .bss ^_^.
0x145f000 - 0x105f000 = 0x400000

Extra 4M is added in this example.

Thanks
Baoquan

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 06/19] x86, kaslr: Clean up useless code related to run_size.
  2016-03-07 23:12   ` Kees Cook
@ 2016-03-08  5:00     ` Baoquan He
  0 siblings, 0 replies; 51+ messages in thread
From: Baoquan He @ 2016-03-08  5:00 UTC (permalink / raw)
  To: Kees Cook
  Cc: LKML, Yinghai Lu, H. Peter Anvin, Vivek Goyal, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, lasse.collin, Andrew Morton,
	Dave Young, Josh Triplett, Matt Fleming, Ard Biesheuvel,
	Junjie Mao

On 03/07/16 at 03:12pm, Kees Cook wrote:
> > diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
> > index 42e76c2..069120e 100644
> > --- a/arch/x86/boot/compressed/misc.c
> > +++ b/arch/x86/boot/compressed/misc.c
> > @@ -393,9 +393,9 @@ asmlinkage __visible void *decompress_kernel(void *rmode, memptr heap,
> >                                   unsigned char *input_data,
> >                                   unsigned long input_len,
> >                                   unsigned char *output,
> > -                                 unsigned long output_len,
> > -                                 unsigned long run_size)
> > +                                 unsigned long output_len)
> >  {
> > +       unsigned long run_size = VO__end - VO__text;
> 
> I think this can be "const" now, yes?

Yes, agree. No one can change it.
Will update it. 

Thanks

> 
> >         unsigned char *output_orig = output;
> >
> >         real_mode = rmode;
> > @@ -416,8 +416,6 @@ asmlinkage __visible void *decompress_kernel(void *rmode, memptr heap,
> >         lines = real_mode->screen_info.orig_video_lines;
> >         cols = real_mode->screen_info.orig_video_cols;
> >
> > -       run_size = VO__end - VO__text;
> > -
> >         console_init();
> >         debug_putstr("early console in decompress_kernel\n");
> >
> > diff --git a/arch/x86/boot/compressed/mkpiggy.c b/arch/x86/boot/compressed/mkpiggy.c
> > index a613c84..c5148642 100644
> > --- a/arch/x86/boot/compressed/mkpiggy.c
> > +++ b/arch/x86/boot/compressed/mkpiggy.c
> > @@ -34,13 +34,11 @@ int main(int argc, char *argv[])
> >  {
> >         uint32_t olen;
> >         long ilen;
> > -       unsigned long run_size;
> >         FILE *f = NULL;
> >         int retval = 1;
> >
> > -       if (argc < 3) {
> > -               fprintf(stderr, "Usage: %s compressed_file run_size\n",
> > -                               argv[0]);
> > +       if (argc < 2) {
> > +               fprintf(stderr, "Usage: %s compressed_file\n", argv[0]);
> >                 goto bail;
> >         }
> >
> > @@ -65,15 +63,11 @@ int main(int argc, char *argv[])
> >         ilen = ftell(f);
> >         olen = get_unaligned_le32(&olen);
> >
> > -       run_size = atoi(argv[2]);
> > -
> >         printf(".section \".rodata..compressed\",\"a\",@progbits\n");
> >         printf(".globl z_input_len\n");
> >         printf("z_input_len = %lu\n", ilen);
> >         printf(".globl z_output_len\n");
> >         printf("z_output_len = %lu\n", (unsigned long)olen);
> > -       printf(".globl z_run_size\n");
> > -       printf("z_run_size = %lu\n", run_size);
> >
> >         printf(".globl input_data, input_data_end\n");
> >         printf("input_data:\n");
> > diff --git a/arch/x86/tools/calc_run_size.sh b/arch/x86/tools/calc_run_size.sh
> > deleted file mode 100644
> > index 1a4c17b..0000000
> > --- a/arch/x86/tools/calc_run_size.sh
> > +++ /dev/null
> > @@ -1,42 +0,0 @@
> > -#!/bin/sh
> > -#
> > -# Calculate the amount of space needed to run the kernel, including room for
> > -# the .bss and .brk sections.
> > -#
> > -# Usage:
> > -# objdump -h a.out | sh calc_run_size.sh
> > -
> > -NUM='\([0-9a-fA-F]*[ \t]*\)'
> > -OUT=$(sed -n 's/^[ \t0-9]*.b[sr][sk][ \t]*'"$NUM$NUM$NUM$NUM"'.*/\1\4/p')
> > -if [ -z "$OUT" ] ; then
> > -       echo "Never found .bss or .brk file offset" >&2
> > -       exit 1
> > -fi
> > -
> > -OUT=$(echo ${OUT# })
> > -sizeA=$(printf "%d" 0x${OUT%% *})
> > -OUT=${OUT#* }
> > -offsetA=$(printf "%d" 0x${OUT%% *})
> > -OUT=${OUT#* }
> > -sizeB=$(printf "%d" 0x${OUT%% *})
> > -OUT=${OUT#* }
> > -offsetB=$(printf "%d" 0x${OUT%% *})
> > -
> > -run_size=$(( $offsetA + $sizeA + $sizeB ))
> > -
> > -# BFD linker shows the same file offset in ELF.
> > -if [ "$offsetA" -ne "$offsetB" ] ; then
> > -       # Gold linker shows them as consecutive.
> > -       endB=$(( $offsetB + $sizeB ))
> > -       if [ "$endB" != "$run_size" ] ; then
> > -               printf "sizeA: 0x%x\n" $sizeA >&2
> > -               printf "offsetA: 0x%x\n" $offsetA >&2
> > -               printf "sizeB: 0x%x\n" $sizeB >&2
> > -               printf "offsetB: 0x%x\n" $offsetB >&2
> > -               echo ".bss and .brk are non-contiguous" >&2
> > -               exit 1
> > -       fi
> > -fi
> > -
> > -printf "%d\n" $run_size
> > -exit 0
> > --
> > 2.5.0
> >
> 
> 
> 
> -- 
> Kees Cook
> Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 07/19] x86, kaslr: Get correct max_addr for relocs pointer
  2016-03-07 23:16   ` Kees Cook
@ 2016-03-08  5:13     ` Baoquan He
  2016-03-08 18:16       ` Kees Cook
  0 siblings, 1 reply; 51+ messages in thread
From: Baoquan He @ 2016-03-08  5:13 UTC (permalink / raw)
  To: Kees Cook
  Cc: LKML, Yinghai Lu, H. Peter Anvin, Vivek Goyal, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, lasse.collin, Andrew Morton,
	Dave Young

On 03/07/16 at 03:16pm, Kees Cook wrote:
> On Fri, Mar 4, 2016 at 8:25 AM, Baoquan He <bhe@redhat.com> wrote:
> > From: Yinghai Lu <yinghai@kernel.org>
> >
> > There is boundary checking for pointer in kaslr relocation handling.
> >
> > Current code is using output_len, and that is VO (vmlinux after objcopy)
> > file size plus vmlinux.relocs file size.
> >
> > That is not right, as we should use loaded address for running.
> 
> I think this needs clarification. If I'm understanding correctly, the
> max_addr check should stop at the end of VO, and not include bss, brk,
> etc. which follows. In which case, this change is correct: the bounds
> checking needed to be tightened.

Yes, exactly as you means.

> 
> 
> >
> > At that time parse_elf already move the sections according to ELF headers.
> >
> > The valid range should be VO [_text, __bss_start) loaded physical addresses.

I think it is expressed here. Maybe we can stress it in this patch log.
After all it's the main idea of this patch.

How about rewriting patch log like this: 

********************
There is boundary checking for pointer in kaslr relocation handling.
However current code is using output_len that is VO (vmlinux after objcopy)
file size plus vmlinux.relocs file size to make the valid boundary.

That is not right since the max_addr check should stop at the end of VO
and excludes bss, brk etc, which follows. Means the valid range should
be VO [_text, __bss_start] in loaded physical address space.

In this patch, add export for __bss_start to voffset.h and use it to get
max_addr.
********************

> >
> > In the patch, add export for __bss_start to voffset.h and use it to get
> > max_addr.
> >
> > Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> > ---
> > v2->v3:
> >     Tune the patch log.
> >
> >  arch/x86/boot/compressed/Makefile | 2 +-
> >  arch/x86/boot/compressed/misc.c   | 2 +-
> >  2 files changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
> > index fef80fa..2e7c0ce 100644
> > --- a/arch/x86/boot/compressed/Makefile
> > +++ b/arch/x86/boot/compressed/Makefile
> > @@ -41,7 +41,7 @@ LDFLAGS_vmlinux := -T
> >  hostprogs-y    := mkpiggy
> >  HOST_EXTRACFLAGS += -I$(srctree)/tools/include
> >
> > -sed-voffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(_text\|_end\)$$/\#define VO_\2 _AC(0x\1,UL)/p'
> > +sed-voffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(_text\|__bss_start\|_end\)$$/\#define VO_\2 _AC(0x\1,UL)/p'
> >
> >  quiet_cmd_voffset = VOFFSET $@
> >        cmd_voffset = $(NM) $< | sed -n $(sed-voffset) > $@
> > diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
> > index 069120e..dd7ed8a 100644
> > --- a/arch/x86/boot/compressed/misc.c
> > +++ b/arch/x86/boot/compressed/misc.c
> > @@ -259,7 +259,7 @@ static void handle_relocations(void *output, unsigned long output_len)
> >         int *reloc;
> >         unsigned long delta, map, ptr;
> >         unsigned long min_addr = (unsigned long)output;
> > -       unsigned long max_addr = min_addr + output_len;
> > +       unsigned long max_addr = min_addr + (VO___bss_start - VO__text);
> >
> >         /*
> >          * Calculate the delta between where vmlinux was linked to load
> > --
> > 2.5.0
> >
> 
> 
> 
> -- 
> Kees Cook
> Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 08/19] x86, kaslr: Consolidate mem_avoid array filling
  2016-03-07 23:28   ` Kees Cook
@ 2016-03-08  5:21     ` Baoquan He
  2016-03-08 18:17       ` Kees Cook
  0 siblings, 1 reply; 51+ messages in thread
From: Baoquan He @ 2016-03-08  5:21 UTC (permalink / raw)
  To: Kees Cook
  Cc: LKML, Yinghai Lu, H. Peter Anvin, Vivek Goyal, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, lasse.collin, Andrew Morton,
	Dave Young

On 03/07/16 at 03:28pm, Kees Cook wrote:
> On Fri, Mar 4, 2016 at 8:25 AM, Baoquan He <bhe@redhat.com> wrote:
> > From: Yinghai Lu <yinghai@kernel.org>
> >
> > We are going to support kaslr with 64bit above 4G, and new random output
> > could be anywhere. Array mem_avoid is used for kaslr to search new output
> > address. Current code only track range that is after output+output_size.
> > So we need to track all ranges instead of just after output+output_size.
> >
> > Current code has first entry which is extra bytes before input+input_size,
> > and it is according to output_size. Other entries are for initrd, cmdline,
> > and heap/stack for ZO running.
> >
> > At first, let's check the first entry that should be in the mem_avoid array.
> > Now ZO sit end of the buffer always, we can find out where is text and
> > data/bss etc of ZO.
> >
> > Since init_size >= run_size, and input+input_len >= output+output_len,
> > here make several assumptions for better presentation by graph:
> >  - init_size > run_size
> >  - input+input_len > output+output_len
> >  - run_size > output_len
> 
> I would like to see each of these assumptions justified. Why is
> init_size > run_size, etc?
> choose_kernel_location's "output_size" is calculated as max(run_size,
> output_len), so run_size may not be > output_len...

Sure. I will add this case in next post. Thanks a lot.
> 
> >
> > 0   output                       input             input+input_len          output+init_size
> > |     |                            |                       |                       |
> > |-----|-------------------|--------|------------------|----|------------|----------|
> >                           |                           |                 |
> >              output+init_size-ZO_INIT_SIZE    output+output_len    output+run_size
> >
> > [output, output+init_size) is the for decompressing buffer for compressed
> > kernel.
> >
> > [output, output+run_size) is for VO run size.
> > [output, output+output_len) is (VO (vmlinux after objcopy) plus relocs)
> >
> > [output+init_size-ZO_INIT_SIZE, output+init_size) is copied ZO.
> > [input, input+input_len) is copied compressed (VO (vmlinux after objcopy)
> > plus relocs), not the ZO.
> >
> > [input+input_len, output+init_size) is [_text, _end) for ZO. that could be
> > first range in mem_avoid. Now the new first entry already includes heap and
> > stack for ZO running. So no need to put them separately into mem_avoid array.
> >
> > Also [input, input+input_size) need be put in mem_avoid array. It is adjacent
> > to new first entry, so merge them.
> 
> I wonder if this diagram and description should live in a comment with the code.

I think it would be very helpful for people interested in this process.
Do you think it's ok to put it where init_size is calculated in
boot/header.S?  Or other suitable places?
> 
> 
> >
> > At last we need to put boot_params into the mem_avoid too. As with 64bit
> > bootloader could put it anywhere.
> >
> > After these changes, we have all ranges which need be avoided in mem_avoid
> > array.
> >
> > Cc: Kees Cook <keescook@chromium.org>
> > Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> > ---
> > v2->v3:
> >     Adjust the patch log.
> >
> >  arch/x86/boot/compressed/aslr.c | 29 +++++++++++++----------------
> >  1 file changed, 13 insertions(+), 16 deletions(-)
> >
> > diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
> > index 622aa88..b93be03 100644
> > --- a/arch/x86/boot/compressed/aslr.c
> > +++ b/arch/x86/boot/compressed/aslr.c
> > @@ -109,7 +109,7 @@ struct mem_vector {
> >         unsigned long size;
> >  };
> >
> > -#define MEM_AVOID_MAX 5
> > +#define MEM_AVOID_MAX 4
> >  static struct mem_vector mem_avoid[MEM_AVOID_MAX];
> >
> >  static bool mem_contains(struct mem_vector *region, struct mem_vector *item)
> > @@ -135,21 +135,22 @@ static bool mem_overlaps(struct mem_vector *one, struct mem_vector *two)
> >  }
> >
> >  static void mem_avoid_init(unsigned long input, unsigned long input_size,
> > -                          unsigned long output, unsigned long output_size)
> > +                          unsigned long output)
> >  {
> > +       unsigned long init_size = real_mode->hdr.init_size;
> >         u64 initrd_start, initrd_size;
> >         u64 cmd_line, cmd_line_size;
> > -       unsigned long unsafe, unsafe_len;
> >         char *ptr;
> >
> >         /*
> >          * Avoid the region that is unsafe to overlap during
> > -        * decompression (see calculations at top of misc.c).
> > +        * decompression.
> > +        * As we already move ZO (arch/x86/boot/compressed/vmlinux)
> > +        * to the end of buffer, [input+input_size, output+init_size)
> > +        * has [_text, _end) for ZO.
> >          */
> > -       unsafe_len = (output_size >> 12) + 32768 + 18;
> > -       unsafe = (unsigned long)input + input_size - unsafe_len;
> > -       mem_avoid[0].start = unsafe;
> > -       mem_avoid[0].size = unsafe_len;
> > +       mem_avoid[0].start = input;
> > +       mem_avoid[0].size = (output + init_size) - input;
> >
> >         /* Avoid initrd. */
> >         initrd_start  = (u64)real_mode->ext_ramdisk_image << 32;
> > @@ -169,13 +170,9 @@ static void mem_avoid_init(unsigned long input, unsigned long input_size,
> >         mem_avoid[2].start = cmd_line;
> >         mem_avoid[2].size = cmd_line_size;
> >
> > -       /* Avoid heap memory. */
> > -       mem_avoid[3].start = (unsigned long)free_mem_ptr;
> > -       mem_avoid[3].size = BOOT_HEAP_SIZE;
> > -
> > -       /* Avoid stack memory. */
> > -       mem_avoid[4].start = (unsigned long)free_mem_end_ptr;
> > -       mem_avoid[4].size = BOOT_STACK_SIZE;
> > +       /* Avoid params */
> > +       mem_avoid[3].start = (unsigned long)real_mode;
> > +       mem_avoid[3].size = sizeof(*real_mode);
> >  }
> >
> >  /* Does this memory vector overlap a known avoided area? */
> > @@ -319,7 +316,7 @@ unsigned char *choose_kernel_location(unsigned char *input,
> >
> >         /* Record the various known unsafe memory ranges. */
> >         mem_avoid_init((unsigned long)input, input_size,
> > -                      (unsigned long)output, output_size);
> > +                      (unsigned long)output);
> >
> >         /* Walk e820 and find a random address. */
> >         random = find_random_addr(choice, output_size);
> > --
> > 2.5.0
> >
> 
> 
> 
> -- 
> Kees Cook
> Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 12/19] x86, kaslr: Fix a bug that relocation can not be handled when kernel is loaded above 2G
  2016-03-07 23:30   ` Kees Cook
@ 2016-03-08  5:22     ` Baoquan He
  0 siblings, 0 replies; 51+ messages in thread
From: Baoquan He @ 2016-03-08  5:22 UTC (permalink / raw)
  To: Kees Cook
  Cc: LKML, Yinghai Lu, H. Peter Anvin, Vivek Goyal, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, lasse.collin, Andrew Morton,
	Dave Young

On 03/07/16 at 03:30pm, Kees Cook wrote:
> On Fri, Mar 4, 2016 at 8:25 AM, Baoquan He <bhe@redhat.com> wrote:
> > When process 32 bit relocation tables a local variable 'extended'
> > is defined to calculate the physical address of relocs entry.
> > However its type is 'int' which is enough for i386, but not enough
> > for x86_64. That's why relocation can only be handled under 2G.
> > Otherwise a overflow will happen and cause system hang.
> >
> > Here change it to 'long' as 32 bit inverse relocation processing
> > does, and this change is safe for i386 relocation handling.
> >
> > Signed-off-by: Baoquan He <bhe@redhat.com>
> 
> Acked-by: Kees Cook <keescook@chromium.org>
> 
> This patch should probably be moved up into the start of this series
> instead of living in the middle here, since it's a clear stand-alone
> bug fix.

Of course. Will put it in the start of this series. 

Thanks.

> 
> -Kees
> 
> > ---
> >  arch/x86/boot/compressed/misc.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
> > index 4b2cd0c..70445c3 100644
> > --- a/arch/x86/boot/compressed/misc.c
> > +++ b/arch/x86/boot/compressed/misc.c
> > @@ -298,7 +298,7 @@ static void handle_relocations(void *output, unsigned long output_len)
> >          * So we work backwards from the end of the decompressed image.
> >          */
> >         for (reloc = output + output_len - sizeof(*reloc); *reloc; reloc--) {
> > -               int extended = *reloc;
> > +               long extended = *reloc;
> >                 extended += map;
> >
> >                 ptr = (unsigned long)extended;
> > --
> > 2.5.0
> >
> 
> 
> 
> -- 
> Kees Cook
> Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 10/19] x86, 64bit: Set ident_mapping for kaslr
  2016-03-07 23:34   ` Kees Cook
@ 2016-03-08  5:25     ` Baoquan He
  2016-03-21  7:50     ` Baoquan He
  1 sibling, 0 replies; 51+ messages in thread
From: Baoquan He @ 2016-03-08  5:25 UTC (permalink / raw)
  To: Kees Cook
  Cc: LKML, Yinghai Lu, H. Peter Anvin, Vivek Goyal, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, lasse.collin, Andrew Morton,
	Dave Young, Jiri Kosina, Borislav Petkov, Matt Fleming

On 03/07/16 at 03:34pm, Kees Cook wrote:
> On Fri, Mar 4, 2016 at 8:25 AM, Baoquan He <bhe@redhat.com> wrote:
> > From: Yinghai Lu <yinghai@kernel.org>
> > diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
> > index 2e7c0ce..229604d 100644
> > --- a/arch/x86/boot/compressed/Makefile
> > +++ b/arch/x86/boot/compressed/Makefile
> > @@ -59,6 +59,9 @@ vmlinux-objs-y := $(obj)/vmlinux.lds $(obj)/head_$(BITS).o $(obj)/misc.o \
> >
> >  vmlinux-objs-$(CONFIG_EARLY_PRINTK) += $(obj)/early_serial_console.o
> >  vmlinux-objs-$(CONFIG_RANDOMIZE_BASE) += $(obj)/aslr.o
> > +ifdef CONFIG_X86_64
> > +       vmlinux-objs-$(CONFIG_RANDOMIZE_BASE) += $(obj)/misc_pgt.o
> > +endif
> >
> >  $(obj)/eboot.o: KBUILD_CFLAGS += -fshort-wchar -mno-red-zone
> >
> > diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
> > index b93be03..35f2eac 100644
> > --- a/arch/x86/boot/compressed/aslr.c
> > +++ b/arch/x86/boot/compressed/aslr.c
> > @@ -151,6 +151,7 @@ static void mem_avoid_init(unsigned long input, unsigned long input_size,
> >          */
> >         mem_avoid[0].start = input;
> >         mem_avoid[0].size = (output + init_size) - input;
> > +       fill_pagetable(input, (output + init_size) - input);
> 
> These calls should juse use the .start and .size arguments instead of
> the copy/paste.

OK, will change it as follows:

	fill_pagetable(mem_avoid[0].start, mem_avoid[0].size);
> 
> >
> >         /* Avoid initrd. */
> >         initrd_start  = (u64)real_mode->ext_ramdisk_image << 32;
> > @@ -159,6 +160,7 @@ static void mem_avoid_init(unsigned long input, unsigned long input_size,
> >         initrd_size |= real_mode->hdr.ramdisk_size;
> >         mem_avoid[1].start = initrd_start;
> >         mem_avoid[1].size = initrd_size;
> > +       /* don't need to set mapping for initrd */
> >
> >         /* Avoid kernel command line. */
> >         cmd_line  = (u64)real_mode->ext_cmd_line_ptr << 32;
> > @@ -169,10 +171,19 @@ static void mem_avoid_init(unsigned long input, unsigned long input_size,
> >                 ;
> >         mem_avoid[2].start = cmd_line;
> >         mem_avoid[2].size = cmd_line_size;
> > +       fill_pagetable(cmd_line, cmd_line_size);
> >
> >         /* Avoid params */
> >         mem_avoid[3].start = (unsigned long)real_mode;
> >         mem_avoid[3].size = sizeof(*real_mode);
> > +       fill_pagetable((unsigned long)real_mode, sizeof(*real_mode));
> > +
> > +       /* don't need to set mapping for setup_data */
> > +
> > +#ifdef CONFIG_X86_VERBOSE_BOOTUP
> > +       /* for video ram */
> > +       fill_pagetable(0, PMD_SIZE);
> > +#endif
> >  }
> >
> >  /* Does this memory vector overlap a known avoided area? */
> > @@ -330,6 +341,9 @@ unsigned char *choose_kernel_location(unsigned char *input,
> >                 goto out;
> >
> >         choice = random;
> > +
> > +       fill_pagetable(choice, output_size);
> > +       switch_pagetable();
> >  out:
> >         return (unsigned char *)choice;
> >  }
> > diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
> > index 3691451..075bb15 100644
> > --- a/arch/x86/boot/compressed/head_64.S
> > +++ b/arch/x86/boot/compressed/head_64.S
> > @@ -126,7 +126,7 @@ ENTRY(startup_32)
> >         /* Initialize Page tables to 0 */
> >         leal    pgtable(%ebx), %edi
> >         xorl    %eax, %eax
> > -       movl    $((4096*6)/4), %ecx
> > +       movl    $(BOOT_INIT_PGT_SIZE/4), %ecx
> >         rep     stosl
> >
> >         /* Build Level 4 */
> > @@ -478,4 +478,4 @@ boot_stack_end:
> >         .section ".pgtable","a",@nobits
> >         .balign 4096
> >  pgtable:
> > -       .fill 6*4096, 1, 0
> > +       .fill BOOT_PGT_SIZE, 1, 0
> > diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
> > index dcf01c2..11736a6 100644
> > --- a/arch/x86/boot/compressed/misc.h
> > +++ b/arch/x86/boot/compressed/misc.h
> > @@ -84,6 +84,17 @@ unsigned char *choose_kernel_location(unsigned char *input,
> >  }
> >  #endif
> >
> > +#ifdef CONFIG_X86_64
> > +void fill_pagetable(unsigned long start, unsigned long size);
> > +void switch_pagetable(void);
> > +extern unsigned char _pgtable[];
> > +#else
> > +static inline void fill_pagetable(unsigned long start, unsigned long size)
> > +{ }
> > +static inline void switch_pagetable(void)
> > +{ }
> > +#endif
> > +
> >  #ifdef CONFIG_EARLY_PRINTK
> >  /* early_serial_console.c */
> >  extern int early_serial_base;
> > diff --git a/arch/x86/boot/compressed/misc_pgt.c b/arch/x86/boot/compressed/misc_pgt.c
> > new file mode 100644
> > index 0000000..954811e
> > --- /dev/null
> > +++ b/arch/x86/boot/compressed/misc_pgt.c
> > @@ -0,0 +1,91 @@
> > +#define __pa(x)  ((unsigned long)(x))
> > +#define __va(x)  ((void *)((unsigned long)(x)))
> > +
> > +#include "misc.h"
> > +
> > +#include <asm/init.h>
> > +#include <asm/pgtable.h>
> > +
> > +#include "../../mm/ident_map.c"
> > +#include "../string.h"
> > +
> > +struct alloc_pgt_data {
> > +       unsigned char *pgt_buf;
> > +       unsigned long pgt_buf_size;
> > +       unsigned long pgt_buf_offset;
> > +};
> > +
> > +static void *alloc_pgt_page(void *context)
> > +{
> > +       struct alloc_pgt_data *d = (struct alloc_pgt_data *)context;
> > +       unsigned char *p = (unsigned char *)d->pgt_buf;
> > +
> > +       if (d->pgt_buf_offset >= d->pgt_buf_size) {
> > +               debug_putstr("out of pgt_buf in misc.c\n");
> 
> This line probably isn't needed any more. Or, if it is, maybe it could
> include more details from the context?
> 
> -Kees
> 
> > +               return NULL;
> > +       }
> > +
> > +       p += d->pgt_buf_offset;
> > +       d->pgt_buf_offset += PAGE_SIZE;
> > +
> > +       return p;
> > +}
> > +
> > +/*
> > + * Use a normal definition of memset() from string.c. There are already
> > + * included header files which expect a definition of memset() and by
> > + * the time we define memset macro, it is too late.
> > + */
> > +#undef memset
> > +
> > +unsigned long __force_order;
> > +static struct alloc_pgt_data pgt_data;
> > +static struct x86_mapping_info mapping_info;
> > +static pgd_t *level4p;
> > +
> > +void fill_pagetable(unsigned long start, unsigned long size)
> > +{
> > +       unsigned long end = start + size;
> > +
> > +       if (!level4p) {
> > +               pgt_data.pgt_buf_offset = 0;
> > +               mapping_info.alloc_pgt_page = alloc_pgt_page;
> > +               mapping_info.context = &pgt_data;
> > +               mapping_info.pmd_flag = __PAGE_KERNEL_LARGE_EXEC;
> > +
> > +               /*
> > +                * come from startup_32 ?
> > +                * then cr3 is _pgtable, we can reuse it.
> > +                */
> > +               level4p = (pgd_t *)read_cr3();
> > +               if ((unsigned long)level4p == (unsigned long)_pgtable) {
> > +                       pgt_data.pgt_buf = (unsigned char *)_pgtable +
> > +                                                BOOT_INIT_PGT_SIZE;
> > +                       pgt_data.pgt_buf_size = BOOT_PGT_SIZE -
> > +                                                BOOT_INIT_PGT_SIZE;
> > +                       memset((unsigned char *)pgt_data.pgt_buf, 0,
> > +                               pgt_data.pgt_buf_size);
> > +                       debug_putstr("boot via startup_32\n");
> > +               } else {
> > +                       pgt_data.pgt_buf = (unsigned char *)_pgtable;
> > +                       pgt_data.pgt_buf_size = BOOT_PGT_SIZE;
> > +                       memset((unsigned char *)pgt_data.pgt_buf, 0,
> > +                               pgt_data.pgt_buf_size);
> > +                       debug_putstr("boot via startup_64\n");
> > +                       level4p = (pgd_t *)alloc_pgt_page(&pgt_data);
> > +               }
> > +       }
> > +
> > +       /* align boundary to 2M */
> > +       start = round_down(start, PMD_SIZE);
> > +       end = round_up(end, PMD_SIZE);
> > +       if (start >= end)
> > +               return;
> > +
> > +       kernel_ident_mapping_init(&mapping_info, level4p, start, end);
> > +}
> > +
> > +void switch_pagetable(void)
> > +{
> > +       write_cr3((unsigned long)level4p);
> > +}
> > diff --git a/arch/x86/include/asm/boot.h b/arch/x86/include/asm/boot.h
> > index 6b8d6e8..52a9cbc 100644
> > --- a/arch/x86/include/asm/boot.h
> > +++ b/arch/x86/include/asm/boot.h
> > @@ -32,7 +32,26 @@
> >  #endif /* !CONFIG_KERNEL_BZIP2 */
> >
> >  #ifdef CONFIG_X86_64
> > +
> >  #define BOOT_STACK_SIZE        0x4000
> > +
> > +#define BOOT_INIT_PGT_SIZE (6*4096)
> > +#ifdef CONFIG_RANDOMIZE_BASE
> > +/*
> > + * 1 page for level4, 2 pages for first 2M.
> > + * (2+2)*4 pages for kernel, param, cmd_line, random kernel
> > + * if all cross 512G boundary.
> > + * So total will be 19 pages.
> > + */
> > +#ifdef CONFIG_X86_VERBOSE_BOOTUP
> > +#define BOOT_PGT_SIZE (19*4096)
> > +#else
> > +#define BOOT_PGT_SIZE (17*4096)
> > +#endif
> > +#else
> > +#define BOOT_PGT_SIZE BOOT_INIT_PGT_SIZE
> > +#endif
> > +
> >  #else
> >  #define BOOT_STACK_SIZE        0x1000
> >  #endif
> > --
> > 2.5.0
> >
> 
> 
> 
> -- 
> Kees Cook
> Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 11/19] x86, boot: Add checking for memcpy
  2016-03-07 23:36   ` Kees Cook
@ 2016-03-08  5:30     ` Baoquan He
  0 siblings, 0 replies; 51+ messages in thread
From: Baoquan He @ 2016-03-08  5:30 UTC (permalink / raw)
  To: Kees Cook
  Cc: LKML, Yinghai Lu, H. Peter Anvin, Vivek Goyal, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, lasse.collin, Andrew Morton,
	Dave Young

On 03/07/16 at 03:36pm, Kees Cook wrote:
> On Fri, Mar 4, 2016 at 8:25 AM, Baoquan He <bhe@redhat.com> wrote:
> > From: Yinghai Lu <yinghai@kernel.org>
> >
> > parse_elf is using local memcpy to move section to running position.
> > That memcpy actually only support no overlapping case or when dest < src.
> >
> > Add checking in memcpy to find out the wrong case for future use, at
> > that time we will need to have backward memcpy for it.
> >
> > Also add comments in parse_elf about the fact.
> 
> Seems like this would be better to just fix the memcpy to handle the overlap?

Yeah, agree. I will remove the code comment.

> 
> -Kees
> 
> >
> > Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> > ---
> > v2->v3:
> >     Add a declaration for error() since its declaration is in misc.h.
> >     But it's not included in compressed/string.c.
> >
> >  arch/x86/boot/compressed/misc.c   | 14 +++++++-------
> >  arch/x86/boot/compressed/misc.h   |  2 ++
> >  arch/x86/boot/compressed/string.c | 29 +++++++++++++++++++++++++++--
> >  3 files changed, 36 insertions(+), 9 deletions(-)
> >
> > diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
> > index dd7ed8a..4b2cd0c 100644
> > --- a/arch/x86/boot/compressed/misc.c
> > +++ b/arch/x86/boot/compressed/misc.c
> > @@ -114,9 +114,6 @@
> >  #undef memset
> >  #define memzero(s, n)  memset((s), 0, (n))
> >
> > -
> > -static void error(char *m);
> > -
> >  /*
> >   * This is set up by the setup-routine at boot-time
> >   */
> > @@ -243,7 +240,7 @@ void __puthex(unsigned long value)
> >         }
> >  }
> >
> > -static void error(char *x)
> > +void error(char *x)
> >  {
> >         error_putstr("\n\n");
> >         error_putstr(x);
> > @@ -378,9 +375,12 @@ static void parse_elf(void *output)
> >  #else
> >                         dest = (void *)(phdr->p_paddr);
> >  #endif
> > -                       memcpy(dest,
> > -                              output + phdr->p_offset,
> > -                              phdr->p_filesz);
> > +                       /*
> > +                        * simple version memcpy only can work when dest is
> > +                        *   smaller than src or no overlapping.
> > +                        * Here dest is smaller than src always.
> > +                        */
> > +                       memcpy(dest, output + phdr->p_offset, phdr->p_filesz);
> >                         break;
> >                 default: /* Ignore other PT_* */ break;
> >                 }
> > diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
> > index 11736a6..39d0e9a 100644
> > --- a/arch/x86/boot/compressed/misc.h
> > +++ b/arch/x86/boot/compressed/misc.h
> > @@ -38,6 +38,8 @@ void __puthex(unsigned long value);
> >  #define error_putstr(__x)  __putstr(__x)
> >  #define error_puthex(__x)  __puthex(__x)
> >
> > +void error(char *x);
> > +
> >  #ifdef CONFIG_X86_VERBOSE_BOOTUP
> >
> >  #define debug_putstr(__x)  __putstr(__x)
> > diff --git a/arch/x86/boot/compressed/string.c b/arch/x86/boot/compressed/string.c
> > index 00e788b..3a935d0 100644
> > --- a/arch/x86/boot/compressed/string.c
> > +++ b/arch/x86/boot/compressed/string.c
> > @@ -1,7 +1,7 @@
> >  #include "../string.c"
> >
> >  #ifdef CONFIG_X86_32
> > -void *memcpy(void *dest, const void *src, size_t n)
> > +void *__memcpy(void *dest, const void *src, size_t n)
> >  {
> >         int d0, d1, d2;
> >         asm volatile(
> > @@ -15,7 +15,7 @@ void *memcpy(void *dest, const void *src, size_t n)
> >         return dest;
> >  }
> >  #else
> > -void *memcpy(void *dest, const void *src, size_t n)
> > +void *__memcpy(void *dest, const void *src, size_t n)
> >  {
> >         long d0, d1, d2;
> >         asm volatile(
> > @@ -30,6 +30,31 @@ void *memcpy(void *dest, const void *src, size_t n)
> >  }
> >  #endif
> >
> > +extern void error(char *x);
> > +void *memcpy(void *dest, const void *src, size_t n)
> > +{
> > +       unsigned long start_dest, end_dest;
> > +       unsigned long start_src, end_src;
> > +       unsigned long max_start, min_end;
> > +
> > +       if (dest < src)
> > +               return __memcpy(dest, src, n);
> > +
> > +       start_dest = (unsigned long)dest;
> > +       end_dest = (unsigned long)dest + n;
> > +       start_src = (unsigned long)src;
> > +       end_src = (unsigned long)src + n;
> > +       max_start = (start_dest > start_src) ?  start_dest : start_src;
> > +       min_end = (end_dest < end_src) ? end_dest : end_src;
> > +
> > +       if (max_start >= min_end)
> > +               return __memcpy(dest, src, n);
> > +
> > +       error("memcpy does not support overlapping with dest > src!\n");
> > +
> > +       return dest;
> > +}
> > +
> >  void *memset(void *s, int c, size_t n)
> >  {
> >         int i;
> > --
> > 2.5.0
> >
> 
> 
> 
> -- 
> Kees Cook
> Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 16/19] x86, kaslr: Randomize physical and virtual address of kernel separately
  2016-03-07 23:51   ` Kees Cook
@ 2016-03-08  5:34     ` Baoquan He
  2016-03-08 18:24       ` Kees Cook
  0 siblings, 1 reply; 51+ messages in thread
From: Baoquan He @ 2016-03-08  5:34 UTC (permalink / raw)
  To: Kees Cook
  Cc: LKML, Yinghai Lu, H. Peter Anvin, Vivek Goyal, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, lasse.collin, Andrew Morton,
	Dave Young

On 03/07/16 at 03:51pm, Kees Cook wrote:
> On Fri, Mar 4, 2016 at 8:25 AM, Baoquan He <bhe@redhat.com> wrote:
> > On x86_64, in old kaslr implementaion only physical address of kernel
> > loading is randomized. Then calculate the delta of physical address
> > where vmlinux was linked to load and where it is finally loaded. If
> > delta is not equal to 0, namely there's a new physical address where
> > kernel is actually decompressed, relocation handling need be done. Then
> > delta is added to offset of kernel symbol relocation, this makes the
> > address of kernel text mapping move delta long.
> >
> > Here the behavior is changed. Randomize both the physical address
> > where kernel is decompressed and the virtual address where kernel text
> > is mapped. And relocation handling only depends on virtual address
> > randomization. Means if and only if virtual address is randomized to
> > a different value, we add the delta to the offset of kernel relocs.
> >
> > Note that up to now both virtual offset and physical addr randomization
> > cann't exceed CONFIG_RANDOMIZE_BASE_MAX_OFFSET, namely 1G.
> 
> It seems like CONFIG_RANDOMIZE_BASE_MAX_OFFSET should have been
> eliminated when the on-demand page table code was added. Once that was
> added, there's no physical max any more. And virtual randomization
> should have no max at all.
For physically random, yes, CONFIG_RANDOMIZE_BASE_MAX_OFFSET is not
needed anymore. But for virtually random,
CONFIG_RANDOMIZE_BASE_MAX_OFFSET is still mandatory since kernel text
mapping and kernel module mapping share the 2G virtual address space as
follows. Though kaslr increase kernel text mapping from 512M to 1G, it's
still limited, can't exceed CONFIG_RANDOMIZE_BASE_MAX_OFFSET.

[0xffffffff80000000, 0xffffffffffffffff] 

But now as you suggested, I would like to change
CONFIG_RANDOMIZE_BASE_MAX_OFFSET to another name because it's only
valid for virtual randomization. A more specific name is better.

> 
> -Kees
> 
> >
> > Signed-off-by: Baoquan He <bhe@redhat.com>
> > ---
> >  arch/x86/boot/compressed/aslr.c | 46 +++++++++++++++++++++--------------------
> >  arch/x86/boot/compressed/misc.c | 40 +++++++++++++++++++++--------------
> >  arch/x86/boot/compressed/misc.h | 19 +++++++++--------
> >  3 files changed, 59 insertions(+), 46 deletions(-)
> >
> > diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
> > index 3730a33..75ea931 100644
> > --- a/arch/x86/boot/compressed/aslr.c
> > +++ b/arch/x86/boot/compressed/aslr.c
> > @@ -349,7 +349,7 @@ static void process_e820_entry(struct e820entry *entry,
> >         }
> >  }
> >
> > -static unsigned long find_random_addr(unsigned long minimum,
> > +static unsigned long find_random_phy_addr(unsigned long minimum,
> >                                       unsigned long size)
> >  {
> >         int i;
> > @@ -387,23 +387,24 @@ static unsigned long find_random_virt_offset(unsigned long minimum,
> >         return random * CONFIG_PHYSICAL_ALIGN + minimum;
> >  }
> >
> > -unsigned char *choose_kernel_location(unsigned char *input,
> > -                                     unsigned long input_size,
> > -                                     unsigned char *output,
> > -                                     unsigned long output_size)
> > +void choose_kernel_location(unsigned char *input,
> > +                               unsigned long input_size,
> > +                               unsigned char **output,
> > +                               unsigned long output_size,
> > +                               unsigned char **virt_offset)
> >  {
> > -       unsigned long choice = (unsigned long)output;
> >         unsigned long random;
> > +       *virt_offset = (unsigned char *)LOAD_PHYSICAL_ADDR;
> >
> >  #ifdef CONFIG_HIBERNATION
> >         if (!cmdline_find_option_bool("kaslr")) {
> >                 debug_putstr("KASLR disabled by default...\n");
> > -               goto out;
> > +               return;
> >         }
> >  #else
> >         if (cmdline_find_option_bool("nokaslr")) {
> >                 debug_putstr("KASLR disabled by cmdline...\n");
> > -               goto out;
> > +               return;
> >         }
> >  #endif
> >
> > @@ -411,23 +412,24 @@ unsigned char *choose_kernel_location(unsigned char *input,
> >
> >         /* Record the various known unsafe memory ranges. */
> >         mem_avoid_init((unsigned long)input, input_size,
> > -                      (unsigned long)output);
> > +                      (unsigned long)*output);
> >
> >         /* Walk e820 and find a random address. */
> > -       random = find_random_addr(choice, output_size);
> > -       if (!random) {
> > +       random = find_random_phy_addr((unsigned long)*output, output_size);
> > +       if (!random)
> >                 debug_putstr("KASLR could not find suitable E820 region...\n");
> > -               goto out;
> > +       else {
> > +               if ((unsigned long)*output != random) {
> > +                       fill_pagetable(random, output_size);
> > +                       switch_pagetable();
> > +                       *output = (unsigned char *)random;
> > +               }
> >         }
> >
> > -       /* Always enforce the minimum. */
> > -       if (random < choice)
> > -               goto out;
> > -
> > -       choice = random;
> > -
> > -       fill_pagetable(choice, output_size);
> > -       switch_pagetable();
> > -out:
> > -       return (unsigned char *)choice;
> > +       /*
> > +        * Get a random address between LOAD_PHYSICAL_ADDR and
> > +        * CONFIG_RANDOMIZE_BASE_MAX_OFFSET
> > +        */
> > +       random = find_random_virt_offset(LOAD_PHYSICAL_ADDR, output_size);
> > +       *virt_offset = (unsigned char *)random;
> >  }
> > diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
> > index 70445c3..362d508 100644
> > --- a/arch/x86/boot/compressed/misc.c
> > +++ b/arch/x86/boot/compressed/misc.c
> > @@ -251,7 +251,8 @@ void error(char *x)
> >  }
> >
> >  #if CONFIG_X86_NEED_RELOCS
> > -static void handle_relocations(void *output, unsigned long output_len)
> > +static void handle_relocations(void *output, unsigned long output_len,
> > +                              void *virt_offset)
> >  {
> >         int *reloc;
> >         unsigned long delta, map, ptr;
> > @@ -263,11 +264,6 @@ static void handle_relocations(void *output, unsigned long output_len)
> >          * and where it was actually loaded.
> >          */
> >         delta = min_addr - LOAD_PHYSICAL_ADDR;
> > -       if (!delta) {
> > -               debug_putstr("No relocation needed... ");
> > -               return;
> > -       }
> > -       debug_putstr("Performing relocations... ");
> >
> >         /*
> >          * The kernel contains a table of relocation addresses. Those
> > @@ -278,6 +274,22 @@ static void handle_relocations(void *output, unsigned long output_len)
> >          */
> >         map = delta - __START_KERNEL_map;
> >
> > +
> > +
> > +       /*
> > +        * 32-bit always performs relocations. 64-bit relocations are only
> > +        * needed if kASLR has chosen a different starting address offset
> > +        * from __START_KERNEL_map.
> > +        */
> > +       if (IS_ENABLED(CONFIG_X86_64))
> > +               delta = (unsigned long)virt_offset - LOAD_PHYSICAL_ADDR;
> > +
> > +       if (!delta) {
> > +               debug_putstr("No relocation needed... ");
> > +               return;
> > +       }
> > +       debug_putstr("Performing relocations... ");
> > +
> >         /*
> >          * Process relocations: 32 bit relocations first then 64 bit after.
> >          * Three sets of binary relocations are added to the end of the kernel
> > @@ -331,7 +343,8 @@ static void handle_relocations(void *output, unsigned long output_len)
> >  #endif
> >  }
> >  #else
> > -static inline void handle_relocations(void *output, unsigned long output_len)
> > +static inline void handle_relocations(void *output, unsigned long output_len,
> > +                                     void *virt_offset)
> >  { }
> >  #endif
> >
> > @@ -397,6 +410,7 @@ asmlinkage __visible void *decompress_kernel(void *rmode, memptr heap,
> >  {
> >         unsigned long run_size = VO__end - VO__text;
> >         unsigned char *output_orig = output;
> > +       unsigned char *virt_offset;
> >
> >         real_mode = rmode;
> >
> > @@ -434,9 +448,10 @@ asmlinkage __visible void *decompress_kernel(void *rmode, memptr heap,
> >          * the entire decompressed kernel plus relocation table, or the
> >          * entire decompressed kernel plus .bss and .brk sections.
> >          */
> > -       output = choose_kernel_location(input_data, input_len, output,
> > +       choose_kernel_location(input_data, input_len, &output,
> >                                         output_len > run_size ? output_len
> > -                                                             : run_size);
> > +                                                             : run_size,
> > +                                       &virt_offset);
> >
> >         /* Validate memory location choices. */
> >         if ((unsigned long)output & (MIN_KERNEL_ALIGN - 1))
> > @@ -457,12 +472,7 @@ asmlinkage __visible void *decompress_kernel(void *rmode, memptr heap,
> >         __decompress(input_data, input_len, NULL, NULL, output, output_len,
> >                         NULL, error);
> >         parse_elf(output);
> > -       /*
> > -        * 32-bit always performs relocations. 64-bit relocations are only
> > -        * needed if kASLR has chosen a different load address.
> > -        */
> > -       if (!IS_ENABLED(CONFIG_X86_64) || output != output_orig)
> > -               handle_relocations(output, output_len);
> > +       handle_relocations(output, output_len, virt_offset);
> >         debug_putstr("done.\nBooting the kernel.\n");
> >         return output;
> >  }
> > diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
> > index 39d0e9a..cde6aec 100644
> > --- a/arch/x86/boot/compressed/misc.h
> > +++ b/arch/x86/boot/compressed/misc.h
> > @@ -69,20 +69,21 @@ int cmdline_find_option_bool(const char *option);
> >
> >  #if CONFIG_RANDOMIZE_BASE
> >  /* aslr.c */
> > -unsigned char *choose_kernel_location(unsigned char *input,
> > -                                     unsigned long input_size,
> > -                                     unsigned char *output,
> > -                                     unsigned long output_size);
> > +void choose_kernel_location(unsigned char *input,
> > +                               unsigned long input_size,
> > +                               unsigned char **output,
> > +                               unsigned long output_size,
> > +                               unsigned char **virt_offset);
> >  /* cpuflags.c */
> >  bool has_cpuflag(int flag);
> >  #else
> >  static inline
> > -unsigned char *choose_kernel_location(unsigned char *input,
> > -                                     unsigned long input_size,
> > -                                     unsigned char *output,
> > -                                     unsigned long output_size)
> > +void choose_kernel_location(unsigned char *input,
> > +                               unsigned long input_size,
> > +                               unsigned char **output,
> > +                               unsigned long output_size,
> > +                               unsigned char **virt_offset)
> >  {
> > -       return output;
> >  }
> >  #endif
> >
> > --
> > 2.5.0
> >
> 
> 
> 
> -- 
> Kees Cook
> Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 05/19] x86, boot: Fix run_size calculation
  2016-03-08  4:57     ` Baoquan He
@ 2016-03-08 18:05       ` Kees Cook
  0 siblings, 0 replies; 51+ messages in thread
From: Kees Cook @ 2016-03-08 18:05 UTC (permalink / raw)
  To: Baoquan He
  Cc: LKML, Yinghai Lu, H. Peter Anvin, Vivek Goyal, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, lasse.collin, Andrew Morton,
	Dave Young, Junjie Mao, Josh Triplett, Matt Fleming

On Mon, Mar 7, 2016 at 8:57 PM, Baoquan He <bhe@redhat.com> wrote:
> On 03/07/16 at 03:10pm, Kees Cook wrote:
>> On Fri, Mar 4, 2016 at 8:25 AM, Baoquan He <bhe@redhat.com> wrote:
>> > From: Yinghai Lu <yinghai@kernel.org>
>> >
>> > Firstly, current run_size is calculated via shell script
>> > arch/x86/tools/calc_run_size.sh. It gets file offset and mem size of section
>> > .bss and .brk in vmlinux, then add them as follows:
>> >
>> > run_size=$(( $offsetA + $sizeA + $sizeB ))
>> >
>> > However this is completely wrong. The offset is the starting address of
>> > section or segment in elf file. Below is a vmlinux I compiled:
>> >
>> > [bhe@x1 linux]$ objdump -h vmlinux
>> >
>> > vmlinux:     file format elf64-x86-64
>> >
>> > Sections:
>> > Idx Name          Size      VMA               LMA               File off  Algn
>> >  27 .bss          00170000  ffffffff81ec8000  0000000001ec8000  012c8000  2**12
>> >                   ALLOC
>> >  28 .brk          00027000  ffffffff82038000  0000000002038000  012c8000  2**0
>> >                   ALLOC
>> >
>> > Here we can get run_size is 0x145f000.
>> > 0x012c8000+0x012c8000+0x00027000=0x145f000
>>
>> This example calculation looks wrong to me. run_size is offset + size
>> + size (not offset + offset + size):
>>
>> 0x12c8000+0x17000+0x27000 = 0x1306000
>
> Yeah, please forgive my carelessness. I copied the wrong size of .bss.
> But you also typied the wrong value of .bss size, it should be 0x170000.
> So the result is still right.
>
> 0x12c8000+0x170000+0x27000 = 0x145f000

Hah, whoops. Yeah. Math is hard! :)

>> > [bhe@x1 linux]$ readelf -l vmlinux
>> >
>> > Elf file type is EXEC (Executable file)
>> > Entry point 0x1000000
>> > There are 5 program headers, starting at offset 64
>> >
>> > Program Headers:
>> >   Type           Offset             VirtAddr           PhysAddr
>> >                  FileSiz            MemSiz              Flags  Align
>> >   LOAD           0x0000000000200000 0xffffffff81000000 0x0000000001000000
>> >                  0x0000000000b5e000 0x0000000000b5e000  R E    200000
>> >   LOAD           0x0000000000e00000 0xffffffff81c00000 0x0000000001c00000
>> >                  0x0000000000145000 0x0000000000145000  RW     200000
>> >   LOAD           0x0000000001000000 0x0000000000000000 0x0000000001d45000
>> >                  0x0000000000018158 0x0000000000018158  RW     200000
>> >   LOAD           0x000000000115e000 0xffffffff81d5e000 0x0000000001d5e000
>> >                  0x000000000016a000 0x0000000000301000  RWE    200000
>> >   NOTE           0x000000000099bcac 0xffffffff8179bcac 0x000000000179bcac
>> >                  0x00000000000001bc 0x00000000000001bc         4
>> >
>> >  Section to Segment mapping:
>> >   Segment Sections...
>> >    00     .text .notes __ex_table .rodata __bug_table .pci_fixup .tracedata __ksymtab __ksymtab_gpl __ksymtab_strings __init_rodata __param __modver
>> >    01     .data .vvar
>> >    02     .data..percpu
>> >    03     .init.text .init.data .x86_cpu_dev.init .parainstructions .altinstructions .altinstr_replacement .iommu_table .apicdrivers .exit.text .smp_locks .bss .brk
>> >    04     .notes
>> >
>> > Here we can get the same value as current run_size if we add p_offset
>> > and p_memsz.
>> > 0x000000000115e000+0x0000000000301000=0x145f000
>> >
>> > But is it right? Obviously not. We should calculate it using the last LOAD
>> > program segment like this:
>> > run_size = phdr->p_paddr + phdr->p_memsz - physical load addr of kernel
>> > run_size=0x0000000001d5e000+0x0000000000301000-0x0000000001000000=0x105f000
>>
>> Segment 03 ends at 0xffffffff81d5e000 + 0x301000 = 0xffffffff8205f000,
>> which does match where .brk ends (0xffffffff82038000 + 0x27000 =
>> 0xffffffff8205f000).
>
> Ah, yes, exactly. They prove it in different way.

Yeah, I'm satisfied that this change is correct. I just got confused
by the earlier example. :)

>
>>
>> >
>> > It's equal to VO_end-VO_text and certainly it's simpler to do.
>> > _end: 0xffffffff8205f000
>> > _text:0xffffffff81000000
>> > run_size = 0xffffffff8205f000-0xffffffff81000000=0x105f000
>>
>> I would agree, it would seem like the existing run_size calculation is
>> 0x247000 too high in this example.
>
> It should be 0x400000 high as you mistakenly input the size of .bss ^_^.
> 0x145f000 - 0x105f000 = 0x400000
>
> Extra 4M is added in this example.

Right. Thanks!

-Kees


-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 07/19] x86, kaslr: Get correct max_addr for relocs pointer
  2016-03-08  5:13     ` Baoquan He
@ 2016-03-08 18:16       ` Kees Cook
  2016-03-09 13:46         ` Baoquan He
  0 siblings, 1 reply; 51+ messages in thread
From: Kees Cook @ 2016-03-08 18:16 UTC (permalink / raw)
  To: Baoquan He
  Cc: LKML, Yinghai Lu, H. Peter Anvin, Vivek Goyal, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, lasse.collin, Andrew Morton,
	Dave Young

On Mon, Mar 7, 2016 at 9:13 PM, Baoquan He <bhe@redhat.com> wrote:
> On 03/07/16 at 03:16pm, Kees Cook wrote:
>> On Fri, Mar 4, 2016 at 8:25 AM, Baoquan He <bhe@redhat.com> wrote:
>> > From: Yinghai Lu <yinghai@kernel.org>
>> >
>> > There is boundary checking for pointer in kaslr relocation handling.
>> >
>> > Current code is using output_len, and that is VO (vmlinux after objcopy)
>> > file size plus vmlinux.relocs file size.
>> >
>> > That is not right, as we should use loaded address for running.
>>
>> I think this needs clarification. If I'm understanding correctly, the
>> max_addr check should stop at the end of VO, and not include bss, brk,
>> etc. which follows. In which case, this change is correct: the bounds
>> checking needed to be tightened.
>
> Yes, exactly as you means.
>
>>
>>
>> >
>> > At that time parse_elf already move the sections according to ELF headers.
>> >
>> > The valid range should be VO [_text, __bss_start) loaded physical addresses.
>
> I think it is expressed here. Maybe we can stress it in this patch log.
> After all it's the main idea of this patch.
>
> How about rewriting patch log like this:
>
> ********************
> There is boundary checking for pointer in kaslr relocation handling.
> However current code is using output_len that is VO (vmlinux after objcopy)
> file size plus vmlinux.relocs file size to make the valid boundary.
>
> That is not right since the max_addr check should stop at the end of VO
> and excludes bss, brk etc, which follows. Means the valid range should
> be VO [_text, __bss_start] in loaded physical address space.
>
> In this patch, add export for __bss_start to voffset.h and use it to get
> max_addr.
> ********************

I would say:

Relocation handling performs bounds checking on the relocated
addresses. The existing code uses output_len (VO size plus relocs
size) as the max address. This is not right since the max_addr check
should stop at the end of VO and exclude bss, brk, etc, which follows.
The valid range should be VO [_text, __bss_start] in the loaded
physical address space.

In this patch, add export for __bss_start to voffset.h and use it to
get the correct max_addr.


Thanks!

-Kees

>
>> >
>> > In the patch, add export for __bss_start to voffset.h and use it to get
>> > max_addr.
>> >
>> > Signed-off-by: Yinghai Lu <yinghai@kernel.org>
>> > ---
>> > v2->v3:
>> >     Tune the patch log.
>> >
>> >  arch/x86/boot/compressed/Makefile | 2 +-
>> >  arch/x86/boot/compressed/misc.c   | 2 +-
>> >  2 files changed, 2 insertions(+), 2 deletions(-)
>> >
>> > diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
>> > index fef80fa..2e7c0ce 100644
>> > --- a/arch/x86/boot/compressed/Makefile
>> > +++ b/arch/x86/boot/compressed/Makefile
>> > @@ -41,7 +41,7 @@ LDFLAGS_vmlinux := -T
>> >  hostprogs-y    := mkpiggy
>> >  HOST_EXTRACFLAGS += -I$(srctree)/tools/include
>> >
>> > -sed-voffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(_text\|_end\)$$/\#define VO_\2 _AC(0x\1,UL)/p'
>> > +sed-voffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(_text\|__bss_start\|_end\)$$/\#define VO_\2 _AC(0x\1,UL)/p'
>> >
>> >  quiet_cmd_voffset = VOFFSET $@
>> >        cmd_voffset = $(NM) $< | sed -n $(sed-voffset) > $@
>> > diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
>> > index 069120e..dd7ed8a 100644
>> > --- a/arch/x86/boot/compressed/misc.c
>> > +++ b/arch/x86/boot/compressed/misc.c
>> > @@ -259,7 +259,7 @@ static void handle_relocations(void *output, unsigned long output_len)
>> >         int *reloc;
>> >         unsigned long delta, map, ptr;
>> >         unsigned long min_addr = (unsigned long)output;
>> > -       unsigned long max_addr = min_addr + output_len;
>> > +       unsigned long max_addr = min_addr + (VO___bss_start - VO__text);
>> >
>> >         /*
>> >          * Calculate the delta between where vmlinux was linked to load
>> > --
>> > 2.5.0
>> >
>>
>>
>>
>> --
>> Kees Cook
>> Chrome OS & Brillo Security



-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 08/19] x86, kaslr: Consolidate mem_avoid array filling
  2016-03-08  5:21     ` Baoquan He
@ 2016-03-08 18:17       ` Kees Cook
  2016-03-09 13:42         ` Baoquan He
  0 siblings, 1 reply; 51+ messages in thread
From: Kees Cook @ 2016-03-08 18:17 UTC (permalink / raw)
  To: Baoquan He
  Cc: LKML, Yinghai Lu, H. Peter Anvin, Vivek Goyal, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, lasse.collin, Andrew Morton,
	Dave Young

On Mon, Mar 7, 2016 at 9:21 PM, Baoquan He <bhe@redhat.com> wrote:
> On 03/07/16 at 03:28pm, Kees Cook wrote:
>> On Fri, Mar 4, 2016 at 8:25 AM, Baoquan He <bhe@redhat.com> wrote:
>> > From: Yinghai Lu <yinghai@kernel.org>
>> >
>> > We are going to support kaslr with 64bit above 4G, and new random output
>> > could be anywhere. Array mem_avoid is used for kaslr to search new output
>> > address. Current code only track range that is after output+output_size.
>> > So we need to track all ranges instead of just after output+output_size.
>> >
>> > Current code has first entry which is extra bytes before input+input_size,
>> > and it is according to output_size. Other entries are for initrd, cmdline,
>> > and heap/stack for ZO running.
>> >
>> > At first, let's check the first entry that should be in the mem_avoid array.
>> > Now ZO sit end of the buffer always, we can find out where is text and
>> > data/bss etc of ZO.
>> >
>> > Since init_size >= run_size, and input+input_len >= output+output_len,
>> > here make several assumptions for better presentation by graph:
>> >  - init_size > run_size
>> >  - input+input_len > output+output_len
>> >  - run_size > output_len
>>
>> I would like to see each of these assumptions justified. Why is
>> init_size > run_size, etc?
>> choose_kernel_location's "output_size" is calculated as max(run_size,
>> output_len), so run_size may not be > output_len...
>
> Sure. I will add this case in next post. Thanks a lot.
>>
>> >
>> > 0   output                       input             input+input_len          output+init_size
>> > |     |                            |                       |                       |
>> > |-----|-------------------|--------|------------------|----|------------|----------|
>> >                           |                           |                 |
>> >              output+init_size-ZO_INIT_SIZE    output+output_len    output+run_size
>> >
>> > [output, output+init_size) is the for decompressing buffer for compressed
>> > kernel.
>> >
>> > [output, output+run_size) is for VO run size.
>> > [output, output+output_len) is (VO (vmlinux after objcopy) plus relocs)
>> >
>> > [output+init_size-ZO_INIT_SIZE, output+init_size) is copied ZO.
>> > [input, input+input_len) is copied compressed (VO (vmlinux after objcopy)
>> > plus relocs), not the ZO.
>> >
>> > [input+input_len, output+init_size) is [_text, _end) for ZO. that could be
>> > first range in mem_avoid. Now the new first entry already includes heap and
>> > stack for ZO running. So no need to put them separately into mem_avoid array.
>> >
>> > Also [input, input+input_size) need be put in mem_avoid array. It is adjacent
>> > to new first entry, so merge them.
>>
>> I wonder if this diagram and description should live in a comment with the code.
>
> I think it would be very helpful for people interested in this process.
> Do you think it's ok to put it where init_size is calculated in
> boot/header.S?  Or other suitable places?

Let's put it in aslr.c since it's the biggest user of these
calculations? I would tend to avoid putting large comments in a .S
file, but that's just personal preference.

-Kees

>>
>>
>> >
>> > At last we need to put boot_params into the mem_avoid too. As with 64bit
>> > bootloader could put it anywhere.
>> >
>> > After these changes, we have all ranges which need be avoided in mem_avoid
>> > array.
>> >
>> > Cc: Kees Cook <keescook@chromium.org>
>> > Signed-off-by: Yinghai Lu <yinghai@kernel.org>
>> > ---
>> > v2->v3:
>> >     Adjust the patch log.
>> >
>> >  arch/x86/boot/compressed/aslr.c | 29 +++++++++++++----------------
>> >  1 file changed, 13 insertions(+), 16 deletions(-)
>> >
>> > diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
>> > index 622aa88..b93be03 100644
>> > --- a/arch/x86/boot/compressed/aslr.c
>> > +++ b/arch/x86/boot/compressed/aslr.c
>> > @@ -109,7 +109,7 @@ struct mem_vector {
>> >         unsigned long size;
>> >  };
>> >
>> > -#define MEM_AVOID_MAX 5
>> > +#define MEM_AVOID_MAX 4
>> >  static struct mem_vector mem_avoid[MEM_AVOID_MAX];
>> >
>> >  static bool mem_contains(struct mem_vector *region, struct mem_vector *item)
>> > @@ -135,21 +135,22 @@ static bool mem_overlaps(struct mem_vector *one, struct mem_vector *two)
>> >  }
>> >
>> >  static void mem_avoid_init(unsigned long input, unsigned long input_size,
>> > -                          unsigned long output, unsigned long output_size)
>> > +                          unsigned long output)
>> >  {
>> > +       unsigned long init_size = real_mode->hdr.init_size;
>> >         u64 initrd_start, initrd_size;
>> >         u64 cmd_line, cmd_line_size;
>> > -       unsigned long unsafe, unsafe_len;
>> >         char *ptr;
>> >
>> >         /*
>> >          * Avoid the region that is unsafe to overlap during
>> > -        * decompression (see calculations at top of misc.c).
>> > +        * decompression.
>> > +        * As we already move ZO (arch/x86/boot/compressed/vmlinux)
>> > +        * to the end of buffer, [input+input_size, output+init_size)
>> > +        * has [_text, _end) for ZO.
>> >          */
>> > -       unsafe_len = (output_size >> 12) + 32768 + 18;
>> > -       unsafe = (unsigned long)input + input_size - unsafe_len;
>> > -       mem_avoid[0].start = unsafe;
>> > -       mem_avoid[0].size = unsafe_len;
>> > +       mem_avoid[0].start = input;
>> > +       mem_avoid[0].size = (output + init_size) - input;
>> >
>> >         /* Avoid initrd. */
>> >         initrd_start  = (u64)real_mode->ext_ramdisk_image << 32;
>> > @@ -169,13 +170,9 @@ static void mem_avoid_init(unsigned long input, unsigned long input_size,
>> >         mem_avoid[2].start = cmd_line;
>> >         mem_avoid[2].size = cmd_line_size;
>> >
>> > -       /* Avoid heap memory. */
>> > -       mem_avoid[3].start = (unsigned long)free_mem_ptr;
>> > -       mem_avoid[3].size = BOOT_HEAP_SIZE;
>> > -
>> > -       /* Avoid stack memory. */
>> > -       mem_avoid[4].start = (unsigned long)free_mem_end_ptr;
>> > -       mem_avoid[4].size = BOOT_STACK_SIZE;
>> > +       /* Avoid params */
>> > +       mem_avoid[3].start = (unsigned long)real_mode;
>> > +       mem_avoid[3].size = sizeof(*real_mode);
>> >  }
>> >
>> >  /* Does this memory vector overlap a known avoided area? */
>> > @@ -319,7 +316,7 @@ unsigned char *choose_kernel_location(unsigned char *input,
>> >
>> >         /* Record the various known unsafe memory ranges. */
>> >         mem_avoid_init((unsigned long)input, input_size,
>> > -                      (unsigned long)output, output_size);
>> > +                      (unsigned long)output);
>> >
>> >         /* Walk e820 and find a random address. */
>> >         random = find_random_addr(choice, output_size);
>> > --
>> > 2.5.0
>> >
>>
>>
>>
>> --
>> Kees Cook
>> Chrome OS & Brillo Security



-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 16/19] x86, kaslr: Randomize physical and virtual address of kernel separately
  2016-03-08  5:34     ` Baoquan He
@ 2016-03-08 18:24       ` Kees Cook
  2016-03-09 13:40         ` Baoquan He
  0 siblings, 1 reply; 51+ messages in thread
From: Kees Cook @ 2016-03-08 18:24 UTC (permalink / raw)
  To: Baoquan He
  Cc: LKML, Yinghai Lu, H. Peter Anvin, Vivek Goyal, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, lasse.collin, Andrew Morton,
	Dave Young

On Mon, Mar 7, 2016 at 9:34 PM, Baoquan He <bhe@redhat.com> wrote:
> On 03/07/16 at 03:51pm, Kees Cook wrote:
>> On Fri, Mar 4, 2016 at 8:25 AM, Baoquan He <bhe@redhat.com> wrote:
>> > On x86_64, in old kaslr implementaion only physical address of kernel
>> > loading is randomized. Then calculate the delta of physical address
>> > where vmlinux was linked to load and where it is finally loaded. If
>> > delta is not equal to 0, namely there's a new physical address where
>> > kernel is actually decompressed, relocation handling need be done. Then
>> > delta is added to offset of kernel symbol relocation, this makes the
>> > address of kernel text mapping move delta long.
>> >
>> > Here the behavior is changed. Randomize both the physical address
>> > where kernel is decompressed and the virtual address where kernel text
>> > is mapped. And relocation handling only depends on virtual address
>> > randomization. Means if and only if virtual address is randomized to
>> > a different value, we add the delta to the offset of kernel relocs.
>> >
>> > Note that up to now both virtual offset and physical addr randomization
>> > cann't exceed CONFIG_RANDOMIZE_BASE_MAX_OFFSET, namely 1G.
>>
>> It seems like CONFIG_RANDOMIZE_BASE_MAX_OFFSET should have been
>> eliminated when the on-demand page table code was added. Once that was
>> added, there's no physical max any more. And virtual randomization
>> should have no max at all.
> For physically random, yes, CONFIG_RANDOMIZE_BASE_MAX_OFFSET is not
> needed anymore. But for virtually random,
> CONFIG_RANDOMIZE_BASE_MAX_OFFSET is still mandatory since kernel text
> mapping and kernel module mapping share the 2G virtual address space as
> follows. Though kaslr increase kernel text mapping from 512M to 1G, it's
> still limited, can't exceed CONFIG_RANDOMIZE_BASE_MAX_OFFSET.
>
> [0xffffffff80000000, 0xffffffffffffffff]
>
> But now as you suggested, I would like to change
> CONFIG_RANDOMIZE_BASE_MAX_OFFSET to another name because it's only
> valid for virtual randomization. A more specific name is better.

Yes, right, the virtual has a 1G max, but I meant that it doesn't need
to be a CONFIG item any more. Physical can use physical memory max as
its max, and virtual max can now be calculated from the existing text
mapping size.

-Kees

-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 16/19] x86, kaslr: Randomize physical and virtual address of kernel separately
  2016-03-08 18:24       ` Kees Cook
@ 2016-03-09 13:40         ` Baoquan He
  2016-03-09 18:07           ` Kees Cook
  0 siblings, 1 reply; 51+ messages in thread
From: Baoquan He @ 2016-03-09 13:40 UTC (permalink / raw)
  To: Kees Cook
  Cc: LKML, Yinghai Lu, H. Peter Anvin, Vivek Goyal, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, lasse.collin, Andrew Morton,
	Dave Young

On 03/08/16 at 10:24am, Kees Cook wrote:
> On Mon, Mar 7, 2016 at 9:34 PM, Baoquan He <bhe@redhat.com> wrote:
> >> It seems like CONFIG_RANDOMIZE_BASE_MAX_OFFSET should have been
> >> eliminated when the on-demand page table code was added. Once that was
> >> added, there's no physical max any more. And virtual randomization
> >> should have no max at all.
> > For physically random, yes, CONFIG_RANDOMIZE_BASE_MAX_OFFSET is not
> > needed anymore. But for virtually random,
> > CONFIG_RANDOMIZE_BASE_MAX_OFFSET is still mandatory since kernel text
> > mapping and kernel module mapping share the 2G virtual address space as
> > follows. Though kaslr increase kernel text mapping from 512M to 1G, it's
> > still limited, can't exceed CONFIG_RANDOMIZE_BASE_MAX_OFFSET.
> >
> > [0xffffffff80000000, 0xffffffffffffffff]
> >
> > But now as you suggested, I would like to change
> > CONFIG_RANDOMIZE_BASE_MAX_OFFSET to another name because it's only
> > valid for virtual randomization. A more specific name is better.
> 
> Yes, right, the virtual has a 1G max, but I meant that it doesn't need
> to be a CONFIG item any more. Physical can use physical memory max as
> its max, and virtual max can now be calculated from the existing text
> mapping size.

Got it, it should be KERNEL_IMAGE_SIZE instead, right?

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 08/19] x86, kaslr: Consolidate mem_avoid array filling
  2016-03-08 18:17       ` Kees Cook
@ 2016-03-09 13:42         ` Baoquan He
  0 siblings, 0 replies; 51+ messages in thread
From: Baoquan He @ 2016-03-09 13:42 UTC (permalink / raw)
  To: Kees Cook
  Cc: LKML, Yinghai Lu, H. Peter Anvin, Vivek Goyal, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, lasse.collin, Andrew Morton,
	Dave Young

On 03/08/16 at 10:17am, Kees Cook wrote:
> On Mon, Mar 7, 2016 at 9:21 PM, Baoquan He <bhe@redhat.com> wrote:
> > On 03/07/16 at 03:28pm, Kees Cook wrote:
> >> On Fri, Mar 4, 2016 at 8:25 AM, Baoquan He <bhe@redhat.com> wrote:
> >> > From: Yinghai Lu <yinghai@kernel.org>
> >> >
> >> > We are going to support kaslr with 64bit above 4G, and new random output
> >> > could be anywhere. Array mem_avoid is used for kaslr to search new output
> >> > address. Current code only track range that is after output+output_size.
> >> > So we need to track all ranges instead of just after output+output_size.
> >> >
> >> > Current code has first entry which is extra bytes before input+input_size,
> >> > and it is according to output_size. Other entries are for initrd, cmdline,
> >> > and heap/stack for ZO running.
> >> >
> >> > At first, let's check the first entry that should be in the mem_avoid array.
> >> > Now ZO sit end of the buffer always, we can find out where is text and
> >> > data/bss etc of ZO.
> >> >
> >> > Since init_size >= run_size, and input+input_len >= output+output_len,
> >> > here make several assumptions for better presentation by graph:
> >> >  - init_size > run_size
> >> >  - input+input_len > output+output_len
> >> >  - run_size > output_len
> >>
> >> I would like to see each of these assumptions justified. Why is
> >> init_size > run_size, etc?
> >> choose_kernel_location's "output_size" is calculated as max(run_size,
> >> output_len), so run_size may not be > output_len...
> >
> > Sure. I will add this case in next post. Thanks a lot.
> >>
> >> >
> >> > 0   output                       input             input+input_len          output+init_size
> >> > |     |                            |                       |                       |
> >> > |-----|-------------------|--------|------------------|----|------------|----------|
> >> >                           |                           |                 |
> >> >              output+init_size-ZO_INIT_SIZE    output+output_len    output+run_size
> >> >
> >> > [output, output+init_size) is the for decompressing buffer for compressed
> >> > kernel.
> >> >
> >> > [output, output+run_size) is for VO run size.
> >> > [output, output+output_len) is (VO (vmlinux after objcopy) plus relocs)
> >> >
> >> > [output+init_size-ZO_INIT_SIZE, output+init_size) is copied ZO.
> >> > [input, input+input_len) is copied compressed (VO (vmlinux after objcopy)
> >> > plus relocs), not the ZO.
> >> >
> >> > [input+input_len, output+init_size) is [_text, _end) for ZO. that could be
> >> > first range in mem_avoid. Now the new first entry already includes heap and
> >> > stack for ZO running. So no need to put them separately into mem_avoid array.
> >> >
> >> > Also [input, input+input_size) need be put in mem_avoid array. It is adjacent
> >> > to new first entry, so merge them.
> >>
> >> I wonder if this diagram and description should live in a comment with the code.
> >
> > I think it would be very helpful for people interested in this process.
> > Do you think it's ok to put it where init_size is calculated in
> > boot/header.S?  Or other suitable places?
> 
> Let's put it in aslr.c since it's the biggest user of these
> calculations? I would tend to avoid putting large comments in a .S
> file, but that's just personal preference.

Yeah, agree. Will put it in aslr.c.

Thanks a lot for your good suggestions.

> 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 07/19] x86, kaslr: Get correct max_addr for relocs pointer
  2016-03-08 18:16       ` Kees Cook
@ 2016-03-09 13:46         ` Baoquan He
  0 siblings, 0 replies; 51+ messages in thread
From: Baoquan He @ 2016-03-09 13:46 UTC (permalink / raw)
  To: Kees Cook
  Cc: LKML, Yinghai Lu, H. Peter Anvin, Vivek Goyal, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, lasse.collin, Andrew Morton,
	Dave Young

On 03/08/16 at 10:16am, Kees Cook wrote:
> On Mon, Mar 7, 2016 at 9:13 PM, Baoquan He <bhe@redhat.com> wrote:
> > On 03/07/16 at 03:16pm, Kees Cook wrote:
> >> On Fri, Mar 4, 2016 at 8:25 AM, Baoquan He <bhe@redhat.com> wrote:
> >> > From: Yinghai Lu <yinghai@kernel.org>
> >> >
> >> > There is boundary checking for pointer in kaslr relocation handling.
> >> >
> >> > Current code is using output_len, and that is VO (vmlinux after objcopy)
> >> > file size plus vmlinux.relocs file size.
> >> >
> >> > That is not right, as we should use loaded address for running.
> >>
> >> I think this needs clarification. If I'm understanding correctly, the
> >> max_addr check should stop at the end of VO, and not include bss, brk,
> >> etc. which follows. In which case, this change is correct: the bounds
> >> checking needed to be tightened.
> >
> > Yes, exactly as you means.
> >
> >>
> >>
> >> >
> >> > At that time parse_elf already move the sections according to ELF headers.
> >> >
> >> > The valid range should be VO [_text, __bss_start) loaded physical addresses.
> >
> > I think it is expressed here. Maybe we can stress it in this patch log.
> > After all it's the main idea of this patch.
> >
> > How about rewriting patch log like this:
> >
> > ********************
> > There is boundary checking for pointer in kaslr relocation handling.
> > However current code is using output_len that is VO (vmlinux after objcopy)
> > file size plus vmlinux.relocs file size to make the valid boundary.
> >
> > That is not right since the max_addr check should stop at the end of VO
> > and excludes bss, brk etc, which follows. Means the valid range should
> > be VO [_text, __bss_start] in loaded physical address space.
> >
> > In this patch, add export for __bss_start to voffset.h and use it to get
> > max_addr.
> > ********************
> 
> I would say:
> 
> Relocation handling performs bounds checking on the relocated
> addresses. The existing code uses output_len (VO size plus relocs
> size) as the max address. This is not right since the max_addr check
> should stop at the end of VO and exclude bss, brk, etc, which follows.
> The valid range should be VO [_text, __bss_start] in the loaded
> physical address space.
> 
> In this patch, add export for __bss_start to voffset.h and use it to
> get the correct max_addr.

Yeah, it's much better. Will use this as patch log.

Thanks!

> 
> 
> Thanks!
> 
> -Kees
> 
> >
> >> >
> >> > In the patch, add export for __bss_start to voffset.h and use it to get
> >> > max_addr.
> >> >
> >> > Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> >> > ---
> >> > v2->v3:
> >> >     Tune the patch log.
> >> >
> >> >  arch/x86/boot/compressed/Makefile | 2 +-
> >> >  arch/x86/boot/compressed/misc.c   | 2 +-
> >> >  2 files changed, 2 insertions(+), 2 deletions(-)
> >> >
> >> > diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
> >> > index fef80fa..2e7c0ce 100644
> >> > --- a/arch/x86/boot/compressed/Makefile
> >> > +++ b/arch/x86/boot/compressed/Makefile
> >> > @@ -41,7 +41,7 @@ LDFLAGS_vmlinux := -T
> >> >  hostprogs-y    := mkpiggy
> >> >  HOST_EXTRACFLAGS += -I$(srctree)/tools/include
> >> >
> >> > -sed-voffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(_text\|_end\)$$/\#define VO_\2 _AC(0x\1,UL)/p'
> >> > +sed-voffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(_text\|__bss_start\|_end\)$$/\#define VO_\2 _AC(0x\1,UL)/p'
> >> >
> >> >  quiet_cmd_voffset = VOFFSET $@
> >> >        cmd_voffset = $(NM) $< | sed -n $(sed-voffset) > $@
> >> > diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
> >> > index 069120e..dd7ed8a 100644
> >> > --- a/arch/x86/boot/compressed/misc.c
> >> > +++ b/arch/x86/boot/compressed/misc.c
> >> > @@ -259,7 +259,7 @@ static void handle_relocations(void *output, unsigned long output_len)
> >> >         int *reloc;
> >> >         unsigned long delta, map, ptr;
> >> >         unsigned long min_addr = (unsigned long)output;
> >> > -       unsigned long max_addr = min_addr + output_len;
> >> > +       unsigned long max_addr = min_addr + (VO___bss_start - VO__text);
> >> >
> >> >         /*
> >> >          * Calculate the delta between where vmlinux was linked to load
> >> > --
> >> > 2.5.0
> >> >
> >>
> >>
> >>
> >> --
> >> Kees Cook
> >> Chrome OS & Brillo Security
> 
> 
> 
> -- 
> Kees Cook
> Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 16/19] x86, kaslr: Randomize physical and virtual address of kernel separately
  2016-03-09 13:40         ` Baoquan He
@ 2016-03-09 18:07           ` Kees Cook
  2016-03-10 15:15             ` Baoquan He
  0 siblings, 1 reply; 51+ messages in thread
From: Kees Cook @ 2016-03-09 18:07 UTC (permalink / raw)
  To: Baoquan He
  Cc: LKML, Yinghai Lu, H. Peter Anvin, Vivek Goyal, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, lasse.collin, Andrew Morton,
	Dave Young

On Wed, Mar 9, 2016 at 5:40 AM, Baoquan He <bhe@redhat.com> wrote:
> On 03/08/16 at 10:24am, Kees Cook wrote:
>> On Mon, Mar 7, 2016 at 9:34 PM, Baoquan He <bhe@redhat.com> wrote:
>> >> It seems like CONFIG_RANDOMIZE_BASE_MAX_OFFSET should have been
>> >> eliminated when the on-demand page table code was added. Once that was
>> >> added, there's no physical max any more. And virtual randomization
>> >> should have no max at all.
>> > For physically random, yes, CONFIG_RANDOMIZE_BASE_MAX_OFFSET is not
>> > needed anymore. But for virtually random,
>> > CONFIG_RANDOMIZE_BASE_MAX_OFFSET is still mandatory since kernel text
>> > mapping and kernel module mapping share the 2G virtual address space as
>> > follows. Though kaslr increase kernel text mapping from 512M to 1G, it's
>> > still limited, can't exceed CONFIG_RANDOMIZE_BASE_MAX_OFFSET.
>> >
>> > [0xffffffff80000000, 0xffffffffffffffff]
>> >
>> > But now as you suggested, I would like to change
>> > CONFIG_RANDOMIZE_BASE_MAX_OFFSET to another name because it's only
>> > valid for virtual randomization. A more specific name is better.
>>
>> Yes, right, the virtual has a 1G max, but I meant that it doesn't need
>> to be a CONFIG item any more. Physical can use physical memory max as
>> its max, and virtual max can now be calculated from the existing text
>> mapping size.
>
> Got it, it should be KERNEL_IMAGE_SIZE instead, right?

Yup, that should be right. And removing
CONFIG_RANDOMIZE_BASE_MAX_OFFSET will be a nice clean up, actually. It
can be its own patch separate from the rest of the changes. The config
was likely never needed (the Kconfigs already default to max values),
but I had it there out of a desire to make everything configurable
early on when it was a new feature.

-Kees

-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 16/19] x86, kaslr: Randomize physical and virtual address of kernel separately
  2016-03-09 18:07           ` Kees Cook
@ 2016-03-10 15:15             ` Baoquan He
  0 siblings, 0 replies; 51+ messages in thread
From: Baoquan He @ 2016-03-10 15:15 UTC (permalink / raw)
  To: Kees Cook
  Cc: LKML, Yinghai Lu, H. Peter Anvin, Vivek Goyal, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, lasse.collin, Andrew Morton,
	Dave Young

On 03/09/16 at 10:07am, Kees Cook wrote:
> On Wed, Mar 9, 2016 at 5:40 AM, Baoquan He <bhe@redhat.com> wrote:
> > On 03/08/16 at 10:24am, Kees Cook wrote:
> >> On Mon, Mar 7, 2016 at 9:34 PM, Baoquan He <bhe@redhat.com> wrote:
> >> >> It seems like CONFIG_RANDOMIZE_BASE_MAX_OFFSET should have been
> >> >> eliminated when the on-demand page table code was added. Once that was
> >> >> added, there's no physical max any more. And virtual randomization
> >> >> should have no max at all.
> >> > For physically random, yes, CONFIG_RANDOMIZE_BASE_MAX_OFFSET is not
> >> > needed anymore. But for virtually random,
> >> > CONFIG_RANDOMIZE_BASE_MAX_OFFSET is still mandatory since kernel text
> >> > mapping and kernel module mapping share the 2G virtual address space as
> >> > follows. Though kaslr increase kernel text mapping from 512M to 1G, it's
> >> > still limited, can't exceed CONFIG_RANDOMIZE_BASE_MAX_OFFSET.
> >> >
> >> > [0xffffffff80000000, 0xffffffffffffffff]
> >> >
> >> > But now as you suggested, I would like to change
> >> > CONFIG_RANDOMIZE_BASE_MAX_OFFSET to another name because it's only
> >> > valid for virtual randomization. A more specific name is better.
> >>
> >> Yes, right, the virtual has a 1G max, but I meant that it doesn't need
> >> to be a CONFIG item any more. Physical can use physical memory max as
> >> its max, and virtual max can now be calculated from the existing text
> >> mapping size.
> >
> > Got it, it should be KERNEL_IMAGE_SIZE instead, right?
> 
> Yup, that should be right. And removing
> CONFIG_RANDOMIZE_BASE_MAX_OFFSET will be a nice clean up, actually. It
> can be its own patch separate from the rest of the changes. The config
> was likely never needed (the Kconfigs already default to max values),
> but I had it there out of a desire to make everything configurable
> early on when it was a new feature.

OK, will check code and make a separate patch to fix this as you
suggested. Thanks.

Baoquan

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 10/19] x86, 64bit: Set ident_mapping for kaslr
  2016-03-07 23:34   ` Kees Cook
  2016-03-08  5:25     ` Baoquan He
@ 2016-03-21  7:50     ` Baoquan He
  2016-03-21 19:48       ` Kees Cook
  1 sibling, 1 reply; 51+ messages in thread
From: Baoquan He @ 2016-03-21  7:50 UTC (permalink / raw)
  To: Kees Cook
  Cc: LKML, Yinghai Lu, H. Peter Anvin, Vivek Goyal, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, lasse.collin, Andrew Morton,
	Dave Young, Jiri Kosina, Borislav Petkov, Matt Fleming

On 03/07/16 at 03:34pm, Kees Cook wrote:
> >  #ifdef CONFIG_EARLY_PRINTK
> >  /* early_serial_console.c */
> >  extern int early_serial_base;
> > diff --git a/arch/x86/boot/compressed/misc_pgt.c b/arch/x86/boot/compressed/misc_pgt.c
> > new file mode 100644
> > index 0000000..954811e
> > --- /dev/null
> > +++ b/arch/x86/boot/compressed/misc_pgt.c
> > @@ -0,0 +1,91 @@
> > +#define __pa(x)  ((unsigned long)(x))
> > +#define __va(x)  ((void *)((unsigned long)(x)))
> > +
> > +#include "misc.h"
> > +
> > +#include <asm/init.h>
> > +#include <asm/pgtable.h>
> > +
> > +#include "../../mm/ident_map.c"
> > +#include "../string.h"
> > +
> > +struct alloc_pgt_data {
> > +       unsigned char *pgt_buf;
> > +       unsigned long pgt_buf_size;
> > +       unsigned long pgt_buf_offset;
> > +};
> > +
> > +static void *alloc_pgt_page(void *context)
> > +{
> > +       struct alloc_pgt_data *d = (struct alloc_pgt_data *)context;
> > +       unsigned char *p = (unsigned char *)d->pgt_buf;
> > +
> > +       if (d->pgt_buf_offset >= d->pgt_buf_size) {
> > +               debug_putstr("out of pgt_buf in misc.c\n");
> 
> This line probably isn't needed any more. Or, if it is, maybe it could
> include more details from the context?

I am trying to repost, found I missed this place.

Is it OK to add more informatin in printing as below?
if (d->pgt_buf_offset >= d->pgt_buf_size) {
                debug_putstr("out of pgt_buf in misc.c\n");
                debug_putaddr(d->pgt_buf_offset);
                debug_putaddr(d->pgt_buf_size);

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 10/19] x86, 64bit: Set ident_mapping for kaslr
  2016-03-21  7:50     ` Baoquan He
@ 2016-03-21 19:48       ` Kees Cook
  0 siblings, 0 replies; 51+ messages in thread
From: Kees Cook @ 2016-03-21 19:48 UTC (permalink / raw)
  To: Baoquan He
  Cc: LKML, Yinghai Lu, H. Peter Anvin, Vivek Goyal, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, lasse.collin, Andrew Morton,
	Dave Young, Jiri Kosina, Borislav Petkov, Matt Fleming

On Mon, Mar 21, 2016 at 12:50 AM, Baoquan He <bhe@redhat.com> wrote:
> On 03/07/16 at 03:34pm, Kees Cook wrote:
>> >  #ifdef CONFIG_EARLY_PRINTK
>> >  /* early_serial_console.c */
>> >  extern int early_serial_base;
>> > diff --git a/arch/x86/boot/compressed/misc_pgt.c b/arch/x86/boot/compressed/misc_pgt.c
>> > new file mode 100644
>> > index 0000000..954811e
>> > --- /dev/null
>> > +++ b/arch/x86/boot/compressed/misc_pgt.c
>> > @@ -0,0 +1,91 @@
>> > +#define __pa(x)  ((unsigned long)(x))
>> > +#define __va(x)  ((void *)((unsigned long)(x)))
>> > +
>> > +#include "misc.h"
>> > +
>> > +#include <asm/init.h>
>> > +#include <asm/pgtable.h>
>> > +
>> > +#include "../../mm/ident_map.c"
>> > +#include "../string.h"
>> > +
>> > +struct alloc_pgt_data {
>> > +       unsigned char *pgt_buf;
>> > +       unsigned long pgt_buf_size;
>> > +       unsigned long pgt_buf_offset;
>> > +};
>> > +
>> > +static void *alloc_pgt_page(void *context)
>> > +{
>> > +       struct alloc_pgt_data *d = (struct alloc_pgt_data *)context;
>> > +       unsigned char *p = (unsigned char *)d->pgt_buf;
>> > +
>> > +       if (d->pgt_buf_offset >= d->pgt_buf_size) {
>> > +               debug_putstr("out of pgt_buf in misc.c\n");
>>
>> This line probably isn't needed any more. Or, if it is, maybe it could
>> include more details from the context?
>
> I am trying to repost, found I missed this place.
>
> Is it OK to add more informatin in printing as below?
> if (d->pgt_buf_offset >= d->pgt_buf_size) {
>                 debug_putstr("out of pgt_buf in misc.c\n");
>                 debug_putaddr(d->pgt_buf_offset);
>                 debug_putaddr(d->pgt_buf_size);

Yeah, absolutely, that sounds perfect.

-Kees

-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2016-03-21 19:48 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-04 16:24 [PATCH v3 00/19] x86, boot: kaslr cleanup and 64bit kaslr support Baoquan He
2016-03-04 16:24 ` [PATCH v3 01/19] x86, kaslr: Remove not needed parameter for choose_kernel_location Baoquan He
2016-03-07 22:28   ` Kees Cook
2016-03-04 16:25 ` [PATCH v3 02/19] x86, boot: Move compressed kernel to end of buffer before decompressing Baoquan He
2016-03-07 22:35   ` Kees Cook
2016-03-08  4:50     ` Baoquan He
2016-03-04 16:25 ` [PATCH v3 03/19] x86, boot: Move z_extract_offset calculation to header.S Baoquan He
2016-03-04 16:25 ` [PATCH v3 04/19] x86, kaskr: Update the description for decompressor worst case Baoquan He
2016-03-04 16:25 ` [PATCH v3 05/19] x86, boot: Fix run_size calculation Baoquan He
2016-03-07 23:10   ` Kees Cook
2016-03-08  4:57     ` Baoquan He
2016-03-08 18:05       ` Kees Cook
2016-03-04 16:25 ` [PATCH v3 06/19] x86, kaslr: Clean up useless code related to run_size Baoquan He
2016-03-07 23:12   ` Kees Cook
2016-03-08  5:00     ` Baoquan He
2016-03-04 16:25 ` [PATCH v3 07/19] x86, kaslr: Get correct max_addr for relocs pointer Baoquan He
2016-03-07 23:16   ` Kees Cook
2016-03-08  5:13     ` Baoquan He
2016-03-08 18:16       ` Kees Cook
2016-03-09 13:46         ` Baoquan He
2016-03-04 16:25 ` [PATCH v3 08/19] x86, kaslr: Consolidate mem_avoid array filling Baoquan He
2016-03-07 23:28   ` Kees Cook
2016-03-08  5:21     ` Baoquan He
2016-03-08 18:17       ` Kees Cook
2016-03-09 13:42         ` Baoquan He
2016-03-04 16:25 ` [PATCH v3 09/19] x86, boot: Split kernel_ident_mapping_init to another file Baoquan He
2016-03-04 16:25 ` [PATCH v3 10/19] x86, 64bit: Set ident_mapping for kaslr Baoquan He
2016-03-07 23:34   ` Kees Cook
2016-03-08  5:25     ` Baoquan He
2016-03-21  7:50     ` Baoquan He
2016-03-21 19:48       ` Kees Cook
2016-03-04 16:25 ` [PATCH v3 11/19] x86, boot: Add checking for memcpy Baoquan He
2016-03-07 23:36   ` Kees Cook
2016-03-08  5:30     ` Baoquan He
2016-03-04 16:25 ` [PATCH v3 12/19] x86, kaslr: Fix a bug that relocation can not be handled when kernel is loaded above 2G Baoquan He
2016-03-07 23:30   ` Kees Cook
2016-03-08  5:22     ` Baoquan He
2016-03-04 16:25 ` [PATCH v3 13/19] x86, kaslr: Introduce struct slot_area to manage randomization slot info Baoquan He
2016-03-04 16:25 ` [PATCH v3 14/19] x86, kaslr: Add two functions which will be used later Baoquan He
2016-03-04 16:25 ` [PATCH v3 15/19] x86, kaslr: Introduce fetch_random_virt_offset to randomize the kernel text mapping address Baoquan He
2016-03-04 16:25 ` [PATCH v3 16/19] x86, kaslr: Randomize physical and virtual address of kernel separately Baoquan He
2016-03-07 23:51   ` Kees Cook
2016-03-08  5:34     ` Baoquan He
2016-03-08 18:24       ` Kees Cook
2016-03-09 13:40         ` Baoquan He
2016-03-09 18:07           ` Kees Cook
2016-03-10 15:15             ` Baoquan He
2016-03-04 16:25 ` [PATCH v3 17/19] x86, kaslr: Add support of kernel physical address randomization above 4G Baoquan He
2016-03-04 16:25 ` [PATCH v3 18/19] x86, kaslr: Remove useless codes Baoquan He
2016-03-04 16:25 ` [PATCH v3 19/19] x86, kaslr: Allow random address to be below loaded address Baoquan He
2016-03-05 11:35 ` [PATCH v3 00/19] x86, boot: kaslr cleanup and 64bit kaslr support Baoquan He

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).