All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 00/20] x86, boot: kaslr cleanup and 64bit kaslr support
@ 2016-03-22  7:31 Baoquan He
  2016-03-22  7:31 ` [PATCH v4 01/20] x86, kaslr: Remove not needed parameter for choose_kernel_location Baoquan He
                   ` (21 more replies)
  0 siblings, 22 replies; 52+ messages in thread
From: Baoquan He @ 2016-03-22  7:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: yinghai, keescook, hpa, mingo, bp, vgoyal, luto, lasse.collin,
	akpm, dyoung, Baoquan He

***Background:
Previously a bug is reported that kdump didn't work when kaslr is enabled. During
discussing that bug fix, we found current kaslr has a limilation that it can
only randomize in 1GB region.

This is because in curent kaslr implementaion only physical address of kernel
loading is randomized. Then calculate the delta of physical address where
vmlinux was linked to load and where it is finally loaded. If delta is not
equal to 0, namely there's a new physical address where kernel is actually
decompressed, relocation handling need be done. Then delta is added to offset
of kernel symbol relocation, this makes the address of kernel text mapping move
delta long. Though in principle kernel can be randomized to any physical address,
kernel text mapping address space is limited and only 1G, namely as follows on
x86_64:
	[0xffffffff80000000, 0xffffffffc0000000)

In one word, kernel text physical address and virtual address randomization is
coupled. This causes the limitation.

Then hpa and Vivek suggested we should change this. To decouple the physical
address and virtual address randomization of kernel text and let them work
separately. Then kernel text physical address can be randomized in region
[16M, 64T), and kernel text virtual address can be randomized in region
[0xffffffff80000000, 0xffffffffc0000000).

***Problems we need solved:
  - For kernel boot from startup_32 case, only 0~4G identity mapping is built.
    If kernel will be randomly put anywhere from 16M to 64T at most, the price
    to build all region of identity mapping is too high. We need build the
    identity mapping on demand, not covering all physical address space.

  - Decouple the physical address and virtual address randomization of kernel
    text and let them work separately.

***Parts:
   - The 1st part is Yinghai's identity mapping building on demand patches.
     This is used to solve the first problem mentioned above.
     (Patch 09-10/19)
   - The 2nd part is decoupling the physical address and virtual address
     randomization of kernel text and letting them work separately patches
     based on Yinghai's ident mapping patches.
     (Patch 12-19/19)
   - The 3rd part is some clean up patches which Yinghai found when he reviewed
     my patches and the related code around.
     (Patch 01-08/19)

***Patch status:
This patchset went through several rounds of review.

    v1:
    - The first round can be found here:
	https://lwn.net/Articles/637115/

    v1->v2:
    - In 2nd round Yinghai made a big patchset including this kaslr fix and another
      setup_data related fix. The link is here:
       http://lists-archives.com/linux-kernel/28346903-x86-updated-patches-for-kaslr-and-setup_data-etc-for-v4-3.html
      You can get the code from Yinghai's git branch:
      git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git for-x86-v4.3-next

    v2->v3:
    - It only takes care of the kaslr related patches.
      For reviewers it's better to discuss only one issue in one thread.
        * I take off one patch as follows from Yinghai's because I think it's unnecessay. 
           - Patch 05/19 x86, kaslr: rename output_size to output_run_size
             output_size is enough to represen the value:
         	output_len > run_size ? output_len : run_size
       
        * I add Patch 04/19, it's a comment update patch. For other patches, I just
          adjust patch log and do several places of change comparing with 2nd round.
          Please check the change log under patch log of each patch for details.

        * Adjust sequence of several patches to make review easier. It doesn't
          affect codes.

    v3->v4:
    - Made changes according to Kees's comments.
      Add one patch 20/20 as Kees suggested to use KERNEL_IMAGE_SIZE as offset
      max of virtual random, meanwhile clean up useless CONFIG_RANDOM_OFFSET_MAX

        x86, kaslr: Use KERNEL_IMAGE_SIZE as the offset max for kernel virtual randomization

You can also get this patchset from my github:
   https://github.com/baoquan-he/linux.git kaslr-above-4G

Any comments about code changes, code comments, patch logs are welcome and
appreciated.

Baoquan He (9):
  x86, kaslr: Fix a bug that relocation can not be handled when kernel
    is loaded above 2G
  x86, kaskr: Update the description for decompressor worst case
  x86, kaslr: Introduce struct slot_area to manage randomization slot
    info
  x86, kaslr: Add two functions which will be used later
  x86, kaslr: Introduce fetch_random_virt_offset to randomize the kernel
    text mapping address
  x86, kaslr: Randomize physical and virtual address of kernel
    separately
  x86, kaslr: Add support of kernel physical address randomization above
    4G
  x86, kaslr: Remove useless codes
  x86, kaslr: Use KERNEL_IMAGE_SIZE as the offset max for kernel virtual
    randomization

Yinghai Lu (11):
  x86, kaslr: Remove not needed parameter for choose_kernel_location
  x86, boot: Move compressed kernel to end of buffer before
    decompressing
  x86, boot: Move z_extract_offset calculation to header.S
  x86, boot: Fix run_size calculation
  x86, kaslr: Clean up useless code related to run_size.
  x86, kaslr: Get correct max_addr for relocs pointer
  x86, kaslr: Consolidate mem_avoid array filling
  x86, boot: Split kernel_ident_mapping_init to another file
  x86, 64bit: Set ident_mapping for kaslr
  x86, boot: Add checking for memcpy
  x86, kaslr: Allow random address to be below loaded address

 arch/x86/Kconfig                       |  57 +++----
 arch/x86/boot/Makefile                 |  13 +-
 arch/x86/boot/compressed/Makefile      |  19 ++-
 arch/x86/boot/compressed/aslr.c        | 300 +++++++++++++++++++++++++--------
 arch/x86/boot/compressed/head_32.S     |  14 +-
 arch/x86/boot/compressed/head_64.S     |  15 +-
 arch/x86/boot/compressed/misc.c        |  89 +++++-----
 arch/x86/boot/compressed/misc.h        |  34 ++--
 arch/x86/boot/compressed/misc_pgt.c    |  93 ++++++++++
 arch/x86/boot/compressed/mkpiggy.c     |  28 +--
 arch/x86/boot/compressed/string.c      |  29 +++-
 arch/x86/boot/compressed/vmlinux.lds.S |   1 +
 arch/x86/boot/header.S                 |  22 ++-
 arch/x86/include/asm/boot.h            |  19 +++
 arch/x86/include/asm/page.h            |   5 +
 arch/x86/include/asm/page_64_types.h   |   5 +-
 arch/x86/kernel/asm-offsets.c          |   1 +
 arch/x86/kernel/vmlinux.lds.S          |   1 +
 arch/x86/mm/ident_map.c                |  74 ++++++++
 arch/x86/mm/init_32.c                  |   3 -
 arch/x86/mm/init_64.c                  |  74 +-------
 arch/x86/tools/calc_run_size.sh        |  42 -----
 22 files changed, 605 insertions(+), 333 deletions(-)
 create mode 100644 arch/x86/boot/compressed/misc_pgt.c
 create mode 100644 arch/x86/mm/ident_map.c
 delete mode 100644 arch/x86/tools/calc_run_size.sh

-- 
2.5.0

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v4 01/20] x86, kaslr: Remove not needed parameter for choose_kernel_location
  2016-03-22  7:31 [PATCH v4 00/20] x86, boot: kaslr cleanup and 64bit kaslr support Baoquan He
@ 2016-03-22  7:31 ` Baoquan He
  2016-03-22  7:31 ` [PATCH v4 02/20] x86, kaslr: Fix a bug that relocation can not be handled when kernel is loaded above 2G Baoquan He
                   ` (20 subsequent siblings)
  21 siblings, 0 replies; 52+ messages in thread
From: Baoquan He @ 2016-03-22  7:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: yinghai, keescook, hpa, mingo, bp, vgoyal, luto, lasse.collin,
	akpm, dyoung

From: Yinghai Lu <yinghai@kernel.org>

real_mode is global variable, so we do not need to pass it around.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: Kees Cook <keescook@chromium.org>
---
 arch/x86/boot/compressed/aslr.c | 5 ++---
 arch/x86/boot/compressed/misc.c | 2 +-
 arch/x86/boot/compressed/misc.h | 6 ++----
 3 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index 6a9b96b..622aa88 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -295,8 +295,7 @@ static unsigned long find_random_addr(unsigned long minimum,
 	return slots_fetch_random();
 }
 
-unsigned char *choose_kernel_location(struct boot_params *boot_params,
-				      unsigned char *input,
+unsigned char *choose_kernel_location(unsigned char *input,
 				      unsigned long input_size,
 				      unsigned char *output,
 				      unsigned long output_size)
@@ -316,7 +315,7 @@ unsigned char *choose_kernel_location(struct boot_params *boot_params,
 	}
 #endif
 
-	boot_params->hdr.loadflags |= KASLR_FLAG;
+	real_mode->hdr.loadflags |= KASLR_FLAG;
 
 	/* Record the various known unsafe memory ranges. */
 	mem_avoid_init((unsigned long)input, input_size,
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 79dac17..f35ad9e 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -428,7 +428,7 @@ asmlinkage __visible void *decompress_kernel(void *rmode, memptr heap,
 	 * the entire decompressed kernel plus relocation table, or the
 	 * entire decompressed kernel plus .bss and .brk sections.
 	 */
-	output = choose_kernel_location(real_mode, input_data, input_len, output,
+	output = choose_kernel_location(input_data, input_len, output,
 					output_len > run_size ? output_len
 							      : run_size);
 
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index 3783dc3..dcf01c2 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -67,8 +67,7 @@ int cmdline_find_option_bool(const char *option);
 
 #if CONFIG_RANDOMIZE_BASE
 /* aslr.c */
-unsigned char *choose_kernel_location(struct boot_params *boot_params,
-				      unsigned char *input,
+unsigned char *choose_kernel_location(unsigned char *input,
 				      unsigned long input_size,
 				      unsigned char *output,
 				      unsigned long output_size);
@@ -76,8 +75,7 @@ unsigned char *choose_kernel_location(struct boot_params *boot_params,
 bool has_cpuflag(int flag);
 #else
 static inline
-unsigned char *choose_kernel_location(struct boot_params *boot_params,
-				      unsigned char *input,
+unsigned char *choose_kernel_location(unsigned char *input,
 				      unsigned long input_size,
 				      unsigned char *output,
 				      unsigned long output_size)
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v4 02/20] x86, kaslr: Fix a bug that relocation can not be handled when kernel is loaded above 2G
  2016-03-22  7:31 [PATCH v4 00/20] x86, boot: kaslr cleanup and 64bit kaslr support Baoquan He
  2016-03-22  7:31 ` [PATCH v4 01/20] x86, kaslr: Remove not needed parameter for choose_kernel_location Baoquan He
@ 2016-03-22  7:31 ` Baoquan He
  2016-03-22  7:32 ` [PATCH v4 03/20] x86, boot: Move compressed kernel to end of buffer before decompressing Baoquan He
                   ` (19 subsequent siblings)
  21 siblings, 0 replies; 52+ messages in thread
From: Baoquan He @ 2016-03-22  7:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: yinghai, keescook, hpa, mingo, bp, vgoyal, luto, lasse.collin,
	akpm, dyoung, Baoquan He

When process 32 bit relocation tables a local variable 'extended'
is defined to calculate the physical address of relocs entry.
However its type is 'int' which is enough for i386, but not enough
for x86_64. That's why relocation can only be handled under 2G.
Otherwise a overflow will happen and cause system hang.

Here change it to 'long' as 32 bit inverse relocation processing
does, and this change is safe for i386 relocation handling.

Signed-off-by: Baoquan He <bhe@redhat.com>
Acked-by: Kees Cook <keescook@chromium.org>
---
 arch/x86/boot/compressed/misc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index f35ad9e..c4477d5 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -295,7 +295,7 @@ static void handle_relocations(void *output, unsigned long output_len)
 	 * So we work backwards from the end of the decompressed image.
 	 */
 	for (reloc = output + output_len - sizeof(*reloc); *reloc; reloc--) {
-		int extended = *reloc;
+		long extended = *reloc;
 		extended += map;
 
 		ptr = (unsigned long)extended;
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v4 03/20] x86, boot: Move compressed kernel to end of buffer before decompressing
  2016-03-22  7:31 [PATCH v4 00/20] x86, boot: kaslr cleanup and 64bit kaslr support Baoquan He
  2016-03-22  7:31 ` [PATCH v4 01/20] x86, kaslr: Remove not needed parameter for choose_kernel_location Baoquan He
  2016-03-22  7:31 ` [PATCH v4 02/20] x86, kaslr: Fix a bug that relocation can not be handled when kernel is loaded above 2G Baoquan He
@ 2016-03-22  7:32 ` Baoquan He
  2016-03-22  7:32 ` [PATCH v4 04/20] x86, boot: Move z_extract_offset calculation to header.S Baoquan He
                   ` (18 subsequent siblings)
  21 siblings, 0 replies; 52+ messages in thread
From: Baoquan He @ 2016-03-22  7:32 UTC (permalink / raw)
  To: linux-kernel
  Cc: yinghai, keescook, hpa, mingo, bp, vgoyal, luto, lasse.collin,
	akpm, dyoung

From: Yinghai Lu <yinghai@kernel.org>

For better understanding clarify what are VO and ZO firstly. They are
introduced in below commits by hpa.
77d1a49 x86, boot: make symbols from the main vmlinux available
37ba7ab x86, boot: make kernel_alignment adjustable; new bzImage fields

VO:
- uncompressed kernel
- size: VO__end - VO__text

ZO:
- boot/compressed/vmlinux
- head text + compressed (VO+relocs) + decompressor code
- size: ZO__end - ZO_startup_32

And also copy INIT_SIZE definition here for reference. From below macros
we can see INIT_SIZE is either bigger than VO_INIT_SIZE or equal to
VO_INIT_SIZE.

 #define ZO_INIT_SIZE    (ZO__end - ZO_startup_32 + ZO_z_extract_offset)
 #define VO_INIT_SIZE    (VO__end - VO__text)
 #if ZO_INIT_SIZE > VO_INIT_SIZE
 #define INIT_SIZE ZO_INIT_SIZE
 #else
 #define INIT_SIZE VO_INIT_SIZE
 #endif

Current code uses extract_offset to decide the copied ZO position, it
put ZO starting from extract_offset. When INIT_SIZE is bigger than
VO_INIT_SIZE copied ZO occupies space from extract_offset to the end
of decompressing buffer, just like below.

0                       extract_offset                       init_size
|-----------|---------------|------------------------|---------|
            |               |                        |         |
          VO__text      startup_32 of ZO         VO__end     ZO__end

However when INIT_SIZE is equal to VO_INIT_SIZE there's still space left
from end of ZO to the end of decompressing buffer, like below.

0                       extract_offset                       init_size
|-----------|---------------|------------------------|---------|
            |               |                        |         |
          VO__text      startup_32 of ZO         ZO__end     VO__end

It's better to always put ZO right in the end of decompressing buffer.
Anyway, who doesn't like it to be better.

So in this patch add BP_init_size into asm-offsets.c to make it visible
to asm modules. And find the starting position of copied ZO by init_size
and ZO run size, namely (ZO__end - startup_32) when move ZO.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
v2->v3:
    Adjust the patch log.

 arch/x86/boot/compressed/head_32.S     | 11 +++++++++--
 arch/x86/boot/compressed/head_64.S     |  8 ++++++--
 arch/x86/boot/compressed/mkpiggy.c     |  3 ---
 arch/x86/boot/compressed/vmlinux.lds.S |  1 +
 arch/x86/kernel/asm-offsets.c          |  1 +
 arch/x86/kernel/vmlinux.lds.S          |  1 +
 6 files changed, 18 insertions(+), 7 deletions(-)

diff --git a/arch/x86/boot/compressed/head_32.S b/arch/x86/boot/compressed/head_32.S
index 8ef964d..0c140f9 100644
--- a/arch/x86/boot/compressed/head_32.S
+++ b/arch/x86/boot/compressed/head_32.S
@@ -148,7 +148,9 @@ preferred_addr:
 1:
 
 	/* Target address to relocate to for decompression */
-	addl	$z_extract_offset, %ebx
+	movl    BP_init_size(%esi), %eax
+	subl    $_end, %eax
+	addl    %eax, %ebx
 
 	/* Set up the stack */
 	leal	boot_stack_end(%ebx), %esp
@@ -210,8 +212,13 @@ relocated:
 				/* push arguments for decompress_kernel: */
 	pushl	$z_run_size	/* size of kernel with .bss and .brk */
 	pushl	$z_output_len	/* decompressed length, end of relocs */
-	leal	z_extract_offset_negative(%ebx), %ebp
+
+	movl    BP_init_size(%esi), %eax
+	subl    $_end, %eax
+	movl    %ebx, %ebp
+	subl    %eax, %ebp
 	pushl	%ebp		/* output address */
+
 	pushl	$z_input_len	/* input_len */
 	leal	input_data(%ebx), %eax
 	pushl	%eax		/* input_data */
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index b0c0d16..67dd8d3 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -102,7 +102,9 @@ ENTRY(startup_32)
 1:
 
 	/* Target address to relocate to for decompression */
-	addl	$z_extract_offset, %ebx
+	movl	BP_init_size(%esi), %eax
+	subl	$_end, %eax
+	addl	%eax, %ebx
 
 /*
  * Prepare for entering 64 bit mode
@@ -330,7 +332,9 @@ preferred_addr:
 1:
 
 	/* Target address to relocate to for decompression */
-	leaq	z_extract_offset(%rbp), %rbx
+	movl	BP_init_size(%rsi), %ebx
+	subl	$_end, %ebx
+	addq	%rbp, %rbx
 
 	/* Set up the stack */
 	leaq	boot_stack_end(%rbx), %rsp
diff --git a/arch/x86/boot/compressed/mkpiggy.c b/arch/x86/boot/compressed/mkpiggy.c
index d8222f2..b980046 100644
--- a/arch/x86/boot/compressed/mkpiggy.c
+++ b/arch/x86/boot/compressed/mkpiggy.c
@@ -85,9 +85,6 @@ int main(int argc, char *argv[])
 	printf("z_output_len = %lu\n", (unsigned long)olen);
 	printf(".globl z_extract_offset\n");
 	printf("z_extract_offset = 0x%lx\n", offs);
-	/* z_extract_offset_negative allows simplification of head_32.S */
-	printf(".globl z_extract_offset_negative\n");
-	printf("z_extract_offset_negative = -0x%lx\n", offs);
 	printf(".globl z_run_size\n");
 	printf("z_run_size = %lu\n", run_size);
 
diff --git a/arch/x86/boot/compressed/vmlinux.lds.S b/arch/x86/boot/compressed/vmlinux.lds.S
index 34d047c..e24e0a0 100644
--- a/arch/x86/boot/compressed/vmlinux.lds.S
+++ b/arch/x86/boot/compressed/vmlinux.lds.S
@@ -70,5 +70,6 @@ SECTIONS
 		_epgtable = . ;
 	}
 #endif
+	. = ALIGN(PAGE_SIZE);	/* keep ZO size page aligned */
 	_end = .;
 }
diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index 84a7524..506de14 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -81,6 +81,7 @@ void common(void) {
 	OFFSET(BP_hardware_subarch, boot_params, hdr.hardware_subarch);
 	OFFSET(BP_version, boot_params, hdr.version);
 	OFFSET(BP_kernel_alignment, boot_params, hdr.kernel_alignment);
+	OFFSET(BP_init_size, boot_params, hdr.init_size);
 	OFFSET(BP_pref_address, boot_params, hdr.pref_address);
 	OFFSET(BP_code32_start, boot_params, hdr.code32_start);
 
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index fe133b7..b8081e6 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -322,6 +322,7 @@ SECTIONS
 		__brk_limit = .;
 	}
 
+	. = ALIGN(PAGE_SIZE);		/* keep VO_INIT_SIZE page aligned */
 	_end = .;
 
         STABS_DEBUG
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v4 04/20] x86, boot: Move z_extract_offset calculation to header.S
  2016-03-22  7:31 [PATCH v4 00/20] x86, boot: kaslr cleanup and 64bit kaslr support Baoquan He
                   ` (2 preceding siblings ...)
  2016-03-22  7:32 ` [PATCH v4 03/20] x86, boot: Move compressed kernel to end of buffer before decompressing Baoquan He
@ 2016-03-22  7:32 ` Baoquan He
  2016-03-22  7:32 ` [PATCH v4 05/20] x86, kaskr: Update the description for decompressor worst case Baoquan He
                   ` (17 subsequent siblings)
  21 siblings, 0 replies; 52+ messages in thread
From: Baoquan He @ 2016-03-22  7:32 UTC (permalink / raw)
  To: linux-kernel
  Cc: yinghai, keescook, hpa, mingo, bp, vgoyal, luto, lasse.collin,
	akpm, dyoung, Baoquan He

From: Yinghai Lu <yinghai@kernel.org>

Current z_extract_offset is calculated in boot/compressed/mkpiggy.c. The
problem is in mkpiggy.c we don't know the detail of decompressor. Then
we just get a rough z_extract_offset according to extra_bytes. As we know
extra_bytes can only promise a safety margin when decompressing. In fact
this brings some risks:

 - output+output_len could be much bigger than input+input_len. In this
   cass decompressed kernel plus relocs could overwrite decompressing
   method code even when they are running.

 - ehead of ZO could be bigger than z_extract_offset. In this case overwrite
   could happen when head code is running to move ZO to end of buffer.
   Though currently the size of head code is very small it's still a
   potential risk. Because no rule to limit the size of head code of ZO,
   it has possibility to be very big.

Now move z_extract_offset calculation to header.S, and adjust z_extract_offset
to make sure that above two cases never happen.

Besides we have made ZO always be in the end of decompressing buffer,
z_extract_offset is only used here to calculate an appropriate INIT_SIZE,
no other place need it now. So no need to put it in voffset.h.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Baoquan He <bhe@redhat.com>
---
v2->v3:
    Tune the patch log.
    Remove code comment above init_size.
    I still think extra_bytes shoule be taken as below formula to cover
    the worst case of xz.
#define ZO_z_extra_bytes ((ZO_z_output_len >> 12) + 65536 + 128)

 arch/x86/boot/Makefile             |  2 +-
 arch/x86/boot/compressed/misc.c    |  5 +----
 arch/x86/boot/compressed/mkpiggy.c | 15 +--------------
 arch/x86/boot/header.S             | 22 +++++++++++++++++++++-
 4 files changed, 24 insertions(+), 20 deletions(-)

diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
index bbe1a62..bd021e5 100644
--- a/arch/x86/boot/Makefile
+++ b/arch/x86/boot/Makefile
@@ -87,7 +87,7 @@ targets += voffset.h
 $(obj)/voffset.h: vmlinux FORCE
 	$(call if_changed,voffset)
 
-sed-zoffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(startup_32\|startup_64\|efi32_stub_entry\|efi64_stub_entry\|efi_pe_entry\|input_data\|_end\|z_.*\)$$/\#define ZO_\2 0x\1/p'
+sed-zoffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(startup_32\|startup_64\|efi32_stub_entry\|efi64_stub_entry\|efi_pe_entry\|input_data\|_end\|_ehead\|_text\|z_.*\)$$/\#define ZO_\2 0x\1/p'
 
 quiet_cmd_zoffset = ZOFFSET $@
       cmd_zoffset = $(NM) $< | sed -n $(sed-zoffset) > $@
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index c4477d5..e2a998f 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -83,13 +83,10 @@
  * To avoid problems with the compressed data's meta information an extra 18
  * bytes are needed.  Leading to the formula:
  *
- * extra_bytes = (uncompressed_size >> 12) + 32768 + 18 + decompressor_size.
+ * extra_bytes = (uncompressed_size >> 12) + 32768 + 18.
  *
  * Adding 8 bytes per 32K is a bit excessive but much easier to calculate.
  * Adding 32768 instead of 32767 just makes for round numbers.
- * Adding the decompressor_size is necessary as it musht live after all
- * of the data as well.  Last I measured the decompressor is about 14K.
- * 10K of actual data and 4K of bss.
  *
  */
 
diff --git a/arch/x86/boot/compressed/mkpiggy.c b/arch/x86/boot/compressed/mkpiggy.c
index b980046..a613c84 100644
--- a/arch/x86/boot/compressed/mkpiggy.c
+++ b/arch/x86/boot/compressed/mkpiggy.c
@@ -21,8 +21,7 @@
  * ----------------------------------------------------------------------- */
 
 /*
- * Compute the desired load offset from a compressed program; outputs
- * a small assembly wrapper with the appropriate symbols defined.
+ * outputs a small assembly wrapper with the appropriate symbols defined.
  */
 
 #include <stdlib.h>
@@ -35,7 +34,6 @@ int main(int argc, char *argv[])
 {
 	uint32_t olen;
 	long ilen;
-	unsigned long offs;
 	unsigned long run_size;
 	FILE *f = NULL;
 	int retval = 1;
@@ -67,15 +65,6 @@ int main(int argc, char *argv[])
 	ilen = ftell(f);
 	olen = get_unaligned_le32(&olen);
 
-	/*
-	 * Now we have the input (compressed) and output (uncompressed)
-	 * sizes, compute the necessary decompression offset...
-	 */
-
-	offs = (olen > ilen) ? olen - ilen : 0;
-	offs += olen >> 12;	/* Add 8 bytes for each 32K block */
-	offs += 64*1024 + 128;	/* Add 64K + 128 bytes slack */
-	offs = (offs+4095) & ~4095; /* Round to a 4K boundary */
 	run_size = atoi(argv[2]);
 
 	printf(".section \".rodata..compressed\",\"a\",@progbits\n");
@@ -83,8 +72,6 @@ int main(int argc, char *argv[])
 	printf("z_input_len = %lu\n", ilen);
 	printf(".globl z_output_len\n");
 	printf("z_output_len = %lu\n", (unsigned long)olen);
-	printf(".globl z_extract_offset\n");
-	printf("z_extract_offset = 0x%lx\n", offs);
 	printf(".globl z_run_size\n");
 	printf("z_run_size = %lu\n", run_size);
 
diff --git a/arch/x86/boot/header.S b/arch/x86/boot/header.S
index 6236b9e..1c057e3 100644
--- a/arch/x86/boot/header.S
+++ b/arch/x86/boot/header.S
@@ -440,7 +440,27 @@ setup_data:		.quad 0			# 64-bit physical pointer to
 
 pref_address:		.quad LOAD_PHYSICAL_ADDR	# preferred load addr
 
-#define ZO_INIT_SIZE	(ZO__end - ZO_startup_32 + ZO_z_extract_offset)
+/* Check arch/x86/boot/compressed/misc.c for the formula of extra_bytes*/
+#define ZO_z_extra_bytes ((ZO_z_output_len >> 12) + 65536 + 128)
+#if ZO_z_output_len > ZO_z_input_len
+#define ZO_z_extract_offset (ZO_z_output_len + ZO_z_extra_bytes - ZO_z_input_len)
+#else
+#define ZO_z_extract_offset ZO_z_extra_bytes
+#endif
+
+/*
+ * extract_offset has to be bigger than ZO head section. Otherwise
+ * during head code running to move ZO to end of buffer, it will
+ * overwrite head code itself.
+ */
+#if (ZO__ehead - ZO_startup_32) > ZO_z_extract_offset
+#define ZO_z_min_extract_offset ((ZO__ehead - ZO_startup_32 + 4095) & ~4095)
+#else
+#define ZO_z_min_extract_offset ((ZO_z_extract_offset + 4095) & ~4095)
+#endif
+
+#define ZO_INIT_SIZE	(ZO__end - ZO_startup_32 + ZO_z_min_extract_offset)
+
 #define VO_INIT_SIZE	(VO__end - VO__text)
 #if ZO_INIT_SIZE > VO_INIT_SIZE
 #define INIT_SIZE ZO_INIT_SIZE
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v4 05/20] x86, kaskr: Update the description for decompressor worst case
  2016-03-22  7:31 [PATCH v4 00/20] x86, boot: kaslr cleanup and 64bit kaslr support Baoquan He
                   ` (3 preceding siblings ...)
  2016-03-22  7:32 ` [PATCH v4 04/20] x86, boot: Move z_extract_offset calculation to header.S Baoquan He
@ 2016-03-22  7:32 ` Baoquan He
  2016-03-22  7:32 ` [PATCH v4 06/20] x86, boot: Fix run_size calculation Baoquan He
                   ` (16 subsequent siblings)
  21 siblings, 0 replies; 52+ messages in thread
From: Baoquan He @ 2016-03-22  7:32 UTC (permalink / raw)
  To: linux-kernel
  Cc: yinghai, keescook, hpa, mingo, bp, vgoyal, luto, lasse.collin,
	akpm, dyoung, Baoquan He

Current analysis is only for gzip decompressor only. In fact we can
support 6 different decompressor. Update the description to cover
all of them. Meanwhile fix several typos.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 arch/x86/boot/compressed/misc.c | 26 +++++++++++++++++---------
 1 file changed, 17 insertions(+), 9 deletions(-)

diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index e2a998f..e7c6fdf 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -19,11 +19,13 @@
  */
 
 /*
- * Getting to provable safe in place decompression is hard.
- * Worst case behaviours need to be analyzed.
- * Background information:
+ * Getting to provable safe in place decompression is hard. Worst case
+ * behaviours need be analyzed. Here let's take decompressing gzip-compressed
+ * kernel as example to illustrate it.
+ *
+ * The file layout of gzip compressed kernel is as follows. For more information,
+ * please refer to rfc 1951 and rfc 1952.
  *
- * The file layout is:
  *    magic[2]
  *    method[1]
  *    flags[1]
@@ -70,13 +72,13 @@
  * of 5 bytes per 32767 bytes.
  *
  * The worst case internal to a compressed block is very hard to figure.
- * The worst case can at least be boundined by having one bit that represents
+ * The worst case can at least be bounded by having one bit that represents
  * 32764 bytes and then all of the rest of the bytes representing the very
  * very last byte.
  *
  * All of which is enough to compute an amount of extra data that is required
  * to be safe.  To avoid problems at the block level allocating 5 extra bytes
- * per 32767 bytes of data is sufficient.  To avoind problems internal to a
+ * per 32767 bytes of data is sufficient.  To avoid problems internal to a
  * block adding an extra 32767 bytes (the worst case uncompressed block size)
  * is sufficient, to ensure that in the worst case the decompressed data for
  * block will stop the byte before the compressed data for a block begins.
@@ -88,11 +90,17 @@
  * Adding 8 bytes per 32K is a bit excessive but much easier to calculate.
  * Adding 32768 instead of 32767 just makes for round numbers.
  *
+ * Above analysis is for decompressing gzip compressed kernel only. Up to
+ * now 6 different decompressor are supported all together. And among them
+ * xz stores data in chunks and has maximum chunk of 64K. Hence safety
+ * margin should be updated to cover all decompressors so that we don't
+ * need to deal with each of them separately. Please check
+ * the description in lib/decompressor_xxx.c for specific information.
+ *
+ * extra_bytes = (uncompressed_size >> 12) + 65536 + 128.
+ *
  */
 
-/*
- * gzip declarations
- */
 #define STATIC		static
 
 #undef memcpy
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v4 06/20] x86, boot: Fix run_size calculation
  2016-03-22  7:31 [PATCH v4 00/20] x86, boot: kaslr cleanup and 64bit kaslr support Baoquan He
                   ` (4 preceding siblings ...)
  2016-03-22  7:32 ` [PATCH v4 05/20] x86, kaskr: Update the description for decompressor worst case Baoquan He
@ 2016-03-22  7:32 ` Baoquan He
  2016-03-22 20:51   ` Kees Cook
  2016-03-22  7:32 ` [PATCH v4 07/20] x86, kaslr: Clean up useless code related to run_size Baoquan He
                   ` (15 subsequent siblings)
  21 siblings, 1 reply; 52+ messages in thread
From: Baoquan He @ 2016-03-22  7:32 UTC (permalink / raw)
  To: linux-kernel
  Cc: yinghai, keescook, hpa, mingo, bp, vgoyal, luto, lasse.collin,
	akpm, dyoung, Junjie Mao, Josh Triplett, Baoquan He

From: Yinghai Lu <yinghai@kernel.org>

Firstly, current run_size is calculated via shell script
arch/x86/tools/calc_run_size.sh. It gets file offset and mem size of section
.bss and .brk in vmlinux, then add them as follows:

run_size=$(( $offsetA + $sizeA + $sizeB ))

However this is completely wrong. The offset is the starting address of
section or segment in elf file. Below is a vmlinux I compiled:

[bhe@x1 linux]$ objdump -h vmlinux

vmlinux:     file format elf64-x86-64

Sections:
Idx Name          Size      VMA               LMA               File off  Algn
 27 .bss          00170000  ffffffff81ec8000  0000000001ec8000  012c8000  2**12
                  ALLOC
 28 .brk          00027000  ffffffff82038000  0000000002038000  012c8000  2**0
                  ALLOC

Here we can get run_size is 0x145f000.
0x012c8000 + 0x00170000 + 0x00027000 = 0x145f000

[bhe@x1 linux]$ readelf -l vmlinux

Elf file type is EXEC (Executable file)
Entry point 0x1000000
There are 5 program headers, starting at offset 64

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  LOAD           0x0000000000200000 0xffffffff81000000 0x0000000001000000
                 0x0000000000b5e000 0x0000000000b5e000  R E    200000
  LOAD           0x0000000000e00000 0xffffffff81c00000 0x0000000001c00000
                 0x0000000000145000 0x0000000000145000  RW     200000
  LOAD           0x0000000001000000 0x0000000000000000 0x0000000001d45000
                 0x0000000000018158 0x0000000000018158  RW     200000
  LOAD           0x000000000115e000 0xffffffff81d5e000 0x0000000001d5e000
                 0x000000000016a000 0x0000000000301000  RWE    200000
  NOTE           0x000000000099bcac 0xffffffff8179bcac 0x000000000179bcac
                 0x00000000000001bc 0x00000000000001bc         4

 Section to Segment mapping:
  Segment Sections...
   00     .text .notes __ex_table .rodata __bug_table .pci_fixup .tracedata __ksymtab __ksymtab_gpl __ksymtab_strings __init_rodata __param __modver
   01     .data .vvar
   02     .data..percpu
   03     .init.text .init.data .x86_cpu_dev.init .parainstructions .altinstructions .altinstr_replacement .iommu_table .apicdrivers .exit.text .smp_locks .bss .brk
   04     .notes

Here we can get the same value as current run_size if we add p_offset
and p_memsz.
0x000000000115e000 + 0x0000000000301000 = 0x145f000

But is it right? Obviously not. We should calculate it using the last LOAD
program segment like this:
run_size = phdr->p_paddr + phdr->p_memsz - physical load addr of kernel
run_size = 0x0000000001d5e000 + 0x0000000000301000 - 0x0000000001000000 = 0x105f000

It's equal to VO_end-VO_text and certainly it's simpler to do.
_end: 0xffffffff8205f000
_text:0xffffffff81000000
run_size = 0xffffffff8205f000 - 0xffffffff81000000 = 0x105f000

Secondly run_size is a simple constant, we don't need to pass it around
and we already have voffset.h for that. We can share voffset.h between
misc.c and header.S instead of getting run_size in other ways. Hence in
this patch, we move voffset.h creation code to boot/compressed/Makefile.

Dependence was:
boot/header.S ==> boot/voffset.h ==> vmlinux
boot/header.S ==> compressed/vmlinux ==> compressed/misc.c
Now become:
boot/header.S ==> compressed/vmlinux ==> compressed/misc.c ==> boot/voffset.h ==> vmlinux

Use macro in misc.c to replace passed run_size.

Fixes: e6023367d779 ("x86, kaslr: Prevent .bss from overlaping initrd")
Cc: Junjie Mao <eternal.n08@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Baoquan He <bhe@redhat.com>
---
v2->v3:
  Adjust the patch log.
v3->v4:
  Correct calculation errors in patch log that Kees pointed out.

 arch/x86/boot/Makefile            | 11 +----------
 arch/x86/boot/compressed/Makefile | 12 ++++++++++++
 arch/x86/boot/compressed/misc.c   |  3 +++
 3 files changed, 16 insertions(+), 10 deletions(-)

diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
index bd021e5..40f0ae2 100644
--- a/arch/x86/boot/Makefile
+++ b/arch/x86/boot/Makefile
@@ -78,15 +78,6 @@ $(obj)/vmlinux.bin: $(obj)/compressed/vmlinux FORCE
 
 SETUP_OBJS = $(addprefix $(obj)/,$(setup-y))
 
-sed-voffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(_text\|_end\)$$/\#define VO_\2 0x\1/p'
-
-quiet_cmd_voffset = VOFFSET $@
-      cmd_voffset = $(NM) $< | sed -n $(sed-voffset) > $@
-
-targets += voffset.h
-$(obj)/voffset.h: vmlinux FORCE
-	$(call if_changed,voffset)
-
 sed-zoffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(startup_32\|startup_64\|efi32_stub_entry\|efi64_stub_entry\|efi_pe_entry\|input_data\|_end\|_ehead\|_text\|z_.*\)$$/\#define ZO_\2 0x\1/p'
 
 quiet_cmd_zoffset = ZOFFSET $@
@@ -98,7 +89,7 @@ $(obj)/zoffset.h: $(obj)/compressed/vmlinux FORCE
 
 
 AFLAGS_header.o += -I$(obj)
-$(obj)/header.o: $(obj)/voffset.h $(obj)/zoffset.h
+$(obj)/header.o: $(obj)/zoffset.h
 
 LDFLAGS_setup.elf	:= -T
 $(obj)/setup.elf: $(src)/setup.ld $(SETUP_OBJS) FORCE
diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index f9ce75d..d9dedd9 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -41,6 +41,18 @@ LDFLAGS_vmlinux := -T
 hostprogs-y	:= mkpiggy
 HOST_EXTRACFLAGS += -I$(srctree)/tools/include
 
+sed-voffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(_text\|_end\)$$/\#define VO_\2 _AC(0x\1,UL)/p'
+
+quiet_cmd_voffset = VOFFSET $@
+      cmd_voffset = $(NM) $< | sed -n $(sed-voffset) > $@
+
+targets += ../voffset.h
+
+$(obj)/../voffset.h: vmlinux FORCE
+	$(call if_changed,voffset)
+
+$(obj)/misc.o: $(obj)/../voffset.h
+
 vmlinux-objs-y := $(obj)/vmlinux.lds $(obj)/head_$(BITS).o $(obj)/misc.o \
 	$(obj)/string.o $(obj)/cmdline.o \
 	$(obj)/piggy.o $(obj)/cpuflags.o
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index e7c6fdf..4d317b7 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -11,6 +11,7 @@
 
 #include "misc.h"
 #include "../string.h"
+#include "../voffset.h"
 
 /* WARNING!!
  * This code is compiled with -fPIC and it is relocated dynamically
@@ -415,6 +416,8 @@ asmlinkage __visible void *decompress_kernel(void *rmode, memptr heap,
 	lines = real_mode->screen_info.orig_video_lines;
 	cols = real_mode->screen_info.orig_video_cols;
 
+	run_size = VO__end - VO__text;
+
 	console_init();
 	debug_putstr("early console in decompress_kernel\n");
 
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v4 07/20] x86, kaslr: Clean up useless code related to run_size.
  2016-03-22  7:31 [PATCH v4 00/20] x86, boot: kaslr cleanup and 64bit kaslr support Baoquan He
                   ` (5 preceding siblings ...)
  2016-03-22  7:32 ` [PATCH v4 06/20] x86, boot: Fix run_size calculation Baoquan He
@ 2016-03-22  7:32 ` Baoquan He
  2016-03-22 20:52   ` Kees Cook
  2016-03-22  7:32 ` [PATCH v4 08/20] x86, kaslr: Get correct max_addr for relocs pointer Baoquan He
                   ` (14 subsequent siblings)
  21 siblings, 1 reply; 52+ messages in thread
From: Baoquan He @ 2016-03-22  7:32 UTC (permalink / raw)
  To: linux-kernel
  Cc: yinghai, keescook, hpa, mingo, bp, vgoyal, luto, lasse.collin,
	akpm, dyoung, Josh Triplett, Ard Biesheuvel, Junjie Mao,
	Baoquan He

From: Yinghai Lu <yinghai@kernel.org>

So now we use the formula below to get run_size.

	run_size = VO__end - VO__text

Let's remove code for old run_size calculation and clean up the places
where run_size need be passed in and out.

Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Junjie Mao <eternal.n08@gmail.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Baoquan He <bhe@redhat.com>
---
v2->v3:
    Adjust the patch log.
v3->v4:
    Kees suggested that change run_size as const in decompress_kernel()

 arch/x86/boot/compressed/Makefile  |  4 +---
 arch/x86/boot/compressed/head_32.S |  3 +--
 arch/x86/boot/compressed/head_64.S |  3 ---
 arch/x86/boot/compressed/misc.c    |  6 ++----
 arch/x86/boot/compressed/mkpiggy.c | 10 ++-------
 arch/x86/tools/calc_run_size.sh    | 42 --------------------------------------
 6 files changed, 6 insertions(+), 62 deletions(-)
 delete mode 100644 arch/x86/tools/calc_run_size.sh

diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index d9dedd9..fef80fa 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -105,10 +105,8 @@ suffix-$(CONFIG_KERNEL_XZ)	:= xz
 suffix-$(CONFIG_KERNEL_LZO) 	:= lzo
 suffix-$(CONFIG_KERNEL_LZ4) 	:= lz4
 
-RUN_SIZE = $(shell $(OBJDUMP) -h vmlinux | \
-	     $(CONFIG_SHELL) $(srctree)/arch/x86/tools/calc_run_size.sh)
 quiet_cmd_mkpiggy = MKPIGGY $@
-      cmd_mkpiggy = $(obj)/mkpiggy $< $(RUN_SIZE) > $@ || ( rm -f $@ ; false )
+      cmd_mkpiggy = $(obj)/mkpiggy $< > $@ || ( rm -f $@ ; false )
 
 targets += piggy.S
 $(obj)/piggy.S: $(obj)/vmlinux.bin.$(suffix-y) $(obj)/mkpiggy FORCE
diff --git a/arch/x86/boot/compressed/head_32.S b/arch/x86/boot/compressed/head_32.S
index 0c140f9..122b32f 100644
--- a/arch/x86/boot/compressed/head_32.S
+++ b/arch/x86/boot/compressed/head_32.S
@@ -210,7 +210,6 @@ relocated:
  * Do the decompression, and jump to the new kernel..
  */
 				/* push arguments for decompress_kernel: */
-	pushl	$z_run_size	/* size of kernel with .bss and .brk */
 	pushl	$z_output_len	/* decompressed length, end of relocs */
 
 	movl    BP_init_size(%esi), %eax
@@ -226,7 +225,7 @@ relocated:
 	pushl	%eax		/* heap area */
 	pushl	%esi		/* real mode pointer */
 	call	decompress_kernel /* returns kernel location in %eax */
-	addl	$28, %esp
+	addl	$24, %esp
 
 /*
  * Jump to the decompressed kernel.
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 67dd8d3..3691451 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -407,8 +407,6 @@ relocated:
  * Do the decompression, and jump to the new kernel..
  */
 	pushq	%rsi			/* Save the real mode argument */
-	movq	$z_run_size, %r9	/* size of kernel with .bss and .brk */
-	pushq	%r9
 	movq	%rsi, %rdi		/* real mode address */
 	leaq	boot_heap(%rip), %rsi	/* malloc area for uncompression */
 	leaq	input_data(%rip), %rdx  /* input_data */
@@ -416,7 +414,6 @@ relocated:
 	movq	%rbp, %r8		/* output target address */
 	movq	$z_output_len, %r9	/* decompressed length, end of relocs */
 	call	decompress_kernel	/* returns kernel location in %rax */
-	popq	%r9
 	popq	%rsi
 
 /*
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 4d317b7..def6207 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -393,9 +393,9 @@ asmlinkage __visible void *decompress_kernel(void *rmode, memptr heap,
 				  unsigned char *input_data,
 				  unsigned long input_len,
 				  unsigned char *output,
-				  unsigned long output_len,
-				  unsigned long run_size)
+				  unsigned long output_len)
 {
+	const unsigned long run_size = VO__end - VO__text;
 	unsigned char *output_orig = output;
 
 	real_mode = rmode;
@@ -416,8 +416,6 @@ asmlinkage __visible void *decompress_kernel(void *rmode, memptr heap,
 	lines = real_mode->screen_info.orig_video_lines;
 	cols = real_mode->screen_info.orig_video_cols;
 
-	run_size = VO__end - VO__text;
-
 	console_init();
 	debug_putstr("early console in decompress_kernel\n");
 
diff --git a/arch/x86/boot/compressed/mkpiggy.c b/arch/x86/boot/compressed/mkpiggy.c
index a613c84..c5148642 100644
--- a/arch/x86/boot/compressed/mkpiggy.c
+++ b/arch/x86/boot/compressed/mkpiggy.c
@@ -34,13 +34,11 @@ int main(int argc, char *argv[])
 {
 	uint32_t olen;
 	long ilen;
-	unsigned long run_size;
 	FILE *f = NULL;
 	int retval = 1;
 
-	if (argc < 3) {
-		fprintf(stderr, "Usage: %s compressed_file run_size\n",
-				argv[0]);
+	if (argc < 2) {
+		fprintf(stderr, "Usage: %s compressed_file\n", argv[0]);
 		goto bail;
 	}
 
@@ -65,15 +63,11 @@ int main(int argc, char *argv[])
 	ilen = ftell(f);
 	olen = get_unaligned_le32(&olen);
 
-	run_size = atoi(argv[2]);
-
 	printf(".section \".rodata..compressed\",\"a\",@progbits\n");
 	printf(".globl z_input_len\n");
 	printf("z_input_len = %lu\n", ilen);
 	printf(".globl z_output_len\n");
 	printf("z_output_len = %lu\n", (unsigned long)olen);
-	printf(".globl z_run_size\n");
-	printf("z_run_size = %lu\n", run_size);
 
 	printf(".globl input_data, input_data_end\n");
 	printf("input_data:\n");
diff --git a/arch/x86/tools/calc_run_size.sh b/arch/x86/tools/calc_run_size.sh
deleted file mode 100644
index 1a4c17b..0000000
--- a/arch/x86/tools/calc_run_size.sh
+++ /dev/null
@@ -1,42 +0,0 @@
-#!/bin/sh
-#
-# Calculate the amount of space needed to run the kernel, including room for
-# the .bss and .brk sections.
-#
-# Usage:
-# objdump -h a.out | sh calc_run_size.sh
-
-NUM='\([0-9a-fA-F]*[ \t]*\)'
-OUT=$(sed -n 's/^[ \t0-9]*.b[sr][sk][ \t]*'"$NUM$NUM$NUM$NUM"'.*/\1\4/p')
-if [ -z "$OUT" ] ; then
-	echo "Never found .bss or .brk file offset" >&2
-	exit 1
-fi
-
-OUT=$(echo ${OUT# })
-sizeA=$(printf "%d" 0x${OUT%% *})
-OUT=${OUT#* }
-offsetA=$(printf "%d" 0x${OUT%% *})
-OUT=${OUT#* }
-sizeB=$(printf "%d" 0x${OUT%% *})
-OUT=${OUT#* }
-offsetB=$(printf "%d" 0x${OUT%% *})
-
-run_size=$(( $offsetA + $sizeA + $sizeB ))
-
-# BFD linker shows the same file offset in ELF.
-if [ "$offsetA" -ne "$offsetB" ] ; then
-	# Gold linker shows them as consecutive.
-	endB=$(( $offsetB + $sizeB ))
-	if [ "$endB" != "$run_size" ] ; then
-		printf "sizeA: 0x%x\n" $sizeA >&2
-		printf "offsetA: 0x%x\n" $offsetA >&2
-		printf "sizeB: 0x%x\n" $sizeB >&2
-		printf "offsetB: 0x%x\n" $offsetB >&2
-		echo ".bss and .brk are non-contiguous" >&2
-		exit 1
-	fi
-fi
-
-printf "%d\n" $run_size
-exit 0
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v4 08/20] x86, kaslr: Get correct max_addr for relocs pointer
  2016-03-22  7:31 [PATCH v4 00/20] x86, boot: kaslr cleanup and 64bit kaslr support Baoquan He
                   ` (6 preceding siblings ...)
  2016-03-22  7:32 ` [PATCH v4 07/20] x86, kaslr: Clean up useless code related to run_size Baoquan He
@ 2016-03-22  7:32 ` Baoquan He
  2016-03-22 20:52   ` Kees Cook
  2016-03-22  7:32 ` [PATCH v4 09/20] x86, kaslr: Consolidate mem_avoid array filling Baoquan He
                   ` (13 subsequent siblings)
  21 siblings, 1 reply; 52+ messages in thread
From: Baoquan He @ 2016-03-22  7:32 UTC (permalink / raw)
  To: linux-kernel
  Cc: yinghai, keescook, hpa, mingo, bp, vgoyal, luto, lasse.collin,
	akpm, dyoung, Baoquan He

From: Yinghai Lu <yinghai@kernel.org>

Relocation handling performs bounds checking on the relocated
addresses. The existing code uses output_len (VO size plus relocs
size) as the max address. This is not right since the max_addr check
should stop at the end of VO and exclude bss, brk, etc, which follows.
The valid range should be VO [_text, __bss_start] in the loaded
physical address space.

In this patch, add export for __bss_start to voffset.h and use it to
get the correct max_addr.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
---
v2->v3:
    Adjust patch log.
v3->v4:
    Kees help rewrite the patch log

 arch/x86/boot/compressed/Makefile | 2 +-
 arch/x86/boot/compressed/misc.c   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index fef80fa..2e7c0ce 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -41,7 +41,7 @@ LDFLAGS_vmlinux := -T
 hostprogs-y	:= mkpiggy
 HOST_EXTRACFLAGS += -I$(srctree)/tools/include
 
-sed-voffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(_text\|_end\)$$/\#define VO_\2 _AC(0x\1,UL)/p'
+sed-voffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(_text\|__bss_start\|_end\)$$/\#define VO_\2 _AC(0x\1,UL)/p'
 
 quiet_cmd_voffset = VOFFSET $@
       cmd_voffset = $(NM) $< | sed -n $(sed-voffset) > $@
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index def6207..029f42f 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -259,7 +259,7 @@ static void handle_relocations(void *output, unsigned long output_len)
 	int *reloc;
 	unsigned long delta, map, ptr;
 	unsigned long min_addr = (unsigned long)output;
-	unsigned long max_addr = min_addr + output_len;
+	unsigned long max_addr = min_addr + (VO___bss_start - VO__text);
 
 	/*
 	 * Calculate the delta between where vmlinux was linked to load
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v4 09/20] x86, kaslr: Consolidate mem_avoid array filling
  2016-03-22  7:31 [PATCH v4 00/20] x86, boot: kaslr cleanup and 64bit kaslr support Baoquan He
                   ` (7 preceding siblings ...)
  2016-03-22  7:32 ` [PATCH v4 08/20] x86, kaslr: Get correct max_addr for relocs pointer Baoquan He
@ 2016-03-22  7:32 ` Baoquan He
  2016-03-22  7:32 ` [PATCH v4 10/20] x86, boot: Split kernel_ident_mapping_init to another file Baoquan He
                   ` (12 subsequent siblings)
  21 siblings, 0 replies; 52+ messages in thread
From: Baoquan He @ 2016-03-22  7:32 UTC (permalink / raw)
  To: linux-kernel
  Cc: yinghai, keescook, hpa, mingo, bp, vgoyal, luto, lasse.collin,
	akpm, dyoung, Baoquan He

From: Yinghai Lu <yinghai@kernel.org>

We are going to support kaslr with 64bit above 4G, and new random output
could be anywhere. Array mem_avoid is used for kaslr to search new output
address. Current code has first entry which is extra bytes before
input+input_size, and it is according to output_size. But we need to track
all ranges instead of just after output+output_size. Other entries are for
initrd, cmdline, and heap/stack for ZO running.

In fact the compressed vmlinux plus relocs and the whole run space of ZO need
be protected from being overriden by decompressing output. Details about how
to get the first entry in mem_avoid can be seen in code comment above
mem_avoid_init() definition. From there we can see the new first entry already
includes heap and stack for ZO running. So no need to put them separately into
mem_avoid[]. Also boot_params need be put into mem_avoid[] since 64bit bootloader
could put it anywhere.

So in this patch, the first entry is adjusted and comments are added for it,
remove the heap and stack of ZO from mem_avoid, and at last add boot_params
into mem_avoid.

Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Baoquan He <bhe@redhat.com>
---
v2->v3:
    Adjust the patch log.
v3->v4:
    Add missing explanation for "init_size > run_size" part.
    Put the descrption and diagram about why those regions need be avoided
    in aslr.c as Kees suggested.

 arch/x86/boot/compressed/aslr.c | 72 ++++++++++++++++++++++++++++++++---------
 1 file changed, 56 insertions(+), 16 deletions(-)

diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index 622aa88..e323630 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -109,7 +109,7 @@ struct mem_vector {
 	unsigned long size;
 };
 
-#define MEM_AVOID_MAX 5
+#define MEM_AVOID_MAX 4
 static struct mem_vector mem_avoid[MEM_AVOID_MAX];
 
 static bool mem_contains(struct mem_vector *region, struct mem_vector *item)
@@ -134,22 +134,66 @@ static bool mem_overlaps(struct mem_vector *one, struct mem_vector *two)
 	return true;
 }
 
+/*
+ * In theroy, kaslr can put kernel anywhere in area of [16M, 64T). Array mem_avoid
+ * is used to store those ranges which need be avoided when kaslr searches new
+ * output address. We need avoid the region that is unsafe to overlap during
+ * decompression, initrd, cmdline and boot_params.
+ *
+ * How to get the region that is unsafe to overlap during decompression need
+ * be discussed detailedly. The principal is the compressed vmlinux plus relocs
+ * and the run space of ZO can't be overriden by decompressing output.
+ *
+ * Now ZO sit end of the buffer always, we can find out where is text and
+ * data/bss etc of ZO.
+ *
+ * Based on the following facts:
+ *  - init_size >= run_size,
+ *  - input+input_len >= output+output_len,
+ *  - run_size coule be >= or  <  output_len
+ *
+ * we can make several assumptions for better presentation by diagram:
+ *  - init_size > run_size
+ *  - input+input_len > output+output_len
+ *  - run_size > output_len
+ *
+ * 0   output                       input             input+input_len          output+init_size
+ * |     |                            |                       |                       |
+ * |-----|-------------------|--------|------------------|----|------------|----------|
+ *                           |                           |                 |
+ *              output+init_size-ZO_INIT_SIZE    output+output_len    output+run_size
+ *
+ * [output, output+init_size) is the for decompressing buffer for compressed
+ * kernel.
+ *
+ * [output, output+run_size) is for VO run size.
+ * [output, output+output_len) is (VO (vmlinux after objcopy) plus relocs)
+ *
+ * [output+init_size-ZO_INIT_SIZE, output+init_size) is copied ZO.
+ * [input, input+input_len) is copied compressed (VO (vmlinux after objcopy)
+ * plus relocs), not the ZO.
+ *
+ * [input+input_len, output+init_size) is [_text, _end) for ZO. That could be
+ * first range in mem_avoid. Now the first range already includes heap and
+ * stack for ZO running. Also [input, input+input_size) need be put in mem_avoid
+ * array. It is adjacent to the first entry, so merge them. This is  how we get
+ * the first entry in mem_avoid[].
+ *
+ */
 static void mem_avoid_init(unsigned long input, unsigned long input_size,
-			   unsigned long output, unsigned long output_size)
+			   unsigned long output)
 {
+	unsigned long init_size = real_mode->hdr.init_size;
 	u64 initrd_start, initrd_size;
 	u64 cmd_line, cmd_line_size;
-	unsigned long unsafe, unsafe_len;
 	char *ptr;
 
 	/*
 	 * Avoid the region that is unsafe to overlap during
-	 * decompression (see calculations at top of misc.c).
+	 * decompression.
 	 */
-	unsafe_len = (output_size >> 12) + 32768 + 18;
-	unsafe = (unsigned long)input + input_size - unsafe_len;
-	mem_avoid[0].start = unsafe;
-	mem_avoid[0].size = unsafe_len;
+	mem_avoid[0].start = input;
+	mem_avoid[0].size = (output + init_size) - input;
 
 	/* Avoid initrd. */
 	initrd_start  = (u64)real_mode->ext_ramdisk_image << 32;
@@ -169,13 +213,9 @@ static void mem_avoid_init(unsigned long input, unsigned long input_size,
 	mem_avoid[2].start = cmd_line;
 	mem_avoid[2].size = cmd_line_size;
 
-	/* Avoid heap memory. */
-	mem_avoid[3].start = (unsigned long)free_mem_ptr;
-	mem_avoid[3].size = BOOT_HEAP_SIZE;
-
-	/* Avoid stack memory. */
-	mem_avoid[4].start = (unsigned long)free_mem_end_ptr;
-	mem_avoid[4].size = BOOT_STACK_SIZE;
+	/* Avoid params */
+	mem_avoid[3].start = (unsigned long)real_mode;
+	mem_avoid[3].size = sizeof(*real_mode);
 }
 
 /* Does this memory vector overlap a known avoided area? */
@@ -319,7 +359,7 @@ unsigned char *choose_kernel_location(unsigned char *input,
 
 	/* Record the various known unsafe memory ranges. */
 	mem_avoid_init((unsigned long)input, input_size,
-		       (unsigned long)output, output_size);
+		       (unsigned long)output);
 
 	/* Walk e820 and find a random address. */
 	random = find_random_addr(choice, output_size);
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v4 10/20] x86, boot: Split kernel_ident_mapping_init to another file
  2016-03-22  7:31 [PATCH v4 00/20] x86, boot: kaslr cleanup and 64bit kaslr support Baoquan He
                   ` (8 preceding siblings ...)
  2016-03-22  7:32 ` [PATCH v4 09/20] x86, kaslr: Consolidate mem_avoid array filling Baoquan He
@ 2016-03-22  7:32 ` Baoquan He
  2016-03-22  7:32 ` [PATCH v4 11/20] x86, 64bit: Set ident_mapping for kaslr Baoquan He
                   ` (11 subsequent siblings)
  21 siblings, 0 replies; 52+ messages in thread
From: Baoquan He @ 2016-03-22  7:32 UTC (permalink / raw)
  To: linux-kernel
  Cc: yinghai, keescook, hpa, mingo, bp, vgoyal, luto, lasse.collin,
	akpm, dyoung

From: Yinghai Lu <yinghai@kernel.org>

We need to include that in boot::decompress_kernel stage to set new
ident mapping.

Also add checking for __pa/__va macro definition, as we need to override them
in boot::decompress_kernel stage.

Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/include/asm/page.h |  5 +++
 arch/x86/mm/ident_map.c     | 74 +++++++++++++++++++++++++++++++++++++++++++++
 arch/x86/mm/init_64.c       | 74 +--------------------------------------------
 3 files changed, 80 insertions(+), 73 deletions(-)
 create mode 100644 arch/x86/mm/ident_map.c

diff --git a/arch/x86/include/asm/page.h b/arch/x86/include/asm/page.h
index 802dde3..cf8f619 100644
--- a/arch/x86/include/asm/page.h
+++ b/arch/x86/include/asm/page.h
@@ -37,7 +37,10 @@ static inline void copy_user_page(void *to, void *from, unsigned long vaddr,
 	alloc_page_vma(GFP_HIGHUSER | __GFP_ZERO | movableflags, vma, vaddr)
 #define __HAVE_ARCH_ALLOC_ZEROED_USER_HIGHPAGE
 
+#ifndef __pa
 #define __pa(x)		__phys_addr((unsigned long)(x))
+#endif
+
 #define __pa_nodebug(x)	__phys_addr_nodebug((unsigned long)(x))
 /* __pa_symbol should be used for C visible symbols.
    This seems to be the official gcc blessed way to do such arithmetic. */
@@ -51,7 +54,9 @@ static inline void copy_user_page(void *to, void *from, unsigned long vaddr,
 #define __pa_symbol(x) \
 	__phys_addr_symbol(__phys_reloc_hide((unsigned long)(x)))
 
+#ifndef __va
 #define __va(x)			((void *)((unsigned long)(x)+PAGE_OFFSET))
+#endif
 
 #define __boot_va(x)		__va(x)
 #define __boot_pa(x)		__pa(x)
diff --git a/arch/x86/mm/ident_map.c b/arch/x86/mm/ident_map.c
new file mode 100644
index 0000000..751ca92
--- /dev/null
+++ b/arch/x86/mm/ident_map.c
@@ -0,0 +1,74 @@
+
+static void ident_pmd_init(unsigned long pmd_flag, pmd_t *pmd_page,
+			   unsigned long addr, unsigned long end)
+{
+	addr &= PMD_MASK;
+	for (; addr < end; addr += PMD_SIZE) {
+		pmd_t *pmd = pmd_page + pmd_index(addr);
+
+		if (!pmd_present(*pmd))
+			set_pmd(pmd, __pmd(addr | pmd_flag));
+	}
+}
+static int ident_pud_init(struct x86_mapping_info *info, pud_t *pud_page,
+			  unsigned long addr, unsigned long end)
+{
+	unsigned long next;
+
+	for (; addr < end; addr = next) {
+		pud_t *pud = pud_page + pud_index(addr);
+		pmd_t *pmd;
+
+		next = (addr & PUD_MASK) + PUD_SIZE;
+		if (next > end)
+			next = end;
+
+		if (pud_present(*pud)) {
+			pmd = pmd_offset(pud, 0);
+			ident_pmd_init(info->pmd_flag, pmd, addr, next);
+			continue;
+		}
+		pmd = (pmd_t *)info->alloc_pgt_page(info->context);
+		if (!pmd)
+			return -ENOMEM;
+		ident_pmd_init(info->pmd_flag, pmd, addr, next);
+		set_pud(pud, __pud(__pa(pmd) | _KERNPG_TABLE));
+	}
+
+	return 0;
+}
+
+int kernel_ident_mapping_init(struct x86_mapping_info *info, pgd_t *pgd_page,
+			      unsigned long addr, unsigned long end)
+{
+	unsigned long next;
+	int result;
+	int off = info->kernel_mapping ? pgd_index(__PAGE_OFFSET) : 0;
+
+	for (; addr < end; addr = next) {
+		pgd_t *pgd = pgd_page + pgd_index(addr) + off;
+		pud_t *pud;
+
+		next = (addr & PGDIR_MASK) + PGDIR_SIZE;
+		if (next > end)
+			next = end;
+
+		if (pgd_present(*pgd)) {
+			pud = pud_offset(pgd, 0);
+			result = ident_pud_init(info, pud, addr, next);
+			if (result)
+				return result;
+			continue;
+		}
+
+		pud = (pud_t *)info->alloc_pgt_page(info->context);
+		if (!pud)
+			return -ENOMEM;
+		result = ident_pud_init(info, pud, addr, next);
+		if (result)
+			return result;
+		set_pgd(pgd, __pgd(__pa(pud) | _KERNPG_TABLE));
+	}
+
+	return 0;
+}
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index a40b755..38f8396 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -57,79 +57,7 @@
 
 #include "mm_internal.h"
 
-static void ident_pmd_init(unsigned long pmd_flag, pmd_t *pmd_page,
-			   unsigned long addr, unsigned long end)
-{
-	addr &= PMD_MASK;
-	for (; addr < end; addr += PMD_SIZE) {
-		pmd_t *pmd = pmd_page + pmd_index(addr);
-
-		if (!pmd_present(*pmd))
-			set_pmd(pmd, __pmd(addr | pmd_flag));
-	}
-}
-static int ident_pud_init(struct x86_mapping_info *info, pud_t *pud_page,
-			  unsigned long addr, unsigned long end)
-{
-	unsigned long next;
-
-	for (; addr < end; addr = next) {
-		pud_t *pud = pud_page + pud_index(addr);
-		pmd_t *pmd;
-
-		next = (addr & PUD_MASK) + PUD_SIZE;
-		if (next > end)
-			next = end;
-
-		if (pud_present(*pud)) {
-			pmd = pmd_offset(pud, 0);
-			ident_pmd_init(info->pmd_flag, pmd, addr, next);
-			continue;
-		}
-		pmd = (pmd_t *)info->alloc_pgt_page(info->context);
-		if (!pmd)
-			return -ENOMEM;
-		ident_pmd_init(info->pmd_flag, pmd, addr, next);
-		set_pud(pud, __pud(__pa(pmd) | _KERNPG_TABLE));
-	}
-
-	return 0;
-}
-
-int kernel_ident_mapping_init(struct x86_mapping_info *info, pgd_t *pgd_page,
-			      unsigned long addr, unsigned long end)
-{
-	unsigned long next;
-	int result;
-	int off = info->kernel_mapping ? pgd_index(__PAGE_OFFSET) : 0;
-
-	for (; addr < end; addr = next) {
-		pgd_t *pgd = pgd_page + pgd_index(addr) + off;
-		pud_t *pud;
-
-		next = (addr & PGDIR_MASK) + PGDIR_SIZE;
-		if (next > end)
-			next = end;
-
-		if (pgd_present(*pgd)) {
-			pud = pud_offset(pgd, 0);
-			result = ident_pud_init(info, pud, addr, next);
-			if (result)
-				return result;
-			continue;
-		}
-
-		pud = (pud_t *)info->alloc_pgt_page(info->context);
-		if (!pud)
-			return -ENOMEM;
-		result = ident_pud_init(info, pud, addr, next);
-		if (result)
-			return result;
-		set_pgd(pgd, __pgd(__pa(pud) | _KERNPG_TABLE));
-	}
-
-	return 0;
-}
+#include "ident_map.c"
 
 /*
  * NOTE: pagetable_init alloc all the fixmap pagetables contiguous on the
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v4 11/20] x86, 64bit: Set ident_mapping for kaslr
  2016-03-22  7:31 [PATCH v4 00/20] x86, boot: kaslr cleanup and 64bit kaslr support Baoquan He
                   ` (9 preceding siblings ...)
  2016-03-22  7:32 ` [PATCH v4 10/20] x86, boot: Split kernel_ident_mapping_init to another file Baoquan He
@ 2016-03-22  7:32 ` Baoquan He
  2016-04-13 10:05   ` Ingo Molnar
  2016-03-22  7:32 ` [PATCH v4 12/20] x86, boot: Add checking for memcpy Baoquan He
                   ` (10 subsequent siblings)
  21 siblings, 1 reply; 52+ messages in thread
From: Baoquan He @ 2016-03-22  7:32 UTC (permalink / raw)
  To: linux-kernel
  Cc: yinghai, keescook, hpa, mingo, bp, vgoyal, luto, lasse.collin,
	akpm, dyoung, Jiri Kosina, Borislav Petkov, Baoquan He

From: Yinghai Lu <yinghai@kernel.org>

Current aslr only support random in small range, from 16M to 1G.  And
new range still use old mapping. Also it does not support new range
above 4G.

We need to have ident mapping for the new range before we can do
decompress to the new output, and later run them.

In this patch, we add ident mapping for all needed range.

At first, to support aslr to put random VO above 4G, we must set ident
mapping for the new range when it come via startup_32 path.

Secondly, when boot from 64bit bootloader, bootloader set ident mapping,
and boot via ZO (arch/x86/boot/compressed/vmlinux) startup_64.
Those pages for pagetable need to be avoided when we select new random
VO (vmlinux) base. Otherwise decompressor would overwrite them during
decompressing.
First way would be: walk through pagetable and find out every page is used
by pagetable for every mem_aovid checking but we will need extra code, and
may need to increase mem_avoid array size to hold them.
Other way would be: We can create new ident mapping instead, and pages for
pagetable will come from _pagetable section of ZO, and they are in
mem_avoid array already. In this way, we can reuse the code for ident
mapping.

The _pgtable will be shared by 32bit and 64bit path to reduce init_size,
as now ZO _rodata to _end will contribute init_size.

We need to increase pgt buffer size.
When boot via startup_64, as we need to cover old VO, params, cmdline
and new VO, in extreme case we could have them all cross 512G boundary,
will need (2+2)*4 pages with 2M mapping. And need 2 for first 2M for vga
ram. Plus one for level4. Total will be 19 pages.
When boot via startup_32, aslr would move new VO above 4G, we need set
extra ident mapping for new VO, pgt buffer come from _pgtable offset 6
pages. Should only need (2+2) pages at most when it cross 512G boundary.
So 19 pages could make both paths happy.

Cc: Kees Cook <keescook@chromium.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Borislav Petkov <bp@suse.de>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Baoquan He <bhe@redhat.com>
---
v3->v4:
    Use the .start and .size arguments instead of the copy/paste when
    call fill_pagetable() in mem_avoid_init()
       fill_pagetable(mem_avoid[0].start, mem_avoid[0].size);

    Add more details for error printing in alloc_pgt_page()

 arch/x86/boot/compressed/Makefile   |  3 ++
 arch/x86/boot/compressed/aslr.c     | 14 ++++++
 arch/x86/boot/compressed/head_64.S  |  4 +-
 arch/x86/boot/compressed/misc.h     | 11 +++++
 arch/x86/boot/compressed/misc_pgt.c | 93 +++++++++++++++++++++++++++++++++++++
 arch/x86/include/asm/boot.h         | 19 ++++++++
 6 files changed, 142 insertions(+), 2 deletions(-)
 create mode 100644 arch/x86/boot/compressed/misc_pgt.c

diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index 2e7c0ce..229604d 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -59,6 +59,9 @@ vmlinux-objs-y := $(obj)/vmlinux.lds $(obj)/head_$(BITS).o $(obj)/misc.o \
 
 vmlinux-objs-$(CONFIG_EARLY_PRINTK) += $(obj)/early_serial_console.o
 vmlinux-objs-$(CONFIG_RANDOMIZE_BASE) += $(obj)/aslr.o
+ifdef CONFIG_X86_64
+	vmlinux-objs-$(CONFIG_RANDOMIZE_BASE) += $(obj)/misc_pgt.o
+endif
 
 $(obj)/eboot.o: KBUILD_CFLAGS += -fshort-wchar -mno-red-zone
 
diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index e323630..adb2362 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -194,6 +194,7 @@ static void mem_avoid_init(unsigned long input, unsigned long input_size,
 	 */
 	mem_avoid[0].start = input;
 	mem_avoid[0].size = (output + init_size) - input;
+	fill_pagetable(mem_avoid[0].start, mem_avoid[0].size);
 
 	/* Avoid initrd. */
 	initrd_start  = (u64)real_mode->ext_ramdisk_image << 32;
@@ -202,6 +203,7 @@ static void mem_avoid_init(unsigned long input, unsigned long input_size,
 	initrd_size |= real_mode->hdr.ramdisk_size;
 	mem_avoid[1].start = initrd_start;
 	mem_avoid[1].size = initrd_size;
+	/* No need to set mapping for initrd */
 
 	/* Avoid kernel command line. */
 	cmd_line  = (u64)real_mode->ext_cmd_line_ptr << 32;
@@ -212,10 +214,19 @@ static void mem_avoid_init(unsigned long input, unsigned long input_size,
 		;
 	mem_avoid[2].start = cmd_line;
 	mem_avoid[2].size = cmd_line_size;
+	fill_pagetable(mem_avoid[2].start, mem_avoid[2].size);
 
 	/* Avoid params */
 	mem_avoid[3].start = (unsigned long)real_mode;
 	mem_avoid[3].size = sizeof(*real_mode);
+	fill_pagetable(mem_avoid[3].start, mem_avoid[3].size);
+
+	/* don't need to set mapping for setup_data */
+
+#ifdef CONFIG_X86_VERBOSE_BOOTUP
+	/* for video ram */
+	fill_pagetable(0, PMD_SIZE);
+#endif
 }
 
 /* Does this memory vector overlap a known avoided area? */
@@ -373,6 +384,9 @@ unsigned char *choose_kernel_location(unsigned char *input,
 		goto out;
 
 	choice = random;
+
+	fill_pagetable(choice, output_size);
+	switch_pagetable();
 out:
 	return (unsigned char *)choice;
 }
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 3691451..075bb15 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -126,7 +126,7 @@ ENTRY(startup_32)
 	/* Initialize Page tables to 0 */
 	leal	pgtable(%ebx), %edi
 	xorl	%eax, %eax
-	movl	$((4096*6)/4), %ecx
+	movl	$(BOOT_INIT_PGT_SIZE/4), %ecx
 	rep	stosl
 
 	/* Build Level 4 */
@@ -478,4 +478,4 @@ boot_stack_end:
 	.section ".pgtable","a",@nobits
 	.balign 4096
 pgtable:
-	.fill 6*4096, 1, 0
+	.fill BOOT_PGT_SIZE, 1, 0
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index dcf01c2..11736a6 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -84,6 +84,17 @@ unsigned char *choose_kernel_location(unsigned char *input,
 }
 #endif
 
+#ifdef CONFIG_X86_64
+void fill_pagetable(unsigned long start, unsigned long size);
+void switch_pagetable(void);
+extern unsigned char _pgtable[];
+#else
+static inline void fill_pagetable(unsigned long start, unsigned long size)
+{ }
+static inline void switch_pagetable(void)
+{ }
+#endif
+
 #ifdef CONFIG_EARLY_PRINTK
 /* early_serial_console.c */
 extern int early_serial_base;
diff --git a/arch/x86/boot/compressed/misc_pgt.c b/arch/x86/boot/compressed/misc_pgt.c
new file mode 100644
index 0000000..816551d
--- /dev/null
+++ b/arch/x86/boot/compressed/misc_pgt.c
@@ -0,0 +1,93 @@
+#define __pa(x)  ((unsigned long)(x))
+#define __va(x)  ((void *)((unsigned long)(x)))
+
+#include "misc.h"
+
+#include <asm/init.h>
+#include <asm/pgtable.h>
+
+#include "../../mm/ident_map.c"
+#include "../string.h"
+
+struct alloc_pgt_data {
+	unsigned char *pgt_buf;
+	unsigned long pgt_buf_size;
+	unsigned long pgt_buf_offset;
+};
+
+static void *alloc_pgt_page(void *context)
+{
+	struct alloc_pgt_data *d = (struct alloc_pgt_data *)context;
+	unsigned char *p = (unsigned char *)d->pgt_buf;
+
+	if (d->pgt_buf_offset >= d->pgt_buf_size) {
+		debug_putstr("out of pgt_buf in misc.c\n");
+		debug_putaddr(d->pgt_buf_offset);
+		debug_putaddr(d->pgt_buf_size);
+		return NULL;
+	}
+
+	p += d->pgt_buf_offset;
+	d->pgt_buf_offset += PAGE_SIZE;
+
+	return p;
+}
+
+/*
+ * Use a normal definition of memset() from string.c. There are already
+ * included header files which expect a definition of memset() and by
+ * the time we define memset macro, it is too late.
+ */
+#undef memset
+
+unsigned long __force_order;
+static struct alloc_pgt_data pgt_data;
+static struct x86_mapping_info mapping_info;
+static pgd_t *level4p;
+
+void fill_pagetable(unsigned long start, unsigned long size)
+{
+	unsigned long end = start + size;
+
+	if (!level4p) {
+		pgt_data.pgt_buf_offset = 0;
+		mapping_info.alloc_pgt_page = alloc_pgt_page;
+		mapping_info.context = &pgt_data;
+		mapping_info.pmd_flag = __PAGE_KERNEL_LARGE_EXEC;
+
+		/*
+		 * come from startup_32 ?
+		 * then cr3 is _pgtable, we can reuse it.
+		 */
+		level4p = (pgd_t *)read_cr3();
+		if ((unsigned long)level4p == (unsigned long)_pgtable) {
+			pgt_data.pgt_buf = (unsigned char *)_pgtable +
+						 BOOT_INIT_PGT_SIZE;
+			pgt_data.pgt_buf_size = BOOT_PGT_SIZE -
+						 BOOT_INIT_PGT_SIZE;
+			memset((unsigned char *)pgt_data.pgt_buf, 0,
+				pgt_data.pgt_buf_size);
+			debug_putstr("boot via startup_32\n");
+		} else {
+			pgt_data.pgt_buf = (unsigned char *)_pgtable;
+			pgt_data.pgt_buf_size = BOOT_PGT_SIZE;
+			memset((unsigned char *)pgt_data.pgt_buf, 0,
+				pgt_data.pgt_buf_size);
+			debug_putstr("boot via startup_64\n");
+			level4p = (pgd_t *)alloc_pgt_page(&pgt_data);
+		}
+	}
+
+	/* align boundary to 2M */
+	start = round_down(start, PMD_SIZE);
+	end = round_up(end, PMD_SIZE);
+	if (start >= end)
+		return;
+
+	kernel_ident_mapping_init(&mapping_info, level4p, start, end);
+}
+
+void switch_pagetable(void)
+{
+	write_cr3((unsigned long)level4p);
+}
diff --git a/arch/x86/include/asm/boot.h b/arch/x86/include/asm/boot.h
index 6b8d6e8..52a9cbc 100644
--- a/arch/x86/include/asm/boot.h
+++ b/arch/x86/include/asm/boot.h
@@ -32,7 +32,26 @@
 #endif /* !CONFIG_KERNEL_BZIP2 */
 
 #ifdef CONFIG_X86_64
+
 #define BOOT_STACK_SIZE	0x4000
+
+#define BOOT_INIT_PGT_SIZE (6*4096)
+#ifdef CONFIG_RANDOMIZE_BASE
+/*
+ * 1 page for level4, 2 pages for first 2M.
+ * (2+2)*4 pages for kernel, param, cmd_line, random kernel
+ * if all cross 512G boundary.
+ * So total will be 19 pages.
+ */
+#ifdef CONFIG_X86_VERBOSE_BOOTUP
+#define BOOT_PGT_SIZE (19*4096)
+#else
+#define BOOT_PGT_SIZE (17*4096)
+#endif
+#else
+#define BOOT_PGT_SIZE BOOT_INIT_PGT_SIZE
+#endif
+
 #else
 #define BOOT_STACK_SIZE	0x1000
 #endif
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v4 12/20] x86, boot: Add checking for memcpy
  2016-03-22  7:31 [PATCH v4 00/20] x86, boot: kaslr cleanup and 64bit kaslr support Baoquan He
                   ` (10 preceding siblings ...)
  2016-03-22  7:32 ` [PATCH v4 11/20] x86, 64bit: Set ident_mapping for kaslr Baoquan He
@ 2016-03-22  7:32 ` Baoquan He
  2016-03-22  7:32 ` [PATCH v4 13/20] x86, kaslr: Introduce struct slot_area to manage randomization slot info Baoquan He
                   ` (9 subsequent siblings)
  21 siblings, 0 replies; 52+ messages in thread
From: Baoquan He @ 2016-03-22  7:32 UTC (permalink / raw)
  To: linux-kernel
  Cc: yinghai, keescook, hpa, mingo, bp, vgoyal, luto, lasse.collin,
	akpm, dyoung, Baoquan He

From: Yinghai Lu <yinghai@kernel.org>

parse_elf is using local memcpy to move section to running position.
That memcpy actually only support no overlapping case or when dest < src.

Add checking in memcpy to find out the wrong case for future use, at
that time we will need to have backward memcpy for it.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Baoquan He <bhe@redhat.com>
---
v2->v3:
    Add a declaration for error() since its declaration is in misc.h.
    But it's not included in compressed/string.c.
v3->v4:
    Remove the meanless code comment in error() as Kees suggested.

 arch/x86/boot/compressed/misc.c   |  9 ++-------
 arch/x86/boot/compressed/misc.h   |  2 ++
 arch/x86/boot/compressed/string.c | 29 +++++++++++++++++++++++++++--
 3 files changed, 31 insertions(+), 9 deletions(-)

diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 029f42f..c5f0ecf 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -114,9 +114,6 @@
 #undef memset
 #define memzero(s, n)	memset((s), 0, (n))
 
-
-static void error(char *m);
-
 /*
  * This is set up by the setup-routine at boot-time
  */
@@ -243,7 +240,7 @@ void __puthex(unsigned long value)
 	}
 }
 
-static void error(char *x)
+void error(char *x)
 {
 	error_putstr("\n\n");
 	error_putstr(x);
@@ -378,9 +375,7 @@ static void parse_elf(void *output)
 #else
 			dest = (void *)(phdr->p_paddr);
 #endif
-			memcpy(dest,
-			       output + phdr->p_offset,
-			       phdr->p_filesz);
+			memcpy(dest, output + phdr->p_offset, phdr->p_filesz);
 			break;
 		default: /* Ignore other PT_* */ break;
 		}
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index 11736a6..39d0e9a 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -38,6 +38,8 @@ void __puthex(unsigned long value);
 #define error_putstr(__x)  __putstr(__x)
 #define error_puthex(__x)  __puthex(__x)
 
+void error(char *x);
+
 #ifdef CONFIG_X86_VERBOSE_BOOTUP
 
 #define debug_putstr(__x)  __putstr(__x)
diff --git a/arch/x86/boot/compressed/string.c b/arch/x86/boot/compressed/string.c
index 00e788b..3a935d0 100644
--- a/arch/x86/boot/compressed/string.c
+++ b/arch/x86/boot/compressed/string.c
@@ -1,7 +1,7 @@
 #include "../string.c"
 
 #ifdef CONFIG_X86_32
-void *memcpy(void *dest, const void *src, size_t n)
+void *__memcpy(void *dest, const void *src, size_t n)
 {
 	int d0, d1, d2;
 	asm volatile(
@@ -15,7 +15,7 @@ void *memcpy(void *dest, const void *src, size_t n)
 	return dest;
 }
 #else
-void *memcpy(void *dest, const void *src, size_t n)
+void *__memcpy(void *dest, const void *src, size_t n)
 {
 	long d0, d1, d2;
 	asm volatile(
@@ -30,6 +30,31 @@ void *memcpy(void *dest, const void *src, size_t n)
 }
 #endif
 
+extern void error(char *x);
+void *memcpy(void *dest, const void *src, size_t n)
+{
+	unsigned long start_dest, end_dest;
+	unsigned long start_src, end_src;
+	unsigned long max_start, min_end;
+
+	if (dest < src)
+		return __memcpy(dest, src, n);
+
+	start_dest = (unsigned long)dest;
+	end_dest = (unsigned long)dest + n;
+	start_src = (unsigned long)src;
+	end_src = (unsigned long)src + n;
+	max_start = (start_dest > start_src) ?  start_dest : start_src;
+	min_end = (end_dest < end_src) ? end_dest : end_src;
+
+	if (max_start >= min_end)
+		return __memcpy(dest, src, n);
+
+	error("memcpy does not support overlapping with dest > src!\n");
+
+	return dest;
+}
+
 void *memset(void *s, int c, size_t n)
 {
 	int i;
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v4 13/20] x86, kaslr: Introduce struct slot_area to manage randomization slot info
  2016-03-22  7:31 [PATCH v4 00/20] x86, boot: kaslr cleanup and 64bit kaslr support Baoquan He
                   ` (11 preceding siblings ...)
  2016-03-22  7:32 ` [PATCH v4 12/20] x86, boot: Add checking for memcpy Baoquan He
@ 2016-03-22  7:32 ` Baoquan He
  2016-03-22  7:32 ` [PATCH v4 14/20] x86, kaslr: Add two functions which will be used later Baoquan He
                   ` (8 subsequent siblings)
  21 siblings, 0 replies; 52+ messages in thread
From: Baoquan He @ 2016-03-22  7:32 UTC (permalink / raw)
  To: linux-kernel
  Cc: yinghai, keescook, hpa, mingo, bp, vgoyal, luto, lasse.collin,
	akpm, dyoung, Baoquan He

Kernel is expected to be randomly reloaded anywhere in the whole
physical memory area, it could be near 64T at most. In this case
there could be about 4*1024*1024 randomization slots if
CONFIG_PHYSICAL_ALIGN is 0x1000000. Currently the starting address
of candidate position is stored into array slots[], and it finds
each available slot one time and save it into slots[]. With this
old way the slot arrry will cost too much memory and it's also
very unefficient to get and save the slot information into slot
array one by one.

Here introduce struct slot_area to manage randomization slot info
in one contiguous memory area excluding the avoid area. Each
slot_area will contain the starting address and how many available
slots in this area. Array slot_areas is used to store all slot area
info.

Since setup_data is a linked list, could contain many datas by
pointer to point one by one, excluding them will split RAM memory
into many smaller areas, here only take the first 100 slot areas if
too many of them.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 arch/x86/boot/compressed/aslr.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index adb2362..0431c19 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -259,8 +259,20 @@ static bool mem_avoid_overlap(struct mem_vector *img)
 
 static unsigned long slots[CONFIG_RANDOMIZE_BASE_MAX_OFFSET /
 			   CONFIG_PHYSICAL_ALIGN];
+
+struct slot_area {
+	unsigned long addr;
+	int num;
+};
+
+#define MAX_SLOT_AREA 100
+
+static struct slot_area slot_areas[MAX_SLOT_AREA];
+
 static unsigned long slot_max;
 
+static unsigned long slot_area_index;
+
 static void slots_append(unsigned long addr)
 {
 	/* Overflowing the slots list should be impossible. */
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v4 14/20] x86, kaslr: Add two functions which will be used later
  2016-03-22  7:31 [PATCH v4 00/20] x86, boot: kaslr cleanup and 64bit kaslr support Baoquan He
                   ` (12 preceding siblings ...)
  2016-03-22  7:32 ` [PATCH v4 13/20] x86, kaslr: Introduce struct slot_area to manage randomization slot info Baoquan He
@ 2016-03-22  7:32 ` Baoquan He
  2016-03-22  7:32 ` [PATCH v4 15/20] x86, kaslr: Introduce fetch_random_virt_offset to randomize the kernel text mapping address Baoquan He
                   ` (7 subsequent siblings)
  21 siblings, 0 replies; 52+ messages in thread
From: Baoquan He @ 2016-03-22  7:32 UTC (permalink / raw)
  To: linux-kernel
  Cc: yinghai, keescook, hpa, mingo, bp, vgoyal, luto, lasse.collin,
	akpm, dyoung, Baoquan He

Function store_slot_info() is used to calculate the slot info of passed
in memory region and store it into slot_areas[].

Function mem_min_overlap is used to iterate all avoid regions to find the
one which overlap with it in the lowest address. E.g there's a memory
region[1024M, 2048M), after iterating all avoid regions we found the first
avoid region by starting address from low to high is [1536M, 1664M). With
this information we can split memory region [1024M, 1536M) out as the
avaliable slot area and save it into array slot_areas[].

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 arch/x86/boot/compressed/aslr.c | 51 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 51 insertions(+)

diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index 0431c19..44c6768 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -257,6 +257,40 @@ static bool mem_avoid_overlap(struct mem_vector *img)
 	return false;
 }
 
+static unsigned long
+mem_min_overlap(struct mem_vector *img, struct mem_vector *out)
+{
+	int i;
+	struct setup_data *ptr;
+	unsigned long min = img->start + img->size;
+
+	for (i = 0; i < MEM_AVOID_MAX; i++) {
+		if (mem_overlaps(img, &mem_avoid[i]) &&
+			(mem_avoid[i].start < min)) {
+			*out = mem_avoid[i];
+			min = mem_avoid[i].start;
+		}
+	}
+
+	/* Check all entries in the setup_data linked list. */
+	ptr = (struct setup_data *)(unsigned long)real_mode->hdr.setup_data;
+	while (ptr) {
+		struct mem_vector avoid;
+
+		avoid.start = (unsigned long)ptr;
+		avoid.size = sizeof(*ptr) + ptr->len;
+
+		if (mem_overlaps(img, &avoid) && (avoid.start < min)) {
+			*out = avoid;
+			min = avoid.start;
+		}
+
+		ptr = (struct setup_data *)(unsigned long)ptr->next;
+	}
+
+	return min;
+}
+
 static unsigned long slots[CONFIG_RANDOMIZE_BASE_MAX_OFFSET /
 			   CONFIG_PHYSICAL_ALIGN];
 
@@ -273,6 +307,23 @@ static unsigned long slot_max;
 
 static unsigned long slot_area_index;
 
+static void store_slot_info(struct mem_vector *region, unsigned long image_size)
+{
+	struct slot_area slot_area;
+
+	slot_area.addr = region->start;
+	if (image_size <= CONFIG_PHYSICAL_ALIGN)
+		slot_area.num = region->size / CONFIG_PHYSICAL_ALIGN;
+	else
+		slot_area.num = (region->size - image_size) /
+				CONFIG_PHYSICAL_ALIGN + 1;
+
+	if (slot_area.num > 0) {
+		slot_areas[slot_area_index++] = slot_area;
+		slot_max += slot_area.num;
+	}
+}
+
 static void slots_append(unsigned long addr)
 {
 	/* Overflowing the slots list should be impossible. */
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v4 15/20] x86, kaslr: Introduce fetch_random_virt_offset to randomize the kernel text mapping address
  2016-03-22  7:31 [PATCH v4 00/20] x86, boot: kaslr cleanup and 64bit kaslr support Baoquan He
                   ` (13 preceding siblings ...)
  2016-03-22  7:32 ` [PATCH v4 14/20] x86, kaslr: Add two functions which will be used later Baoquan He
@ 2016-03-22  7:32 ` Baoquan He
  2016-03-22  7:32 ` [PATCH v4 16/20] x86, kaslr: Randomize physical and virtual address of kernel separately Baoquan He
                   ` (6 subsequent siblings)
  21 siblings, 0 replies; 52+ messages in thread
From: Baoquan He @ 2016-03-22  7:32 UTC (permalink / raw)
  To: linux-kernel
  Cc: yinghai, keescook, hpa, mingo, bp, vgoyal, luto, lasse.collin,
	akpm, dyoung, Baoquan He

Kaslr extended kernel text mapping region size from 512M to 1G,
namely CONFIG_RANDOMIZE_BASE_MAX_OFFSET. This means kernel text
can be mapped to below region:

[__START_KERNEL_map + LOAD_PHYSICAL_ADDR, __START_KERNEL_map + 1G]

Introduce a function find_random_virt_offset() to get random value
between LOAD_PHYSICAL_ADDR and CONFIG_RANDOMIZE_BASE_MAX_OFFSET.
This random value will be added to __START_KERNEL_map to get the
starting address which kernel text is mapped from. Since slot can
be anywhere of this region, means it is an independent slot_area,
it is very simple to get a slot according to random value.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 arch/x86/boot/compressed/aslr.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index 44c6768..3083f40 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -409,6 +409,27 @@ static unsigned long find_random_addr(unsigned long minimum,
 	return slots_fetch_random();
 }
 
+static unsigned long find_random_virt_offset(unsigned long minimum,
+				  unsigned long image_size)
+{
+	unsigned long slot_num, random;
+
+	/* Make sure minimum is aligned. */
+	minimum = ALIGN(minimum, CONFIG_PHYSICAL_ALIGN);
+
+	if (image_size <= CONFIG_PHYSICAL_ALIGN)
+		slot_num = (CONFIG_RANDOMIZE_BASE_MAX_OFFSET - minimum) /
+				CONFIG_PHYSICAL_ALIGN;
+	else
+		slot_num = (CONFIG_RANDOMIZE_BASE_MAX_OFFSET -
+				minimum - image_size) /
+				CONFIG_PHYSICAL_ALIGN + 1;
+
+	random = get_random_long() % slot_num;
+
+	return random * CONFIG_PHYSICAL_ALIGN + minimum;
+}
+
 unsigned char *choose_kernel_location(unsigned char *input,
 				      unsigned long input_size,
 				      unsigned char *output,
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v4 16/20] x86, kaslr: Randomize physical and virtual address of kernel separately
  2016-03-22  7:31 [PATCH v4 00/20] x86, boot: kaslr cleanup and 64bit kaslr support Baoquan He
                   ` (14 preceding siblings ...)
  2016-03-22  7:32 ` [PATCH v4 15/20] x86, kaslr: Introduce fetch_random_virt_offset to randomize the kernel text mapping address Baoquan He
@ 2016-03-22  7:32 ` Baoquan He
  2016-03-22  7:32 ` [PATCH v4 17/20] x86, kaslr: Add support of kernel physical address randomization above 4G Baoquan He
                   ` (5 subsequent siblings)
  21 siblings, 0 replies; 52+ messages in thread
From: Baoquan He @ 2016-03-22  7:32 UTC (permalink / raw)
  To: linux-kernel
  Cc: yinghai, keescook, hpa, mingo, bp, vgoyal, luto, lasse.collin,
	akpm, dyoung, Baoquan He

On x86_64, in old kaslr implementaion only physical address of kernel
loading is randomized. Then calculate the delta of physical address
where vmlinux was linked to load and where it is finally loaded. If
delta is not equal to 0, namely there's a new physical address where
kernel is actually decompressed, relocation handling need be done. Then
delta is added to offset of kernel symbol relocation, this makes the
address of kernel text mapping move delta long.

Here the behavior is changed. Randomize both the physical address
where kernel is decompressed and the virtual address where kernel text
is mapped. And relocation handling only depends on virtual address
randomization. Means if and only if virtual address is randomized to
a different value, we add the delta to the offset of kernel relocs.

Note that up to now both virtual offset and physical addr randomization
cann't exceed CONFIG_RANDOMIZE_BASE_MAX_OFFSET, namely 1G.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 arch/x86/boot/compressed/aslr.c | 46 +++++++++++++++++++++--------------------
 arch/x86/boot/compressed/misc.c | 40 +++++++++++++++++++++--------------
 arch/x86/boot/compressed/misc.h | 19 +++++++++--------
 3 files changed, 59 insertions(+), 46 deletions(-)

diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index 3083f40..c389454 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -392,7 +392,7 @@ static void process_e820_entry(struct e820entry *entry,
 	}
 }
 
-static unsigned long find_random_addr(unsigned long minimum,
+static unsigned long find_random_phy_addr(unsigned long minimum,
 				      unsigned long size)
 {
 	int i;
@@ -430,23 +430,24 @@ static unsigned long find_random_virt_offset(unsigned long minimum,
 	return random * CONFIG_PHYSICAL_ALIGN + minimum;
 }
 
-unsigned char *choose_kernel_location(unsigned char *input,
-				      unsigned long input_size,
-				      unsigned char *output,
-				      unsigned long output_size)
+void choose_kernel_location(unsigned char *input,
+				unsigned long input_size,
+				unsigned char **output,
+				unsigned long output_size,
+				unsigned char **virt_offset)
 {
-	unsigned long choice = (unsigned long)output;
 	unsigned long random;
+	*virt_offset = (unsigned char *)LOAD_PHYSICAL_ADDR;
 
 #ifdef CONFIG_HIBERNATION
 	if (!cmdline_find_option_bool("kaslr")) {
 		debug_putstr("KASLR disabled by default...\n");
-		goto out;
+		return;
 	}
 #else
 	if (cmdline_find_option_bool("nokaslr")) {
 		debug_putstr("KASLR disabled by cmdline...\n");
-		goto out;
+		return;
 	}
 #endif
 
@@ -454,23 +455,24 @@ unsigned char *choose_kernel_location(unsigned char *input,
 
 	/* Record the various known unsafe memory ranges. */
 	mem_avoid_init((unsigned long)input, input_size,
-		       (unsigned long)output);
+		       (unsigned long)*output);
 
 	/* Walk e820 and find a random address. */
-	random = find_random_addr(choice, output_size);
-	if (!random) {
+	random = find_random_phy_addr((unsigned long)*output, output_size);
+	if (!random)
 		debug_putstr("KASLR could not find suitable E820 region...\n");
-		goto out;
+	else {
+		if ((unsigned long)*output != random) {
+			fill_pagetable(random, output_size);
+			switch_pagetable();
+			*output = (unsigned char *)random;
+		}
 	}
 
-	/* Always enforce the minimum. */
-	if (random < choice)
-		goto out;
-
-	choice = random;
-
-	fill_pagetable(choice, output_size);
-	switch_pagetable();
-out:
-	return (unsigned char *)choice;
+	/*
+	 * Get a random address between LOAD_PHYSICAL_ADDR and
+	 * CONFIG_RANDOMIZE_BASE_MAX_OFFSET
+	 */
+	random = find_random_virt_offset(LOAD_PHYSICAL_ADDR, output_size);
+	*virt_offset = (unsigned char *)random;
 }
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index c5f0ecf..062e65f 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -251,7 +251,8 @@ void error(char *x)
 }
 
 #if CONFIG_X86_NEED_RELOCS
-static void handle_relocations(void *output, unsigned long output_len)
+static void handle_relocations(void *output, unsigned long output_len,
+			       void *virt_offset)
 {
 	int *reloc;
 	unsigned long delta, map, ptr;
@@ -263,11 +264,6 @@ static void handle_relocations(void *output, unsigned long output_len)
 	 * and where it was actually loaded.
 	 */
 	delta = min_addr - LOAD_PHYSICAL_ADDR;
-	if (!delta) {
-		debug_putstr("No relocation needed... ");
-		return;
-	}
-	debug_putstr("Performing relocations... ");
 
 	/*
 	 * The kernel contains a table of relocation addresses. Those
@@ -278,6 +274,22 @@ static void handle_relocations(void *output, unsigned long output_len)
 	 */
 	map = delta - __START_KERNEL_map;
 
+
+
+	/*
+	 * 32-bit always performs relocations. 64-bit relocations are only
+	 * needed if kASLR has chosen a different starting address offset
+	 * from __START_KERNEL_map.
+	 */
+	if (IS_ENABLED(CONFIG_X86_64))
+		delta = (unsigned long)virt_offset - LOAD_PHYSICAL_ADDR;
+
+	if (!delta) {
+		debug_putstr("No relocation needed... ");
+		return;
+	}
+	debug_putstr("Performing relocations... ");
+
 	/*
 	 * Process relocations: 32 bit relocations first then 64 bit after.
 	 * Three sets of binary relocations are added to the end of the kernel
@@ -331,7 +343,8 @@ static void handle_relocations(void *output, unsigned long output_len)
 #endif
 }
 #else
-static inline void handle_relocations(void *output, unsigned long output_len)
+static inline void handle_relocations(void *output, unsigned long output_len,
+				      void *virt_offset)
 { }
 #endif
 
@@ -392,6 +405,7 @@ asmlinkage __visible void *decompress_kernel(void *rmode, memptr heap,
 {
 	const unsigned long run_size = VO__end - VO__text;
 	unsigned char *output_orig = output;
+	unsigned char *virt_offset;
 
 	real_mode = rmode;
 
@@ -429,9 +443,10 @@ asmlinkage __visible void *decompress_kernel(void *rmode, memptr heap,
 	 * the entire decompressed kernel plus relocation table, or the
 	 * entire decompressed kernel plus .bss and .brk sections.
 	 */
-	output = choose_kernel_location(input_data, input_len, output,
+	choose_kernel_location(input_data, input_len, &output,
 					output_len > run_size ? output_len
-							      : run_size);
+							      : run_size,
+					&virt_offset);
 
 	/* Validate memory location choices. */
 	if ((unsigned long)output & (MIN_KERNEL_ALIGN - 1))
@@ -452,12 +467,7 @@ asmlinkage __visible void *decompress_kernel(void *rmode, memptr heap,
 	__decompress(input_data, input_len, NULL, NULL, output, output_len,
 			NULL, error);
 	parse_elf(output);
-	/*
-	 * 32-bit always performs relocations. 64-bit relocations are only
-	 * needed if kASLR has chosen a different load address.
-	 */
-	if (!IS_ENABLED(CONFIG_X86_64) || output != output_orig)
-		handle_relocations(output, output_len);
+	handle_relocations(output, output_len, virt_offset);
 	debug_putstr("done.\nBooting the kernel.\n");
 	return output;
 }
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index 39d0e9a..cde6aec 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -69,20 +69,21 @@ int cmdline_find_option_bool(const char *option);
 
 #if CONFIG_RANDOMIZE_BASE
 /* aslr.c */
-unsigned char *choose_kernel_location(unsigned char *input,
-				      unsigned long input_size,
-				      unsigned char *output,
-				      unsigned long output_size);
+void choose_kernel_location(unsigned char *input,
+				unsigned long input_size,
+				unsigned char **output,
+				unsigned long output_size,
+				unsigned char **virt_offset);
 /* cpuflags.c */
 bool has_cpuflag(int flag);
 #else
 static inline
-unsigned char *choose_kernel_location(unsigned char *input,
-				      unsigned long input_size,
-				      unsigned char *output,
-				      unsigned long output_size)
+void choose_kernel_location(unsigned char *input,
+				unsigned long input_size,
+				unsigned char **output,
+				unsigned long output_size,
+				unsigned char **virt_offset)
 {
-	return output;
 }
 #endif
 
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v4 17/20] x86, kaslr: Add support of kernel physical address randomization above 4G
  2016-03-22  7:31 [PATCH v4 00/20] x86, boot: kaslr cleanup and 64bit kaslr support Baoquan He
                   ` (15 preceding siblings ...)
  2016-03-22  7:32 ` [PATCH v4 16/20] x86, kaslr: Randomize physical and virtual address of kernel separately Baoquan He
@ 2016-03-22  7:32 ` Baoquan He
  2016-03-22  7:32 ` [PATCH v4 18/20] x86, kaslr: Remove useless codes Baoquan He
                   ` (4 subsequent siblings)
  21 siblings, 0 replies; 52+ messages in thread
From: Baoquan He @ 2016-03-22  7:32 UTC (permalink / raw)
  To: linux-kernel
  Cc: yinghai, keescook, hpa, mingo, bp, vgoyal, luto, lasse.collin,
	akpm, dyoung, Baoquan He

In kaslr implementation mechanism, process_e820_entry and
slots_fetch_random are 2 key functions.

process_e820_entry is used to parse passed in memory region and
store available slot area information into array slot_areas[].
slots_fetch_random is used to get a random value and translate
it into starting address of corresponding slot.

In this patch, for adding support of kernel physical address
randomization above 4G, both of these two functions are changed
based on the new slot_area data structure.

Now kernel can be reloaded and decompressed anywhere of the whole
physical memory, even near 64T at most.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 arch/x86/boot/compressed/aslr.c | 68 ++++++++++++++++++++++++++++++-----------
 1 file changed, 51 insertions(+), 17 deletions(-)

diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index c389454..21dad0a 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -336,27 +336,40 @@ static void slots_append(unsigned long addr)
 
 static unsigned long slots_fetch_random(void)
 {
+	unsigned long random;
+	int i;
+
 	/* Handle case of no slots stored. */
 	if (slot_max == 0)
 		return 0;
 
-	return slots[get_random_long() % slot_max];
+	random = get_random_long() % slot_max;
+
+	for (i = 0; i < slot_area_index; i++) {
+		if (random >= slot_areas[i].num) {
+			random -= slot_areas[i].num;
+			continue;
+		}
+		return slot_areas[i].addr + random * CONFIG_PHYSICAL_ALIGN;
+	}
+
+	if (i == slot_area_index)
+		debug_putstr("Something wrong happened in slots_fetch_random()...\n");
+	return 0;
 }
 
 static void process_e820_entry(struct e820entry *entry,
 			       unsigned long minimum,
 			       unsigned long image_size)
 {
-	struct mem_vector region, img;
+	struct mem_vector region, out;
+	struct slot_area slot_area;
+	unsigned long min, start_orig;
 
 	/* Skip non-RAM entries. */
 	if (entry->type != E820_RAM)
 		return;
 
-	/* Ignore entries entirely above our maximum. */
-	if (entry->addr >= CONFIG_RANDOMIZE_BASE_MAX_OFFSET)
-		return;
-
 	/* Ignore entries entirely below our minimum. */
 	if (entry->addr + entry->size < minimum)
 		return;
@@ -364,10 +377,17 @@ static void process_e820_entry(struct e820entry *entry,
 	region.start = entry->addr;
 	region.size = entry->size;
 
+repeat:
+	start_orig = region.start;
+
 	/* Potentially raise address to minimum location. */
 	if (region.start < minimum)
 		region.start = minimum;
 
+	/* Return if slot area array is full */
+	if (slot_area_index == MAX_SLOT_AREA)
+		return;
+
 	/* Potentially raise address to meet alignment requirements. */
 	region.start = ALIGN(region.start, CONFIG_PHYSICAL_ALIGN);
 
@@ -376,20 +396,30 @@ static void process_e820_entry(struct e820entry *entry,
 		return;
 
 	/* Reduce size by any delta from the original address. */
-	region.size -= region.start - entry->addr;
+	region.size -= region.start - start_orig;
 
-	/* Reduce maximum size to fit end of image within maximum limit. */
-	if (region.start + region.size > CONFIG_RANDOMIZE_BASE_MAX_OFFSET)
-		region.size = CONFIG_RANDOMIZE_BASE_MAX_OFFSET - region.start;
+	/* Return if region can't contain decompressed kernel */
+	if (region.size < image_size)
+		return;
 
-	/* Walk each aligned slot and check for avoided areas. */
-	for (img.start = region.start, img.size = image_size ;
-	     mem_contains(&region, &img) ;
-	     img.start += CONFIG_PHYSICAL_ALIGN) {
-		if (mem_avoid_overlap(&img))
-			continue;
-		slots_append(img.start);
+	if (!mem_avoid_overlap(&region)) {
+		store_slot_info(&region, image_size);
+		return;
 	}
+
+	min = mem_min_overlap(&region, &out);
+
+	if (min > region.start + image_size) {
+		struct mem_vector tmp;
+
+		tmp.start = region.start;
+		tmp.size = min - region.start;
+		store_slot_info(&tmp, image_size);
+	}
+
+	region.size -= out.start - region.start + out.size;
+	region.start = out.start + out.size;
+	goto repeat;
 }
 
 static unsigned long find_random_phy_addr(unsigned long minimum,
@@ -404,6 +434,10 @@ static unsigned long find_random_phy_addr(unsigned long minimum,
 	/* Verify potential e820 positions, appending to slots list. */
 	for (i = 0; i < real_mode->e820_entries; i++) {
 		process_e820_entry(&real_mode->e820_map[i], minimum, size);
+		if (slot_area_index == MAX_SLOT_AREA) {
+			debug_putstr("Stop processing e820 since slot_areas is full...\n");
+			break;
+		}
 	}
 
 	return slots_fetch_random();
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v4 18/20] x86, kaslr: Remove useless codes
  2016-03-22  7:31 [PATCH v4 00/20] x86, boot: kaslr cleanup and 64bit kaslr support Baoquan He
                   ` (16 preceding siblings ...)
  2016-03-22  7:32 ` [PATCH v4 17/20] x86, kaslr: Add support of kernel physical address randomization above 4G Baoquan He
@ 2016-03-22  7:32 ` Baoquan He
  2016-03-22  7:32 ` [PATCH v4 19/20] x86, kaslr: Allow random address to be below loaded address Baoquan He
                   ` (3 subsequent siblings)
  21 siblings, 0 replies; 52+ messages in thread
From: Baoquan He @ 2016-03-22  7:32 UTC (permalink / raw)
  To: linux-kernel
  Cc: yinghai, keescook, hpa, mingo, bp, vgoyal, luto, lasse.collin,
	akpm, dyoung, Baoquan He

Several auxiliary functions and slots[] are not needed any more since
struct slot_area and new algorithm is used instead now. Hence remove
them in this patch.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 arch/x86/boot/compressed/aslr.c | 24 ------------------------
 1 file changed, 24 deletions(-)

diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index 21dad0a..ddfc3d0 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -112,17 +112,6 @@ struct mem_vector {
 #define MEM_AVOID_MAX 4
 static struct mem_vector mem_avoid[MEM_AVOID_MAX];
 
-static bool mem_contains(struct mem_vector *region, struct mem_vector *item)
-{
-	/* Item at least partially before region. */
-	if (item->start < region->start)
-		return false;
-	/* Item at least partially after region. */
-	if (item->start + item->size > region->start + region->size)
-		return false;
-	return true;
-}
-
 static bool mem_overlaps(struct mem_vector *one, struct mem_vector *two)
 {
 	/* Item one is entirely before item two. */
@@ -291,9 +280,6 @@ mem_min_overlap(struct mem_vector *img, struct mem_vector *out)
 	return min;
 }
 
-static unsigned long slots[CONFIG_RANDOMIZE_BASE_MAX_OFFSET /
-			   CONFIG_PHYSICAL_ALIGN];
-
 struct slot_area {
 	unsigned long addr;
 	int num;
@@ -324,16 +310,6 @@ static void store_slot_info(struct mem_vector *region, unsigned long image_size)
 	}
 }
 
-static void slots_append(unsigned long addr)
-{
-	/* Overflowing the slots list should be impossible. */
-	if (slot_max >= CONFIG_RANDOMIZE_BASE_MAX_OFFSET /
-			CONFIG_PHYSICAL_ALIGN)
-		return;
-
-	slots[slot_max++] = addr;
-}
-
 static unsigned long slots_fetch_random(void)
 {
 	unsigned long random;
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v4 19/20] x86, kaslr: Allow random address to be below loaded address
  2016-03-22  7:31 [PATCH v4 00/20] x86, boot: kaslr cleanup and 64bit kaslr support Baoquan He
                   ` (17 preceding siblings ...)
  2016-03-22  7:32 ` [PATCH v4 18/20] x86, kaslr: Remove useless codes Baoquan He
@ 2016-03-22  7:32 ` Baoquan He
  2016-03-22 19:54   ` Kees Cook
  2016-03-23  8:59   ` [PATCH v5 " Baoquan He
  2016-03-22  7:32 ` [PATCH v4 20/20] x86, kaslr: Use KERNEL_IMAGE_SIZE as the offset max for kernel virtual randomization Baoquan He
                   ` (2 subsequent siblings)
  21 siblings, 2 replies; 52+ messages in thread
From: Baoquan He @ 2016-03-22  7:32 UTC (permalink / raw)
  To: linux-kernel
  Cc: yinghai, keescook, hpa, mingo, bp, vgoyal, luto, lasse.collin,
	akpm, dyoung

From: Yinghai Lu <yinghai@kernel.org>

Now new randomized output can only be chosen from regions above loaded
address. In this case, for bootloaders like kexec which always loads
kernel near the end of ram, it doesn't do randomization at all. Or kernel
is loaded in a very big starting address, we should not give up that area
is loaded in a very large address, then the area below the large loaded
address will be given up. This is not reasonable.

With correct tracking in mem_avoid  we can allow random output below
loaded address. With this change, though kexec can get random ouput
below its loaded address of kernel.

Now we just pick 512M as min_addr. If kernel loaded address is bigger than
512M, E.g 8G. Then [512M, 8G) can be added into random output candidate area.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/boot/compressed/aslr.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index ddfc3d0..d072ca7 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -446,7 +446,8 @@ void choose_kernel_location(unsigned char *input,
 				unsigned long output_size,
 				unsigned char **virt_offset)
 {
-	unsigned long random;
+	unsigned long random, min_addr;
+
 	*virt_offset = (unsigned char *)LOAD_PHYSICAL_ADDR;
 
 #ifdef CONFIG_HIBERNATION
@@ -467,8 +468,13 @@ void choose_kernel_location(unsigned char *input,
 	mem_avoid_init((unsigned long)input, input_size,
 		       (unsigned long)*output);
 
+	/* start from 512M */
+	min_addr = (unsigned long)*output;
+	if (min_addr > (512UL<<20))
+		min_addr = 512UL<<20;
+
 	/* Walk e820 and find a random address. */
-	random = find_random_phy_addr((unsigned long)*output, output_size);
+	random = find_random_phy_addr(min_addr, output_size);
 	if (!random)
 		debug_putstr("KASLR could not find suitable E820 region...\n");
 	else {
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v4 20/20] x86, kaslr: Use KERNEL_IMAGE_SIZE as the offset max for kernel virtual randomization
  2016-03-22  7:31 [PATCH v4 00/20] x86, boot: kaslr cleanup and 64bit kaslr support Baoquan He
                   ` (18 preceding siblings ...)
  2016-03-22  7:32 ` [PATCH v4 19/20] x86, kaslr: Allow random address to be below loaded address Baoquan He
@ 2016-03-22  7:32 ` Baoquan He
  2016-03-22 20:46   ` Kees Cook
  2016-03-22 20:25 ` [PATCH v4 00/20] x86, boot: kaslr cleanup and 64bit kaslr support Kees Cook
  2016-04-05  1:56 ` Baoquan He
  21 siblings, 1 reply; 52+ messages in thread
From: Baoquan He @ 2016-03-22  7:32 UTC (permalink / raw)
  To: linux-kernel
  Cc: yinghai, keescook, hpa, mingo, bp, vgoyal, luto, lasse.collin,
	akpm, dyoung, Baoquan He

The old code uses CONFIG_RANDOM_OFFSET_MAX to get the offset max for kernel
virtual randomization, and CONFIG_RANDOM_OFFSET_MAX is a configurable value
within the scope of [512M, 1G] on x86_64. Currently CONFIG_RANDOM_OFFSET_MAX
always defaults to 1G, and seems no obvious benefit to make it configurable.
So Kees suggested we should set KERNEL_IMAGE_SIZE 1G if RANDOMIZE_BASE is
on, and use KERNEL_IMAGE_SIZE as offset max.

In this patch just do as Kees suggested. And with this change
CONFIG_RANDOM_OFFSET_MAX is not needed any more, so clean it up now.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
v3->v4:
    Added in v4.

 arch/x86/Kconfig                     | 57 +++++++++++++-----------------------
 arch/x86/boot/compressed/aslr.c      |  7 ++---
 arch/x86/include/asm/page_64_types.h |  5 ++--
 arch/x86/mm/init_32.c                |  3 --
 4 files changed, 26 insertions(+), 46 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index b105105..fbe0bb0 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1908,51 +1908,36 @@ config RANDOMIZE_BASE
 	depends on RELOCATABLE
 	default n
 	---help---
-	   Randomizes the physical and virtual address at which the
-	   kernel image is decompressed, as a security feature that
-	   deters exploit attempts relying on knowledge of the location
-	   of kernel internals.
+	   Randomizes the physical address at which the kernel image
+	   is decompressed and the virtual address where the kernel
+	   image is mapped, as a secrurity feature that deters exploit
+	   attempts relying on knowledge of the location of kernel
+	   internals.
+
+	   The kernel physical address can be randomized from 16M to
+	   64T at most. The kernel virtual address will be offset
+	   by up to KERNEL_IMAGE_SIZE. On 32-bit KERNEL_IMAGE_SIZE is
+	   512MiB. while on 64-bit this is limited by how the kernel
+	   fixmap page table is positioned, so this cannot be larger
+	   than 1GiB currently. Without RANDOMIZE_BASE there is a 512MiB
+	   to 1.5GiB split between kernel and modules. When RANDOMIZE_BASE
+	   is enabled, the modules area will shrink to compensate, up
+	   to a 1GiB to 1GiB split, KERNEL_IMAGE_SIZE changes from 512MiB
+	   to 1GiB.
 
 	   Entropy is generated using the RDRAND instruction if it is
 	   supported. If RDTSC is supported, it is used as well. If
 	   neither RDRAND nor RDTSC are supported, then randomness is
 	   read from the i8254 timer.
 
-	   The kernel will be offset by up to RANDOMIZE_BASE_MAX_OFFSET,
-	   and aligned according to PHYSICAL_ALIGN. Since the kernel is
-	   built using 2GiB addressing, and PHYSICAL_ALGIN must be at a
-	   minimum of 2MiB, only 10 bits of entropy is theoretically
-	   possible. At best, due to page table layouts, 64-bit can use
-	   9 bits of entropy and 32-bit uses 8 bits.
+	   Since the kernel is built using 2GiB addressing, and
+	   PHYSICAL_ALGIN must be at a minimum of 2MiB, only 10 bits of
+	   entropy is theoretically possible. At best, due to page table
+	   layouts, 64-bit can use 9 bits of entropy and 32-bit uses 8
+	   bits.
 
 	   If unsure, say N.
 
-config RANDOMIZE_BASE_MAX_OFFSET
-	hex "Maximum kASLR offset allowed" if EXPERT
-	depends on RANDOMIZE_BASE
-	range 0x0 0x20000000 if X86_32
-	default "0x20000000" if X86_32
-	range 0x0 0x40000000 if X86_64
-	default "0x40000000" if X86_64
-	---help---
-	  The lesser of RANDOMIZE_BASE_MAX_OFFSET and available physical
-	  memory is used to determine the maximal offset in bytes that will
-	  be applied to the kernel when kernel Address Space Layout
-	  Randomization (kASLR) is active. This must be a multiple of
-	  PHYSICAL_ALIGN.
-
-	  On 32-bit this is limited to 512MiB by page table layouts. The
-	  default is 512MiB.
-
-	  On 64-bit this is limited by how the kernel fixmap page table is
-	  positioned, so this cannot be larger than 1GiB currently. Without
-	  RANDOMIZE_BASE, there is a 512MiB to 1.5GiB split between kernel
-	  and modules. When RANDOMIZE_BASE_MAX_OFFSET is above 512MiB, the
-	  modules area will shrink to compensate, up to the current maximum
-	  1GiB to 1GiB split. The default is 1GiB.
-
-	  If unsure, leave at the default value.
-
 # Relocation on x86 needs some additional build support
 config X86_NEED_RELOCS
 	def_bool y
diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index d072ca7..737643c 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -428,11 +428,10 @@ static unsigned long find_random_virt_offset(unsigned long minimum,
 	minimum = ALIGN(minimum, CONFIG_PHYSICAL_ALIGN);
 
 	if (image_size <= CONFIG_PHYSICAL_ALIGN)
-		slot_num = (CONFIG_RANDOMIZE_BASE_MAX_OFFSET - minimum) /
+		slot_num = (KERNEL_IMAGE_SIZE - minimum) /
 				CONFIG_PHYSICAL_ALIGN;
 	else
-		slot_num = (CONFIG_RANDOMIZE_BASE_MAX_OFFSET -
-				minimum - image_size) /
+		slot_num = (KERNEL_IMAGE_SIZE - minimum - image_size) /
 				CONFIG_PHYSICAL_ALIGN + 1;
 
 	random = get_random_long() % slot_num;
@@ -487,7 +486,7 @@ void choose_kernel_location(unsigned char *input,
 
 	/*
 	 * Get a random address between LOAD_PHYSICAL_ADDR and
-	 * CONFIG_RANDOMIZE_BASE_MAX_OFFSET
+	 * KERNEL_IMAGE_SIZE
 	 */
 	random = find_random_virt_offset(LOAD_PHYSICAL_ADDR, output_size);
 	*virt_offset = (unsigned char *)random;
diff --git a/arch/x86/include/asm/page_64_types.h b/arch/x86/include/asm/page_64_types.h
index 4928cf0..8775bec 100644
--- a/arch/x86/include/asm/page_64_types.h
+++ b/arch/x86/include/asm/page_64_types.h
@@ -48,9 +48,8 @@
  * kernel page table mapping, reducing the size of the modules area.
  */
 #define KERNEL_IMAGE_SIZE_DEFAULT      (512 * 1024 * 1024)
-#if defined(CONFIG_RANDOMIZE_BASE) && \
-	CONFIG_RANDOMIZE_BASE_MAX_OFFSET > KERNEL_IMAGE_SIZE_DEFAULT
-#define KERNEL_IMAGE_SIZE   CONFIG_RANDOMIZE_BASE_MAX_OFFSET
+#if defined(CONFIG_RANDOMIZE_BASE)
+#define KERNEL_IMAGE_SIZE   (1024 * 1024 * 1024)
 #else
 #define KERNEL_IMAGE_SIZE      KERNEL_IMAGE_SIZE_DEFAULT
 #endif
diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index 2ebfbaf..c5ae958 100644
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -807,9 +807,6 @@ void __init mem_init(void)
 	BUILD_BUG_ON(VMALLOC_START			>= VMALLOC_END);
 #undef high_memory
 #undef __FIXADDR_TOP
-#ifdef CONFIG_RANDOMIZE_BASE
-	BUILD_BUG_ON(CONFIG_RANDOMIZE_BASE_MAX_OFFSET > KERNEL_IMAGE_SIZE);
-#endif
 
 #ifdef CONFIG_HIGHMEM
 	BUG_ON(PKMAP_BASE + LAST_PKMAP*PAGE_SIZE	> FIXADDR_START);
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PATCH v4 19/20] x86, kaslr: Allow random address to be below loaded address
  2016-03-22  7:32 ` [PATCH v4 19/20] x86, kaslr: Allow random address to be below loaded address Baoquan He
@ 2016-03-22 19:54   ` Kees Cook
  2016-03-23  1:41     ` Baoquan He
  2016-03-23  8:59   ` [PATCH v5 " Baoquan He
  1 sibling, 1 reply; 52+ messages in thread
From: Kees Cook @ 2016-03-22 19:54 UTC (permalink / raw)
  To: Baoquan He
  Cc: LKML, Yinghai Lu, H. Peter Anvin, Ingo Molnar, Borislav Petkov,
	Vivek Goyal, Andy Lutomirski, lasse.collin, Andrew Morton,
	Dave Young

On Tue, Mar 22, 2016 at 12:32 AM, Baoquan He <bhe@redhat.com> wrote:
> From: Yinghai Lu <yinghai@kernel.org>
>
> Now new randomized output can only be chosen from regions above loaded
> address. In this case, for bootloaders like kexec which always loads
> kernel near the end of ram, it doesn't do randomization at all. Or kernel
> is loaded in a very big starting address, we should not give up that area
> is loaded in a very large address, then the area below the large loaded
> address will be given up. This is not reasonable.
>
> With correct tracking in mem_avoid  we can allow random output below
> loaded address. With this change, though kexec can get random ouput
> below its loaded address of kernel.
>
> Now we just pick 512M as min_addr. If kernel loaded address is bigger than
> 512M, E.g 8G. Then [512M, 8G) can be added into random output candidate area.
>
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> ---
>  arch/x86/boot/compressed/aslr.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
> index ddfc3d0..d072ca7 100644
> --- a/arch/x86/boot/compressed/aslr.c
> +++ b/arch/x86/boot/compressed/aslr.c
> @@ -446,7 +446,8 @@ void choose_kernel_location(unsigned char *input,
>                                 unsigned long output_size,
>                                 unsigned char **virt_offset)
>  {
> -       unsigned long random;
> +       unsigned long random, min_addr;
> +
>         *virt_offset = (unsigned char *)LOAD_PHYSICAL_ADDR;
>
>  #ifdef CONFIG_HIBERNATION
> @@ -467,8 +468,13 @@ void choose_kernel_location(unsigned char *input,
>         mem_avoid_init((unsigned long)input, input_size,
>                        (unsigned long)*output);
>
> +       /* start from 512M */
> +       min_addr = (unsigned long)*output;
> +       if (min_addr > (512UL<<20))
> +               min_addr = 512UL<<20;

The goal is to find a minimum address? I'm not sure this comment makes
sense. Shouldn't this be:

    /* Lower minimum to 512M. */
   min_addr = min_t(unsigned long, *output, 512UL << 20);

Or something like that?

> +
>         /* Walk e820 and find a random address. */
> -       random = find_random_phy_addr((unsigned long)*output, output_size);
> +       random = find_random_phy_addr(min_addr, output_size);
>         if (!random)
>                 debug_putstr("KASLR could not find suitable E820 region...\n");
>         else {
> --
> 2.5.0
>

-Kees

-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v4 00/20] x86, boot: kaslr cleanup and 64bit kaslr support
  2016-03-22  7:31 [PATCH v4 00/20] x86, boot: kaslr cleanup and 64bit kaslr support Baoquan He
                   ` (19 preceding siblings ...)
  2016-03-22  7:32 ` [PATCH v4 20/20] x86, kaslr: Use KERNEL_IMAGE_SIZE as the offset max for kernel virtual randomization Baoquan He
@ 2016-03-22 20:25 ` Kees Cook
  2016-03-23 22:40   ` Kees Cook
  2016-04-05  1:56 ` Baoquan He
  21 siblings, 1 reply; 52+ messages in thread
From: Kees Cook @ 2016-03-22 20:25 UTC (permalink / raw)
  To: Baoquan He, Ingo Molnar
  Cc: LKML, Yinghai Lu, H. Peter Anvin, Borislav Petkov, Vivek Goyal,
	Andy Lutomirski, lasse.collin, Andrew Morton, Dave Young

On Tue, Mar 22, 2016 at 12:31 AM, Baoquan He <bhe@redhat.com> wrote:
> ***Background:
> Previously a bug is reported that kdump didn't work when kaslr is enabled. During
> discussing that bug fix, we found current kaslr has a limilation that it can
> only randomize in 1GB region.
>
> This is because in curent kaslr implementaion only physical address of kernel
> loading is randomized. Then calculate the delta of physical address where
> vmlinux was linked to load and where it is finally loaded. If delta is not
> equal to 0, namely there's a new physical address where kernel is actually
> decompressed, relocation handling need be done. Then delta is added to offset
> of kernel symbol relocation, this makes the address of kernel text mapping move
> delta long. Though in principle kernel can be randomized to any physical address,
> kernel text mapping address space is limited and only 1G, namely as follows on
> x86_64:
>         [0xffffffff80000000, 0xffffffffc0000000)
>
> In one word, kernel text physical address and virtual address randomization is
> coupled. This causes the limitation.
>
> Then hpa and Vivek suggested we should change this. To decouple the physical
> address and virtual address randomization of kernel text and let them work
> separately. Then kernel text physical address can be randomized in region
> [16M, 64T), and kernel text virtual address can be randomized in region
> [0xffffffff80000000, 0xffffffffc0000000).
>
> ***Problems we need solved:
>   - For kernel boot from startup_32 case, only 0~4G identity mapping is built.
>     If kernel will be randomly put anywhere from 16M to 64T at most, the price
>     to build all region of identity mapping is too high. We need build the
>     identity mapping on demand, not covering all physical address space.
>
>   - Decouple the physical address and virtual address randomization of kernel
>     text and let them work separately.
>
> ***Parts:
>    - The 1st part is Yinghai's identity mapping building on demand patches.
>      This is used to solve the first problem mentioned above.
>      (Patch 09-10/19)
>    - The 2nd part is decoupling the physical address and virtual address
>      randomization of kernel text and letting them work separately patches
>      based on Yinghai's ident mapping patches.
>      (Patch 12-19/19)
>    - The 3rd part is some clean up patches which Yinghai found when he reviewed
>      my patches and the related code around.
>      (Patch 01-08/19)
>
> ***Patch status:
> This patchset went through several rounds of review.
>
>     v1:
>     - The first round can be found here:
>         https://lwn.net/Articles/637115/
>
>     v1->v2:
>     - In 2nd round Yinghai made a big patchset including this kaslr fix and another
>       setup_data related fix. The link is here:
>        http://lists-archives.com/linux-kernel/28346903-x86-updated-patches-for-kaslr-and-setup_data-etc-for-v4-3.html
>       You can get the code from Yinghai's git branch:
>       git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git for-x86-v4.3-next
>
>     v2->v3:
>     - It only takes care of the kaslr related patches.
>       For reviewers it's better to discuss only one issue in one thread.
>         * I take off one patch as follows from Yinghai's because I think it's unnecessay.
>            - Patch 05/19 x86, kaslr: rename output_size to output_run_size
>              output_size is enough to represen the value:
>                 output_len > run_size ? output_len : run_size
>
>         * I add Patch 04/19, it's a comment update patch. For other patches, I just
>           adjust patch log and do several places of change comparing with 2nd round.
>           Please check the change log under patch log of each patch for details.
>
>         * Adjust sequence of several patches to make review easier. It doesn't
>           affect codes.
>
>     v3->v4:
>     - Made changes according to Kees's comments.
>       Add one patch 20/20 as Kees suggested to use KERNEL_IMAGE_SIZE as offset
>       max of virtual random, meanwhile clean up useless CONFIG_RANDOM_OFFSET_MAX
>
>         x86, kaslr: Use KERNEL_IMAGE_SIZE as the offset max for kernel virtual randomization

This series is looking good to me. I'm running tests under qemu now,
and things appear to work as advertised. :) I'll report back once I've
booted a few hundred times.

Ingo, what do you think of getting this into the x86 tree for some
testing in -next? For stuff I haven't already Acked, consider the
whole series as:

Reviewed-by: Kees Cook <keescook@chromium.org>

-Kees

>
> You can also get this patchset from my github:
>    https://github.com/baoquan-he/linux.git kaslr-above-4G
>
> Any comments about code changes, code comments, patch logs are welcome and
> appreciated.
>
> Baoquan He (9):
>   x86, kaslr: Fix a bug that relocation can not be handled when kernel
>     is loaded above 2G
>   x86, kaskr: Update the description for decompressor worst case
>   x86, kaslr: Introduce struct slot_area to manage randomization slot
>     info
>   x86, kaslr: Add two functions which will be used later
>   x86, kaslr: Introduce fetch_random_virt_offset to randomize the kernel
>     text mapping address
>   x86, kaslr: Randomize physical and virtual address of kernel
>     separately
>   x86, kaslr: Add support of kernel physical address randomization above
>     4G
>   x86, kaslr: Remove useless codes
>   x86, kaslr: Use KERNEL_IMAGE_SIZE as the offset max for kernel virtual
>     randomization
>
> Yinghai Lu (11):
>   x86, kaslr: Remove not needed parameter for choose_kernel_location
>   x86, boot: Move compressed kernel to end of buffer before
>     decompressing
>   x86, boot: Move z_extract_offset calculation to header.S
>   x86, boot: Fix run_size calculation
>   x86, kaslr: Clean up useless code related to run_size.
>   x86, kaslr: Get correct max_addr for relocs pointer
>   x86, kaslr: Consolidate mem_avoid array filling
>   x86, boot: Split kernel_ident_mapping_init to another file
>   x86, 64bit: Set ident_mapping for kaslr
>   x86, boot: Add checking for memcpy
>   x86, kaslr: Allow random address to be below loaded address
>
>  arch/x86/Kconfig                       |  57 +++----
>  arch/x86/boot/Makefile                 |  13 +-
>  arch/x86/boot/compressed/Makefile      |  19 ++-
>  arch/x86/boot/compressed/aslr.c        | 300 +++++++++++++++++++++++++--------
>  arch/x86/boot/compressed/head_32.S     |  14 +-
>  arch/x86/boot/compressed/head_64.S     |  15 +-
>  arch/x86/boot/compressed/misc.c        |  89 +++++-----
>  arch/x86/boot/compressed/misc.h        |  34 ++--
>  arch/x86/boot/compressed/misc_pgt.c    |  93 ++++++++++
>  arch/x86/boot/compressed/mkpiggy.c     |  28 +--
>  arch/x86/boot/compressed/string.c      |  29 +++-
>  arch/x86/boot/compressed/vmlinux.lds.S |   1 +
>  arch/x86/boot/header.S                 |  22 ++-
>  arch/x86/include/asm/boot.h            |  19 +++
>  arch/x86/include/asm/page.h            |   5 +
>  arch/x86/include/asm/page_64_types.h   |   5 +-
>  arch/x86/kernel/asm-offsets.c          |   1 +
>  arch/x86/kernel/vmlinux.lds.S          |   1 +
>  arch/x86/mm/ident_map.c                |  74 ++++++++
>  arch/x86/mm/init_32.c                  |   3 -
>  arch/x86/mm/init_64.c                  |  74 +-------
>  arch/x86/tools/calc_run_size.sh        |  42 -----
>  22 files changed, 605 insertions(+), 333 deletions(-)
>  create mode 100644 arch/x86/boot/compressed/misc_pgt.c
>  create mode 100644 arch/x86/mm/ident_map.c
>  delete mode 100644 arch/x86/tools/calc_run_size.sh
>
> --
> 2.5.0
>



-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v4 20/20] x86, kaslr: Use KERNEL_IMAGE_SIZE as the offset max for kernel virtual randomization
  2016-03-22  7:32 ` [PATCH v4 20/20] x86, kaslr: Use KERNEL_IMAGE_SIZE as the offset max for kernel virtual randomization Baoquan He
@ 2016-03-22 20:46   ` Kees Cook
  0 siblings, 0 replies; 52+ messages in thread
From: Kees Cook @ 2016-03-22 20:46 UTC (permalink / raw)
  To: Baoquan He
  Cc: LKML, Yinghai Lu, H. Peter Anvin, Ingo Molnar, Borislav Petkov,
	Vivek Goyal, Andy Lutomirski, lasse.collin, Andrew Morton,
	Dave Young

On Tue, Mar 22, 2016 at 12:32 AM, Baoquan He <bhe@redhat.com> wrote:
> The old code uses CONFIG_RANDOM_OFFSET_MAX to get the offset max for kernel
> virtual randomization, and CONFIG_RANDOM_OFFSET_MAX is a configurable value
> within the scope of [512M, 1G] on x86_64. Currently CONFIG_RANDOM_OFFSET_MAX
> always defaults to 1G, and seems no obvious benefit to make it configurable.
> So Kees suggested we should set KERNEL_IMAGE_SIZE 1G if RANDOMIZE_BASE is
> on, and use KERNEL_IMAGE_SIZE as offset max.
>
> In this patch just do as Kees suggested. And with this change
> CONFIG_RANDOM_OFFSET_MAX is not needed any more, so clean it up now.
>
> Signed-off-by: Baoquan He <bhe@redhat.com>

Acked-by: Kees Cook <keescook@chromium.org>

-Kees

> ---
> v3->v4:
>     Added in v4.
>
>  arch/x86/Kconfig                     | 57 +++++++++++++-----------------------
>  arch/x86/boot/compressed/aslr.c      |  7 ++---
>  arch/x86/include/asm/page_64_types.h |  5 ++--
>  arch/x86/mm/init_32.c                |  3 --
>  4 files changed, 26 insertions(+), 46 deletions(-)
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index b105105..fbe0bb0 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -1908,51 +1908,36 @@ config RANDOMIZE_BASE
>         depends on RELOCATABLE
>         default n
>         ---help---
> -          Randomizes the physical and virtual address at which the
> -          kernel image is decompressed, as a security feature that
> -          deters exploit attempts relying on knowledge of the location
> -          of kernel internals.
> +          Randomizes the physical address at which the kernel image
> +          is decompressed and the virtual address where the kernel
> +          image is mapped, as a secrurity feature that deters exploit
> +          attempts relying on knowledge of the location of kernel
> +          internals.
> +
> +          The kernel physical address can be randomized from 16M to
> +          64T at most. The kernel virtual address will be offset
> +          by up to KERNEL_IMAGE_SIZE. On 32-bit KERNEL_IMAGE_SIZE is
> +          512MiB. while on 64-bit this is limited by how the kernel
> +          fixmap page table is positioned, so this cannot be larger
> +          than 1GiB currently. Without RANDOMIZE_BASE there is a 512MiB
> +          to 1.5GiB split between kernel and modules. When RANDOMIZE_BASE
> +          is enabled, the modules area will shrink to compensate, up
> +          to a 1GiB to 1GiB split, KERNEL_IMAGE_SIZE changes from 512MiB
> +          to 1GiB.
>
>            Entropy is generated using the RDRAND instruction if it is
>            supported. If RDTSC is supported, it is used as well. If
>            neither RDRAND nor RDTSC are supported, then randomness is
>            read from the i8254 timer.
>
> -          The kernel will be offset by up to RANDOMIZE_BASE_MAX_OFFSET,
> -          and aligned according to PHYSICAL_ALIGN. Since the kernel is
> -          built using 2GiB addressing, and PHYSICAL_ALGIN must be at a
> -          minimum of 2MiB, only 10 bits of entropy is theoretically
> -          possible. At best, due to page table layouts, 64-bit can use
> -          9 bits of entropy and 32-bit uses 8 bits.
> +          Since the kernel is built using 2GiB addressing, and
> +          PHYSICAL_ALGIN must be at a minimum of 2MiB, only 10 bits of
> +          entropy is theoretically possible. At best, due to page table
> +          layouts, 64-bit can use 9 bits of entropy and 32-bit uses 8
> +          bits.
>
>            If unsure, say N.
>
> -config RANDOMIZE_BASE_MAX_OFFSET
> -       hex "Maximum kASLR offset allowed" if EXPERT
> -       depends on RANDOMIZE_BASE
> -       range 0x0 0x20000000 if X86_32
> -       default "0x20000000" if X86_32
> -       range 0x0 0x40000000 if X86_64
> -       default "0x40000000" if X86_64
> -       ---help---
> -         The lesser of RANDOMIZE_BASE_MAX_OFFSET and available physical
> -         memory is used to determine the maximal offset in bytes that will
> -         be applied to the kernel when kernel Address Space Layout
> -         Randomization (kASLR) is active. This must be a multiple of
> -         PHYSICAL_ALIGN.
> -
> -         On 32-bit this is limited to 512MiB by page table layouts. The
> -         default is 512MiB.
> -
> -         On 64-bit this is limited by how the kernel fixmap page table is
> -         positioned, so this cannot be larger than 1GiB currently. Without
> -         RANDOMIZE_BASE, there is a 512MiB to 1.5GiB split between kernel
> -         and modules. When RANDOMIZE_BASE_MAX_OFFSET is above 512MiB, the
> -         modules area will shrink to compensate, up to the current maximum
> -         1GiB to 1GiB split. The default is 1GiB.
> -
> -         If unsure, leave at the default value.
> -
>  # Relocation on x86 needs some additional build support
>  config X86_NEED_RELOCS
>         def_bool y
> diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
> index d072ca7..737643c 100644
> --- a/arch/x86/boot/compressed/aslr.c
> +++ b/arch/x86/boot/compressed/aslr.c
> @@ -428,11 +428,10 @@ static unsigned long find_random_virt_offset(unsigned long minimum,
>         minimum = ALIGN(minimum, CONFIG_PHYSICAL_ALIGN);
>
>         if (image_size <= CONFIG_PHYSICAL_ALIGN)
> -               slot_num = (CONFIG_RANDOMIZE_BASE_MAX_OFFSET - minimum) /
> +               slot_num = (KERNEL_IMAGE_SIZE - minimum) /
>                                 CONFIG_PHYSICAL_ALIGN;
>         else
> -               slot_num = (CONFIG_RANDOMIZE_BASE_MAX_OFFSET -
> -                               minimum - image_size) /
> +               slot_num = (KERNEL_IMAGE_SIZE - minimum - image_size) /
>                                 CONFIG_PHYSICAL_ALIGN + 1;
>
>         random = get_random_long() % slot_num;
> @@ -487,7 +486,7 @@ void choose_kernel_location(unsigned char *input,
>
>         /*
>          * Get a random address between LOAD_PHYSICAL_ADDR and
> -        * CONFIG_RANDOMIZE_BASE_MAX_OFFSET
> +        * KERNEL_IMAGE_SIZE
>          */
>         random = find_random_virt_offset(LOAD_PHYSICAL_ADDR, output_size);
>         *virt_offset = (unsigned char *)random;
> diff --git a/arch/x86/include/asm/page_64_types.h b/arch/x86/include/asm/page_64_types.h
> index 4928cf0..8775bec 100644
> --- a/arch/x86/include/asm/page_64_types.h
> +++ b/arch/x86/include/asm/page_64_types.h
> @@ -48,9 +48,8 @@
>   * kernel page table mapping, reducing the size of the modules area.
>   */
>  #define KERNEL_IMAGE_SIZE_DEFAULT      (512 * 1024 * 1024)
> -#if defined(CONFIG_RANDOMIZE_BASE) && \
> -       CONFIG_RANDOMIZE_BASE_MAX_OFFSET > KERNEL_IMAGE_SIZE_DEFAULT
> -#define KERNEL_IMAGE_SIZE   CONFIG_RANDOMIZE_BASE_MAX_OFFSET
> +#if defined(CONFIG_RANDOMIZE_BASE)
> +#define KERNEL_IMAGE_SIZE   (1024 * 1024 * 1024)
>  #else
>  #define KERNEL_IMAGE_SIZE      KERNEL_IMAGE_SIZE_DEFAULT
>  #endif
> diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
> index 2ebfbaf..c5ae958 100644
> --- a/arch/x86/mm/init_32.c
> +++ b/arch/x86/mm/init_32.c
> @@ -807,9 +807,6 @@ void __init mem_init(void)
>         BUILD_BUG_ON(VMALLOC_START                      >= VMALLOC_END);
>  #undef high_memory
>  #undef __FIXADDR_TOP
> -#ifdef CONFIG_RANDOMIZE_BASE
> -       BUILD_BUG_ON(CONFIG_RANDOMIZE_BASE_MAX_OFFSET > KERNEL_IMAGE_SIZE);
> -#endif
>
>  #ifdef CONFIG_HIGHMEM
>         BUG_ON(PKMAP_BASE + LAST_PKMAP*PAGE_SIZE        > FIXADDR_START);
> --
> 2.5.0
>



-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v4 06/20] x86, boot: Fix run_size calculation
  2016-03-22  7:32 ` [PATCH v4 06/20] x86, boot: Fix run_size calculation Baoquan He
@ 2016-03-22 20:51   ` Kees Cook
  0 siblings, 0 replies; 52+ messages in thread
From: Kees Cook @ 2016-03-22 20:51 UTC (permalink / raw)
  To: Baoquan He
  Cc: LKML, Yinghai Lu, H. Peter Anvin, Ingo Molnar, Borislav Petkov,
	Vivek Goyal, Andy Lutomirski, lasse.collin, Andrew Morton,
	Dave Young, Junjie Mao, Josh Triplett

On Tue, Mar 22, 2016 at 12:32 AM, Baoquan He <bhe@redhat.com> wrote:
> From: Yinghai Lu <yinghai@kernel.org>
>
> Firstly, current run_size is calculated via shell script
> arch/x86/tools/calc_run_size.sh. It gets file offset and mem size of section
> .bss and .brk in vmlinux, then add them as follows:
>
> run_size=$(( $offsetA + $sizeA + $sizeB ))
>
> However this is completely wrong. The offset is the starting address of
> section or segment in elf file. Below is a vmlinux I compiled:
>
> [bhe@x1 linux]$ objdump -h vmlinux
>
> vmlinux:     file format elf64-x86-64
>
> Sections:
> Idx Name          Size      VMA               LMA               File off  Algn
>  27 .bss          00170000  ffffffff81ec8000  0000000001ec8000  012c8000  2**12
>                   ALLOC
>  28 .brk          00027000  ffffffff82038000  0000000002038000  012c8000  2**0
>                   ALLOC
>
> Here we can get run_size is 0x145f000.
> 0x012c8000 + 0x00170000 + 0x00027000 = 0x145f000
>
> [bhe@x1 linux]$ readelf -l vmlinux
>
> Elf file type is EXEC (Executable file)
> Entry point 0x1000000
> There are 5 program headers, starting at offset 64
>
> Program Headers:
>   Type           Offset             VirtAddr           PhysAddr
>                  FileSiz            MemSiz              Flags  Align
>   LOAD           0x0000000000200000 0xffffffff81000000 0x0000000001000000
>                  0x0000000000b5e000 0x0000000000b5e000  R E    200000
>   LOAD           0x0000000000e00000 0xffffffff81c00000 0x0000000001c00000
>                  0x0000000000145000 0x0000000000145000  RW     200000
>   LOAD           0x0000000001000000 0x0000000000000000 0x0000000001d45000
>                  0x0000000000018158 0x0000000000018158  RW     200000
>   LOAD           0x000000000115e000 0xffffffff81d5e000 0x0000000001d5e000
>                  0x000000000016a000 0x0000000000301000  RWE    200000
>   NOTE           0x000000000099bcac 0xffffffff8179bcac 0x000000000179bcac
>                  0x00000000000001bc 0x00000000000001bc         4
>
>  Section to Segment mapping:
>   Segment Sections...
>    00     .text .notes __ex_table .rodata __bug_table .pci_fixup .tracedata __ksymtab __ksymtab_gpl __ksymtab_strings __init_rodata __param __modver
>    01     .data .vvar
>    02     .data..percpu
>    03     .init.text .init.data .x86_cpu_dev.init .parainstructions .altinstructions .altinstr_replacement .iommu_table .apicdrivers .exit.text .smp_locks .bss .brk
>    04     .notes
>
> Here we can get the same value as current run_size if we add p_offset
> and p_memsz.
> 0x000000000115e000 + 0x0000000000301000 = 0x145f000
>
> But is it right? Obviously not. We should calculate it using the last LOAD
> program segment like this:
> run_size = phdr->p_paddr + phdr->p_memsz - physical load addr of kernel
> run_size = 0x0000000001d5e000 + 0x0000000000301000 - 0x0000000001000000 = 0x105f000
>
> It's equal to VO_end-VO_text and certainly it's simpler to do.
> _end: 0xffffffff8205f000
> _text:0xffffffff81000000
> run_size = 0xffffffff8205f000 - 0xffffffff81000000 = 0x105f000
>
> Secondly run_size is a simple constant, we don't need to pass it around
> and we already have voffset.h for that. We can share voffset.h between
> misc.c and header.S instead of getting run_size in other ways. Hence in
> this patch, we move voffset.h creation code to boot/compressed/Makefile.
>
> Dependence was:
> boot/header.S ==> boot/voffset.h ==> vmlinux
> boot/header.S ==> compressed/vmlinux ==> compressed/misc.c
> Now become:
> boot/header.S ==> compressed/vmlinux ==> compressed/misc.c ==> boot/voffset.h ==> vmlinux
>
> Use macro in misc.c to replace passed run_size.
>
> Fixes: e6023367d779 ("x86, kaslr: Prevent .bss from overlaping initrd")
> Cc: Junjie Mao <eternal.n08@gmail.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Josh Triplett <josh@joshtriplett.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> Signed-off-by: Baoquan He <bhe@redhat.com>

Thanks for spending the time to convince me on this. :)

Acked-by: Kees Cook <keescook@chromium.org>

-Kees

> ---
> v2->v3:
>   Adjust the patch log.
> v3->v4:
>   Correct calculation errors in patch log that Kees pointed out.
>
>  arch/x86/boot/Makefile            | 11 +----------
>  arch/x86/boot/compressed/Makefile | 12 ++++++++++++
>  arch/x86/boot/compressed/misc.c   |  3 +++
>  3 files changed, 16 insertions(+), 10 deletions(-)
>
> diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
> index bd021e5..40f0ae2 100644
> --- a/arch/x86/boot/Makefile
> +++ b/arch/x86/boot/Makefile
> @@ -78,15 +78,6 @@ $(obj)/vmlinux.bin: $(obj)/compressed/vmlinux FORCE
>
>  SETUP_OBJS = $(addprefix $(obj)/,$(setup-y))
>
> -sed-voffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(_text\|_end\)$$/\#define VO_\2 0x\1/p'
> -
> -quiet_cmd_voffset = VOFFSET $@
> -      cmd_voffset = $(NM) $< | sed -n $(sed-voffset) > $@
> -
> -targets += voffset.h
> -$(obj)/voffset.h: vmlinux FORCE
> -       $(call if_changed,voffset)
> -
>  sed-zoffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(startup_32\|startup_64\|efi32_stub_entry\|efi64_stub_entry\|efi_pe_entry\|input_data\|_end\|_ehead\|_text\|z_.*\)$$/\#define ZO_\2 0x\1/p'
>
>  quiet_cmd_zoffset = ZOFFSET $@
> @@ -98,7 +89,7 @@ $(obj)/zoffset.h: $(obj)/compressed/vmlinux FORCE
>
>
>  AFLAGS_header.o += -I$(obj)
> -$(obj)/header.o: $(obj)/voffset.h $(obj)/zoffset.h
> +$(obj)/header.o: $(obj)/zoffset.h
>
>  LDFLAGS_setup.elf      := -T
>  $(obj)/setup.elf: $(src)/setup.ld $(SETUP_OBJS) FORCE
> diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
> index f9ce75d..d9dedd9 100644
> --- a/arch/x86/boot/compressed/Makefile
> +++ b/arch/x86/boot/compressed/Makefile
> @@ -41,6 +41,18 @@ LDFLAGS_vmlinux := -T
>  hostprogs-y    := mkpiggy
>  HOST_EXTRACFLAGS += -I$(srctree)/tools/include
>
> +sed-voffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(_text\|_end\)$$/\#define VO_\2 _AC(0x\1,UL)/p'
> +
> +quiet_cmd_voffset = VOFFSET $@
> +      cmd_voffset = $(NM) $< | sed -n $(sed-voffset) > $@
> +
> +targets += ../voffset.h
> +
> +$(obj)/../voffset.h: vmlinux FORCE
> +       $(call if_changed,voffset)
> +
> +$(obj)/misc.o: $(obj)/../voffset.h
> +
>  vmlinux-objs-y := $(obj)/vmlinux.lds $(obj)/head_$(BITS).o $(obj)/misc.o \
>         $(obj)/string.o $(obj)/cmdline.o \
>         $(obj)/piggy.o $(obj)/cpuflags.o
> diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
> index e7c6fdf..4d317b7 100644
> --- a/arch/x86/boot/compressed/misc.c
> +++ b/arch/x86/boot/compressed/misc.c
> @@ -11,6 +11,7 @@
>
>  #include "misc.h"
>  #include "../string.h"
> +#include "../voffset.h"
>
>  /* WARNING!!
>   * This code is compiled with -fPIC and it is relocated dynamically
> @@ -415,6 +416,8 @@ asmlinkage __visible void *decompress_kernel(void *rmode, memptr heap,
>         lines = real_mode->screen_info.orig_video_lines;
>         cols = real_mode->screen_info.orig_video_cols;
>
> +       run_size = VO__end - VO__text;
> +
>         console_init();
>         debug_putstr("early console in decompress_kernel\n");
>
> --
> 2.5.0
>



-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v4 07/20] x86, kaslr: Clean up useless code related to run_size.
  2016-03-22  7:32 ` [PATCH v4 07/20] x86, kaslr: Clean up useless code related to run_size Baoquan He
@ 2016-03-22 20:52   ` Kees Cook
  0 siblings, 0 replies; 52+ messages in thread
From: Kees Cook @ 2016-03-22 20:52 UTC (permalink / raw)
  To: Baoquan He
  Cc: LKML, Yinghai Lu, H. Peter Anvin, Ingo Molnar, Borislav Petkov,
	Vivek Goyal, Andy Lutomirski, lasse.collin, Andrew Morton,
	Dave Young, Josh Triplett, Ard Biesheuvel, Junjie Mao

On Tue, Mar 22, 2016 at 12:32 AM, Baoquan He <bhe@redhat.com> wrote:
> From: Yinghai Lu <yinghai@kernel.org>
>
> So now we use the formula below to get run_size.
>
>         run_size = VO__end - VO__text
>
> Let's remove code for old run_size calculation and clean up the places
> where run_size need be passed in and out.
>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Josh Triplett <josh@joshtriplett.org>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> Cc: Junjie Mao <eternal.n08@gmail.com>
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> Signed-off-by: Baoquan He <bhe@redhat.com>

Acked-by: Kees Cook <keescook@chromium.org>

-Kees

> ---
> v2->v3:
>     Adjust the patch log.
> v3->v4:
>     Kees suggested that change run_size as const in decompress_kernel()
>
>  arch/x86/boot/compressed/Makefile  |  4 +---
>  arch/x86/boot/compressed/head_32.S |  3 +--
>  arch/x86/boot/compressed/head_64.S |  3 ---
>  arch/x86/boot/compressed/misc.c    |  6 ++----
>  arch/x86/boot/compressed/mkpiggy.c | 10 ++-------
>  arch/x86/tools/calc_run_size.sh    | 42 --------------------------------------
>  6 files changed, 6 insertions(+), 62 deletions(-)
>  delete mode 100644 arch/x86/tools/calc_run_size.sh
>
> diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
> index d9dedd9..fef80fa 100644
> --- a/arch/x86/boot/compressed/Makefile
> +++ b/arch/x86/boot/compressed/Makefile
> @@ -105,10 +105,8 @@ suffix-$(CONFIG_KERNEL_XZ) := xz
>  suffix-$(CONFIG_KERNEL_LZO)    := lzo
>  suffix-$(CONFIG_KERNEL_LZ4)    := lz4
>
> -RUN_SIZE = $(shell $(OBJDUMP) -h vmlinux | \
> -            $(CONFIG_SHELL) $(srctree)/arch/x86/tools/calc_run_size.sh)
>  quiet_cmd_mkpiggy = MKPIGGY $@
> -      cmd_mkpiggy = $(obj)/mkpiggy $< $(RUN_SIZE) > $@ || ( rm -f $@ ; false )
> +      cmd_mkpiggy = $(obj)/mkpiggy $< > $@ || ( rm -f $@ ; false )
>
>  targets += piggy.S
>  $(obj)/piggy.S: $(obj)/vmlinux.bin.$(suffix-y) $(obj)/mkpiggy FORCE
> diff --git a/arch/x86/boot/compressed/head_32.S b/arch/x86/boot/compressed/head_32.S
> index 0c140f9..122b32f 100644
> --- a/arch/x86/boot/compressed/head_32.S
> +++ b/arch/x86/boot/compressed/head_32.S
> @@ -210,7 +210,6 @@ relocated:
>   * Do the decompression, and jump to the new kernel..
>   */
>                                 /* push arguments for decompress_kernel: */
> -       pushl   $z_run_size     /* size of kernel with .bss and .brk */
>         pushl   $z_output_len   /* decompressed length, end of relocs */
>
>         movl    BP_init_size(%esi), %eax
> @@ -226,7 +225,7 @@ relocated:
>         pushl   %eax            /* heap area */
>         pushl   %esi            /* real mode pointer */
>         call    decompress_kernel /* returns kernel location in %eax */
> -       addl    $28, %esp
> +       addl    $24, %esp
>
>  /*
>   * Jump to the decompressed kernel.
> diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
> index 67dd8d3..3691451 100644
> --- a/arch/x86/boot/compressed/head_64.S
> +++ b/arch/x86/boot/compressed/head_64.S
> @@ -407,8 +407,6 @@ relocated:
>   * Do the decompression, and jump to the new kernel..
>   */
>         pushq   %rsi                    /* Save the real mode argument */
> -       movq    $z_run_size, %r9        /* size of kernel with .bss and .brk */
> -       pushq   %r9
>         movq    %rsi, %rdi              /* real mode address */
>         leaq    boot_heap(%rip), %rsi   /* malloc area for uncompression */
>         leaq    input_data(%rip), %rdx  /* input_data */
> @@ -416,7 +414,6 @@ relocated:
>         movq    %rbp, %r8               /* output target address */
>         movq    $z_output_len, %r9      /* decompressed length, end of relocs */
>         call    decompress_kernel       /* returns kernel location in %rax */
> -       popq    %r9
>         popq    %rsi
>
>  /*
> diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
> index 4d317b7..def6207 100644
> --- a/arch/x86/boot/compressed/misc.c
> +++ b/arch/x86/boot/compressed/misc.c
> @@ -393,9 +393,9 @@ asmlinkage __visible void *decompress_kernel(void *rmode, memptr heap,
>                                   unsigned char *input_data,
>                                   unsigned long input_len,
>                                   unsigned char *output,
> -                                 unsigned long output_len,
> -                                 unsigned long run_size)
> +                                 unsigned long output_len)
>  {
> +       const unsigned long run_size = VO__end - VO__text;
>         unsigned char *output_orig = output;
>
>         real_mode = rmode;
> @@ -416,8 +416,6 @@ asmlinkage __visible void *decompress_kernel(void *rmode, memptr heap,
>         lines = real_mode->screen_info.orig_video_lines;
>         cols = real_mode->screen_info.orig_video_cols;
>
> -       run_size = VO__end - VO__text;
> -
>         console_init();
>         debug_putstr("early console in decompress_kernel\n");
>
> diff --git a/arch/x86/boot/compressed/mkpiggy.c b/arch/x86/boot/compressed/mkpiggy.c
> index a613c84..c5148642 100644
> --- a/arch/x86/boot/compressed/mkpiggy.c
> +++ b/arch/x86/boot/compressed/mkpiggy.c
> @@ -34,13 +34,11 @@ int main(int argc, char *argv[])
>  {
>         uint32_t olen;
>         long ilen;
> -       unsigned long run_size;
>         FILE *f = NULL;
>         int retval = 1;
>
> -       if (argc < 3) {
> -               fprintf(stderr, "Usage: %s compressed_file run_size\n",
> -                               argv[0]);
> +       if (argc < 2) {
> +               fprintf(stderr, "Usage: %s compressed_file\n", argv[0]);
>                 goto bail;
>         }
>
> @@ -65,15 +63,11 @@ int main(int argc, char *argv[])
>         ilen = ftell(f);
>         olen = get_unaligned_le32(&olen);
>
> -       run_size = atoi(argv[2]);
> -
>         printf(".section \".rodata..compressed\",\"a\",@progbits\n");
>         printf(".globl z_input_len\n");
>         printf("z_input_len = %lu\n", ilen);
>         printf(".globl z_output_len\n");
>         printf("z_output_len = %lu\n", (unsigned long)olen);
> -       printf(".globl z_run_size\n");
> -       printf("z_run_size = %lu\n", run_size);
>
>         printf(".globl input_data, input_data_end\n");
>         printf("input_data:\n");
> diff --git a/arch/x86/tools/calc_run_size.sh b/arch/x86/tools/calc_run_size.sh
> deleted file mode 100644
> index 1a4c17b..0000000
> --- a/arch/x86/tools/calc_run_size.sh
> +++ /dev/null
> @@ -1,42 +0,0 @@
> -#!/bin/sh
> -#
> -# Calculate the amount of space needed to run the kernel, including room for
> -# the .bss and .brk sections.
> -#
> -# Usage:
> -# objdump -h a.out | sh calc_run_size.sh
> -
> -NUM='\([0-9a-fA-F]*[ \t]*\)'
> -OUT=$(sed -n 's/^[ \t0-9]*.b[sr][sk][ \t]*'"$NUM$NUM$NUM$NUM"'.*/\1\4/p')
> -if [ -z "$OUT" ] ; then
> -       echo "Never found .bss or .brk file offset" >&2
> -       exit 1
> -fi
> -
> -OUT=$(echo ${OUT# })
> -sizeA=$(printf "%d" 0x${OUT%% *})
> -OUT=${OUT#* }
> -offsetA=$(printf "%d" 0x${OUT%% *})
> -OUT=${OUT#* }
> -sizeB=$(printf "%d" 0x${OUT%% *})
> -OUT=${OUT#* }
> -offsetB=$(printf "%d" 0x${OUT%% *})
> -
> -run_size=$(( $offsetA + $sizeA + $sizeB ))
> -
> -# BFD linker shows the same file offset in ELF.
> -if [ "$offsetA" -ne "$offsetB" ] ; then
> -       # Gold linker shows them as consecutive.
> -       endB=$(( $offsetB + $sizeB ))
> -       if [ "$endB" != "$run_size" ] ; then
> -               printf "sizeA: 0x%x\n" $sizeA >&2
> -               printf "offsetA: 0x%x\n" $offsetA >&2
> -               printf "sizeB: 0x%x\n" $sizeB >&2
> -               printf "offsetB: 0x%x\n" $offsetB >&2
> -               echo ".bss and .brk are non-contiguous" >&2
> -               exit 1
> -       fi
> -fi
> -
> -printf "%d\n" $run_size
> -exit 0
> --
> 2.5.0
>



-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v4 08/20] x86, kaslr: Get correct max_addr for relocs pointer
  2016-03-22  7:32 ` [PATCH v4 08/20] x86, kaslr: Get correct max_addr for relocs pointer Baoquan He
@ 2016-03-22 20:52   ` Kees Cook
  0 siblings, 0 replies; 52+ messages in thread
From: Kees Cook @ 2016-03-22 20:52 UTC (permalink / raw)
  To: Baoquan He
  Cc: LKML, Yinghai Lu, H. Peter Anvin, Ingo Molnar, Borislav Petkov,
	Vivek Goyal, Andy Lutomirski, lasse.collin, Andrew Morton,
	Dave Young

On Tue, Mar 22, 2016 at 12:32 AM, Baoquan He <bhe@redhat.com> wrote:
> From: Yinghai Lu <yinghai@kernel.org>
>
> Relocation handling performs bounds checking on the relocated
> addresses. The existing code uses output_len (VO size plus relocs
> size) as the max address. This is not right since the max_addr check
> should stop at the end of VO and exclude bss, brk, etc, which follows.
> The valid range should be VO [_text, __bss_start] in the loaded
> physical address space.
>
> In this patch, add export for __bss_start to voffset.h and use it to
> get the correct max_addr.
>
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> Cc: Baoquan He <bhe@redhat.com>

Acked-by: Kees Cook <keescook@chromium.org>

-Kees

> ---
> v2->v3:
>     Adjust patch log.
> v3->v4:
>     Kees help rewrite the patch log
>
>  arch/x86/boot/compressed/Makefile | 2 +-
>  arch/x86/boot/compressed/misc.c   | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
> index fef80fa..2e7c0ce 100644
> --- a/arch/x86/boot/compressed/Makefile
> +++ b/arch/x86/boot/compressed/Makefile
> @@ -41,7 +41,7 @@ LDFLAGS_vmlinux := -T
>  hostprogs-y    := mkpiggy
>  HOST_EXTRACFLAGS += -I$(srctree)/tools/include
>
> -sed-voffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(_text\|_end\)$$/\#define VO_\2 _AC(0x\1,UL)/p'
> +sed-voffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(_text\|__bss_start\|_end\)$$/\#define VO_\2 _AC(0x\1,UL)/p'
>
>  quiet_cmd_voffset = VOFFSET $@
>        cmd_voffset = $(NM) $< | sed -n $(sed-voffset) > $@
> diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
> index def6207..029f42f 100644
> --- a/arch/x86/boot/compressed/misc.c
> +++ b/arch/x86/boot/compressed/misc.c
> @@ -259,7 +259,7 @@ static void handle_relocations(void *output, unsigned long output_len)
>         int *reloc;
>         unsigned long delta, map, ptr;
>         unsigned long min_addr = (unsigned long)output;
> -       unsigned long max_addr = min_addr + output_len;
> +       unsigned long max_addr = min_addr + (VO___bss_start - VO__text);
>
>         /*
>          * Calculate the delta between where vmlinux was linked to load
> --
> 2.5.0
>



-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v4 19/20] x86, kaslr: Allow random address to be below loaded address
  2016-03-22 19:54   ` Kees Cook
@ 2016-03-23  1:41     ` Baoquan He
  0 siblings, 0 replies; 52+ messages in thread
From: Baoquan He @ 2016-03-23  1:41 UTC (permalink / raw)
  To: Kees Cook
  Cc: LKML, Yinghai Lu, H. Peter Anvin, Ingo Molnar, Borislav Petkov,
	Vivek Goyal, Andy Lutomirski, lasse.collin, Andrew Morton,
	Dave Young

On 03/22/16 at 12:54pm, Kees Cook wrote:
> On Tue, Mar 22, 2016 at 12:32 AM, Baoquan He <bhe@redhat.com> wrote:
> > From: Yinghai Lu <yinghai@kernel.org>
> >
> > Now new randomized output can only be chosen from regions above loaded
> > address. In this case, for bootloaders like kexec which always loads
> > kernel near the end of ram, it doesn't do randomization at all. Or kernel
> > is loaded in a very big starting address, we should not give up that area
> > is loaded in a very large address, then the area below the large loaded
> > address will be given up. This is not reasonable.
> >
> > With correct tracking in mem_avoid  we can allow random output below
> > loaded address. With this change, though kexec can get random ouput
> > below its loaded address of kernel.
> >
> > Now we just pick 512M as min_addr. If kernel loaded address is bigger than
> > 512M, E.g 8G. Then [512M, 8G) can be added into random output candidate area.
> >
> > Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> > ---
> >  arch/x86/boot/compressed/aslr.c | 10 ++++++++--
> >  1 file changed, 8 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
> > index ddfc3d0..d072ca7 100644
> > --- a/arch/x86/boot/compressed/aslr.c
> > +++ b/arch/x86/boot/compressed/aslr.c
> > @@ -446,7 +446,8 @@ void choose_kernel_location(unsigned char *input,
> >                                 unsigned long output_size,
> >                                 unsigned char **virt_offset)
> >  {
> > -       unsigned long random;
> > +       unsigned long random, min_addr;
> > +
> >         *virt_offset = (unsigned char *)LOAD_PHYSICAL_ADDR;
> >
> >  #ifdef CONFIG_HIBERNATION
> > @@ -467,8 +468,13 @@ void choose_kernel_location(unsigned char *input,
> >         mem_avoid_init((unsigned long)input, input_size,
> >                        (unsigned long)*output);
> >
> > +       /* start from 512M */
> > +       min_addr = (unsigned long)*output;
> > +       if (min_addr > (512UL<<20))
> > +               min_addr = 512UL<<20;
> 
> The goal is to find a minimum address? I'm not sure this comment makes
> sense. Shouldn't this be:
> 
>     /* Lower minimum to 512M. */
>    min_addr = min_t(unsigned long, *output, 512UL << 20);
> 
> Or something like that?

Yes, the goal is to lower minimum to 512M. It's better to change it to
"Lower minimum to 512M" as you suggested. I will resend a new one with
this update into this thread.

Thanks a lot for your great suggestion for patch log, code change and
patch rearranging, and also appreciate your encouragement and patience.

Thanks
Baoquan

> 
> > +
> >         /* Walk e820 and find a random address. */
> > -       random = find_random_phy_addr((unsigned long)*output, output_size);
> > +       random = find_random_phy_addr(min_addr, output_size);
> >         if (!random)
> >                 debug_putstr("KASLR could not find suitable E820 region...\n");
> >         else {
> > --
> > 2.5.0
> >
> 
> -Kees
> 
> -- 
> Kees Cook
> Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v5 19/20] x86, kaslr: Allow random address to be below loaded address
  2016-03-22  7:32 ` [PATCH v4 19/20] x86, kaslr: Allow random address to be below loaded address Baoquan He
  2016-03-22 19:54   ` Kees Cook
@ 2016-03-23  8:59   ` Baoquan He
  1 sibling, 0 replies; 52+ messages in thread
From: Baoquan He @ 2016-03-23  8:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: yinghai, keescook, hpa, mingo, bp, vgoyal, luto, lasse.collin,
	akpm, dyoung

Now new randomized output can only be chosen from regions above loaded
address. In this case, for bootloaders like kexec which always loads
kernel near the end of ram, it doesn't do randomization at all. Or kernel
is loaded in a very big starting address, we should not give up that area
is loaded in a very large address, then the area below the large loaded
address will be given up. This is not reasonable.

With correct tracking in mem_avoid  we can allow random output below
loaded address. With this change, though kexec can get random ouput
below its loaded address of kernel.

Now we just pick 512M as min_addr. If kernel loaded address is bigger than
512M, E.g 8G. Then [512M, 8G) can be added into random output candidate area.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
v4->v5:
    Kees suggested changing the code comment related to minimum address
    to make it more understandable.

 arch/x86/boot/compressed/aslr.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index ddfc3d0..bbd2d06 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -446,7 +446,8 @@ void choose_kernel_location(unsigned char *input,
 				unsigned long output_size,
 				unsigned char **virt_offset)
 {
-	unsigned long random;
+	unsigned long random, min_addr;
+
 	*virt_offset = (unsigned char *)LOAD_PHYSICAL_ADDR;
 
 #ifdef CONFIG_HIBERNATION
@@ -467,8 +468,13 @@ void choose_kernel_location(unsigned char *input,
 	mem_avoid_init((unsigned long)input, input_size,
 		       (unsigned long)*output);
 
+	/* Lower minimum to 512M. */
+	min_addr = (unsigned long)*output;
+	if (min_addr > (512UL<<20))
+		min_addr = 512UL<<20;
+
 	/* Walk e820 and find a random address. */
-	random = find_random_phy_addr((unsigned long)*output, output_size);
+	random = find_random_phy_addr(min_addr, output_size);
 	if (!random)
 		debug_putstr("KASLR could not find suitable E820 region...\n");
 	else {
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PATCH v4 00/20] x86, boot: kaslr cleanup and 64bit kaslr support
  2016-03-22 20:25 ` [PATCH v4 00/20] x86, boot: kaslr cleanup and 64bit kaslr support Kees Cook
@ 2016-03-23 22:40   ` Kees Cook
  0 siblings, 0 replies; 52+ messages in thread
From: Kees Cook @ 2016-03-23 22:40 UTC (permalink / raw)
  To: Baoquan He, Ingo Molnar
  Cc: LKML, Yinghai Lu, H. Peter Anvin, Borislav Petkov, Vivek Goyal,
	Andy Lutomirski, lasse.collin, Andrew Morton, Dave Young

On Tue, Mar 22, 2016 at 1:25 PM, Kees Cook <keescook@chromium.org> wrote:
> On Tue, Mar 22, 2016 at 12:31 AM, Baoquan He <bhe@redhat.com> wrote:
>> ***Background:
>> Previously a bug is reported that kdump didn't work when kaslr is enabled. During
>> discussing that bug fix, we found current kaslr has a limilation that it can
>> only randomize in 1GB region.
>>
>> This is because in curent kaslr implementaion only physical address of kernel
>> loading is randomized. Then calculate the delta of physical address where
>> vmlinux was linked to load and where it is finally loaded. If delta is not
>> equal to 0, namely there's a new physical address where kernel is actually
>> decompressed, relocation handling need be done. Then delta is added to offset
>> of kernel symbol relocation, this makes the address of kernel text mapping move
>> delta long. Though in principle kernel can be randomized to any physical address,
>> kernel text mapping address space is limited and only 1G, namely as follows on
>> x86_64:
>>         [0xffffffff80000000, 0xffffffffc0000000)
>>
>> In one word, kernel text physical address and virtual address randomization is
>> coupled. This causes the limitation.
>>
>> Then hpa and Vivek suggested we should change this. To decouple the physical
>> address and virtual address randomization of kernel text and let them work
>> separately. Then kernel text physical address can be randomized in region
>> [16M, 64T), and kernel text virtual address can be randomized in region
>> [0xffffffff80000000, 0xffffffffc0000000).
>>
>> ***Problems we need solved:
>>   - For kernel boot from startup_32 case, only 0~4G identity mapping is built.
>>     If kernel will be randomly put anywhere from 16M to 64T at most, the price
>>     to build all region of identity mapping is too high. We need build the
>>     identity mapping on demand, not covering all physical address space.
>>
>>   - Decouple the physical address and virtual address randomization of kernel
>>     text and let them work separately.
>>
>> ***Parts:
>>    - The 1st part is Yinghai's identity mapping building on demand patches.
>>      This is used to solve the first problem mentioned above.
>>      (Patch 09-10/19)
>>    - The 2nd part is decoupling the physical address and virtual address
>>      randomization of kernel text and letting them work separately patches
>>      based on Yinghai's ident mapping patches.
>>      (Patch 12-19/19)
>>    - The 3rd part is some clean up patches which Yinghai found when he reviewed
>>      my patches and the related code around.
>>      (Patch 01-08/19)
>>
>> ***Patch status:
>> This patchset went through several rounds of review.
>>
>>     v1:
>>     - The first round can be found here:
>>         https://lwn.net/Articles/637115/
>>
>>     v1->v2:
>>     - In 2nd round Yinghai made a big patchset including this kaslr fix and another
>>       setup_data related fix. The link is here:
>>        http://lists-archives.com/linux-kernel/28346903-x86-updated-patches-for-kaslr-and-setup_data-etc-for-v4-3.html
>>       You can get the code from Yinghai's git branch:
>>       git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git for-x86-v4.3-next
>>
>>     v2->v3:
>>     - It only takes care of the kaslr related patches.
>>       For reviewers it's better to discuss only one issue in one thread.
>>         * I take off one patch as follows from Yinghai's because I think it's unnecessay.
>>            - Patch 05/19 x86, kaslr: rename output_size to output_run_size
>>              output_size is enough to represen the value:
>>                 output_len > run_size ? output_len : run_size
>>
>>         * I add Patch 04/19, it's a comment update patch. For other patches, I just
>>           adjust patch log and do several places of change comparing with 2nd round.
>>           Please check the change log under patch log of each patch for details.
>>
>>         * Adjust sequence of several patches to make review easier. It doesn't
>>           affect codes.
>>
>>     v3->v4:
>>     - Made changes according to Kees's comments.
>>       Add one patch 20/20 as Kees suggested to use KERNEL_IMAGE_SIZE as offset
>>       max of virtual random, meanwhile clean up useless CONFIG_RANDOM_OFFSET_MAX
>>
>>         x86, kaslr: Use KERNEL_IMAGE_SIZE as the offset max for kernel virtual randomization
>
> This series is looking good to me. I'm running tests under qemu now,
> and things appear to work as advertised. :) I'll report back once I've
> booted a few hundred times.

My qemu isn't implementing rdrand, so my entropy source is poor, but I
had 7322 successful boots, with 6356 unique physical memory positions
and 487 unique virtual memory positions.

With 48G of physical memory, I'd expect to see more physical positions
(i.e. for a 24M kernel (using 12 slots), avoiding the first 512M (256
slots), I'd expect to see closer to 24308 positions (48G / 2M - 12 -
256)). The 487 is pretty close to the max 512 slots. Still, my entropy
source wasn't great, so that's probably the issue. Getting this
retested with rdrand would be nice. Regardless, no hangs. :)

-Kees

>
> Ingo, what do you think of getting this into the x86 tree for some
> testing in -next? For stuff I haven't already Acked, consider the
> whole series as:
>
> Reviewed-by: Kees Cook <keescook@chromium.org>

-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v4 00/20] x86, boot: kaslr cleanup and 64bit kaslr support
  2016-03-22  7:31 [PATCH v4 00/20] x86, boot: kaslr cleanup and 64bit kaslr support Baoquan He
                   ` (20 preceding siblings ...)
  2016-03-22 20:25 ` [PATCH v4 00/20] x86, boot: kaslr cleanup and 64bit kaslr support Kees Cook
@ 2016-04-05  1:56 ` Baoquan He
  2016-04-05 20:00     ` [kernel-hardening] " Kees Cook
  21 siblings, 1 reply; 52+ messages in thread
From: Baoquan He @ 2016-04-05  1:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: yinghai, keescook, hpa, mingo, bp, vgoyal, luto, lasse.collin,
	akpm, dyoung

Hi Ingo,

Ping.

Do you have any suggestions or comments for this patchset? As you
suggested for Yinghai's earlier post, I re-check the whole patchset and
try to understand them all, and try to re-write patch logs with my own
understanding. I hope this patch logs can help people who are interested
in this issue can understand what the patch is doing. Meanwhile Kees
is very kind to help review the code change and patch log, help adjust
or re-write some of them.

If still there's something you don't like, your suggestions, ideas and
comments are welcome and appreciated. I will continue making changes
until it's satisfactory.

Thanks
Baoquan

On 03/22/16 at 03:31pm, Baoquan He wrote:
> ***Background:
> Previously a bug is reported that kdump didn't work when kaslr is enabled. During
> discussing that bug fix, we found current kaslr has a limilation that it can
> only randomize in 1GB region.
> 
> This is because in curent kaslr implementaion only physical address of kernel
> loading is randomized. Then calculate the delta of physical address where
> vmlinux was linked to load and where it is finally loaded. If delta is not
> equal to 0, namely there's a new physical address where kernel is actually
> decompressed, relocation handling need be done. Then delta is added to offset
> of kernel symbol relocation, this makes the address of kernel text mapping move
> delta long. Though in principle kernel can be randomized to any physical address,
> kernel text mapping address space is limited and only 1G, namely as follows on
> x86_64:
> 	[0xffffffff80000000, 0xffffffffc0000000)
> 
> In one word, kernel text physical address and virtual address randomization is
> coupled. This causes the limitation.
> 
> Then hpa and Vivek suggested we should change this. To decouple the physical
> address and virtual address randomization of kernel text and let them work
> separately. Then kernel text physical address can be randomized in region
> [16M, 64T), and kernel text virtual address can be randomized in region
> [0xffffffff80000000, 0xffffffffc0000000).
> 
> ***Problems we need solved:
>   - For kernel boot from startup_32 case, only 0~4G identity mapping is built.
>     If kernel will be randomly put anywhere from 16M to 64T at most, the price
>     to build all region of identity mapping is too high. We need build the
>     identity mapping on demand, not covering all physical address space.
> 
>   - Decouple the physical address and virtual address randomization of kernel
>     text and let them work separately.
> 
> ***Parts:
>    - The 1st part is Yinghai's identity mapping building on demand patches.
>      This is used to solve the first problem mentioned above.
>      (Patch 09-10/19)
>    - The 2nd part is decoupling the physical address and virtual address
>      randomization of kernel text and letting them work separately patches
>      based on Yinghai's ident mapping patches.
>      (Patch 12-19/19)
>    - The 3rd part is some clean up patches which Yinghai found when he reviewed
>      my patches and the related code around.
>      (Patch 01-08/19)
> 
> ***Patch status:
> This patchset went through several rounds of review.
> 
>     v1:
>     - The first round can be found here:
> 	https://lwn.net/Articles/637115/
> 
>     v1->v2:
>     - In 2nd round Yinghai made a big patchset including this kaslr fix and another
>       setup_data related fix. The link is here:
>        http://lists-archives.com/linux-kernel/28346903-x86-updated-patches-for-kaslr-and-setup_data-etc-for-v4-3.html
>       You can get the code from Yinghai's git branch:
>       git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git for-x86-v4.3-next
> 
>     v2->v3:
>     - It only takes care of the kaslr related patches.
>       For reviewers it's better to discuss only one issue in one thread.
>         * I take off one patch as follows from Yinghai's because I think it's unnecessay. 
>            - Patch 05/19 x86, kaslr: rename output_size to output_run_size
>              output_size is enough to represen the value:
>          	output_len > run_size ? output_len : run_size
>        
>         * I add Patch 04/19, it's a comment update patch. For other patches, I just
>           adjust patch log and do several places of change comparing with 2nd round.
>           Please check the change log under patch log of each patch for details.
> 
>         * Adjust sequence of several patches to make review easier. It doesn't
>           affect codes.
> 
>     v3->v4:
>     - Made changes according to Kees's comments.
>       Add one patch 20/20 as Kees suggested to use KERNEL_IMAGE_SIZE as offset
>       max of virtual random, meanwhile clean up useless CONFIG_RANDOM_OFFSET_MAX
> 
>         x86, kaslr: Use KERNEL_IMAGE_SIZE as the offset max for kernel virtual randomization
> 
> You can also get this patchset from my github:
>    https://github.com/baoquan-he/linux.git kaslr-above-4G
> 
> Any comments about code changes, code comments, patch logs are welcome and
> appreciated.
> 
> Baoquan He (9):
>   x86, kaslr: Fix a bug that relocation can not be handled when kernel
>     is loaded above 2G
>   x86, kaskr: Update the description for decompressor worst case
>   x86, kaslr: Introduce struct slot_area to manage randomization slot
>     info
>   x86, kaslr: Add two functions which will be used later
>   x86, kaslr: Introduce fetch_random_virt_offset to randomize the kernel
>     text mapping address
>   x86, kaslr: Randomize physical and virtual address of kernel
>     separately
>   x86, kaslr: Add support of kernel physical address randomization above
>     4G
>   x86, kaslr: Remove useless codes
>   x86, kaslr: Use KERNEL_IMAGE_SIZE as the offset max for kernel virtual
>     randomization
> 
> Yinghai Lu (11):
>   x86, kaslr: Remove not needed parameter for choose_kernel_location
>   x86, boot: Move compressed kernel to end of buffer before
>     decompressing
>   x86, boot: Move z_extract_offset calculation to header.S
>   x86, boot: Fix run_size calculation
>   x86, kaslr: Clean up useless code related to run_size.
>   x86, kaslr: Get correct max_addr for relocs pointer
>   x86, kaslr: Consolidate mem_avoid array filling
>   x86, boot: Split kernel_ident_mapping_init to another file
>   x86, 64bit: Set ident_mapping for kaslr
>   x86, boot: Add checking for memcpy
>   x86, kaslr: Allow random address to be below loaded address
> 
>  arch/x86/Kconfig                       |  57 +++----
>  arch/x86/boot/Makefile                 |  13 +-
>  arch/x86/boot/compressed/Makefile      |  19 ++-
>  arch/x86/boot/compressed/aslr.c        | 300 +++++++++++++++++++++++++--------
>  arch/x86/boot/compressed/head_32.S     |  14 +-
>  arch/x86/boot/compressed/head_64.S     |  15 +-
>  arch/x86/boot/compressed/misc.c        |  89 +++++-----
>  arch/x86/boot/compressed/misc.h        |  34 ++--
>  arch/x86/boot/compressed/misc_pgt.c    |  93 ++++++++++
>  arch/x86/boot/compressed/mkpiggy.c     |  28 +--
>  arch/x86/boot/compressed/string.c      |  29 +++-
>  arch/x86/boot/compressed/vmlinux.lds.S |   1 +
>  arch/x86/boot/header.S                 |  22 ++-
>  arch/x86/include/asm/boot.h            |  19 +++
>  arch/x86/include/asm/page.h            |   5 +
>  arch/x86/include/asm/page_64_types.h   |   5 +-
>  arch/x86/kernel/asm-offsets.c          |   1 +
>  arch/x86/kernel/vmlinux.lds.S          |   1 +
>  arch/x86/mm/ident_map.c                |  74 ++++++++
>  arch/x86/mm/init_32.c                  |   3 -
>  arch/x86/mm/init_64.c                  |  74 +-------
>  arch/x86/tools/calc_run_size.sh        |  42 -----
>  22 files changed, 605 insertions(+), 333 deletions(-)
>  create mode 100644 arch/x86/boot/compressed/misc_pgt.c
>  create mode 100644 arch/x86/mm/ident_map.c
>  delete mode 100644 arch/x86/tools/calc_run_size.sh
> 
> -- 
> 2.5.0
> 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v4 00/20] x86, boot: kaslr cleanup and 64bit kaslr support
  2016-04-05  1:56 ` Baoquan He
@ 2016-04-05 20:00     ` Kees Cook
  0 siblings, 0 replies; 52+ messages in thread
From: Kees Cook @ 2016-04-05 20:00 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Baoquan He, Yinghai Lu, H. Peter Anvin, Borislav Petkov,
	Vivek Goyal, Andy Lutomirski, lasse.collin, Andrew Morton,
	Dave Young, kernel-hardening

On Mon, Apr 4, 2016 at 6:56 PM, Baoquan He <bhe@redhat.com> wrote:
> On 03/22/16 at 03:31pm, Baoquan He wrote:
>> ***Background:
>> Previously a bug is reported that kdump didn't work when kaslr is enabled. During
>> discussing that bug fix, we found current kaslr has a limilation that it can
>> only randomize in 1GB region.
>>
>> This is because in curent kaslr implementaion only physical address of kernel
>> loading is randomized. Then calculate the delta of physical address where
>> vmlinux was linked to load and where it is finally loaded. If delta is not
>> equal to 0, namely there's a new physical address where kernel is actually
>> decompressed, relocation handling need be done. Then delta is added to offset
>> of kernel symbol relocation, this makes the address of kernel text mapping move
>> delta long. Though in principle kernel can be randomized to any physical address,
>> kernel text mapping address space is limited and only 1G, namely as follows on
>> x86_64:
>>       [0xffffffff80000000, 0xffffffffc0000000)
>>
>> In one word, kernel text physical address and virtual address randomization is
>> coupled. This causes the limitation.
>>
>> Then hpa and Vivek suggested we should change this. To decouple the physical
>> address and virtual address randomization of kernel text and let them work
>> separately. Then kernel text physical address can be randomized in region
>> [16M, 64T), and kernel text virtual address can be randomized in region
>> [0xffffffff80000000, 0xffffffffc0000000).
>>
>> ***Problems we need solved:
>>   - For kernel boot from startup_32 case, only 0~4G identity mapping is built.
>>     If kernel will be randomly put anywhere from 16M to 64T at most, the price
>>     to build all region of identity mapping is too high. We need build the
>>     identity mapping on demand, not covering all physical address space.
>>
>>   - Decouple the physical address and virtual address randomization of kernel
>>     text and let them work separately.
>>
>> ***Parts:
>>    - The 1st part is Yinghai's identity mapping building on demand patches.
>>      This is used to solve the first problem mentioned above.
>>      (Patch 09-10/19)
>>    - The 2nd part is decoupling the physical address and virtual address
>>      randomization of kernel text and letting them work separately patches
>>      based on Yinghai's ident mapping patches.
>>      (Patch 12-19/19)
>>    - The 3rd part is some clean up patches which Yinghai found when he reviewed
>>      my patches and the related code around.
>>      (Patch 01-08/19)
>>
>> ***Patch status:
>> This patchset went through several rounds of review.
>>
>>     v1:
>>     - The first round can be found here:
>>       https://lwn.net/Articles/637115/
>>
>>     v1->v2:
>>     - In 2nd round Yinghai made a big patchset including this kaslr fix and another
>>       setup_data related fix. The link is here:
>>        http://lists-archives.com/linux-kernel/28346903-x86-updated-patches-for-kaslr-and-setup_data-etc-for-v4-3.html
>>       You can get the code from Yinghai's git branch:
>>       git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git for-x86-v4.3-next
>>
>>     v2->v3:
>>     - It only takes care of the kaslr related patches.
>>       For reviewers it's better to discuss only one issue in one thread.
>>         * I take off one patch as follows from Yinghai's because I think it's unnecessay.
>>            - Patch 05/19 x86, kaslr: rename output_size to output_run_size
>>              output_size is enough to represen the value:
>>               output_len > run_size ? output_len : run_size
>>
>>         * I add Patch 04/19, it's a comment update patch. For other patches, I just
>>           adjust patch log and do several places of change comparing with 2nd round.
>>           Please check the change log under patch log of each patch for details.
>>
>>         * Adjust sequence of several patches to make review easier. It doesn't
>>           affect codes.
>>
>>     v3->v4:
>>     - Made changes according to Kees's comments.
>>       Add one patch 20/20 as Kees suggested to use KERNEL_IMAGE_SIZE as offset
>>       max of virtual random, meanwhile clean up useless CONFIG_RANDOM_OFFSET_MAX
>>
>>         x86, kaslr: Use KERNEL_IMAGE_SIZE as the offset max for kernel virtual randomization
>>
>> You can also get this patchset from my github:
>>    https://github.com/baoquan-he/linux.git kaslr-above-4G
>>
>> Any comments about code changes, code comments, patch logs are welcome and
>> appreciated.
>>
>> Baoquan He (9):
>>   x86, kaslr: Fix a bug that relocation can not be handled when kernel
>>     is loaded above 2G
>>   x86, kaskr: Update the description for decompressor worst case
>>   x86, kaslr: Introduce struct slot_area to manage randomization slot
>>     info
>>   x86, kaslr: Add two functions which will be used later
>>   x86, kaslr: Introduce fetch_random_virt_offset to randomize the kernel
>>     text mapping address
>>   x86, kaslr: Randomize physical and virtual address of kernel
>>     separately
>>   x86, kaslr: Add support of kernel physical address randomization above
>>     4G
>>   x86, kaslr: Remove useless codes
>>   x86, kaslr: Use KERNEL_IMAGE_SIZE as the offset max for kernel virtual
>>     randomization
>>
>> Yinghai Lu (11):
>>   x86, kaslr: Remove not needed parameter for choose_kernel_location
>>   x86, boot: Move compressed kernel to end of buffer before
>>     decompressing
>>   x86, boot: Move z_extract_offset calculation to header.S
>>   x86, boot: Fix run_size calculation
>>   x86, kaslr: Clean up useless code related to run_size.
>>   x86, kaslr: Get correct max_addr for relocs pointer
>>   x86, kaslr: Consolidate mem_avoid array filling
>>   x86, boot: Split kernel_ident_mapping_init to another file
>>   x86, 64bit: Set ident_mapping for kaslr
>>   x86, boot: Add checking for memcpy
>>   x86, kaslr: Allow random address to be below loaded address
>>
>>  arch/x86/Kconfig                       |  57 +++----
>>  arch/x86/boot/Makefile                 |  13 +-
>>  arch/x86/boot/compressed/Makefile      |  19 ++-
>>  arch/x86/boot/compressed/aslr.c        | 300 +++++++++++++++++++++++++--------
>>  arch/x86/boot/compressed/head_32.S     |  14 +-
>>  arch/x86/boot/compressed/head_64.S     |  15 +-
>>  arch/x86/boot/compressed/misc.c        |  89 +++++-----
>>  arch/x86/boot/compressed/misc.h        |  34 ++--
>>  arch/x86/boot/compressed/misc_pgt.c    |  93 ++++++++++
>>  arch/x86/boot/compressed/mkpiggy.c     |  28 +--
>>  arch/x86/boot/compressed/string.c      |  29 +++-
>>  arch/x86/boot/compressed/vmlinux.lds.S |   1 +
>>  arch/x86/boot/header.S                 |  22 ++-
>>  arch/x86/include/asm/boot.h            |  19 +++
>>  arch/x86/include/asm/page.h            |   5 +
>>  arch/x86/include/asm/page_64_types.h   |   5 +-
>>  arch/x86/kernel/asm-offsets.c          |   1 +
>>  arch/x86/kernel/vmlinux.lds.S          |   1 +
>>  arch/x86/mm/ident_map.c                |  74 ++++++++
>>  arch/x86/mm/init_32.c                  |   3 -
>>  arch/x86/mm/init_64.c                  |  74 +-------
>>  arch/x86/tools/calc_run_size.sh        |  42 -----
>>  22 files changed, 605 insertions(+), 333 deletions(-)
>>  create mode 100644 arch/x86/boot/compressed/misc_pgt.c
>>  create mode 100644 arch/x86/mm/ident_map.c
>>  delete mode 100644 arch/x86/tools/calc_run_size.sh
>>
> Hi Ingo,
>
> Ping.
>
> Do you have any suggestions or comments for this patchset? As you
> suggested for Yinghai's earlier post, I re-check the whole patchset and
> try to understand them all, and try to re-write patch logs with my own
> understanding. I hope this patch logs can help people who are interested
> in this issue can understand what the patch is doing. Meanwhile Kees
> is very kind to help review the code change and patch log, help adjust
> or re-write some of them.
>
> If still there's something you don't like, your suggestions, ideas and
> comments are welcome and appreciated. I will continue making changes
> until it's satisfactory.

FWIW, I've also had this tree up in my git branches, and the 0day
tester hasn't complained at all about it in the last two weeks. I'd
really like to see this in -next to fix the >4G (mainly kexec) issues
and get us to feature parity with the arm64 kASLR work (randomized
virtual address).

-Kees

-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [kernel-hardening] Re: [PATCH v4 00/20] x86, boot: kaslr cleanup and 64bit kaslr support
@ 2016-04-05 20:00     ` Kees Cook
  0 siblings, 0 replies; 52+ messages in thread
From: Kees Cook @ 2016-04-05 20:00 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Baoquan He, Yinghai Lu, H. Peter Anvin, Borislav Petkov,
	Vivek Goyal, Andy Lutomirski, lasse.collin, Andrew Morton,
	Dave Young, kernel-hardening

On Mon, Apr 4, 2016 at 6:56 PM, Baoquan He <bhe@redhat.com> wrote:
> On 03/22/16 at 03:31pm, Baoquan He wrote:
>> ***Background:
>> Previously a bug is reported that kdump didn't work when kaslr is enabled. During
>> discussing that bug fix, we found current kaslr has a limilation that it can
>> only randomize in 1GB region.
>>
>> This is because in curent kaslr implementaion only physical address of kernel
>> loading is randomized. Then calculate the delta of physical address where
>> vmlinux was linked to load and where it is finally loaded. If delta is not
>> equal to 0, namely there's a new physical address where kernel is actually
>> decompressed, relocation handling need be done. Then delta is added to offset
>> of kernel symbol relocation, this makes the address of kernel text mapping move
>> delta long. Though in principle kernel can be randomized to any physical address,
>> kernel text mapping address space is limited and only 1G, namely as follows on
>> x86_64:
>>       [0xffffffff80000000, 0xffffffffc0000000)
>>
>> In one word, kernel text physical address and virtual address randomization is
>> coupled. This causes the limitation.
>>
>> Then hpa and Vivek suggested we should change this. To decouple the physical
>> address and virtual address randomization of kernel text and let them work
>> separately. Then kernel text physical address can be randomized in region
>> [16M, 64T), and kernel text virtual address can be randomized in region
>> [0xffffffff80000000, 0xffffffffc0000000).
>>
>> ***Problems we need solved:
>>   - For kernel boot from startup_32 case, only 0~4G identity mapping is built.
>>     If kernel will be randomly put anywhere from 16M to 64T at most, the price
>>     to build all region of identity mapping is too high. We need build the
>>     identity mapping on demand, not covering all physical address space.
>>
>>   - Decouple the physical address and virtual address randomization of kernel
>>     text and let them work separately.
>>
>> ***Parts:
>>    - The 1st part is Yinghai's identity mapping building on demand patches.
>>      This is used to solve the first problem mentioned above.
>>      (Patch 09-10/19)
>>    - The 2nd part is decoupling the physical address and virtual address
>>      randomization of kernel text and letting them work separately patches
>>      based on Yinghai's ident mapping patches.
>>      (Patch 12-19/19)
>>    - The 3rd part is some clean up patches which Yinghai found when he reviewed
>>      my patches and the related code around.
>>      (Patch 01-08/19)
>>
>> ***Patch status:
>> This patchset went through several rounds of review.
>>
>>     v1:
>>     - The first round can be found here:
>>       https://lwn.net/Articles/637115/
>>
>>     v1->v2:
>>     - In 2nd round Yinghai made a big patchset including this kaslr fix and another
>>       setup_data related fix. The link is here:
>>        http://lists-archives.com/linux-kernel/28346903-x86-updated-patches-for-kaslr-and-setup_data-etc-for-v4-3.html
>>       You can get the code from Yinghai's git branch:
>>       git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git for-x86-v4.3-next
>>
>>     v2->v3:
>>     - It only takes care of the kaslr related patches.
>>       For reviewers it's better to discuss only one issue in one thread.
>>         * I take off one patch as follows from Yinghai's because I think it's unnecessay.
>>            - Patch 05/19 x86, kaslr: rename output_size to output_run_size
>>              output_size is enough to represen the value:
>>               output_len > run_size ? output_len : run_size
>>
>>         * I add Patch 04/19, it's a comment update patch. For other patches, I just
>>           adjust patch log and do several places of change comparing with 2nd round.
>>           Please check the change log under patch log of each patch for details.
>>
>>         * Adjust sequence of several patches to make review easier. It doesn't
>>           affect codes.
>>
>>     v3->v4:
>>     - Made changes according to Kees's comments.
>>       Add one patch 20/20 as Kees suggested to use KERNEL_IMAGE_SIZE as offset
>>       max of virtual random, meanwhile clean up useless CONFIG_RANDOM_OFFSET_MAX
>>
>>         x86, kaslr: Use KERNEL_IMAGE_SIZE as the offset max for kernel virtual randomization
>>
>> You can also get this patchset from my github:
>>    https://github.com/baoquan-he/linux.git kaslr-above-4G
>>
>> Any comments about code changes, code comments, patch logs are welcome and
>> appreciated.
>>
>> Baoquan He (9):
>>   x86, kaslr: Fix a bug that relocation can not be handled when kernel
>>     is loaded above 2G
>>   x86, kaskr: Update the description for decompressor worst case
>>   x86, kaslr: Introduce struct slot_area to manage randomization slot
>>     info
>>   x86, kaslr: Add two functions which will be used later
>>   x86, kaslr: Introduce fetch_random_virt_offset to randomize the kernel
>>     text mapping address
>>   x86, kaslr: Randomize physical and virtual address of kernel
>>     separately
>>   x86, kaslr: Add support of kernel physical address randomization above
>>     4G
>>   x86, kaslr: Remove useless codes
>>   x86, kaslr: Use KERNEL_IMAGE_SIZE as the offset max for kernel virtual
>>     randomization
>>
>> Yinghai Lu (11):
>>   x86, kaslr: Remove not needed parameter for choose_kernel_location
>>   x86, boot: Move compressed kernel to end of buffer before
>>     decompressing
>>   x86, boot: Move z_extract_offset calculation to header.S
>>   x86, boot: Fix run_size calculation
>>   x86, kaslr: Clean up useless code related to run_size.
>>   x86, kaslr: Get correct max_addr for relocs pointer
>>   x86, kaslr: Consolidate mem_avoid array filling
>>   x86, boot: Split kernel_ident_mapping_init to another file
>>   x86, 64bit: Set ident_mapping for kaslr
>>   x86, boot: Add checking for memcpy
>>   x86, kaslr: Allow random address to be below loaded address
>>
>>  arch/x86/Kconfig                       |  57 +++----
>>  arch/x86/boot/Makefile                 |  13 +-
>>  arch/x86/boot/compressed/Makefile      |  19 ++-
>>  arch/x86/boot/compressed/aslr.c        | 300 +++++++++++++++++++++++++--------
>>  arch/x86/boot/compressed/head_32.S     |  14 +-
>>  arch/x86/boot/compressed/head_64.S     |  15 +-
>>  arch/x86/boot/compressed/misc.c        |  89 +++++-----
>>  arch/x86/boot/compressed/misc.h        |  34 ++--
>>  arch/x86/boot/compressed/misc_pgt.c    |  93 ++++++++++
>>  arch/x86/boot/compressed/mkpiggy.c     |  28 +--
>>  arch/x86/boot/compressed/string.c      |  29 +++-
>>  arch/x86/boot/compressed/vmlinux.lds.S |   1 +
>>  arch/x86/boot/header.S                 |  22 ++-
>>  arch/x86/include/asm/boot.h            |  19 +++
>>  arch/x86/include/asm/page.h            |   5 +
>>  arch/x86/include/asm/page_64_types.h   |   5 +-
>>  arch/x86/kernel/asm-offsets.c          |   1 +
>>  arch/x86/kernel/vmlinux.lds.S          |   1 +
>>  arch/x86/mm/ident_map.c                |  74 ++++++++
>>  arch/x86/mm/init_32.c                  |   3 -
>>  arch/x86/mm/init_64.c                  |  74 +-------
>>  arch/x86/tools/calc_run_size.sh        |  42 -----
>>  22 files changed, 605 insertions(+), 333 deletions(-)
>>  create mode 100644 arch/x86/boot/compressed/misc_pgt.c
>>  create mode 100644 arch/x86/mm/ident_map.c
>>  delete mode 100644 arch/x86/tools/calc_run_size.sh
>>
> Hi Ingo,
>
> Ping.
>
> Do you have any suggestions or comments for this patchset? As you
> suggested for Yinghai's earlier post, I re-check the whole patchset and
> try to understand them all, and try to re-write patch logs with my own
> understanding. I hope this patch logs can help people who are interested
> in this issue can understand what the patch is doing. Meanwhile Kees
> is very kind to help review the code change and patch log, help adjust
> or re-write some of them.
>
> If still there's something you don't like, your suggestions, ideas and
> comments are welcome and appreciated. I will continue making changes
> until it's satisfactory.

FWIW, I've also had this tree up in my git branches, and the 0day
tester hasn't complained at all about it in the last two weeks. I'd
really like to see this in -next to fix the >4G (mainly kexec) issues
and get us to feature parity with the arm64 kASLR work (randomized
virtual address).

-Kees

-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v4 11/20] x86, 64bit: Set ident_mapping for kaslr
  2016-03-22  7:32 ` [PATCH v4 11/20] x86, 64bit: Set ident_mapping for kaslr Baoquan He
@ 2016-04-13 10:05   ` Ingo Molnar
  0 siblings, 0 replies; 52+ messages in thread
From: Ingo Molnar @ 2016-04-13 10:05 UTC (permalink / raw)
  To: Baoquan He
  Cc: linux-kernel, yinghai, keescook, hpa, mingo, bp, vgoyal, luto,
	lasse.collin, akpm, dyoung, Jiri Kosina, Borislav Petkov


* Baoquan He <bhe@redhat.com> wrote:

> From: Yinghai Lu <yinghai@kernel.org>
> 
> Current aslr only support random in small range, from 16M to 1G.  And
> new range still use old mapping. Also it does not support new range
> above 4G.
> 
> We need to have ident mapping for the new range before we can do
> decompress to the new output, and later run them.
> 
> In this patch, we add ident mapping for all needed range.
> 
> At first, to support aslr to put random VO above 4G, we must set ident
> mapping for the new range when it come via startup_32 path.
> 
> Secondly, when boot from 64bit bootloader, bootloader set ident mapping,
> and boot via ZO (arch/x86/boot/compressed/vmlinux) startup_64.
> Those pages for pagetable need to be avoided when we select new random
> VO (vmlinux) base. Otherwise decompressor would overwrite them during
> decompressing.
> First way would be: walk through pagetable and find out every page is used
> by pagetable for every mem_aovid checking but we will need extra code, and
> may need to increase mem_avoid array size to hold them.
> Other way would be: We can create new ident mapping instead, and pages for
> pagetable will come from _pagetable section of ZO, and they are in
> mem_avoid array already. In this way, we can reuse the code for ident
> mapping.
> 
> The _pgtable will be shared by 32bit and 64bit path to reduce init_size,
> as now ZO _rodata to _end will contribute init_size.
> 
> We need to increase pgt buffer size.
> When boot via startup_64, as we need to cover old VO, params, cmdline
> and new VO, in extreme case we could have them all cross 512G boundary,
> will need (2+2)*4 pages with 2M mapping. And need 2 for first 2M for vga
> ram. Plus one for level4. Total will be 19 pages.
> When boot via startup_32, aslr would move new VO above 4G, we need set
> extra ident mapping for new VO, pgt buffer come from _pgtable offset 6
> pages. Should only need (2+2) pages at most when it cross 512G boundary.
> So 19 pages could make both paths happy.

Guys, when you pick up patches and if you add Reviewed-by tags, don't just pass 
through crap but _fix_ the changelogs!

The very first sentence:

 > Current aslr only support random in small range, from 16M to 1G.  And

and it gets worse from there on.

Also, please capitalize consistently, spell KASLR, not 'kaslr' or 'aslr'.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v4 00/20] x86, boot: kaslr cleanup and 64bit kaslr support
  2016-04-05 20:00     ` [kernel-hardening] " Kees Cook
@ 2016-04-13 10:19       ` Ingo Molnar
  -1 siblings, 0 replies; 52+ messages in thread
From: Ingo Molnar @ 2016-04-13 10:19 UTC (permalink / raw)
  To: Kees Cook
  Cc: Ingo Molnar, LKML, Baoquan He, Yinghai Lu, H. Peter Anvin,
	Borislav Petkov, Vivek Goyal, Andy Lutomirski, lasse.collin,
	Andrew Morton, Dave Young, kernel-hardening


* Kees Cook <keescook@chromium.org> wrote:

> FWIW, I've also had this tree up in my git branches, and the 0day
> tester hasn't complained at all about it in the last two weeks. I'd
> really like to see this in -next to fix the >4G (mainly kexec) issues
> and get us to feature parity with the arm64 kASLR work (randomized
> virtual address).

So I started applying the patches, started fixing up changelogs and gave up on 
patch #3.

Changelogs of such non-trivial code need to be proper English and need to be 
understandable.

For example patch #3 starts with:

> Current z_extract_offset is calculated in boot/compressed/mkpiggy.c. The problem 
> is in mkpiggy.c we don't know the detail of decompressor. Then we just get a 
> rough z_extract_offset according to extra_bytes. As we know extra_bytes can only 
> promise a safety margin when decompressing. In fact this brings some risks:

Beyond the bad grammar of the _first word_ of the changelog, this is not a proper 
high level description of the change. A _real_ high level description would be 
something like:

  > Currently z_extract_offset is calculated during kernel build time. The problem 
  > with that method is that at this stage we don't yet know the decompression 
  > buffer sizes - we only know that during bootup.
  >
  > Effects of this are that when we calculate z_extract_offset during the build 
  > we don't know the precise decompression details, we'll only get a rough 
  > estimation of z_extract_offset.
  >
  > Instead of that we want to calculate it during bootup.
  
etc. etc. - the whole series is _full_ of such crappy changelogs that make it 
difficult for me and others to see whether the author actually _understands_ the 
existing code or is hacking away on it. It's also much harder to review and 
validate.

This is totally unacceptable.

Please make sure every changelog starts with a proper high level description that 
tells the story and convinces the reader about what the problem is and what the 
change should be.

And part of that are the patch titles. Things like:

Subject: [PATCH v3 03/19] x86, boot: Move z_extract_offset calculation to header.S

are absolutely mindless titles. A better title would be:

      x86/boot: Calculate precise decompressor parameters during bootup, not build time

... or something like that. Even having read the changelog 3 times I'm unsure what 
the change really is about.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [kernel-hardening] Re: [PATCH v4 00/20] x86, boot: kaslr cleanup and 64bit kaslr support
@ 2016-04-13 10:19       ` Ingo Molnar
  0 siblings, 0 replies; 52+ messages in thread
From: Ingo Molnar @ 2016-04-13 10:19 UTC (permalink / raw)
  To: Kees Cook
  Cc: Ingo Molnar, LKML, Baoquan He, Yinghai Lu, H. Peter Anvin,
	Borislav Petkov, Vivek Goyal, Andy Lutomirski, lasse.collin,
	Andrew Morton, Dave Young, kernel-hardening


* Kees Cook <keescook@chromium.org> wrote:

> FWIW, I've also had this tree up in my git branches, and the 0day
> tester hasn't complained at all about it in the last two weeks. I'd
> really like to see this in -next to fix the >4G (mainly kexec) issues
> and get us to feature parity with the arm64 kASLR work (randomized
> virtual address).

So I started applying the patches, started fixing up changelogs and gave up on 
patch #3.

Changelogs of such non-trivial code need to be proper English and need to be 
understandable.

For example patch #3 starts with:

> Current z_extract_offset is calculated in boot/compressed/mkpiggy.c. The problem 
> is in mkpiggy.c we don't know the detail of decompressor. Then we just get a 
> rough z_extract_offset according to extra_bytes. As we know extra_bytes can only 
> promise a safety margin when decompressing. In fact this brings some risks:

Beyond the bad grammar of the _first word_ of the changelog, this is not a proper 
high level description of the change. A _real_ high level description would be 
something like:

  > Currently z_extract_offset is calculated during kernel build time. The problem 
  > with that method is that at this stage we don't yet know the decompression 
  > buffer sizes - we only know that during bootup.
  >
  > Effects of this are that when we calculate z_extract_offset during the build 
  > we don't know the precise decompression details, we'll only get a rough 
  > estimation of z_extract_offset.
  >
  > Instead of that we want to calculate it during bootup.
  
etc. etc. - the whole series is _full_ of such crappy changelogs that make it 
difficult for me and others to see whether the author actually _understands_ the 
existing code or is hacking away on it. It's also much harder to review and 
validate.

This is totally unacceptable.

Please make sure every changelog starts with a proper high level description that 
tells the story and convinces the reader about what the problem is and what the 
change should be.

And part of that are the patch titles. Things like:

Subject: [PATCH v3 03/19] x86, boot: Move z_extract_offset calculation to header.S

are absolutely mindless titles. A better title would be:

      x86/boot: Calculate precise decompressor parameters during bootup, not build time

... or something like that. Even having read the changelog 3 times I'm unsure what 
the change really is about.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v4 00/20] x86, boot: kaslr cleanup and 64bit kaslr support
  2016-04-13 10:19       ` [kernel-hardening] " Ingo Molnar
@ 2016-04-13 14:11         ` Kees Cook
  -1 siblings, 0 replies; 52+ messages in thread
From: Kees Cook @ 2016-04-13 14:11 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Ingo Molnar, LKML, Baoquan He, Yinghai Lu, H. Peter Anvin,
	Borislav Petkov, Vivek Goyal, Andy Lutomirski, lasse.collin,
	Andrew Morton, Dave Young, kernel-hardening

On Wed, Apr 13, 2016 at 3:19 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Kees Cook <keescook@chromium.org> wrote:
>
>> FWIW, I've also had this tree up in my git branches, and the 0day
>> tester hasn't complained at all about it in the last two weeks. I'd
>> really like to see this in -next to fix the >4G (mainly kexec) issues
>> and get us to feature parity with the arm64 kASLR work (randomized
>> virtual address).
>
> So I started applying the patches, started fixing up changelogs and gave up on
> patch #3.
>
> Changelogs of such non-trivial code need to be proper English and need to be
> understandable.
>
> For example patch #3 starts with:
>
>> Current z_extract_offset is calculated in boot/compressed/mkpiggy.c. The problem
>> is in mkpiggy.c we don't know the detail of decompressor. Then we just get a
>> rough z_extract_offset according to extra_bytes. As we know extra_bytes can only
>> promise a safety margin when decompressing. In fact this brings some risks:
>
> Beyond the bad grammar of the _first word_ of the changelog, this is not a proper
> high level description of the change. A _real_ high level description would be
> something like:
>
>   > Currently z_extract_offset is calculated during kernel build time. The problem
>   > with that method is that at this stage we don't yet know the decompression
>   > buffer sizes - we only know that during bootup.
>   >
>   > Effects of this are that when we calculate z_extract_offset during the build
>   > we don't know the precise decompression details, we'll only get a rough
>   > estimation of z_extract_offset.
>   >
>   > Instead of that we want to calculate it during bootup.
>
> etc. etc. - the whole series is _full_ of such crappy changelogs that make it
> difficult for me and others to see whether the author actually _understands_ the
> existing code or is hacking away on it. It's also much harder to review and
> validate.
>
> This is totally unacceptable.
>
> Please make sure every changelog starts with a proper high level description that
> tells the story and convinces the reader about what the problem is and what the
> change should be.
>
> And part of that are the patch titles. Things like:
>
> Subject: [PATCH v3 03/19] x86, boot: Move z_extract_offset calculation to header.S
>
> are absolutely mindless titles. A better title would be:
>
>       x86/boot: Calculate precise decompressor parameters during bootup, not build time
>
> ... or something like that. Even having read the changelog 3 times I'm unsure what
> the change really is about.

I'll rewrite all the changelogs and resend the series. Thanks taking a look! :)

-Kees

-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [kernel-hardening] Re: [PATCH v4 00/20] x86, boot: kaslr cleanup and 64bit kaslr support
@ 2016-04-13 14:11         ` Kees Cook
  0 siblings, 0 replies; 52+ messages in thread
From: Kees Cook @ 2016-04-13 14:11 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Ingo Molnar, LKML, Baoquan He, Yinghai Lu, H. Peter Anvin,
	Borislav Petkov, Vivek Goyal, Andy Lutomirski, lasse.collin,
	Andrew Morton, Dave Young, kernel-hardening

On Wed, Apr 13, 2016 at 3:19 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Kees Cook <keescook@chromium.org> wrote:
>
>> FWIW, I've also had this tree up in my git branches, and the 0day
>> tester hasn't complained at all about it in the last two weeks. I'd
>> really like to see this in -next to fix the >4G (mainly kexec) issues
>> and get us to feature parity with the arm64 kASLR work (randomized
>> virtual address).
>
> So I started applying the patches, started fixing up changelogs and gave up on
> patch #3.
>
> Changelogs of such non-trivial code need to be proper English and need to be
> understandable.
>
> For example patch #3 starts with:
>
>> Current z_extract_offset is calculated in boot/compressed/mkpiggy.c. The problem
>> is in mkpiggy.c we don't know the detail of decompressor. Then we just get a
>> rough z_extract_offset according to extra_bytes. As we know extra_bytes can only
>> promise a safety margin when decompressing. In fact this brings some risks:
>
> Beyond the bad grammar of the _first word_ of the changelog, this is not a proper
> high level description of the change. A _real_ high level description would be
> something like:
>
>   > Currently z_extract_offset is calculated during kernel build time. The problem
>   > with that method is that at this stage we don't yet know the decompression
>   > buffer sizes - we only know that during bootup.
>   >
>   > Effects of this are that when we calculate z_extract_offset during the build
>   > we don't know the precise decompression details, we'll only get a rough
>   > estimation of z_extract_offset.
>   >
>   > Instead of that we want to calculate it during bootup.
>
> etc. etc. - the whole series is _full_ of such crappy changelogs that make it
> difficult for me and others to see whether the author actually _understands_ the
> existing code or is hacking away on it. It's also much harder to review and
> validate.
>
> This is totally unacceptable.
>
> Please make sure every changelog starts with a proper high level description that
> tells the story and convinces the reader about what the problem is and what the
> change should be.
>
> And part of that are the patch titles. Things like:
>
> Subject: [PATCH v3 03/19] x86, boot: Move z_extract_offset calculation to header.S
>
> are absolutely mindless titles. A better title would be:
>
>       x86/boot: Calculate precise decompressor parameters during bootup, not build time
>
> ... or something like that. Even having read the changelog 3 times I'm unsure what
> the change really is about.

I'll rewrite all the changelogs and resend the series. Thanks taking a look! :)

-Kees

-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v4 00/20] x86, boot: kaslr cleanup and 64bit kaslr support
  2016-04-13 14:11         ` [kernel-hardening] " Kees Cook
@ 2016-04-14  6:02           ` Kees Cook
  -1 siblings, 0 replies; 52+ messages in thread
From: Kees Cook @ 2016-04-14  6:02 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Ingo Molnar, LKML, Baoquan He, Yinghai Lu, H. Peter Anvin,
	Borislav Petkov, Vivek Goyal, Andy Lutomirski, lasse.collin,
	Andrew Morton, Dave Young, kernel-hardening

On Wed, Apr 13, 2016 at 7:11 AM, Kees Cook <keescook@chromium.org> wrote:
> On Wed, Apr 13, 2016 at 3:19 AM, Ingo Molnar <mingo@kernel.org> wrote:
>>
>> * Kees Cook <keescook@chromium.org> wrote:
>>
>>> FWIW, I've also had this tree up in my git branches, and the 0day
>>> tester hasn't complained at all about it in the last two weeks. I'd
>>> really like to see this in -next to fix the >4G (mainly kexec) issues
>>> and get us to feature parity with the arm64 kASLR work (randomized
>>> virtual address).

So, I've done this and suddenly realized I hadn't boot-tested i386. It
doesn't work, unfortunately. (Which I find strange: I'd expect 0day to
have noticed...)

Baoquan, have you tested this on 32-bit systems? I get a variety of
failures. Either it boots okay, it reboots, or I get tons of pte
errors like this:

[    0.000000] clearing pte for ram above max_low_pfn: pfn: 37dcc pmd:
f9144f7c pmd phys: 39144f7c pte: f9a1b730 pte phys: 39a1b730

Can you confirm? I suspect relocation problems, but ran out of time
today to debug it.

I have the entire series with cleaned up changelogs and various other
refactorings up here now:

http://git.kernel.org/cgit/linux/kernel/git/kees/linux.git/log/?h=kaslr/highmem

Thanks!

-Kees

-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [kernel-hardening] Re: [PATCH v4 00/20] x86, boot: kaslr cleanup and 64bit kaslr support
@ 2016-04-14  6:02           ` Kees Cook
  0 siblings, 0 replies; 52+ messages in thread
From: Kees Cook @ 2016-04-14  6:02 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Ingo Molnar, LKML, Baoquan He, Yinghai Lu, H. Peter Anvin,
	Borislav Petkov, Vivek Goyal, Andy Lutomirski, lasse.collin,
	Andrew Morton, Dave Young, kernel-hardening

On Wed, Apr 13, 2016 at 7:11 AM, Kees Cook <keescook@chromium.org> wrote:
> On Wed, Apr 13, 2016 at 3:19 AM, Ingo Molnar <mingo@kernel.org> wrote:
>>
>> * Kees Cook <keescook@chromium.org> wrote:
>>
>>> FWIW, I've also had this tree up in my git branches, and the 0day
>>> tester hasn't complained at all about it in the last two weeks. I'd
>>> really like to see this in -next to fix the >4G (mainly kexec) issues
>>> and get us to feature parity with the arm64 kASLR work (randomized
>>> virtual address).

So, I've done this and suddenly realized I hadn't boot-tested i386. It
doesn't work, unfortunately. (Which I find strange: I'd expect 0day to
have noticed...)

Baoquan, have you tested this on 32-bit systems? I get a variety of
failures. Either it boots okay, it reboots, or I get tons of pte
errors like this:

[    0.000000] clearing pte for ram above max_low_pfn: pfn: 37dcc pmd:
f9144f7c pmd phys: 39144f7c pte: f9a1b730 pte phys: 39a1b730

Can you confirm? I suspect relocation problems, but ran out of time
today to debug it.

I have the entire series with cleaned up changelogs and various other
refactorings up here now:

http://git.kernel.org/cgit/linux/kernel/git/kees/linux.git/log/?h=kaslr/highmem

Thanks!

-Kees

-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v4 00/20] x86, boot: kaslr cleanup and 64bit kaslr support
  2016-04-14  6:02           ` [kernel-hardening] " Kees Cook
@ 2016-04-14  6:24             ` Baoquan He
  -1 siblings, 0 replies; 52+ messages in thread
From: Baoquan He @ 2016-04-14  6:24 UTC (permalink / raw)
  To: Kees Cook
  Cc: Ingo Molnar, Ingo Molnar, LKML, Yinghai Lu, H. Peter Anvin,
	Borislav Petkov, Vivek Goyal, Andy Lutomirski, lasse.collin,
	Andrew Morton, Dave Young, kernel-hardening

On 04/13/16 at 11:02pm, Kees Cook wrote:
> On Wed, Apr 13, 2016 at 7:11 AM, Kees Cook <keescook@chromium.org> wrote:
> > On Wed, Apr 13, 2016 at 3:19 AM, Ingo Molnar <mingo@kernel.org> wrote:
> >>
> >> * Kees Cook <keescook@chromium.org> wrote:
> >>
> >>> FWIW, I've also had this tree up in my git branches, and the 0day
> >>> tester hasn't complained at all about it in the last two weeks. I'd
> >>> really like to see this in -next to fix the >4G (mainly kexec) issues
> >>> and get us to feature parity with the arm64 kASLR work (randomized
> >>> virtual address).
> 
> So, I've done this and suddenly realized I hadn't boot-tested i386. It
> doesn't work, unfortunately. (Which I find strange: I'd expect 0day to
> have noticed...)
> 
> Baoquan, have you tested this on 32-bit systems? I get a variety of
> failures. Either it boots okay, it reboots, or I get tons of pte
> errors like this:
> 
> [    0.000000] clearing pte for ram above max_low_pfn: pfn: 37dcc pmd:
> f9144f7c pmd phys: 39144f7c pte: f9a1b730 pte phys: 39a1b730
> 
> Can you confirm? I suspect relocation problems, but ran out of time
> today to debug it.

Sorry, I didn't test i386 either before. I will get a i386 machine and
check it now. I know it's very late in your side, you need rest. Will
report update if there's any progress.

> 
> I have the entire series with cleaned up changelogs and various other
> refactorings up here now:
> 
> http://git.kernel.org/cgit/linux/kernel/git/kees/linux.git/log/?h=kaslr/highmem
> 
> Thanks!
> 
> -Kees
> 
> -- 
> Kees Cook
> Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [kernel-hardening] Re: [PATCH v4 00/20] x86, boot: kaslr cleanup and 64bit kaslr support
@ 2016-04-14  6:24             ` Baoquan He
  0 siblings, 0 replies; 52+ messages in thread
From: Baoquan He @ 2016-04-14  6:24 UTC (permalink / raw)
  To: Kees Cook
  Cc: Ingo Molnar, Ingo Molnar, LKML, Yinghai Lu, H. Peter Anvin,
	Borislav Petkov, Vivek Goyal, Andy Lutomirski, lasse.collin,
	Andrew Morton, Dave Young, kernel-hardening

On 04/13/16 at 11:02pm, Kees Cook wrote:
> On Wed, Apr 13, 2016 at 7:11 AM, Kees Cook <keescook@chromium.org> wrote:
> > On Wed, Apr 13, 2016 at 3:19 AM, Ingo Molnar <mingo@kernel.org> wrote:
> >>
> >> * Kees Cook <keescook@chromium.org> wrote:
> >>
> >>> FWIW, I've also had this tree up in my git branches, and the 0day
> >>> tester hasn't complained at all about it in the last two weeks. I'd
> >>> really like to see this in -next to fix the >4G (mainly kexec) issues
> >>> and get us to feature parity with the arm64 kASLR work (randomized
> >>> virtual address).
> 
> So, I've done this and suddenly realized I hadn't boot-tested i386. It
> doesn't work, unfortunately. (Which I find strange: I'd expect 0day to
> have noticed...)
> 
> Baoquan, have you tested this on 32-bit systems? I get a variety of
> failures. Either it boots okay, it reboots, or I get tons of pte
> errors like this:
> 
> [    0.000000] clearing pte for ram above max_low_pfn: pfn: 37dcc pmd:
> f9144f7c pmd phys: 39144f7c pte: f9a1b730 pte phys: 39a1b730
> 
> Can you confirm? I suspect relocation problems, but ran out of time
> today to debug it.

Sorry, I didn't test i386 either before. I will get a i386 machine and
check it now. I know it's very late in your side, you need rest. Will
report update if there's any progress.

> 
> I have the entire series with cleaned up changelogs and various other
> refactorings up here now:
> 
> http://git.kernel.org/cgit/linux/kernel/git/kees/linux.git/log/?h=kaslr/highmem
> 
> Thanks!
> 
> -Kees
> 
> -- 
> Kees Cook
> Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v4 00/20] x86, boot: kaslr cleanup and 64bit kaslr support
  2016-04-14  6:02           ` [kernel-hardening] " Kees Cook
@ 2016-04-14 15:06             ` Baoquan He
  -1 siblings, 0 replies; 52+ messages in thread
From: Baoquan He @ 2016-04-14 15:06 UTC (permalink / raw)
  To: Kees Cook
  Cc: Ingo Molnar, Ingo Molnar, LKML, Yinghai Lu, H. Peter Anvin,
	Borislav Petkov, Vivek Goyal, Andy Lutomirski, lasse.collin,
	Andrew Morton, Dave Young, kernel-hardening

On 04/13/16 at 11:02pm, Kees Cook wrote:
> On Wed, Apr 13, 2016 at 7:11 AM, Kees Cook <keescook@chromium.org> wrote:
> > On Wed, Apr 13, 2016 at 3:19 AM, Ingo Molnar <mingo@kernel.org> wrote:
> >>
> >> * Kees Cook <keescook@chromium.org> wrote:
> >>
> >>> FWIW, I've also had this tree up in my git branches, and the 0day
> >>> tester hasn't complained at all about it in the last two weeks. I'd
> >>> really like to see this in -next to fix the >4G (mainly kexec) issues
> >>> and get us to feature parity with the arm64 kASLR work (randomized
> >>> virtual address).
> 
> So, I've done this and suddenly realized I hadn't boot-tested i386. It
> doesn't work, unfortunately. (Which I find strange: I'd expect 0day to
> have noticed...)
> 
> Baoquan, have you tested this on 32-bit systems? I get a variety of
> failures. Either it boots okay, it reboots, or I get tons of pte
> errors like this:

Hi Kees,

I am sorry I didn't notice the change impacts i386. I got a i386 machine
and had tests. Found i386 can't take separate randomzation since there's
difference between i386 and x86_64. 

x86_64 has phys_base and can translate virt addr and phys addr according
to below formula:

paddr = vaddr - __START_KERNEL_map + phys_base;

However i386 can only do like this:

paddr = vaddr - PAGE_OFFSET;

Besides i386 has to reserve 128M for VMALLOC at the end of kernel
virtual address. So for i386 area 768M is the upper limit for
randomization. But I am fine with the KERNEL_IMAGE_SIZE, the old default
value. What do you say about this?

So the plan should be keeping the old style of randomization for i386
system:

1) Disable virtual address randomization in i386 case because it's
useless. This should be done in patch:
 x86, KASLR: Randomize virtual address separately

2) Add an upper limit for physical randomization if it's i386 system.
 x86, KASLR: Add physical address randomization >4G

I just got a test machine in office, and haven't had time to change
code. You can change it directly, or I will do it tomorrow.

Thanks

> 
> [    0.000000] clearing pte for ram above max_low_pfn: pfn: 37dcc pmd:
> f9144f7c pmd phys: 39144f7c pte: f9a1b730 pte phys: 39a1b730
> 
> Can you confirm? I suspect relocation problems, but ran out of time
> today to debug it.
> 
> I have the entire series with cleaned up changelogs and various other
> refactorings up here now:
> 
> http://git.kernel.org/cgit/linux/kernel/git/kees/linux.git/log/?h=kaslr/highmem
> 
> Thanks!
> 
> -Kees
> 
> -- 
> Kees Cook
> Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [kernel-hardening] Re: [PATCH v4 00/20] x86, boot: kaslr cleanup and 64bit kaslr support
@ 2016-04-14 15:06             ` Baoquan He
  0 siblings, 0 replies; 52+ messages in thread
From: Baoquan He @ 2016-04-14 15:06 UTC (permalink / raw)
  To: Kees Cook
  Cc: Ingo Molnar, Ingo Molnar, LKML, Yinghai Lu, H. Peter Anvin,
	Borislav Petkov, Vivek Goyal, Andy Lutomirski, lasse.collin,
	Andrew Morton, Dave Young, kernel-hardening

On 04/13/16 at 11:02pm, Kees Cook wrote:
> On Wed, Apr 13, 2016 at 7:11 AM, Kees Cook <keescook@chromium.org> wrote:
> > On Wed, Apr 13, 2016 at 3:19 AM, Ingo Molnar <mingo@kernel.org> wrote:
> >>
> >> * Kees Cook <keescook@chromium.org> wrote:
> >>
> >>> FWIW, I've also had this tree up in my git branches, and the 0day
> >>> tester hasn't complained at all about it in the last two weeks. I'd
> >>> really like to see this in -next to fix the >4G (mainly kexec) issues
> >>> and get us to feature parity with the arm64 kASLR work (randomized
> >>> virtual address).
> 
> So, I've done this and suddenly realized I hadn't boot-tested i386. It
> doesn't work, unfortunately. (Which I find strange: I'd expect 0day to
> have noticed...)
> 
> Baoquan, have you tested this on 32-bit systems? I get a variety of
> failures. Either it boots okay, it reboots, or I get tons of pte
> errors like this:

Hi Kees,

I am sorry I didn't notice the change impacts i386. I got a i386 machine
and had tests. Found i386 can't take separate randomzation since there's
difference between i386 and x86_64. 

x86_64 has phys_base and can translate virt addr and phys addr according
to below formula:

paddr = vaddr - __START_KERNEL_map + phys_base;

However i386 can only do like this:

paddr = vaddr - PAGE_OFFSET;

Besides i386 has to reserve 128M for VMALLOC at the end of kernel
virtual address. So for i386 area 768M is the upper limit for
randomization. But I am fine with the KERNEL_IMAGE_SIZE, the old default
value. What do you say about this?

So the plan should be keeping the old style of randomization for i386
system:

1) Disable virtual address randomization in i386 case because it's
useless. This should be done in patch:
 x86, KASLR: Randomize virtual address separately

2) Add an upper limit for physical randomization if it's i386 system.
 x86, KASLR: Add physical address randomization >4G

I just got a test machine in office, and haven't had time to change
code. You can change it directly, or I will do it tomorrow.

Thanks

> 
> [    0.000000] clearing pte for ram above max_low_pfn: pfn: 37dcc pmd:
> f9144f7c pmd phys: 39144f7c pte: f9a1b730 pte phys: 39a1b730
> 
> Can you confirm? I suspect relocation problems, but ran out of time
> today to debug it.
> 
> I have the entire series with cleaned up changelogs and various other
> refactorings up here now:
> 
> http://git.kernel.org/cgit/linux/kernel/git/kees/linux.git/log/?h=kaslr/highmem
> 
> Thanks!
> 
> -Kees
> 
> -- 
> Kees Cook
> Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v4 00/20] x86, boot: kaslr cleanup and 64bit kaslr support
  2016-04-14 15:06             ` [kernel-hardening] " Baoquan He
@ 2016-04-14 17:56               ` Kees Cook
  -1 siblings, 0 replies; 52+ messages in thread
From: Kees Cook @ 2016-04-14 17:56 UTC (permalink / raw)
  To: Baoquan He
  Cc: Ingo Molnar, Ingo Molnar, LKML, Yinghai Lu, H. Peter Anvin,
	Borislav Petkov, Vivek Goyal, Andy Lutomirski, lasse.collin,
	Andrew Morton, Dave Young, kernel-hardening

On Thu, Apr 14, 2016 at 8:06 AM, Baoquan He <bhe@redhat.com> wrote:
> On 04/13/16 at 11:02pm, Kees Cook wrote:
>> On Wed, Apr 13, 2016 at 7:11 AM, Kees Cook <keescook@chromium.org> wrote:
>> > On Wed, Apr 13, 2016 at 3:19 AM, Ingo Molnar <mingo@kernel.org> wrote:
>> >>
>> >> * Kees Cook <keescook@chromium.org> wrote:
>> >>
>> >>> FWIW, I've also had this tree up in my git branches, and the 0day
>> >>> tester hasn't complained at all about it in the last two weeks. I'd
>> >>> really like to see this in -next to fix the >4G (mainly kexec) issues
>> >>> and get us to feature parity with the arm64 kASLR work (randomized
>> >>> virtual address).
>>
>> So, I've done this and suddenly realized I hadn't boot-tested i386. It
>> doesn't work, unfortunately. (Which I find strange: I'd expect 0day to
>> have noticed...)
>>
>> Baoquan, have you tested this on 32-bit systems? I get a variety of
>> failures. Either it boots okay, it reboots, or I get tons of pte
>> errors like this:
>
> Hi Kees,
>
> I am sorry I didn't notice the change impacts i386. I got a i386 machine
> and had tests. Found i386 can't take separate randomzation since there's
> difference between i386 and x86_64.
>
> x86_64 has phys_base and can translate virt addr and phys addr according
> to below formula:
>
> paddr = vaddr - __START_KERNEL_map + phys_base;
>
> However i386 can only do like this:
>
> paddr = vaddr - PAGE_OFFSET;
>
> Besides i386 has to reserve 128M for VMALLOC at the end of kernel
> virtual address. So for i386 area 768M is the upper limit for
> randomization. But I am fine with the KERNEL_IMAGE_SIZE, the old default
> value. What do you say about this?
>
> So the plan should be keeping the old style of randomization for i386
> system:
>
> 1) Disable virtual address randomization in i386 case because it's
> useless. This should be done in patch:
>  x86, KASLR: Randomize virtual address separately
>
> 2) Add an upper limit for physical randomization if it's i386 system.
>  x86, KASLR: Add physical address randomization >4G
>
> I just got a test machine in office, and haven't had time to change
> code. You can change it directly, or I will do it tomorrow.

I was thinking about the physical vs virtual on i386 as I woke up
today. :) Thanks for confirming! These changes appear to make the
series boot reliably on i386 (pardon any gmail-induced whitespace
damage...):

diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index 5bae54b50d4c..3a58fe8acb8e 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -347,6 +347,10 @@ static void process_e820_entry(struct e820entry *entry,
        if (entry->type != E820_RAM)
                return;

+       /* On 32-bit, ignore entries entirely above our maximum. */
+       if (IS_ENABLED(CONFIG_X86_32) && entry->addr >= KERNEL_IMAGE_SIZE)
+               return;
+
        /* Ignore entries entirely below our minimum. */
        if (entry->addr + entry->size < minimum)
                return;
@@ -372,6 +376,11 @@ static void process_e820_entry(struct e820entry *entry,
                /* Reduce size by any delta from the original address. */
                region.size -= region.start - start_orig;

+               /* On 32-bit, reduce region size to fit within max size. */
+               if (IS_ENABLED(CONFIG_X86_32) &&
+                   region.start + region.size > KERNEL_IMAGE_SIZE)
+                       region.size = KERNEL_IMAGE_SIZE - region.start;
+
                /* Return if region can't contain decompressed kernel */
                if (region.size < image_size)
                        return;
@@ -488,6 +497,8 @@ void choose_kernel_location(unsigned char *input,
        }

        /* Pick random virtual address starting from LOAD_PHYSICAL_ADDR. */
-       random = find_random_virt_offset(LOAD_PHYSICAL_ADDR, output_size);
+       if (IS_ENABLED(CONFIG_X86_64))
+               random = find_random_virt_offset(LOAD_PHYSICAL_ADDR,
+                                                output_size);
        *virt_offset = (unsigned char *)random;
 }


I will split these chunks up into the correct patches and resend the
series. If you get a chance, can you double-check this?

Thanks!

-Kees


>
> Thanks
>
>>
>> [    0.000000] clearing pte for ram above max_low_pfn: pfn: 37dcc pmd:
>> f9144f7c pmd phys: 39144f7c pte: f9a1b730 pte phys: 39a1b730
>>
>> Can you confirm? I suspect relocation problems, but ran out of time
>> today to debug it.
>>
>> I have the entire series with cleaned up changelogs and various other
>> refactorings up here now:
>>
>> http://git.kernel.org/cgit/linux/kernel/git/kees/linux.git/log/?h=kaslr/highmem
>>
>> Thanks!
>>
>> -Kees
>>
>> --
>> Kees Cook
>> Chrome OS & Brillo Security



-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [kernel-hardening] Re: [PATCH v4 00/20] x86, boot: kaslr cleanup and 64bit kaslr support
@ 2016-04-14 17:56               ` Kees Cook
  0 siblings, 0 replies; 52+ messages in thread
From: Kees Cook @ 2016-04-14 17:56 UTC (permalink / raw)
  To: Baoquan He
  Cc: Ingo Molnar, Ingo Molnar, LKML, Yinghai Lu, H. Peter Anvin,
	Borislav Petkov, Vivek Goyal, Andy Lutomirski, lasse.collin,
	Andrew Morton, Dave Young, kernel-hardening

On Thu, Apr 14, 2016 at 8:06 AM, Baoquan He <bhe@redhat.com> wrote:
> On 04/13/16 at 11:02pm, Kees Cook wrote:
>> On Wed, Apr 13, 2016 at 7:11 AM, Kees Cook <keescook@chromium.org> wrote:
>> > On Wed, Apr 13, 2016 at 3:19 AM, Ingo Molnar <mingo@kernel.org> wrote:
>> >>
>> >> * Kees Cook <keescook@chromium.org> wrote:
>> >>
>> >>> FWIW, I've also had this tree up in my git branches, and the 0day
>> >>> tester hasn't complained at all about it in the last two weeks. I'd
>> >>> really like to see this in -next to fix the >4G (mainly kexec) issues
>> >>> and get us to feature parity with the arm64 kASLR work (randomized
>> >>> virtual address).
>>
>> So, I've done this and suddenly realized I hadn't boot-tested i386. It
>> doesn't work, unfortunately. (Which I find strange: I'd expect 0day to
>> have noticed...)
>>
>> Baoquan, have you tested this on 32-bit systems? I get a variety of
>> failures. Either it boots okay, it reboots, or I get tons of pte
>> errors like this:
>
> Hi Kees,
>
> I am sorry I didn't notice the change impacts i386. I got a i386 machine
> and had tests. Found i386 can't take separate randomzation since there's
> difference between i386 and x86_64.
>
> x86_64 has phys_base and can translate virt addr and phys addr according
> to below formula:
>
> paddr = vaddr - __START_KERNEL_map + phys_base;
>
> However i386 can only do like this:
>
> paddr = vaddr - PAGE_OFFSET;
>
> Besides i386 has to reserve 128M for VMALLOC at the end of kernel
> virtual address. So for i386 area 768M is the upper limit for
> randomization. But I am fine with the KERNEL_IMAGE_SIZE, the old default
> value. What do you say about this?
>
> So the plan should be keeping the old style of randomization for i386
> system:
>
> 1) Disable virtual address randomization in i386 case because it's
> useless. This should be done in patch:
>  x86, KASLR: Randomize virtual address separately
>
> 2) Add an upper limit for physical randomization if it's i386 system.
>  x86, KASLR: Add physical address randomization >4G
>
> I just got a test machine in office, and haven't had time to change
> code. You can change it directly, or I will do it tomorrow.

I was thinking about the physical vs virtual on i386 as I woke up
today. :) Thanks for confirming! These changes appear to make the
series boot reliably on i386 (pardon any gmail-induced whitespace
damage...):

diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index 5bae54b50d4c..3a58fe8acb8e 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -347,6 +347,10 @@ static void process_e820_entry(struct e820entry *entry,
        if (entry->type != E820_RAM)
                return;

+       /* On 32-bit, ignore entries entirely above our maximum. */
+       if (IS_ENABLED(CONFIG_X86_32) && entry->addr >= KERNEL_IMAGE_SIZE)
+               return;
+
        /* Ignore entries entirely below our minimum. */
        if (entry->addr + entry->size < minimum)
                return;
@@ -372,6 +376,11 @@ static void process_e820_entry(struct e820entry *entry,
                /* Reduce size by any delta from the original address. */
                region.size -= region.start - start_orig;

+               /* On 32-bit, reduce region size to fit within max size. */
+               if (IS_ENABLED(CONFIG_X86_32) &&
+                   region.start + region.size > KERNEL_IMAGE_SIZE)
+                       region.size = KERNEL_IMAGE_SIZE - region.start;
+
                /* Return if region can't contain decompressed kernel */
                if (region.size < image_size)
                        return;
@@ -488,6 +497,8 @@ void choose_kernel_location(unsigned char *input,
        }

        /* Pick random virtual address starting from LOAD_PHYSICAL_ADDR. */
-       random = find_random_virt_offset(LOAD_PHYSICAL_ADDR, output_size);
+       if (IS_ENABLED(CONFIG_X86_64))
+               random = find_random_virt_offset(LOAD_PHYSICAL_ADDR,
+                                                output_size);
        *virt_offset = (unsigned char *)random;
 }


I will split these chunks up into the correct patches and resend the
series. If you get a chance, can you double-check this?

Thanks!

-Kees


>
> Thanks
>
>>
>> [    0.000000] clearing pte for ram above max_low_pfn: pfn: 37dcc pmd:
>> f9144f7c pmd phys: 39144f7c pte: f9a1b730 pte phys: 39a1b730
>>
>> Can you confirm? I suspect relocation problems, but ran out of time
>> today to debug it.
>>
>> I have the entire series with cleaned up changelogs and various other
>> refactorings up here now:
>>
>> http://git.kernel.org/cgit/linux/kernel/git/kees/linux.git/log/?h=kaslr/highmem
>>
>> Thanks!
>>
>> -Kees
>>
>> --
>> Kees Cook
>> Chrome OS & Brillo Security



-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PATCH v4 00/20] x86, boot: kaslr cleanup and 64bit kaslr support
  2016-04-14 17:56               ` [kernel-hardening] " Kees Cook
@ 2016-04-15  4:08                 ` Baoquan He
  -1 siblings, 0 replies; 52+ messages in thread
From: Baoquan He @ 2016-04-15  4:08 UTC (permalink / raw)
  To: Kees Cook
  Cc: Ingo Molnar, Ingo Molnar, LKML, Yinghai Lu, H. Peter Anvin,
	Borislav Petkov, Vivek Goyal, Andy Lutomirski, lasse.collin,
	Andrew Morton, Dave Young, kernel-hardening

On 04/14/16 at 10:56am, Kees Cook wrote:
> On Thu, Apr 14, 2016 at 8:06 AM, Baoquan He <bhe@redhat.com> wrote:
> > On 04/13/16 at 11:02pm, Kees Cook wrote:
> >> On Wed, Apr 13, 2016 at 7:11 AM, Kees Cook <keescook@chromium.org> wrote:
> >> > On Wed, Apr 13, 2016 at 3:19 AM, Ingo Molnar <mingo@kernel.org> wrote:
> >> >>
> >> >> * Kees Cook <keescook@chromium.org> wrote:
> >> >>
> >> >>> FWIW, I've also had this tree up in my git branches, and the 0day
> >> >>> tester hasn't complained at all about it in the last two weeks. I'd
> >> >>> really like to see this in -next to fix the >4G (mainly kexec) issues
> >> >>> and get us to feature parity with the arm64 kASLR work (randomized
> >> >>> virtual address).
> >>
> >> So, I've done this and suddenly realized I hadn't boot-tested i386. It
> >> doesn't work, unfortunately. (Which I find strange: I'd expect 0day to
> >> have noticed...)
> >>
> >> Baoquan, have you tested this on 32-bit systems? I get a variety of
> >> failures. Either it boots okay, it reboots, or I get tons of pte
> >> errors like this:
> >
> > Hi Kees,
> >
> > I am sorry I didn't notice the change impacts i386. I got a i386 machine
> > and had tests. Found i386 can't take separate randomzation since there's
> > difference between i386 and x86_64.
> >
> > x86_64 has phys_base and can translate virt addr and phys addr according
> > to below formula:
> >
> > paddr = vaddr - __START_KERNEL_map + phys_base;
> >
> > However i386 can only do like this:
> >
> > paddr = vaddr - PAGE_OFFSET;
> >
> > Besides i386 has to reserve 128M for VMALLOC at the end of kernel
> > virtual address. So for i386 area 768M is the upper limit for
> > randomization. But I am fine with the KERNEL_IMAGE_SIZE, the old default
> > value. What do you say about this?
> >
> > So the plan should be keeping the old style of randomization for i386
> > system:
> >
> > 1) Disable virtual address randomization in i386 case because it's
> > useless. This should be done in patch:
> >  x86, KASLR: Randomize virtual address separately
> >
> > 2) Add an upper limit for physical randomization if it's i386 system.
> >  x86, KASLR: Add physical address randomization >4G
> >
> > I just got a test machine in office, and haven't had time to change
> > code. You can change it directly, or I will do it tomorrow.
> 
> I was thinking about the physical vs virtual on i386 as I woke up
> today. :) Thanks for confirming! These changes appear to make the
> series boot reliably on i386 (pardon any gmail-induced whitespace
> damage...):
> 
> diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
> index 5bae54b50d4c..3a58fe8acb8e 100644
> --- a/arch/x86/boot/compressed/aslr.c
> +++ b/arch/x86/boot/compressed/aslr.c
> @@ -347,6 +347,10 @@ static void process_e820_entry(struct e820entry *entry,
>         if (entry->type != E820_RAM)
>                 return;
> 
> +       /* On 32-bit, ignore entries entirely above our maximum. */
> +       if (IS_ENABLED(CONFIG_X86_32) && entry->addr >= KERNEL_IMAGE_SIZE)
> +               return;
> +
>         /* Ignore entries entirely below our minimum. */
>         if (entry->addr + entry->size < minimum)
>                 return;
> @@ -372,6 +376,11 @@ static void process_e820_entry(struct e820entry *entry,
>                 /* Reduce size by any delta from the original address. */
>                 region.size -= region.start - start_orig;
> 
> +               /* On 32-bit, reduce region size to fit within max size. */
> +               if (IS_ENABLED(CONFIG_X86_32) &&
> +                   region.start + region.size > KERNEL_IMAGE_SIZE)
> +                       region.size = KERNEL_IMAGE_SIZE - region.start;
> +
>                 /* Return if region can't contain decompressed kernel */
>                 if (region.size < image_size)
>                         return;
> @@ -488,6 +497,8 @@ void choose_kernel_location(unsigned char *input,
>         }
> 
>         /* Pick random virtual address starting from LOAD_PHYSICAL_ADDR. */
> -       random = find_random_virt_offset(LOAD_PHYSICAL_ADDR, output_size);
> +       if (IS_ENABLED(CONFIG_X86_64))
> +               random = find_random_virt_offset(LOAD_PHYSICAL_ADDR,
> +                                                output_size);
>         *virt_offset = (unsigned char *)random;
>  }
> 
> 
> I will split these chunks up into the correct patches and resend the
> series. If you get a chance, can you double-check this?

Yes, these changes sounds great. I checked the series you posted, and
have to say you make them look much better. The change logs are perfect
and great code refactoring. Just one little bit thing, here:

[kees: rewrote changelog, refactored goto into while, limit 32-bit to 1G]
in patch [PATCH v5 19/21] x86, KASLR: Add physical address randomization >4G

In i386 KERNEL_IMAGE_SIZE is kept to be 0x20000000, namely 512M, w/o kaslr
enabled. So here I guess it's a typo, should be "limit 32-bit to 1G". And
what I said is wrong about upper limit yesterday, in fact i386 can put kernel
in [16M, 896M), not 768M. But KERNEL_IMAGE_SIZE is good enough for i386 for
now.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [kernel-hardening] Re: [PATCH v4 00/20] x86, boot: kaslr cleanup and 64bit kaslr support
@ 2016-04-15  4:08                 ` Baoquan He
  0 siblings, 0 replies; 52+ messages in thread
From: Baoquan He @ 2016-04-15  4:08 UTC (permalink / raw)
  To: Kees Cook
  Cc: Ingo Molnar, Ingo Molnar, LKML, Yinghai Lu, H. Peter Anvin,
	Borislav Petkov, Vivek Goyal, Andy Lutomirski, lasse.collin,
	Andrew Morton, Dave Young, kernel-hardening

On 04/14/16 at 10:56am, Kees Cook wrote:
> On Thu, Apr 14, 2016 at 8:06 AM, Baoquan He <bhe@redhat.com> wrote:
> > On 04/13/16 at 11:02pm, Kees Cook wrote:
> >> On Wed, Apr 13, 2016 at 7:11 AM, Kees Cook <keescook@chromium.org> wrote:
> >> > On Wed, Apr 13, 2016 at 3:19 AM, Ingo Molnar <mingo@kernel.org> wrote:
> >> >>
> >> >> * Kees Cook <keescook@chromium.org> wrote:
> >> >>
> >> >>> FWIW, I've also had this tree up in my git branches, and the 0day
> >> >>> tester hasn't complained at all about it in the last two weeks. I'd
> >> >>> really like to see this in -next to fix the >4G (mainly kexec) issues
> >> >>> and get us to feature parity with the arm64 kASLR work (randomized
> >> >>> virtual address).
> >>
> >> So, I've done this and suddenly realized I hadn't boot-tested i386. It
> >> doesn't work, unfortunately. (Which I find strange: I'd expect 0day to
> >> have noticed...)
> >>
> >> Baoquan, have you tested this on 32-bit systems? I get a variety of
> >> failures. Either it boots okay, it reboots, or I get tons of pte
> >> errors like this:
> >
> > Hi Kees,
> >
> > I am sorry I didn't notice the change impacts i386. I got a i386 machine
> > and had tests. Found i386 can't take separate randomzation since there's
> > difference between i386 and x86_64.
> >
> > x86_64 has phys_base and can translate virt addr and phys addr according
> > to below formula:
> >
> > paddr = vaddr - __START_KERNEL_map + phys_base;
> >
> > However i386 can only do like this:
> >
> > paddr = vaddr - PAGE_OFFSET;
> >
> > Besides i386 has to reserve 128M for VMALLOC at the end of kernel
> > virtual address. So for i386 area 768M is the upper limit for
> > randomization. But I am fine with the KERNEL_IMAGE_SIZE, the old default
> > value. What do you say about this?
> >
> > So the plan should be keeping the old style of randomization for i386
> > system:
> >
> > 1) Disable virtual address randomization in i386 case because it's
> > useless. This should be done in patch:
> >  x86, KASLR: Randomize virtual address separately
> >
> > 2) Add an upper limit for physical randomization if it's i386 system.
> >  x86, KASLR: Add physical address randomization >4G
> >
> > I just got a test machine in office, and haven't had time to change
> > code. You can change it directly, or I will do it tomorrow.
> 
> I was thinking about the physical vs virtual on i386 as I woke up
> today. :) Thanks for confirming! These changes appear to make the
> series boot reliably on i386 (pardon any gmail-induced whitespace
> damage...):
> 
> diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
> index 5bae54b50d4c..3a58fe8acb8e 100644
> --- a/arch/x86/boot/compressed/aslr.c
> +++ b/arch/x86/boot/compressed/aslr.c
> @@ -347,6 +347,10 @@ static void process_e820_entry(struct e820entry *entry,
>         if (entry->type != E820_RAM)
>                 return;
> 
> +       /* On 32-bit, ignore entries entirely above our maximum. */
> +       if (IS_ENABLED(CONFIG_X86_32) && entry->addr >= KERNEL_IMAGE_SIZE)
> +               return;
> +
>         /* Ignore entries entirely below our minimum. */
>         if (entry->addr + entry->size < minimum)
>                 return;
> @@ -372,6 +376,11 @@ static void process_e820_entry(struct e820entry *entry,
>                 /* Reduce size by any delta from the original address. */
>                 region.size -= region.start - start_orig;
> 
> +               /* On 32-bit, reduce region size to fit within max size. */
> +               if (IS_ENABLED(CONFIG_X86_32) &&
> +                   region.start + region.size > KERNEL_IMAGE_SIZE)
> +                       region.size = KERNEL_IMAGE_SIZE - region.start;
> +
>                 /* Return if region can't contain decompressed kernel */
>                 if (region.size < image_size)
>                         return;
> @@ -488,6 +497,8 @@ void choose_kernel_location(unsigned char *input,
>         }
> 
>         /* Pick random virtual address starting from LOAD_PHYSICAL_ADDR. */
> -       random = find_random_virt_offset(LOAD_PHYSICAL_ADDR, output_size);
> +       if (IS_ENABLED(CONFIG_X86_64))
> +               random = find_random_virt_offset(LOAD_PHYSICAL_ADDR,
> +                                                output_size);
>         *virt_offset = (unsigned char *)random;
>  }
> 
> 
> I will split these chunks up into the correct patches and resend the
> series. If you get a chance, can you double-check this?

Yes, these changes sounds great. I checked the series you posted, and
have to say you make them look much better. The change logs are perfect
and great code refactoring. Just one little bit thing, here:

[kees: rewrote changelog, refactored goto into while, limit 32-bit to 1G]
in patch [PATCH v5 19/21] x86, KASLR: Add physical address randomization >4G

In i386 KERNEL_IMAGE_SIZE is kept to be 0x20000000, namely 512M, w/o kaslr
enabled. So here I guess it's a typo, should be "limit 32-bit to 1G". And
what I said is wrong about upper limit yesterday, in fact i386 can put kernel
in [16M, 896M), not 768M. But KERNEL_IMAGE_SIZE is good enough for i386 for
now.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v4 00/20] x86, boot: kaslr cleanup and 64bit kaslr support
  2016-04-15  4:08                 ` [kernel-hardening] " Baoquan He
@ 2016-04-15  4:52                   ` Kees Cook
  -1 siblings, 0 replies; 52+ messages in thread
From: Kees Cook @ 2016-04-15  4:52 UTC (permalink / raw)
  To: Baoquan He
  Cc: Ingo Molnar, Ingo Molnar, LKML, Yinghai Lu, H. Peter Anvin,
	Borislav Petkov, Vivek Goyal, Andy Lutomirski, lasse.collin,
	Andrew Morton, Dave Young, kernel-hardening

On Thu, Apr 14, 2016 at 9:08 PM, Baoquan He <bhe@redhat.com> wrote:
> On 04/14/16 at 10:56am, Kees Cook wrote:
>> On Thu, Apr 14, 2016 at 8:06 AM, Baoquan He <bhe@redhat.com> wrote:
>> > On 04/13/16 at 11:02pm, Kees Cook wrote:
>> >> On Wed, Apr 13, 2016 at 7:11 AM, Kees Cook <keescook@chromium.org> wrote:
>> >> > On Wed, Apr 13, 2016 at 3:19 AM, Ingo Molnar <mingo@kernel.org> wrote:
>> >> >>
>> >> >> * Kees Cook <keescook@chromium.org> wrote:
>> >> >>
>> >> >>> FWIW, I've also had this tree up in my git branches, and the 0day
>> >> >>> tester hasn't complained at all about it in the last two weeks. I'd
>> >> >>> really like to see this in -next to fix the >4G (mainly kexec) issues
>> >> >>> and get us to feature parity with the arm64 kASLR work (randomized
>> >> >>> virtual address).
>> >>
>> >> So, I've done this and suddenly realized I hadn't boot-tested i386. It
>> >> doesn't work, unfortunately. (Which I find strange: I'd expect 0day to
>> >> have noticed...)
>> >>
>> >> Baoquan, have you tested this on 32-bit systems? I get a variety of
>> >> failures. Either it boots okay, it reboots, or I get tons of pte
>> >> errors like this:
>> >
>> > Hi Kees,
>> >
>> > I am sorry I didn't notice the change impacts i386. I got a i386 machine
>> > and had tests. Found i386 can't take separate randomzation since there's
>> > difference between i386 and x86_64.
>> >
>> > x86_64 has phys_base and can translate virt addr and phys addr according
>> > to below formula:
>> >
>> > paddr = vaddr - __START_KERNEL_map + phys_base;
>> >
>> > However i386 can only do like this:
>> >
>> > paddr = vaddr - PAGE_OFFSET;
>> >
>> > Besides i386 has to reserve 128M for VMALLOC at the end of kernel
>> > virtual address. So for i386 area 768M is the upper limit for
>> > randomization. But I am fine with the KERNEL_IMAGE_SIZE, the old default
>> > value. What do you say about this?
>> >
>> > So the plan should be keeping the old style of randomization for i386
>> > system:
>> >
>> > 1) Disable virtual address randomization in i386 case because it's
>> > useless. This should be done in patch:
>> >  x86, KASLR: Randomize virtual address separately
>> >
>> > 2) Add an upper limit for physical randomization if it's i386 system.
>> >  x86, KASLR: Add physical address randomization >4G
>> >
>> > I just got a test machine in office, and haven't had time to change
>> > code. You can change it directly, or I will do it tomorrow.
>>
>> I was thinking about the physical vs virtual on i386 as I woke up
>> today. :) Thanks for confirming! These changes appear to make the
>> series boot reliably on i386 (pardon any gmail-induced whitespace
>> damage...):
>>
>> diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
>> index 5bae54b50d4c..3a58fe8acb8e 100644
>> --- a/arch/x86/boot/compressed/aslr.c
>> +++ b/arch/x86/boot/compressed/aslr.c
>> @@ -347,6 +347,10 @@ static void process_e820_entry(struct e820entry *entry,
>>         if (entry->type != E820_RAM)
>>                 return;
>>
>> +       /* On 32-bit, ignore entries entirely above our maximum. */
>> +       if (IS_ENABLED(CONFIG_X86_32) && entry->addr >= KERNEL_IMAGE_SIZE)
>> +               return;
>> +
>>         /* Ignore entries entirely below our minimum. */
>>         if (entry->addr + entry->size < minimum)
>>                 return;
>> @@ -372,6 +376,11 @@ static void process_e820_entry(struct e820entry *entry,
>>                 /* Reduce size by any delta from the original address. */
>>                 region.size -= region.start - start_orig;
>>
>> +               /* On 32-bit, reduce region size to fit within max size. */
>> +               if (IS_ENABLED(CONFIG_X86_32) &&
>> +                   region.start + region.size > KERNEL_IMAGE_SIZE)
>> +                       region.size = KERNEL_IMAGE_SIZE - region.start;
>> +
>>                 /* Return if region can't contain decompressed kernel */
>>                 if (region.size < image_size)
>>                         return;
>> @@ -488,6 +497,8 @@ void choose_kernel_location(unsigned char *input,
>>         }
>>
>>         /* Pick random virtual address starting from LOAD_PHYSICAL_ADDR. */
>> -       random = find_random_virt_offset(LOAD_PHYSICAL_ADDR, output_size);
>> +       if (IS_ENABLED(CONFIG_X86_64))
>> +               random = find_random_virt_offset(LOAD_PHYSICAL_ADDR,
>> +                                                output_size);
>>         *virt_offset = (unsigned char *)random;
>>  }
>>
>>
>> I will split these chunks up into the correct patches and resend the
>> series. If you get a chance, can you double-check this?
>
> Yes, these changes sounds great. I checked the series you posted, and
> have to say you make them look much better. The change logs are perfect
> and great code refactoring. Just one little bit thing, here:
>
> [kees: rewrote changelog, refactored goto into while, limit 32-bit to 1G]
> in patch [PATCH v5 19/21] x86, KASLR: Add physical address randomization >4G
>
> In i386 KERNEL_IMAGE_SIZE is kept to be 0x20000000, namely 512M, w/o kaslr
> enabled. So here I guess it's a typo, should be "limit 32-bit to 1G". And
> what I said is wrong about upper limit yesterday, in fact i386 can put kernel
> in [16M, 896M), not 768M. But KERNEL_IMAGE_SIZE is good enough for i386 for
> now.

Ah yeah, thanks. If we do a v6, I'll update the typo. I was going to
say "limit 32-bit to KERNEL_IMAGE_SIZE" but it was going to line-wrap.
:P

-Kees

-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [kernel-hardening] Re: [PATCH v4 00/20] x86, boot: kaslr cleanup and 64bit kaslr support
@ 2016-04-15  4:52                   ` Kees Cook
  0 siblings, 0 replies; 52+ messages in thread
From: Kees Cook @ 2016-04-15  4:52 UTC (permalink / raw)
  To: Baoquan He
  Cc: Ingo Molnar, Ingo Molnar, LKML, Yinghai Lu, H. Peter Anvin,
	Borislav Petkov, Vivek Goyal, Andy Lutomirski, lasse.collin,
	Andrew Morton, Dave Young, kernel-hardening

On Thu, Apr 14, 2016 at 9:08 PM, Baoquan He <bhe@redhat.com> wrote:
> On 04/14/16 at 10:56am, Kees Cook wrote:
>> On Thu, Apr 14, 2016 at 8:06 AM, Baoquan He <bhe@redhat.com> wrote:
>> > On 04/13/16 at 11:02pm, Kees Cook wrote:
>> >> On Wed, Apr 13, 2016 at 7:11 AM, Kees Cook <keescook@chromium.org> wrote:
>> >> > On Wed, Apr 13, 2016 at 3:19 AM, Ingo Molnar <mingo@kernel.org> wrote:
>> >> >>
>> >> >> * Kees Cook <keescook@chromium.org> wrote:
>> >> >>
>> >> >>> FWIW, I've also had this tree up in my git branches, and the 0day
>> >> >>> tester hasn't complained at all about it in the last two weeks. I'd
>> >> >>> really like to see this in -next to fix the >4G (mainly kexec) issues
>> >> >>> and get us to feature parity with the arm64 kASLR work (randomized
>> >> >>> virtual address).
>> >>
>> >> So, I've done this and suddenly realized I hadn't boot-tested i386. It
>> >> doesn't work, unfortunately. (Which I find strange: I'd expect 0day to
>> >> have noticed...)
>> >>
>> >> Baoquan, have you tested this on 32-bit systems? I get a variety of
>> >> failures. Either it boots okay, it reboots, or I get tons of pte
>> >> errors like this:
>> >
>> > Hi Kees,
>> >
>> > I am sorry I didn't notice the change impacts i386. I got a i386 machine
>> > and had tests. Found i386 can't take separate randomzation since there's
>> > difference between i386 and x86_64.
>> >
>> > x86_64 has phys_base and can translate virt addr and phys addr according
>> > to below formula:
>> >
>> > paddr = vaddr - __START_KERNEL_map + phys_base;
>> >
>> > However i386 can only do like this:
>> >
>> > paddr = vaddr - PAGE_OFFSET;
>> >
>> > Besides i386 has to reserve 128M for VMALLOC at the end of kernel
>> > virtual address. So for i386 area 768M is the upper limit for
>> > randomization. But I am fine with the KERNEL_IMAGE_SIZE, the old default
>> > value. What do you say about this?
>> >
>> > So the plan should be keeping the old style of randomization for i386
>> > system:
>> >
>> > 1) Disable virtual address randomization in i386 case because it's
>> > useless. This should be done in patch:
>> >  x86, KASLR: Randomize virtual address separately
>> >
>> > 2) Add an upper limit for physical randomization if it's i386 system.
>> >  x86, KASLR: Add physical address randomization >4G
>> >
>> > I just got a test machine in office, and haven't had time to change
>> > code. You can change it directly, or I will do it tomorrow.
>>
>> I was thinking about the physical vs virtual on i386 as I woke up
>> today. :) Thanks for confirming! These changes appear to make the
>> series boot reliably on i386 (pardon any gmail-induced whitespace
>> damage...):
>>
>> diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
>> index 5bae54b50d4c..3a58fe8acb8e 100644
>> --- a/arch/x86/boot/compressed/aslr.c
>> +++ b/arch/x86/boot/compressed/aslr.c
>> @@ -347,6 +347,10 @@ static void process_e820_entry(struct e820entry *entry,
>>         if (entry->type != E820_RAM)
>>                 return;
>>
>> +       /* On 32-bit, ignore entries entirely above our maximum. */
>> +       if (IS_ENABLED(CONFIG_X86_32) && entry->addr >= KERNEL_IMAGE_SIZE)
>> +               return;
>> +
>>         /* Ignore entries entirely below our minimum. */
>>         if (entry->addr + entry->size < minimum)
>>                 return;
>> @@ -372,6 +376,11 @@ static void process_e820_entry(struct e820entry *entry,
>>                 /* Reduce size by any delta from the original address. */
>>                 region.size -= region.start - start_orig;
>>
>> +               /* On 32-bit, reduce region size to fit within max size. */
>> +               if (IS_ENABLED(CONFIG_X86_32) &&
>> +                   region.start + region.size > KERNEL_IMAGE_SIZE)
>> +                       region.size = KERNEL_IMAGE_SIZE - region.start;
>> +
>>                 /* Return if region can't contain decompressed kernel */
>>                 if (region.size < image_size)
>>                         return;
>> @@ -488,6 +497,8 @@ void choose_kernel_location(unsigned char *input,
>>         }
>>
>>         /* Pick random virtual address starting from LOAD_PHYSICAL_ADDR. */
>> -       random = find_random_virt_offset(LOAD_PHYSICAL_ADDR, output_size);
>> +       if (IS_ENABLED(CONFIG_X86_64))
>> +               random = find_random_virt_offset(LOAD_PHYSICAL_ADDR,
>> +                                                output_size);
>>         *virt_offset = (unsigned char *)random;
>>  }
>>
>>
>> I will split these chunks up into the correct patches and resend the
>> series. If you get a chance, can you double-check this?
>
> Yes, these changes sounds great. I checked the series you posted, and
> have to say you make them look much better. The change logs are perfect
> and great code refactoring. Just one little bit thing, here:
>
> [kees: rewrote changelog, refactored goto into while, limit 32-bit to 1G]
> in patch [PATCH v5 19/21] x86, KASLR: Add physical address randomization >4G
>
> In i386 KERNEL_IMAGE_SIZE is kept to be 0x20000000, namely 512M, w/o kaslr
> enabled. So here I guess it's a typo, should be "limit 32-bit to 1G". And
> what I said is wrong about upper limit yesterday, in fact i386 can put kernel
> in [16M, 896M), not 768M. But KERNEL_IMAGE_SIZE is good enough for i386 for
> now.

Ah yeah, thanks. If we do a v6, I'll update the typo. I was going to
say "limit 32-bit to KERNEL_IMAGE_SIZE" but it was going to line-wrap.
:P

-Kees

-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v4 00/20] x86, boot: kaslr cleanup and 64bit kaslr support
  2016-04-15  4:52                   ` [kernel-hardening] " Kees Cook
@ 2016-04-15  6:55                     ` Ingo Molnar
  -1 siblings, 0 replies; 52+ messages in thread
From: Ingo Molnar @ 2016-04-15  6:55 UTC (permalink / raw)
  To: Kees Cook
  Cc: Baoquan He, Ingo Molnar, LKML, Yinghai Lu, H. Peter Anvin,
	Borislav Petkov, Vivek Goyal, Andy Lutomirski, lasse.collin,
	Andrew Morton, Dave Young, kernel-hardening


* Kees Cook <keescook@chromium.org> wrote:

> On Thu, Apr 14, 2016 at 9:08 PM, Baoquan He <bhe@redhat.com> wrote:

> >> I will split these chunks up into the correct patches and resend the series. 
> >> If you get a chance, can you double-check this?
> >
> > Yes, these changes sounds great. I checked the series you posted, and have to 
> > say you make them look much better. The change logs are perfect and great code 
> > refactoring. Just one little bit thing, here:
> >
> > [kees: rewrote changelog, refactored goto into while, limit 32-bit to 1G] in 
> > patch [PATCH v5 19/21] x86, KASLR: Add physical address randomization >4G
> >
> > In i386 KERNEL_IMAGE_SIZE is kept to be 0x20000000, namely 512M, w/o kaslr 
> > enabled. So here I guess it's a typo, should be "limit 32-bit to 1G". And what 
> > I said is wrong about upper limit yesterday, in fact i386 can put kernel in 
> > [16M, 896M), not 768M. But KERNEL_IMAGE_SIZE is good enough for i386 for now.
> 
> Ah yeah, thanks. If we do a v6, I'll update the typo. I was going to say "limit 
> 32-bit to KERNEL_IMAGE_SIZE" but it was going to line-wrap.
> :P

No need to resend, I've fixed the changelog.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [kernel-hardening] Re: [PATCH v4 00/20] x86, boot: kaslr cleanup and 64bit kaslr support
@ 2016-04-15  6:55                     ` Ingo Molnar
  0 siblings, 0 replies; 52+ messages in thread
From: Ingo Molnar @ 2016-04-15  6:55 UTC (permalink / raw)
  To: Kees Cook
  Cc: Baoquan He, Ingo Molnar, LKML, Yinghai Lu, H. Peter Anvin,
	Borislav Petkov, Vivek Goyal, Andy Lutomirski, lasse.collin,
	Andrew Morton, Dave Young, kernel-hardening


* Kees Cook <keescook@chromium.org> wrote:

> On Thu, Apr 14, 2016 at 9:08 PM, Baoquan He <bhe@redhat.com> wrote:

> >> I will split these chunks up into the correct patches and resend the series. 
> >> If you get a chance, can you double-check this?
> >
> > Yes, these changes sounds great. I checked the series you posted, and have to 
> > say you make them look much better. The change logs are perfect and great code 
> > refactoring. Just one little bit thing, here:
> >
> > [kees: rewrote changelog, refactored goto into while, limit 32-bit to 1G] in 
> > patch [PATCH v5 19/21] x86, KASLR: Add physical address randomization >4G
> >
> > In i386 KERNEL_IMAGE_SIZE is kept to be 0x20000000, namely 512M, w/o kaslr 
> > enabled. So here I guess it's a typo, should be "limit 32-bit to 1G". And what 
> > I said is wrong about upper limit yesterday, in fact i386 can put kernel in 
> > [16M, 896M), not 768M. But KERNEL_IMAGE_SIZE is good enough for i386 for now.
> 
> Ah yeah, thanks. If we do a v6, I'll update the typo. I was going to say "limit 
> 32-bit to KERNEL_IMAGE_SIZE" but it was going to line-wrap.
> :P

No need to resend, I've fixed the changelog.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 52+ messages in thread

end of thread, other threads:[~2016-04-15  6:56 UTC | newest]

Thread overview: 52+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-22  7:31 [PATCH v4 00/20] x86, boot: kaslr cleanup and 64bit kaslr support Baoquan He
2016-03-22  7:31 ` [PATCH v4 01/20] x86, kaslr: Remove not needed parameter for choose_kernel_location Baoquan He
2016-03-22  7:31 ` [PATCH v4 02/20] x86, kaslr: Fix a bug that relocation can not be handled when kernel is loaded above 2G Baoquan He
2016-03-22  7:32 ` [PATCH v4 03/20] x86, boot: Move compressed kernel to end of buffer before decompressing Baoquan He
2016-03-22  7:32 ` [PATCH v4 04/20] x86, boot: Move z_extract_offset calculation to header.S Baoquan He
2016-03-22  7:32 ` [PATCH v4 05/20] x86, kaskr: Update the description for decompressor worst case Baoquan He
2016-03-22  7:32 ` [PATCH v4 06/20] x86, boot: Fix run_size calculation Baoquan He
2016-03-22 20:51   ` Kees Cook
2016-03-22  7:32 ` [PATCH v4 07/20] x86, kaslr: Clean up useless code related to run_size Baoquan He
2016-03-22 20:52   ` Kees Cook
2016-03-22  7:32 ` [PATCH v4 08/20] x86, kaslr: Get correct max_addr for relocs pointer Baoquan He
2016-03-22 20:52   ` Kees Cook
2016-03-22  7:32 ` [PATCH v4 09/20] x86, kaslr: Consolidate mem_avoid array filling Baoquan He
2016-03-22  7:32 ` [PATCH v4 10/20] x86, boot: Split kernel_ident_mapping_init to another file Baoquan He
2016-03-22  7:32 ` [PATCH v4 11/20] x86, 64bit: Set ident_mapping for kaslr Baoquan He
2016-04-13 10:05   ` Ingo Molnar
2016-03-22  7:32 ` [PATCH v4 12/20] x86, boot: Add checking for memcpy Baoquan He
2016-03-22  7:32 ` [PATCH v4 13/20] x86, kaslr: Introduce struct slot_area to manage randomization slot info Baoquan He
2016-03-22  7:32 ` [PATCH v4 14/20] x86, kaslr: Add two functions which will be used later Baoquan He
2016-03-22  7:32 ` [PATCH v4 15/20] x86, kaslr: Introduce fetch_random_virt_offset to randomize the kernel text mapping address Baoquan He
2016-03-22  7:32 ` [PATCH v4 16/20] x86, kaslr: Randomize physical and virtual address of kernel separately Baoquan He
2016-03-22  7:32 ` [PATCH v4 17/20] x86, kaslr: Add support of kernel physical address randomization above 4G Baoquan He
2016-03-22  7:32 ` [PATCH v4 18/20] x86, kaslr: Remove useless codes Baoquan He
2016-03-22  7:32 ` [PATCH v4 19/20] x86, kaslr: Allow random address to be below loaded address Baoquan He
2016-03-22 19:54   ` Kees Cook
2016-03-23  1:41     ` Baoquan He
2016-03-23  8:59   ` [PATCH v5 " Baoquan He
2016-03-22  7:32 ` [PATCH v4 20/20] x86, kaslr: Use KERNEL_IMAGE_SIZE as the offset max for kernel virtual randomization Baoquan He
2016-03-22 20:46   ` Kees Cook
2016-03-22 20:25 ` [PATCH v4 00/20] x86, boot: kaslr cleanup and 64bit kaslr support Kees Cook
2016-03-23 22:40   ` Kees Cook
2016-04-05  1:56 ` Baoquan He
2016-04-05 20:00   ` Kees Cook
2016-04-05 20:00     ` [kernel-hardening] " Kees Cook
2016-04-13 10:19     ` Ingo Molnar
2016-04-13 10:19       ` [kernel-hardening] " Ingo Molnar
2016-04-13 14:11       ` Kees Cook
2016-04-13 14:11         ` [kernel-hardening] " Kees Cook
2016-04-14  6:02         ` Kees Cook
2016-04-14  6:02           ` [kernel-hardening] " Kees Cook
2016-04-14  6:24           ` Baoquan He
2016-04-14  6:24             ` [kernel-hardening] " Baoquan He
2016-04-14 15:06           ` Baoquan He
2016-04-14 15:06             ` [kernel-hardening] " Baoquan He
2016-04-14 17:56             ` Kees Cook
2016-04-14 17:56               ` [kernel-hardening] " Kees Cook
2016-04-15  4:08               ` Baoquan He
2016-04-15  4:08                 ` [kernel-hardening] " Baoquan He
2016-04-15  4:52                 ` Kees Cook
2016-04-15  4:52                   ` [kernel-hardening] " Kees Cook
2016-04-15  6:55                   ` Ingo Molnar
2016-04-15  6:55                     ` [kernel-hardening] " Ingo Molnar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.