All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 00/21] x86, boot: KASLR cleanup and 64-bit improvements
@ 2016-04-14 22:28 ` Kees Cook
  0 siblings, 0 replies; 80+ messages in thread
From: Kees Cook @ 2016-04-14 22:28 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kees Cook, Baoquan He, Yinghai Lu, Ard Biesheuvel, Matt Redfearn,
	x86, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Vivek Goyal,
	Andy Lutomirski, lasse.collin, Andrew Morton, Dave Young,
	kernel-hardening, LKML

This is v5 of the x86 KASLR improvement series from Yinghai, Baoquan,
and myself. The current branch lives here:
http://git.kernel.org/cgit/linux/kernel/git/kees/linux.git/log/?h=kaslr/highmem

***Background:
Bugs have been reported around kdump, kexec, and some netboot situations
that didn't work when KASLR was enabled. While discussing the bugs, it
was found that the current KASLR implementation had various limitations,
but most importantly that it can only randomize in a 1GB region of
physical memory.

The current KASLR implementaion only randomizes the base physical
address of the kernel. If the delta from build-time load address and
KASLR run-time load address (i.e. the physical address of where the
kernel actually decompressed) is not equal to 0, relocation handling is
performed using the delta. Though in principle kernel can be randomized
to any physical address, the physical kernel text mapping address space
is limited to 1G and the virtual address is just offset by the same
amount. On x86_64 the result is the following range:
        [0xffffffff80000000, 0xffffffffc0000000)

hpa and Vivek suggested we should change this by decoupling the physical
address and virtual address randomization of kernel text and let them work
separately. Then kernel text physical address can be randomized in region
[16M, 64T), and kernel text virtual address can be randomized in region
[0xffffffff80000000, 0xffffffffc0000000).

***Problems that needed solving:
  - When booting from the startup_32 case, only a 0~4G identity mapping is
    built. If kernel will be randomly put anywhere from 16M to 64T at
    most, the price to build all region of identity mapping is too high.
    We need build the identity mapping on demand, not covering all
    physical address space.

  - Decouple the physical address and virtual address randomization of kernel
    text and let them work separately.

***Parts:
   - The 1st part is clean-up and improvements to help the rest of the series.
     (Patches 01-10)
   - The 2nd part is Yinghai's building of identity mappings on demand.
     This is used to solve the first problem mentioned above.
     (Patches 11-12)
   - The 3rd part is Baoquan's decoupling the physical address and virtual
     address randomization of kernel text and letting them work separately,
     based on Yinghai's ident mapping patches.
     (Patches 13-21)

I've boot tested this a bunch on 32-bit and 64-bit, and things appear
to be working as expected. I've cleaned up the changelogs, improved
some comments, refactored a few things, etc. Changes are noted in the
individual changelogs.

Thanks!

-Kees

v4->v5:
- rewrote all the changelogs, and several comments.
- refactored e820 parser to use a while loop instead of goto.
- rearranged removal of CONFIG_RANDOMIZE_BASE_MAX_OFFSET to earlier.
- additionally dropped KERNEL_IMAGE_SIZE_DEFAULT
- refactored minimum address calculation
- refactored slot offset calculation for readability
- fixed 32-bit boot failures
- fixed CONFIG_RANDOMIZE_BASE=n boot failure
- improved debug reporting

[Baoquan's histroy]
v3->v4:
- Made changes according to Kees's comments.
  Add one patch 20/20 as Kees suggested to use KERNEL_IMAGE_SIZE as offset
  max of virtual random, meanwhile clean up useless CONFIG_RANDOM_OFFSET_MAX

    x86, kaslr: Use KERNEL_IMAGE_SIZE as the offset max for kernel virtual randomization

v2->v3:
- It only takes care of the kaslr related patches.
  For reviewers it's better to discuss only one issue in one thread.
    * I take off one patch as follows from Yinghai's because I think it's unnecessay.
       - Patch 05/19 x86, kaslr: rename output_size to output_run_size
         output_size is enough to represen the value:
            output_len > run_size ? output_len : run_size

    * I add Patch 04/19, it's a comment update patch. For other patches, I just
      adjust patch log and do several places of change comparing with 2nd round.
      Please check the change log under patch log of each patch for details.

    * Adjust sequence of several patches to make review easier. It doesn't
      affect codes.

v1->v2:
- In 2nd round Yinghai made a big patchset including this kaslr fix and another
  setup_data related fix. The link is here:
   http://lists-archives.com/linux-kernel/28346903-x86-updated-patches-for-kaslr-and-setup_data-etc-for-v4-3.html
  You can get the code from Yinghai's git branch:
  git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git for-x86-v4.3-next

v1:
- The first round can be found here:
    https://lwn.net/Articles/637115/

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [kernel-hardening] [PATCH v5 00/21] x86, boot: KASLR cleanup and 64-bit improvements
@ 2016-04-14 22:28 ` Kees Cook
  0 siblings, 0 replies; 80+ messages in thread
From: Kees Cook @ 2016-04-14 22:28 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kees Cook, Baoquan He, Yinghai Lu, Ard Biesheuvel, Matt Redfearn,
	x86, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Vivek Goyal,
	Andy Lutomirski, lasse.collin, Andrew Morton, Dave Young,
	kernel-hardening, LKML

This is v5 of the x86 KASLR improvement series from Yinghai, Baoquan,
and myself. The current branch lives here:
http://git.kernel.org/cgit/linux/kernel/git/kees/linux.git/log/?h=kaslr/highmem

***Background:
Bugs have been reported around kdump, kexec, and some netboot situations
that didn't work when KASLR was enabled. While discussing the bugs, it
was found that the current KASLR implementation had various limitations,
but most importantly that it can only randomize in a 1GB region of
physical memory.

The current KASLR implementaion only randomizes the base physical
address of the kernel. If the delta from build-time load address and
KASLR run-time load address (i.e. the physical address of where the
kernel actually decompressed) is not equal to 0, relocation handling is
performed using the delta. Though in principle kernel can be randomized
to any physical address, the physical kernel text mapping address space
is limited to 1G and the virtual address is just offset by the same
amount. On x86_64 the result is the following range:
        [0xffffffff80000000, 0xffffffffc0000000)

hpa and Vivek suggested we should change this by decoupling the physical
address and virtual address randomization of kernel text and let them work
separately. Then kernel text physical address can be randomized in region
[16M, 64T), and kernel text virtual address can be randomized in region
[0xffffffff80000000, 0xffffffffc0000000).

***Problems that needed solving:
  - When booting from the startup_32 case, only a 0~4G identity mapping is
    built. If kernel will be randomly put anywhere from 16M to 64T at
    most, the price to build all region of identity mapping is too high.
    We need build the identity mapping on demand, not covering all
    physical address space.

  - Decouple the physical address and virtual address randomization of kernel
    text and let them work separately.

***Parts:
   - The 1st part is clean-up and improvements to help the rest of the series.
     (Patches 01-10)
   - The 2nd part is Yinghai's building of identity mappings on demand.
     This is used to solve the first problem mentioned above.
     (Patches 11-12)
   - The 3rd part is Baoquan's decoupling the physical address and virtual
     address randomization of kernel text and letting them work separately,
     based on Yinghai's ident mapping patches.
     (Patches 13-21)

I've boot tested this a bunch on 32-bit and 64-bit, and things appear
to be working as expected. I've cleaned up the changelogs, improved
some comments, refactored a few things, etc. Changes are noted in the
individual changelogs.

Thanks!

-Kees

v4->v5:
- rewrote all the changelogs, and several comments.
- refactored e820 parser to use a while loop instead of goto.
- rearranged removal of CONFIG_RANDOMIZE_BASE_MAX_OFFSET to earlier.
- additionally dropped KERNEL_IMAGE_SIZE_DEFAULT
- refactored minimum address calculation
- refactored slot offset calculation for readability
- fixed 32-bit boot failures
- fixed CONFIG_RANDOMIZE_BASE=n boot failure
- improved debug reporting

[Baoquan's histroy]
v3->v4:
- Made changes according to Kees's comments.
  Add one patch 20/20 as Kees suggested to use KERNEL_IMAGE_SIZE as offset
  max of virtual random, meanwhile clean up useless CONFIG_RANDOM_OFFSET_MAX

    x86, kaslr: Use KERNEL_IMAGE_SIZE as the offset max for kernel virtual randomization

v2->v3:
- It only takes care of the kaslr related patches.
  For reviewers it's better to discuss only one issue in one thread.
    * I take off one patch as follows from Yinghai's because I think it's unnecessay.
       - Patch 05/19 x86, kaslr: rename output_size to output_run_size
         output_size is enough to represen the value:
            output_len > run_size ? output_len : run_size

    * I add Patch 04/19, it's a comment update patch. For other patches, I just
      adjust patch log and do several places of change comparing with 2nd round.
      Please check the change log under patch log of each patch for details.

    * Adjust sequence of several patches to make review easier. It doesn't
      affect codes.

v1->v2:
- In 2nd round Yinghai made a big patchset including this kaslr fix and another
  setup_data related fix. The link is here:
   http://lists-archives.com/linux-kernel/28346903-x86-updated-patches-for-kaslr-and-setup_data-etc-for-v4-3.html
  You can get the code from Yinghai's git branch:
  git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git for-x86-v4.3-next

v1:
- The first round can be found here:
    https://lwn.net/Articles/637115/

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v5 01/21] x86, KASLR: Remove unneeded boot_params argument
  2016-04-14 22:28 ` [kernel-hardening] " Kees Cook
@ 2016-04-14 22:28   ` Kees Cook
  -1 siblings, 0 replies; 80+ messages in thread
From: Kees Cook @ 2016-04-14 22:28 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kees Cook, Yinghai Lu, Baoquan He, Ard Biesheuvel, Matt Redfearn,
	x86, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Vivek Goyal,
	Andy Lutomirski, lasse.collin, Andrew Morton, Dave Young,
	kernel-hardening, LKML

From: Yinghai Lu <yinghai@kernel.org>

Since the boot_params can be found using the real_mode global variable,
there is no need to pass around a pointer to it. This slightly simplifies
the choose_kernel_location function and its callers.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
[kees: rewrote changelog]
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 arch/x86/boot/compressed/aslr.c | 5 ++---
 arch/x86/boot/compressed/misc.c | 2 +-
 arch/x86/boot/compressed/misc.h | 6 ++----
 3 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index 6a9b96b4624d..622aa881c6ab 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -295,8 +295,7 @@ static unsigned long find_random_addr(unsigned long minimum,
 	return slots_fetch_random();
 }
 
-unsigned char *choose_kernel_location(struct boot_params *boot_params,
-				      unsigned char *input,
+unsigned char *choose_kernel_location(unsigned char *input,
 				      unsigned long input_size,
 				      unsigned char *output,
 				      unsigned long output_size)
@@ -316,7 +315,7 @@ unsigned char *choose_kernel_location(struct boot_params *boot_params,
 	}
 #endif
 
-	boot_params->hdr.loadflags |= KASLR_FLAG;
+	real_mode->hdr.loadflags |= KASLR_FLAG;
 
 	/* Record the various known unsafe memory ranges. */
 	mem_avoid_init((unsigned long)input, input_size,
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 79dac1758e7c..f35ad9eb1bf1 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -428,7 +428,7 @@ asmlinkage __visible void *decompress_kernel(void *rmode, memptr heap,
 	 * the entire decompressed kernel plus relocation table, or the
 	 * entire decompressed kernel plus .bss and .brk sections.
 	 */
-	output = choose_kernel_location(real_mode, input_data, input_len, output,
+	output = choose_kernel_location(input_data, input_len, output,
 					output_len > run_size ? output_len
 							      : run_size);
 
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index 3783dc3e10b3..dcf01c22400e 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -67,8 +67,7 @@ int cmdline_find_option_bool(const char *option);
 
 #if CONFIG_RANDOMIZE_BASE
 /* aslr.c */
-unsigned char *choose_kernel_location(struct boot_params *boot_params,
-				      unsigned char *input,
+unsigned char *choose_kernel_location(unsigned char *input,
 				      unsigned long input_size,
 				      unsigned char *output,
 				      unsigned long output_size);
@@ -76,8 +75,7 @@ unsigned char *choose_kernel_location(struct boot_params *boot_params,
 bool has_cpuflag(int flag);
 #else
 static inline
-unsigned char *choose_kernel_location(struct boot_params *boot_params,
-				      unsigned char *input,
+unsigned char *choose_kernel_location(unsigned char *input,
 				      unsigned long input_size,
 				      unsigned char *output,
 				      unsigned long output_size)
-- 
2.6.3

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [kernel-hardening] [PATCH v5 01/21] x86, KASLR: Remove unneeded boot_params argument
@ 2016-04-14 22:28   ` Kees Cook
  0 siblings, 0 replies; 80+ messages in thread
From: Kees Cook @ 2016-04-14 22:28 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kees Cook, Yinghai Lu, Baoquan He, Ard Biesheuvel, Matt Redfearn,
	x86, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Vivek Goyal,
	Andy Lutomirski, lasse.collin, Andrew Morton, Dave Young,
	kernel-hardening, LKML

From: Yinghai Lu <yinghai@kernel.org>

Since the boot_params can be found using the real_mode global variable,
there is no need to pass around a pointer to it. This slightly simplifies
the choose_kernel_location function and its callers.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
[kees: rewrote changelog]
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 arch/x86/boot/compressed/aslr.c | 5 ++---
 arch/x86/boot/compressed/misc.c | 2 +-
 arch/x86/boot/compressed/misc.h | 6 ++----
 3 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index 6a9b96b4624d..622aa881c6ab 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -295,8 +295,7 @@ static unsigned long find_random_addr(unsigned long minimum,
 	return slots_fetch_random();
 }
 
-unsigned char *choose_kernel_location(struct boot_params *boot_params,
-				      unsigned char *input,
+unsigned char *choose_kernel_location(unsigned char *input,
 				      unsigned long input_size,
 				      unsigned char *output,
 				      unsigned long output_size)
@@ -316,7 +315,7 @@ unsigned char *choose_kernel_location(struct boot_params *boot_params,
 	}
 #endif
 
-	boot_params->hdr.loadflags |= KASLR_FLAG;
+	real_mode->hdr.loadflags |= KASLR_FLAG;
 
 	/* Record the various known unsafe memory ranges. */
 	mem_avoid_init((unsigned long)input, input_size,
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 79dac1758e7c..f35ad9eb1bf1 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -428,7 +428,7 @@ asmlinkage __visible void *decompress_kernel(void *rmode, memptr heap,
 	 * the entire decompressed kernel plus relocation table, or the
 	 * entire decompressed kernel plus .bss and .brk sections.
 	 */
-	output = choose_kernel_location(real_mode, input_data, input_len, output,
+	output = choose_kernel_location(input_data, input_len, output,
 					output_len > run_size ? output_len
 							      : run_size);
 
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index 3783dc3e10b3..dcf01c22400e 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -67,8 +67,7 @@ int cmdline_find_option_bool(const char *option);
 
 #if CONFIG_RANDOMIZE_BASE
 /* aslr.c */
-unsigned char *choose_kernel_location(struct boot_params *boot_params,
-				      unsigned char *input,
+unsigned char *choose_kernel_location(unsigned char *input,
 				      unsigned long input_size,
 				      unsigned char *output,
 				      unsigned long output_size);
@@ -76,8 +75,7 @@ unsigned char *choose_kernel_location(struct boot_params *boot_params,
 bool has_cpuflag(int flag);
 #else
 static inline
-unsigned char *choose_kernel_location(struct boot_params *boot_params,
-				      unsigned char *input,
+unsigned char *choose_kernel_location(unsigned char *input,
 				      unsigned long input_size,
 				      unsigned char *output,
 				      unsigned long output_size)
-- 
2.6.3

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v5 02/21] x86, KASLR: Handle kernel relocation above 2G
  2016-04-14 22:28 ` [kernel-hardening] " Kees Cook
@ 2016-04-14 22:28   ` Kees Cook
  -1 siblings, 0 replies; 80+ messages in thread
From: Kees Cook @ 2016-04-14 22:28 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kees Cook, Baoquan He, Yinghai Lu, Ard Biesheuvel, Matt Redfearn,
	x86, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Vivek Goyal,
	Andy Lutomirski, lasse.collin, Andrew Morton, Dave Young,
	kernel-hardening, LKML

From: Baoquan He <bhe@redhat.com>

When processing the relocation table, the offset used to calculate the
relocation is an int. This is sufficient for calculating the physical
address of the relocs entry on 32-bit systems and on 64-bit systems when
the relocation is under 2G. To handle relocations above 2G (seen in
situations like kexec, netboot, etc), this offset needs to be calculated
using a long to avoid wrapping and miscalculating the relocation.

Signed-off-by: Baoquan He <bhe@redhat.com>
[kees: rewrote changelog]
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 arch/x86/boot/compressed/misc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index f35ad9eb1bf1..c4477d5f3fff 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -295,7 +295,7 @@ static void handle_relocations(void *output, unsigned long output_len)
 	 * So we work backwards from the end of the decompressed image.
 	 */
 	for (reloc = output + output_len - sizeof(*reloc); *reloc; reloc--) {
-		int extended = *reloc;
+		long extended = *reloc;
 		extended += map;
 
 		ptr = (unsigned long)extended;
-- 
2.6.3

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [kernel-hardening] [PATCH v5 02/21] x86, KASLR: Handle kernel relocation above 2G
@ 2016-04-14 22:28   ` Kees Cook
  0 siblings, 0 replies; 80+ messages in thread
From: Kees Cook @ 2016-04-14 22:28 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kees Cook, Baoquan He, Yinghai Lu, Ard Biesheuvel, Matt Redfearn,
	x86, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Vivek Goyal,
	Andy Lutomirski, lasse.collin, Andrew Morton, Dave Young,
	kernel-hardening, LKML

From: Baoquan He <bhe@redhat.com>

When processing the relocation table, the offset used to calculate the
relocation is an int. This is sufficient for calculating the physical
address of the relocs entry on 32-bit systems and on 64-bit systems when
the relocation is under 2G. To handle relocations above 2G (seen in
situations like kexec, netboot, etc), this offset needs to be calculated
using a long to avoid wrapping and miscalculating the relocation.

Signed-off-by: Baoquan He <bhe@redhat.com>
[kees: rewrote changelog]
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 arch/x86/boot/compressed/misc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index f35ad9eb1bf1..c4477d5f3fff 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -295,7 +295,7 @@ static void handle_relocations(void *output, unsigned long output_len)
 	 * So we work backwards from the end of the decompressed image.
 	 */
 	for (reloc = output + output_len - sizeof(*reloc); *reloc; reloc--) {
-		int extended = *reloc;
+		long extended = *reloc;
 		extended += map;
 
 		ptr = (unsigned long)extended;
-- 
2.6.3

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v5 03/21] x86, KASLR: Drop CONFIG_RANDOMIZE_BASE_MAX_OFFSET
  2016-04-14 22:28 ` [kernel-hardening] " Kees Cook
@ 2016-04-14 22:28   ` Kees Cook
  -1 siblings, 0 replies; 80+ messages in thread
From: Kees Cook @ 2016-04-14 22:28 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kees Cook, Baoquan He, Yinghai Lu, Ard Biesheuvel, Matt Redfearn,
	x86, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Vivek Goyal,
	Andy Lutomirski, lasse.collin, Andrew Morton, Dave Young,
	kernel-hardening, LKML

From: Baoquan He <bhe@redhat.com>

Currently CONFIG_RANDOMIZE_BASE_MAX_OFFSET is used to limit the maximum
offset for kernel randomization. This limit doesn't need to be a CONFIG
since it is tied completely to KERNEL_IMAGE_SIZE, and will make no sense
once physical and virtual offsets are randomized separately. This patch
removes CONFIG_RANDOMIZE_BASE_MAX_OFFSET and consolidates the Kconfig
help text.

Signed-off-by: Baoquan He <bhe@redhat.com>
[kees: rewrote changelog, dropped KERNEL_IMAGE_SIZE_DEFAULT, moved earlier]
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 arch/x86/Kconfig                     | 57 +++++++++++++-----------------------
 arch/x86/boot/compressed/aslr.c      | 12 ++++----
 arch/x86/include/asm/page_64_types.h |  8 ++---
 arch/x86/mm/init_32.c                |  3 --
 4 files changed, 29 insertions(+), 51 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 2dc18605831f..fd9ac711ada8 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1936,51 +1936,36 @@ config RANDOMIZE_BASE
 	depends on RELOCATABLE
 	default n
 	---help---
-	   Randomizes the physical and virtual address at which the
-	   kernel image is decompressed, as a security feature that
-	   deters exploit attempts relying on knowledge of the location
-	   of kernel internals.
+	   Randomizes the physical address at which the kernel image
+	   is decompressed and the virtual address where the kernel
+	   image is mapped, as a secrurity feature that deters exploit
+	   attempts relying on knowledge of the location of kernel
+	   internals.
+
+	   The kernel physical address can be randomized from 16M to
+	   64T at most. The kernel virtual address will be offset
+	   by up to KERNEL_IMAGE_SIZE. On 32-bit KERNEL_IMAGE_SIZE is
+	   512MiB. while on 64-bit this is limited by how the kernel
+	   fixmap page table is positioned, so this cannot be larger
+	   than 1GiB currently. Without RANDOMIZE_BASE there is a 512MiB
+	   to 1.5GiB split between kernel and modules. When RANDOMIZE_BASE
+	   is enabled, the modules area will shrink to compensate, up
+	   to a 1GiB to 1GiB split, KERNEL_IMAGE_SIZE changes from 512MiB
+	   to 1GiB.
 
 	   Entropy is generated using the RDRAND instruction if it is
 	   supported. If RDTSC is supported, it is used as well. If
 	   neither RDRAND nor RDTSC are supported, then randomness is
 	   read from the i8254 timer.
 
-	   The kernel will be offset by up to RANDOMIZE_BASE_MAX_OFFSET,
-	   and aligned according to PHYSICAL_ALIGN. Since the kernel is
-	   built using 2GiB addressing, and PHYSICAL_ALGIN must be at a
-	   minimum of 2MiB, only 10 bits of entropy is theoretically
-	   possible. At best, due to page table layouts, 64-bit can use
-	   9 bits of entropy and 32-bit uses 8 bits.
+	   Since the kernel is built using 2GiB addressing, and
+	   PHYSICAL_ALGIN must be at a minimum of 2MiB, only 10 bits of
+	   entropy is theoretically possible. At best, due to page table
+	   layouts, 64-bit can use 9 bits of entropy and 32-bit uses 8
+	   bits.
 
 	   If unsure, say N.
 
-config RANDOMIZE_BASE_MAX_OFFSET
-	hex "Maximum kASLR offset allowed" if EXPERT
-	depends on RANDOMIZE_BASE
-	range 0x0 0x20000000 if X86_32
-	default "0x20000000" if X86_32
-	range 0x0 0x40000000 if X86_64
-	default "0x40000000" if X86_64
-	---help---
-	  The lesser of RANDOMIZE_BASE_MAX_OFFSET and available physical
-	  memory is used to determine the maximal offset in bytes that will
-	  be applied to the kernel when kernel Address Space Layout
-	  Randomization (kASLR) is active. This must be a multiple of
-	  PHYSICAL_ALIGN.
-
-	  On 32-bit this is limited to 512MiB by page table layouts. The
-	  default is 512MiB.
-
-	  On 64-bit this is limited by how the kernel fixmap page table is
-	  positioned, so this cannot be larger than 1GiB currently. Without
-	  RANDOMIZE_BASE, there is a 512MiB to 1.5GiB split between kernel
-	  and modules. When RANDOMIZE_BASE_MAX_OFFSET is above 512MiB, the
-	  modules area will shrink to compensate, up to the current maximum
-	  1GiB to 1GiB split. The default is 1GiB.
-
-	  If unsure, leave at the default value.
-
 # Relocation on x86 needs some additional build support
 config X86_NEED_RELOCS
 	def_bool y
diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index 622aa881c6ab..462654097d63 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -206,15 +206,13 @@ static bool mem_avoid_overlap(struct mem_vector *img)
 	return false;
 }
 
-static unsigned long slots[CONFIG_RANDOMIZE_BASE_MAX_OFFSET /
-			   CONFIG_PHYSICAL_ALIGN];
+static unsigned long slots[KERNEL_IMAGE_SIZE / CONFIG_PHYSICAL_ALIGN];
 static unsigned long slot_max;
 
 static void slots_append(unsigned long addr)
 {
 	/* Overflowing the slots list should be impossible. */
-	if (slot_max >= CONFIG_RANDOMIZE_BASE_MAX_OFFSET /
-			CONFIG_PHYSICAL_ALIGN)
+	if (slot_max >= KERNEL_IMAGE_SIZE / CONFIG_PHYSICAL_ALIGN)
 		return;
 
 	slots[slot_max++] = addr;
@@ -240,7 +238,7 @@ static void process_e820_entry(struct e820entry *entry,
 		return;
 
 	/* Ignore entries entirely above our maximum. */
-	if (entry->addr >= CONFIG_RANDOMIZE_BASE_MAX_OFFSET)
+	if (entry->addr >= KERNEL_IMAGE_SIZE)
 		return;
 
 	/* Ignore entries entirely below our minimum. */
@@ -265,8 +263,8 @@ static void process_e820_entry(struct e820entry *entry,
 	region.size -= region.start - entry->addr;
 
 	/* Reduce maximum size to fit end of image within maximum limit. */
-	if (region.start + region.size > CONFIG_RANDOMIZE_BASE_MAX_OFFSET)
-		region.size = CONFIG_RANDOMIZE_BASE_MAX_OFFSET - region.start;
+	if (region.start + region.size > KERNEL_IMAGE_SIZE)
+		region.size = KERNEL_IMAGE_SIZE - region.start;
 
 	/* Walk each aligned slot and check for avoided areas. */
 	for (img.start = region.start, img.size = image_size ;
diff --git a/arch/x86/include/asm/page_64_types.h b/arch/x86/include/asm/page_64_types.h
index 4928cf0d5af0..d5c2f8b40faa 100644
--- a/arch/x86/include/asm/page_64_types.h
+++ b/arch/x86/include/asm/page_64_types.h
@@ -47,12 +47,10 @@
  * are fully set up. If kernel ASLR is configured, it can extend the
  * kernel page table mapping, reducing the size of the modules area.
  */
-#define KERNEL_IMAGE_SIZE_DEFAULT      (512 * 1024 * 1024)
-#if defined(CONFIG_RANDOMIZE_BASE) && \
-	CONFIG_RANDOMIZE_BASE_MAX_OFFSET > KERNEL_IMAGE_SIZE_DEFAULT
-#define KERNEL_IMAGE_SIZE   CONFIG_RANDOMIZE_BASE_MAX_OFFSET
+#if defined(CONFIG_RANDOMIZE_BASE)
+#define KERNEL_IMAGE_SIZE	(1024 * 1024 * 1024)
 #else
-#define KERNEL_IMAGE_SIZE      KERNEL_IMAGE_SIZE_DEFAULT
+#define KERNEL_IMAGE_SIZE	(512 * 1024 * 1024)
 #endif
 
 #endif /* _ASM_X86_PAGE_64_DEFS_H */
diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index bd7a9b9e2e14..f2ee42d61894 100644
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -804,9 +804,6 @@ void __init mem_init(void)
 	BUILD_BUG_ON(VMALLOC_START			>= VMALLOC_END);
 #undef high_memory
 #undef __FIXADDR_TOP
-#ifdef CONFIG_RANDOMIZE_BASE
-	BUILD_BUG_ON(CONFIG_RANDOMIZE_BASE_MAX_OFFSET > KERNEL_IMAGE_SIZE);
-#endif
 
 #ifdef CONFIG_HIGHMEM
 	BUG_ON(PKMAP_BASE + LAST_PKMAP*PAGE_SIZE	> FIXADDR_START);
-- 
2.6.3

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [kernel-hardening] [PATCH v5 03/21] x86, KASLR: Drop CONFIG_RANDOMIZE_BASE_MAX_OFFSET
@ 2016-04-14 22:28   ` Kees Cook
  0 siblings, 0 replies; 80+ messages in thread
From: Kees Cook @ 2016-04-14 22:28 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kees Cook, Baoquan He, Yinghai Lu, Ard Biesheuvel, Matt Redfearn,
	x86, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Vivek Goyal,
	Andy Lutomirski, lasse.collin, Andrew Morton, Dave Young,
	kernel-hardening, LKML

From: Baoquan He <bhe@redhat.com>

Currently CONFIG_RANDOMIZE_BASE_MAX_OFFSET is used to limit the maximum
offset for kernel randomization. This limit doesn't need to be a CONFIG
since it is tied completely to KERNEL_IMAGE_SIZE, and will make no sense
once physical and virtual offsets are randomized separately. This patch
removes CONFIG_RANDOMIZE_BASE_MAX_OFFSET and consolidates the Kconfig
help text.

Signed-off-by: Baoquan He <bhe@redhat.com>
[kees: rewrote changelog, dropped KERNEL_IMAGE_SIZE_DEFAULT, moved earlier]
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 arch/x86/Kconfig                     | 57 +++++++++++++-----------------------
 arch/x86/boot/compressed/aslr.c      | 12 ++++----
 arch/x86/include/asm/page_64_types.h |  8 ++---
 arch/x86/mm/init_32.c                |  3 --
 4 files changed, 29 insertions(+), 51 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 2dc18605831f..fd9ac711ada8 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1936,51 +1936,36 @@ config RANDOMIZE_BASE
 	depends on RELOCATABLE
 	default n
 	---help---
-	   Randomizes the physical and virtual address at which the
-	   kernel image is decompressed, as a security feature that
-	   deters exploit attempts relying on knowledge of the location
-	   of kernel internals.
+	   Randomizes the physical address at which the kernel image
+	   is decompressed and the virtual address where the kernel
+	   image is mapped, as a secrurity feature that deters exploit
+	   attempts relying on knowledge of the location of kernel
+	   internals.
+
+	   The kernel physical address can be randomized from 16M to
+	   64T at most. The kernel virtual address will be offset
+	   by up to KERNEL_IMAGE_SIZE. On 32-bit KERNEL_IMAGE_SIZE is
+	   512MiB. while on 64-bit this is limited by how the kernel
+	   fixmap page table is positioned, so this cannot be larger
+	   than 1GiB currently. Without RANDOMIZE_BASE there is a 512MiB
+	   to 1.5GiB split between kernel and modules. When RANDOMIZE_BASE
+	   is enabled, the modules area will shrink to compensate, up
+	   to a 1GiB to 1GiB split, KERNEL_IMAGE_SIZE changes from 512MiB
+	   to 1GiB.
 
 	   Entropy is generated using the RDRAND instruction if it is
 	   supported. If RDTSC is supported, it is used as well. If
 	   neither RDRAND nor RDTSC are supported, then randomness is
 	   read from the i8254 timer.
 
-	   The kernel will be offset by up to RANDOMIZE_BASE_MAX_OFFSET,
-	   and aligned according to PHYSICAL_ALIGN. Since the kernel is
-	   built using 2GiB addressing, and PHYSICAL_ALGIN must be at a
-	   minimum of 2MiB, only 10 bits of entropy is theoretically
-	   possible. At best, due to page table layouts, 64-bit can use
-	   9 bits of entropy and 32-bit uses 8 bits.
+	   Since the kernel is built using 2GiB addressing, and
+	   PHYSICAL_ALGIN must be at a minimum of 2MiB, only 10 bits of
+	   entropy is theoretically possible. At best, due to page table
+	   layouts, 64-bit can use 9 bits of entropy and 32-bit uses 8
+	   bits.
 
 	   If unsure, say N.
 
-config RANDOMIZE_BASE_MAX_OFFSET
-	hex "Maximum kASLR offset allowed" if EXPERT
-	depends on RANDOMIZE_BASE
-	range 0x0 0x20000000 if X86_32
-	default "0x20000000" if X86_32
-	range 0x0 0x40000000 if X86_64
-	default "0x40000000" if X86_64
-	---help---
-	  The lesser of RANDOMIZE_BASE_MAX_OFFSET and available physical
-	  memory is used to determine the maximal offset in bytes that will
-	  be applied to the kernel when kernel Address Space Layout
-	  Randomization (kASLR) is active. This must be a multiple of
-	  PHYSICAL_ALIGN.
-
-	  On 32-bit this is limited to 512MiB by page table layouts. The
-	  default is 512MiB.
-
-	  On 64-bit this is limited by how the kernel fixmap page table is
-	  positioned, so this cannot be larger than 1GiB currently. Without
-	  RANDOMIZE_BASE, there is a 512MiB to 1.5GiB split between kernel
-	  and modules. When RANDOMIZE_BASE_MAX_OFFSET is above 512MiB, the
-	  modules area will shrink to compensate, up to the current maximum
-	  1GiB to 1GiB split. The default is 1GiB.
-
-	  If unsure, leave at the default value.
-
 # Relocation on x86 needs some additional build support
 config X86_NEED_RELOCS
 	def_bool y
diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index 622aa881c6ab..462654097d63 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -206,15 +206,13 @@ static bool mem_avoid_overlap(struct mem_vector *img)
 	return false;
 }
 
-static unsigned long slots[CONFIG_RANDOMIZE_BASE_MAX_OFFSET /
-			   CONFIG_PHYSICAL_ALIGN];
+static unsigned long slots[KERNEL_IMAGE_SIZE / CONFIG_PHYSICAL_ALIGN];
 static unsigned long slot_max;
 
 static void slots_append(unsigned long addr)
 {
 	/* Overflowing the slots list should be impossible. */
-	if (slot_max >= CONFIG_RANDOMIZE_BASE_MAX_OFFSET /
-			CONFIG_PHYSICAL_ALIGN)
+	if (slot_max >= KERNEL_IMAGE_SIZE / CONFIG_PHYSICAL_ALIGN)
 		return;
 
 	slots[slot_max++] = addr;
@@ -240,7 +238,7 @@ static void process_e820_entry(struct e820entry *entry,
 		return;
 
 	/* Ignore entries entirely above our maximum. */
-	if (entry->addr >= CONFIG_RANDOMIZE_BASE_MAX_OFFSET)
+	if (entry->addr >= KERNEL_IMAGE_SIZE)
 		return;
 
 	/* Ignore entries entirely below our minimum. */
@@ -265,8 +263,8 @@ static void process_e820_entry(struct e820entry *entry,
 	region.size -= region.start - entry->addr;
 
 	/* Reduce maximum size to fit end of image within maximum limit. */
-	if (region.start + region.size > CONFIG_RANDOMIZE_BASE_MAX_OFFSET)
-		region.size = CONFIG_RANDOMIZE_BASE_MAX_OFFSET - region.start;
+	if (region.start + region.size > KERNEL_IMAGE_SIZE)
+		region.size = KERNEL_IMAGE_SIZE - region.start;
 
 	/* Walk each aligned slot and check for avoided areas. */
 	for (img.start = region.start, img.size = image_size ;
diff --git a/arch/x86/include/asm/page_64_types.h b/arch/x86/include/asm/page_64_types.h
index 4928cf0d5af0..d5c2f8b40faa 100644
--- a/arch/x86/include/asm/page_64_types.h
+++ b/arch/x86/include/asm/page_64_types.h
@@ -47,12 +47,10 @@
  * are fully set up. If kernel ASLR is configured, it can extend the
  * kernel page table mapping, reducing the size of the modules area.
  */
-#define KERNEL_IMAGE_SIZE_DEFAULT      (512 * 1024 * 1024)
-#if defined(CONFIG_RANDOMIZE_BASE) && \
-	CONFIG_RANDOMIZE_BASE_MAX_OFFSET > KERNEL_IMAGE_SIZE_DEFAULT
-#define KERNEL_IMAGE_SIZE   CONFIG_RANDOMIZE_BASE_MAX_OFFSET
+#if defined(CONFIG_RANDOMIZE_BASE)
+#define KERNEL_IMAGE_SIZE	(1024 * 1024 * 1024)
 #else
-#define KERNEL_IMAGE_SIZE      KERNEL_IMAGE_SIZE_DEFAULT
+#define KERNEL_IMAGE_SIZE	(512 * 1024 * 1024)
 #endif
 
 #endif /* _ASM_X86_PAGE_64_DEFS_H */
diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index bd7a9b9e2e14..f2ee42d61894 100644
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -804,9 +804,6 @@ void __init mem_init(void)
 	BUILD_BUG_ON(VMALLOC_START			>= VMALLOC_END);
 #undef high_memory
 #undef __FIXADDR_TOP
-#ifdef CONFIG_RANDOMIZE_BASE
-	BUILD_BUG_ON(CONFIG_RANDOMIZE_BASE_MAX_OFFSET > KERNEL_IMAGE_SIZE);
-#endif
 
 #ifdef CONFIG_HIGHMEM
 	BUG_ON(PKMAP_BASE + LAST_PKMAP*PAGE_SIZE	> FIXADDR_START);
-- 
2.6.3

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v5 04/21] x86, boot: Move compressed kernel to end of decompression buffer
  2016-04-14 22:28 ` [kernel-hardening] " Kees Cook
@ 2016-04-14 22:28   ` Kees Cook
  -1 siblings, 0 replies; 80+ messages in thread
From: Kees Cook @ 2016-04-14 22:28 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kees Cook, Yinghai Lu, Baoquan He, Ard Biesheuvel, Matt Redfearn,
	x86, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Vivek Goyal,
	Andy Lutomirski, lasse.collin, Andrew Morton, Dave Young,
	kernel-hardening, LKML

From: Yinghai Lu <yinghai@kernel.org>

This change makes later calculations about where the kernel is located
easier to reason about. To better understand this change, we must first
clarify what VO and ZO are. They were introduced in commits by hpa:

77d1a49 x86, boot: make symbols from the main vmlinux available
37ba7ab x86, boot: make kernel_alignment adjustable; new bzImage fields

Specifically:

VO:
- uncompressed kernel image
- size: VO__end - VO__text ("VO_INIT_SIZE" define)

ZO:
- bootable compressed kernel image (boot/compressed/vmlinux)
- head text + compressed kernel (VO and relocs table) + decompressor code
- size: ZO__end - ZO_startup_32 ("ZO_INIT_SIZE" define, though see below)

The INIT_SIZE definition is used to find the larger of the two image sizes:

 #define ZO_INIT_SIZE    (ZO__end - ZO_startup_32 + ZO_z_extract_offset)
 #define VO_INIT_SIZE    (VO__end - VO__text)
 #if ZO_INIT_SIZE > VO_INIT_SIZE
 #define INIT_SIZE ZO_INIT_SIZE
 #else
 #define INIT_SIZE VO_INIT_SIZE
 #endif

The current code uses extract_offset to decide where to position the
copied ZO (i.e. ZO starts at extract_offset). (This is why ZO_INIT_SIZE
currently includes the extract_offset.)

When INIT_SIZE is bigger than VO_INIT_SIZE (uncommon but possible),
the copied ZO occupies the memory from extract_offset to the end of
decompression buffer. It overlaps with the soon-to-be-uncompressed kernel
like this:

                            |-----compressed kernel image------|
                            V                                  V
0                       extract_offset                      +INIT_SIZE
|-----------|---------------|-------------------------|--------|
            |               |                         |        |
          VO__text      startup_32 of ZO          VO__end    ZO__end
            ^                                         ^
            |-------uncompressed kernel image---------|

When INIT_SIZE is equal to VO_INIT_SIZE (likely) there's still space
left from end of ZO to the end of decompressing buffer, like below.

                            |-compressed kernel image-|
                            V                         V
0                       extract_offset                      +INIT_SIZE
|-----------|---------------|-------------------------|--------|
            |               |                         |        |
          VO__text      startup_32 of ZO          ZO__end    VO__end
            ^                                                  ^
            |------------uncompressed kernel image-------------|

To simplify calculations and avoid special cases, it is cleaner to
always place the compressed kernel image in memory so that ZO__end
is at the end of the decompression buffer, instead of placing that
start extract_offset as is currently done.

This patch adds BP_init_size (which is the INIT_SIZE as passed in from
the boot_params) into asm-offsets.c to make it visible to the assembly
code. Then when moving the ZO, it calculates the starting position of
the copied ZO (via BP_init_size and the ZO run size) so that the VO__end
will be at the end of the decompression buffer. To make the position
calculation safe, the end of ZO is page aligned (and a comment is added
to the existing VO alignment for good measure).

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
[kees: rewrote changelog]
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 arch/x86/boot/compressed/head_32.S     | 11 +++++++++--
 arch/x86/boot/compressed/head_64.S     |  8 ++++++--
 arch/x86/boot/compressed/mkpiggy.c     |  3 ---
 arch/x86/boot/compressed/vmlinux.lds.S |  1 +
 arch/x86/kernel/asm-offsets.c          |  1 +
 arch/x86/kernel/vmlinux.lds.S          |  2 +-
 6 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/arch/x86/boot/compressed/head_32.S b/arch/x86/boot/compressed/head_32.S
index 8ef964ddc18e..0c140f99c602 100644
--- a/arch/x86/boot/compressed/head_32.S
+++ b/arch/x86/boot/compressed/head_32.S
@@ -148,7 +148,9 @@ preferred_addr:
 1:
 
 	/* Target address to relocate to for decompression */
-	addl	$z_extract_offset, %ebx
+	movl    BP_init_size(%esi), %eax
+	subl    $_end, %eax
+	addl    %eax, %ebx
 
 	/* Set up the stack */
 	leal	boot_stack_end(%ebx), %esp
@@ -210,8 +212,13 @@ relocated:
 				/* push arguments for decompress_kernel: */
 	pushl	$z_run_size	/* size of kernel with .bss and .brk */
 	pushl	$z_output_len	/* decompressed length, end of relocs */
-	leal	z_extract_offset_negative(%ebx), %ebp
+
+	movl    BP_init_size(%esi), %eax
+	subl    $_end, %eax
+	movl    %ebx, %ebp
+	subl    %eax, %ebp
 	pushl	%ebp		/* output address */
+
 	pushl	$z_input_len	/* input_len */
 	leal	input_data(%ebx), %eax
 	pushl	%eax		/* input_data */
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index b0c0d16ef58d..67dd8d300c61 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -102,7 +102,9 @@ ENTRY(startup_32)
 1:
 
 	/* Target address to relocate to for decompression */
-	addl	$z_extract_offset, %ebx
+	movl	BP_init_size(%esi), %eax
+	subl	$_end, %eax
+	addl	%eax, %ebx
 
 /*
  * Prepare for entering 64 bit mode
@@ -330,7 +332,9 @@ preferred_addr:
 1:
 
 	/* Target address to relocate to for decompression */
-	leaq	z_extract_offset(%rbp), %rbx
+	movl	BP_init_size(%rsi), %ebx
+	subl	$_end, %ebx
+	addq	%rbp, %rbx
 
 	/* Set up the stack */
 	leaq	boot_stack_end(%rbx), %rsp
diff --git a/arch/x86/boot/compressed/mkpiggy.c b/arch/x86/boot/compressed/mkpiggy.c
index d8222f213182..b980046c3329 100644
--- a/arch/x86/boot/compressed/mkpiggy.c
+++ b/arch/x86/boot/compressed/mkpiggy.c
@@ -85,9 +85,6 @@ int main(int argc, char *argv[])
 	printf("z_output_len = %lu\n", (unsigned long)olen);
 	printf(".globl z_extract_offset\n");
 	printf("z_extract_offset = 0x%lx\n", offs);
-	/* z_extract_offset_negative allows simplification of head_32.S */
-	printf(".globl z_extract_offset_negative\n");
-	printf("z_extract_offset_negative = -0x%lx\n", offs);
 	printf(".globl z_run_size\n");
 	printf("z_run_size = %lu\n", run_size);
 
diff --git a/arch/x86/boot/compressed/vmlinux.lds.S b/arch/x86/boot/compressed/vmlinux.lds.S
index 34d047c98284..e24e0a0c90c9 100644
--- a/arch/x86/boot/compressed/vmlinux.lds.S
+++ b/arch/x86/boot/compressed/vmlinux.lds.S
@@ -70,5 +70,6 @@ SECTIONS
 		_epgtable = . ;
 	}
 #endif
+	. = ALIGN(PAGE_SIZE);	/* keep ZO size page aligned */
 	_end = .;
 }
diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index 5c042466f274..674134e9f5e5 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -80,6 +80,7 @@ void common(void) {
 	OFFSET(BP_hardware_subarch, boot_params, hdr.hardware_subarch);
 	OFFSET(BP_version, boot_params, hdr.version);
 	OFFSET(BP_kernel_alignment, boot_params, hdr.kernel_alignment);
+	OFFSET(BP_init_size, boot_params, hdr.init_size);
 	OFFSET(BP_pref_address, boot_params, hdr.pref_address);
 	OFFSET(BP_code32_start, boot_params, hdr.code32_start);
 
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 4c941f88d405..9297a002d8e5 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -334,7 +334,7 @@ SECTIONS
 		__brk_limit = .;
 	}
 
-	. = ALIGN(PAGE_SIZE);
+	. = ALIGN(PAGE_SIZE);		/* keep VO_INIT_SIZE page aligned */
 	_end = .;
 
         STABS_DEBUG
-- 
2.6.3

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [kernel-hardening] [PATCH v5 04/21] x86, boot: Move compressed kernel to end of decompression buffer
@ 2016-04-14 22:28   ` Kees Cook
  0 siblings, 0 replies; 80+ messages in thread
From: Kees Cook @ 2016-04-14 22:28 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kees Cook, Yinghai Lu, Baoquan He, Ard Biesheuvel, Matt Redfearn,
	x86, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Vivek Goyal,
	Andy Lutomirski, lasse.collin, Andrew Morton, Dave Young,
	kernel-hardening, LKML

From: Yinghai Lu <yinghai@kernel.org>

This change makes later calculations about where the kernel is located
easier to reason about. To better understand this change, we must first
clarify what VO and ZO are. They were introduced in commits by hpa:

77d1a49 x86, boot: make symbols from the main vmlinux available
37ba7ab x86, boot: make kernel_alignment adjustable; new bzImage fields

Specifically:

VO:
- uncompressed kernel image
- size: VO__end - VO__text ("VO_INIT_SIZE" define)

ZO:
- bootable compressed kernel image (boot/compressed/vmlinux)
- head text + compressed kernel (VO and relocs table) + decompressor code
- size: ZO__end - ZO_startup_32 ("ZO_INIT_SIZE" define, though see below)

The INIT_SIZE definition is used to find the larger of the two image sizes:

 #define ZO_INIT_SIZE    (ZO__end - ZO_startup_32 + ZO_z_extract_offset)
 #define VO_INIT_SIZE    (VO__end - VO__text)
 #if ZO_INIT_SIZE > VO_INIT_SIZE
 #define INIT_SIZE ZO_INIT_SIZE
 #else
 #define INIT_SIZE VO_INIT_SIZE
 #endif

The current code uses extract_offset to decide where to position the
copied ZO (i.e. ZO starts at extract_offset). (This is why ZO_INIT_SIZE
currently includes the extract_offset.)

When INIT_SIZE is bigger than VO_INIT_SIZE (uncommon but possible),
the copied ZO occupies the memory from extract_offset to the end of
decompression buffer. It overlaps with the soon-to-be-uncompressed kernel
like this:

                            |-----compressed kernel image------|
                            V                                  V
0                       extract_offset                      +INIT_SIZE
|-----------|---------------|-------------------------|--------|
            |               |                         |        |
          VO__text      startup_32 of ZO          VO__end    ZO__end
            ^                                         ^
            |-------uncompressed kernel image---------|

When INIT_SIZE is equal to VO_INIT_SIZE (likely) there's still space
left from end of ZO to the end of decompressing buffer, like below.

                            |-compressed kernel image-|
                            V                         V
0                       extract_offset                      +INIT_SIZE
|-----------|---------------|-------------------------|--------|
            |               |                         |        |
          VO__text      startup_32 of ZO          ZO__end    VO__end
            ^                                                  ^
            |------------uncompressed kernel image-------------|

To simplify calculations and avoid special cases, it is cleaner to
always place the compressed kernel image in memory so that ZO__end
is at the end of the decompression buffer, instead of placing that
start extract_offset as is currently done.

This patch adds BP_init_size (which is the INIT_SIZE as passed in from
the boot_params) into asm-offsets.c to make it visible to the assembly
code. Then when moving the ZO, it calculates the starting position of
the copied ZO (via BP_init_size and the ZO run size) so that the VO__end
will be at the end of the decompression buffer. To make the position
calculation safe, the end of ZO is page aligned (and a comment is added
to the existing VO alignment for good measure).

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
[kees: rewrote changelog]
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 arch/x86/boot/compressed/head_32.S     | 11 +++++++++--
 arch/x86/boot/compressed/head_64.S     |  8 ++++++--
 arch/x86/boot/compressed/mkpiggy.c     |  3 ---
 arch/x86/boot/compressed/vmlinux.lds.S |  1 +
 arch/x86/kernel/asm-offsets.c          |  1 +
 arch/x86/kernel/vmlinux.lds.S          |  2 +-
 6 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/arch/x86/boot/compressed/head_32.S b/arch/x86/boot/compressed/head_32.S
index 8ef964ddc18e..0c140f99c602 100644
--- a/arch/x86/boot/compressed/head_32.S
+++ b/arch/x86/boot/compressed/head_32.S
@@ -148,7 +148,9 @@ preferred_addr:
 1:
 
 	/* Target address to relocate to for decompression */
-	addl	$z_extract_offset, %ebx
+	movl    BP_init_size(%esi), %eax
+	subl    $_end, %eax
+	addl    %eax, %ebx
 
 	/* Set up the stack */
 	leal	boot_stack_end(%ebx), %esp
@@ -210,8 +212,13 @@ relocated:
 				/* push arguments for decompress_kernel: */
 	pushl	$z_run_size	/* size of kernel with .bss and .brk */
 	pushl	$z_output_len	/* decompressed length, end of relocs */
-	leal	z_extract_offset_negative(%ebx), %ebp
+
+	movl    BP_init_size(%esi), %eax
+	subl    $_end, %eax
+	movl    %ebx, %ebp
+	subl    %eax, %ebp
 	pushl	%ebp		/* output address */
+
 	pushl	$z_input_len	/* input_len */
 	leal	input_data(%ebx), %eax
 	pushl	%eax		/* input_data */
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index b0c0d16ef58d..67dd8d300c61 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -102,7 +102,9 @@ ENTRY(startup_32)
 1:
 
 	/* Target address to relocate to for decompression */
-	addl	$z_extract_offset, %ebx
+	movl	BP_init_size(%esi), %eax
+	subl	$_end, %eax
+	addl	%eax, %ebx
 
 /*
  * Prepare for entering 64 bit mode
@@ -330,7 +332,9 @@ preferred_addr:
 1:
 
 	/* Target address to relocate to for decompression */
-	leaq	z_extract_offset(%rbp), %rbx
+	movl	BP_init_size(%rsi), %ebx
+	subl	$_end, %ebx
+	addq	%rbp, %rbx
 
 	/* Set up the stack */
 	leaq	boot_stack_end(%rbx), %rsp
diff --git a/arch/x86/boot/compressed/mkpiggy.c b/arch/x86/boot/compressed/mkpiggy.c
index d8222f213182..b980046c3329 100644
--- a/arch/x86/boot/compressed/mkpiggy.c
+++ b/arch/x86/boot/compressed/mkpiggy.c
@@ -85,9 +85,6 @@ int main(int argc, char *argv[])
 	printf("z_output_len = %lu\n", (unsigned long)olen);
 	printf(".globl z_extract_offset\n");
 	printf("z_extract_offset = 0x%lx\n", offs);
-	/* z_extract_offset_negative allows simplification of head_32.S */
-	printf(".globl z_extract_offset_negative\n");
-	printf("z_extract_offset_negative = -0x%lx\n", offs);
 	printf(".globl z_run_size\n");
 	printf("z_run_size = %lu\n", run_size);
 
diff --git a/arch/x86/boot/compressed/vmlinux.lds.S b/arch/x86/boot/compressed/vmlinux.lds.S
index 34d047c98284..e24e0a0c90c9 100644
--- a/arch/x86/boot/compressed/vmlinux.lds.S
+++ b/arch/x86/boot/compressed/vmlinux.lds.S
@@ -70,5 +70,6 @@ SECTIONS
 		_epgtable = . ;
 	}
 #endif
+	. = ALIGN(PAGE_SIZE);	/* keep ZO size page aligned */
 	_end = .;
 }
diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index 5c042466f274..674134e9f5e5 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -80,6 +80,7 @@ void common(void) {
 	OFFSET(BP_hardware_subarch, boot_params, hdr.hardware_subarch);
 	OFFSET(BP_version, boot_params, hdr.version);
 	OFFSET(BP_kernel_alignment, boot_params, hdr.kernel_alignment);
+	OFFSET(BP_init_size, boot_params, hdr.init_size);
 	OFFSET(BP_pref_address, boot_params, hdr.pref_address);
 	OFFSET(BP_code32_start, boot_params, hdr.code32_start);
 
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 4c941f88d405..9297a002d8e5 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -334,7 +334,7 @@ SECTIONS
 		__brk_limit = .;
 	}
 
-	. = ALIGN(PAGE_SIZE);
+	. = ALIGN(PAGE_SIZE);		/* keep VO_INIT_SIZE page aligned */
 	_end = .;
 
         STABS_DEBUG
-- 
2.6.3

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v5 05/21] x86, boot: Calculate decompression size during boot not build
  2016-04-14 22:28 ` [kernel-hardening] " Kees Cook
@ 2016-04-14 22:28   ` Kees Cook
  -1 siblings, 0 replies; 80+ messages in thread
From: Kees Cook @ 2016-04-14 22:28 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kees Cook, Yinghai Lu, Baoquan He, Ard Biesheuvel, Matt Redfearn,
	x86, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Vivek Goyal,
	Andy Lutomirski, lasse.collin, Andrew Morton, Dave Young,
	kernel-hardening, LKML

From: Yinghai Lu <yinghai@kernel.org>

Currently z_extract_offset is calculated in boot/compressed/mkpiggy.c.
This doesn't work well because mkpiggy.c doesn't know the details of the
decompressor in use. As a result, it can only make an estimation, which
has risks:

 - output + output_len (VO) could be much bigger than input + input_len
   (ZO). In this case, the decompressed kernel plus relocs could overwrite
   the decompression code while it is running.

 - The head code of ZO could be bigger than z_extract_offset. In this case
   an overwrite could happen when the head code is running to move ZO to
   the end of buffer. Though currently the size of the head code is very
   small it's still a potential risk. Since there is no rule to limit the
   size of the head code of ZO, it runs the risk of suddenly becoming a
   (hard to find) bug.

Instead, this moves the z_extract_offset calculation into header.S, and
makes adjustments to be sure that the above two cases can never happen.

Since we have (previously) made ZO always be located against the end of
decompression buffer, z_extract_offset is only used here to calculate an
appropriate buffer size (INIT_SIZE), and is not longer used elsewhere. As
such, it can be removed from voffset.h.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Baoquan He <bhe@redhat.com>
[kees: rewrote changelog]
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 arch/x86/boot/Makefile             |  2 +-
 arch/x86/boot/compressed/misc.c    |  5 +----
 arch/x86/boot/compressed/mkpiggy.c | 15 +--------------
 arch/x86/boot/header.S             | 23 ++++++++++++++++++++++-
 4 files changed, 25 insertions(+), 20 deletions(-)

diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
index b1ef9e489084..942f7dabfb1e 100644
--- a/arch/x86/boot/Makefile
+++ b/arch/x86/boot/Makefile
@@ -95,7 +95,7 @@ targets += voffset.h
 $(obj)/voffset.h: vmlinux FORCE
 	$(call if_changed,voffset)
 
-sed-zoffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(startup_32\|startup_64\|efi32_stub_entry\|efi64_stub_entry\|efi_pe_entry\|input_data\|_end\|z_.*\)$$/\#define ZO_\2 0x\1/p'
+sed-zoffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(startup_32\|startup_64\|efi32_stub_entry\|efi64_stub_entry\|efi_pe_entry\|input_data\|_end\|_ehead\|_text\|z_.*\)$$/\#define ZO_\2 0x\1/p'
 
 quiet_cmd_zoffset = ZOFFSET $@
       cmd_zoffset = $(NM) $< | sed -n $(sed-zoffset) > $@
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index c4477d5f3fff..e2a998f8c304 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -83,13 +83,10 @@
  * To avoid problems with the compressed data's meta information an extra 18
  * bytes are needed.  Leading to the formula:
  *
- * extra_bytes = (uncompressed_size >> 12) + 32768 + 18 + decompressor_size.
+ * extra_bytes = (uncompressed_size >> 12) + 32768 + 18.
  *
  * Adding 8 bytes per 32K is a bit excessive but much easier to calculate.
  * Adding 32768 instead of 32767 just makes for round numbers.
- * Adding the decompressor_size is necessary as it musht live after all
- * of the data as well.  Last I measured the decompressor is about 14K.
- * 10K of actual data and 4K of bss.
  *
  */
 
diff --git a/arch/x86/boot/compressed/mkpiggy.c b/arch/x86/boot/compressed/mkpiggy.c
index b980046c3329..a613c84d9b88 100644
--- a/arch/x86/boot/compressed/mkpiggy.c
+++ b/arch/x86/boot/compressed/mkpiggy.c
@@ -21,8 +21,7 @@
  * ----------------------------------------------------------------------- */
 
 /*
- * Compute the desired load offset from a compressed program; outputs
- * a small assembly wrapper with the appropriate symbols defined.
+ * outputs a small assembly wrapper with the appropriate symbols defined.
  */
 
 #include <stdlib.h>
@@ -35,7 +34,6 @@ int main(int argc, char *argv[])
 {
 	uint32_t olen;
 	long ilen;
-	unsigned long offs;
 	unsigned long run_size;
 	FILE *f = NULL;
 	int retval = 1;
@@ -67,15 +65,6 @@ int main(int argc, char *argv[])
 	ilen = ftell(f);
 	olen = get_unaligned_le32(&olen);
 
-	/*
-	 * Now we have the input (compressed) and output (uncompressed)
-	 * sizes, compute the necessary decompression offset...
-	 */
-
-	offs = (olen > ilen) ? olen - ilen : 0;
-	offs += olen >> 12;	/* Add 8 bytes for each 32K block */
-	offs += 64*1024 + 128;	/* Add 64K + 128 bytes slack */
-	offs = (offs+4095) & ~4095; /* Round to a 4K boundary */
 	run_size = atoi(argv[2]);
 
 	printf(".section \".rodata..compressed\",\"a\",@progbits\n");
@@ -83,8 +72,6 @@ int main(int argc, char *argv[])
 	printf("z_input_len = %lu\n", ilen);
 	printf(".globl z_output_len\n");
 	printf("z_output_len = %lu\n", (unsigned long)olen);
-	printf(".globl z_extract_offset\n");
-	printf("z_extract_offset = 0x%lx\n", offs);
 	printf(".globl z_run_size\n");
 	printf("z_run_size = %lu\n", run_size);
 
diff --git a/arch/x86/boot/header.S b/arch/x86/boot/header.S
index 6236b9ec4b76..6565dcb2b899 100644
--- a/arch/x86/boot/header.S
+++ b/arch/x86/boot/header.S
@@ -440,7 +440,28 @@ setup_data:		.quad 0			# 64-bit physical pointer to
 
 pref_address:		.quad LOAD_PHYSICAL_ADDR	# preferred load addr
 
-#define ZO_INIT_SIZE	(ZO__end - ZO_startup_32 + ZO_z_extract_offset)
+/* Check arch/x86/boot/compressed/misc.c for the formula of extra_bytes*/
+#define ZO_z_extra_bytes	((ZO_z_output_len >> 12) + 65536 + 128)
+#if ZO_z_output_len > ZO_z_input_len
+#define ZO_z_extract_offset	(ZO_z_output_len + ZO_z_extra_bytes - \
+				 ZO_z_input_len)
+#else
+#define ZO_z_extract_offset	ZO_z_extra_bytes
+#endif
+
+/*
+ * extract_offset has to be bigger than ZO head section. Otherwise
+ * during head code running to move ZO to end of buffer, it will
+ * overwrite head code itself.
+ */
+#if (ZO__ehead - ZO_startup_32) > ZO_z_extract_offset
+#define ZO_z_min_extract_offset ((ZO__ehead - ZO_startup_32 + 4095) & ~4095)
+#else
+#define ZO_z_min_extract_offset ((ZO_z_extract_offset + 4095) & ~4095)
+#endif
+
+#define ZO_INIT_SIZE	(ZO__end - ZO_startup_32 + ZO_z_min_extract_offset)
+
 #define VO_INIT_SIZE	(VO__end - VO__text)
 #if ZO_INIT_SIZE > VO_INIT_SIZE
 #define INIT_SIZE ZO_INIT_SIZE
-- 
2.6.3

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [kernel-hardening] [PATCH v5 05/21] x86, boot: Calculate decompression size during boot not build
@ 2016-04-14 22:28   ` Kees Cook
  0 siblings, 0 replies; 80+ messages in thread
From: Kees Cook @ 2016-04-14 22:28 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kees Cook, Yinghai Lu, Baoquan He, Ard Biesheuvel, Matt Redfearn,
	x86, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Vivek Goyal,
	Andy Lutomirski, lasse.collin, Andrew Morton, Dave Young,
	kernel-hardening, LKML

From: Yinghai Lu <yinghai@kernel.org>

Currently z_extract_offset is calculated in boot/compressed/mkpiggy.c.
This doesn't work well because mkpiggy.c doesn't know the details of the
decompressor in use. As a result, it can only make an estimation, which
has risks:

 - output + output_len (VO) could be much bigger than input + input_len
   (ZO). In this case, the decompressed kernel plus relocs could overwrite
   the decompression code while it is running.

 - The head code of ZO could be bigger than z_extract_offset. In this case
   an overwrite could happen when the head code is running to move ZO to
   the end of buffer. Though currently the size of the head code is very
   small it's still a potential risk. Since there is no rule to limit the
   size of the head code of ZO, it runs the risk of suddenly becoming a
   (hard to find) bug.

Instead, this moves the z_extract_offset calculation into header.S, and
makes adjustments to be sure that the above two cases can never happen.

Since we have (previously) made ZO always be located against the end of
decompression buffer, z_extract_offset is only used here to calculate an
appropriate buffer size (INIT_SIZE), and is not longer used elsewhere. As
such, it can be removed from voffset.h.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Baoquan He <bhe@redhat.com>
[kees: rewrote changelog]
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 arch/x86/boot/Makefile             |  2 +-
 arch/x86/boot/compressed/misc.c    |  5 +----
 arch/x86/boot/compressed/mkpiggy.c | 15 +--------------
 arch/x86/boot/header.S             | 23 ++++++++++++++++++++++-
 4 files changed, 25 insertions(+), 20 deletions(-)

diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
index b1ef9e489084..942f7dabfb1e 100644
--- a/arch/x86/boot/Makefile
+++ b/arch/x86/boot/Makefile
@@ -95,7 +95,7 @@ targets += voffset.h
 $(obj)/voffset.h: vmlinux FORCE
 	$(call if_changed,voffset)
 
-sed-zoffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(startup_32\|startup_64\|efi32_stub_entry\|efi64_stub_entry\|efi_pe_entry\|input_data\|_end\|z_.*\)$$/\#define ZO_\2 0x\1/p'
+sed-zoffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(startup_32\|startup_64\|efi32_stub_entry\|efi64_stub_entry\|efi_pe_entry\|input_data\|_end\|_ehead\|_text\|z_.*\)$$/\#define ZO_\2 0x\1/p'
 
 quiet_cmd_zoffset = ZOFFSET $@
       cmd_zoffset = $(NM) $< | sed -n $(sed-zoffset) > $@
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index c4477d5f3fff..e2a998f8c304 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -83,13 +83,10 @@
  * To avoid problems with the compressed data's meta information an extra 18
  * bytes are needed.  Leading to the formula:
  *
- * extra_bytes = (uncompressed_size >> 12) + 32768 + 18 + decompressor_size.
+ * extra_bytes = (uncompressed_size >> 12) + 32768 + 18.
  *
  * Adding 8 bytes per 32K is a bit excessive but much easier to calculate.
  * Adding 32768 instead of 32767 just makes for round numbers.
- * Adding the decompressor_size is necessary as it musht live after all
- * of the data as well.  Last I measured the decompressor is about 14K.
- * 10K of actual data and 4K of bss.
  *
  */
 
diff --git a/arch/x86/boot/compressed/mkpiggy.c b/arch/x86/boot/compressed/mkpiggy.c
index b980046c3329..a613c84d9b88 100644
--- a/arch/x86/boot/compressed/mkpiggy.c
+++ b/arch/x86/boot/compressed/mkpiggy.c
@@ -21,8 +21,7 @@
  * ----------------------------------------------------------------------- */
 
 /*
- * Compute the desired load offset from a compressed program; outputs
- * a small assembly wrapper with the appropriate symbols defined.
+ * outputs a small assembly wrapper with the appropriate symbols defined.
  */
 
 #include <stdlib.h>
@@ -35,7 +34,6 @@ int main(int argc, char *argv[])
 {
 	uint32_t olen;
 	long ilen;
-	unsigned long offs;
 	unsigned long run_size;
 	FILE *f = NULL;
 	int retval = 1;
@@ -67,15 +65,6 @@ int main(int argc, char *argv[])
 	ilen = ftell(f);
 	olen = get_unaligned_le32(&olen);
 
-	/*
-	 * Now we have the input (compressed) and output (uncompressed)
-	 * sizes, compute the necessary decompression offset...
-	 */
-
-	offs = (olen > ilen) ? olen - ilen : 0;
-	offs += olen >> 12;	/* Add 8 bytes for each 32K block */
-	offs += 64*1024 + 128;	/* Add 64K + 128 bytes slack */
-	offs = (offs+4095) & ~4095; /* Round to a 4K boundary */
 	run_size = atoi(argv[2]);
 
 	printf(".section \".rodata..compressed\",\"a\",@progbits\n");
@@ -83,8 +72,6 @@ int main(int argc, char *argv[])
 	printf("z_input_len = %lu\n", ilen);
 	printf(".globl z_output_len\n");
 	printf("z_output_len = %lu\n", (unsigned long)olen);
-	printf(".globl z_extract_offset\n");
-	printf("z_extract_offset = 0x%lx\n", offs);
 	printf(".globl z_run_size\n");
 	printf("z_run_size = %lu\n", run_size);
 
diff --git a/arch/x86/boot/header.S b/arch/x86/boot/header.S
index 6236b9ec4b76..6565dcb2b899 100644
--- a/arch/x86/boot/header.S
+++ b/arch/x86/boot/header.S
@@ -440,7 +440,28 @@ setup_data:		.quad 0			# 64-bit physical pointer to
 
 pref_address:		.quad LOAD_PHYSICAL_ADDR	# preferred load addr
 
-#define ZO_INIT_SIZE	(ZO__end - ZO_startup_32 + ZO_z_extract_offset)
+/* Check arch/x86/boot/compressed/misc.c for the formula of extra_bytes*/
+#define ZO_z_extra_bytes	((ZO_z_output_len >> 12) + 65536 + 128)
+#if ZO_z_output_len > ZO_z_input_len
+#define ZO_z_extract_offset	(ZO_z_output_len + ZO_z_extra_bytes - \
+				 ZO_z_input_len)
+#else
+#define ZO_z_extract_offset	ZO_z_extra_bytes
+#endif
+
+/*
+ * extract_offset has to be bigger than ZO head section. Otherwise
+ * during head code running to move ZO to end of buffer, it will
+ * overwrite head code itself.
+ */
+#if (ZO__ehead - ZO_startup_32) > ZO_z_extract_offset
+#define ZO_z_min_extract_offset ((ZO__ehead - ZO_startup_32 + 4095) & ~4095)
+#else
+#define ZO_z_min_extract_offset ((ZO_z_extract_offset + 4095) & ~4095)
+#endif
+
+#define ZO_INIT_SIZE	(ZO__end - ZO_startup_32 + ZO_z_min_extract_offset)
+
 #define VO_INIT_SIZE	(VO__end - VO__text)
 #if ZO_INIT_SIZE > VO_INIT_SIZE
 #define INIT_SIZE ZO_INIT_SIZE
-- 
2.6.3

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v5 06/21] x86, KASLR: Update description for decompressor worst case size
  2016-04-14 22:28 ` [kernel-hardening] " Kees Cook
@ 2016-04-14 22:28   ` Kees Cook
  -1 siblings, 0 replies; 80+ messages in thread
From: Kees Cook @ 2016-04-14 22:28 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kees Cook, Baoquan He, Yinghai Lu, Ard Biesheuvel, Matt Redfearn,
	x86, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Vivek Goyal,
	Andy Lutomirski, lasse.collin, Andrew Morton, Dave Young,
	kernel-hardening, LKML

From: Baoquan He <bhe@redhat.com>

The comment that describes the analysis for the size of the decompressor
code only took gzip into account (there are 6 other decompressors that
could be used). The actual z_extract_offset calculation in code was
already handling the correct maximum size, but this documentation hadn't
been updated. This updates the documentation and fixes several typos.

Signed-off-by: Baoquan He <bhe@redhat.com>
[kees: rewrote changelog, cleaned up comment style]
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 arch/x86/boot/compressed/misc.c | 26 +++++++++++++++++---------
 1 file changed, 17 insertions(+), 9 deletions(-)

diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index e2a998f8c304..31e2d6155643 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -19,11 +19,13 @@
  */
 
 /*
- * Getting to provable safe in place decompression is hard.
- * Worst case behaviours need to be analyzed.
- * Background information:
+ * Getting to provable safe in place decompression is hard. Worst case
+ * behaviours need be analyzed. Here let's take decompressing gzip-compressed
+ * kernel as example to illustrate it.
+ *
+ * The file layout of gzip compressed kernel is as follows. For more
+ * information, please refer to RFC 1951 and RFC 1952.
  *
- * The file layout is:
  *    magic[2]
  *    method[1]
  *    flags[1]
@@ -70,13 +72,13 @@
  * of 5 bytes per 32767 bytes.
  *
  * The worst case internal to a compressed block is very hard to figure.
- * The worst case can at least be boundined by having one bit that represents
+ * The worst case can at least be bounded by having one bit that represents
  * 32764 bytes and then all of the rest of the bytes representing the very
  * very last byte.
  *
  * All of which is enough to compute an amount of extra data that is required
  * to be safe.  To avoid problems at the block level allocating 5 extra bytes
- * per 32767 bytes of data is sufficient.  To avoind problems internal to a
+ * per 32767 bytes of data is sufficient.  To avoid problems internal to a
  * block adding an extra 32767 bytes (the worst case uncompressed block size)
  * is sufficient, to ensure that in the worst case the decompressed data for
  * block will stop the byte before the compressed data for a block begins.
@@ -88,11 +90,17 @@
  * Adding 8 bytes per 32K is a bit excessive but much easier to calculate.
  * Adding 32768 instead of 32767 just makes for round numbers.
  *
+ * Above analysis is for decompressing gzip compressed kernel only. Up to
+ * now 6 different decompressor are supported all together. And among them
+ * xz stores data in chunks and has maximum chunk of 64K. Hence safety
+ * margin should be updated to cover all decompressors so that we don't
+ * need to deal with each of them separately. Please check
+ * the description in lib/decompressor_xxx.c for specific information.
+ *
+ * extra_bytes = (uncompressed_size >> 12) + 65536 + 128.
+ *
  */
 
-/*
- * gzip declarations
- */
 #define STATIC		static
 
 #undef memcpy
-- 
2.6.3

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [kernel-hardening] [PATCH v5 06/21] x86, KASLR: Update description for decompressor worst case size
@ 2016-04-14 22:28   ` Kees Cook
  0 siblings, 0 replies; 80+ messages in thread
From: Kees Cook @ 2016-04-14 22:28 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kees Cook, Baoquan He, Yinghai Lu, Ard Biesheuvel, Matt Redfearn,
	x86, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Vivek Goyal,
	Andy Lutomirski, lasse.collin, Andrew Morton, Dave Young,
	kernel-hardening, LKML

From: Baoquan He <bhe@redhat.com>

The comment that describes the analysis for the size of the decompressor
code only took gzip into account (there are 6 other decompressors that
could be used). The actual z_extract_offset calculation in code was
already handling the correct maximum size, but this documentation hadn't
been updated. This updates the documentation and fixes several typos.

Signed-off-by: Baoquan He <bhe@redhat.com>
[kees: rewrote changelog, cleaned up comment style]
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 arch/x86/boot/compressed/misc.c | 26 +++++++++++++++++---------
 1 file changed, 17 insertions(+), 9 deletions(-)

diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index e2a998f8c304..31e2d6155643 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -19,11 +19,13 @@
  */
 
 /*
- * Getting to provable safe in place decompression is hard.
- * Worst case behaviours need to be analyzed.
- * Background information:
+ * Getting to provable safe in place decompression is hard. Worst case
+ * behaviours need be analyzed. Here let's take decompressing gzip-compressed
+ * kernel as example to illustrate it.
+ *
+ * The file layout of gzip compressed kernel is as follows. For more
+ * information, please refer to RFC 1951 and RFC 1952.
  *
- * The file layout is:
  *    magic[2]
  *    method[1]
  *    flags[1]
@@ -70,13 +72,13 @@
  * of 5 bytes per 32767 bytes.
  *
  * The worst case internal to a compressed block is very hard to figure.
- * The worst case can at least be boundined by having one bit that represents
+ * The worst case can at least be bounded by having one bit that represents
  * 32764 bytes and then all of the rest of the bytes representing the very
  * very last byte.
  *
  * All of which is enough to compute an amount of extra data that is required
  * to be safe.  To avoid problems at the block level allocating 5 extra bytes
- * per 32767 bytes of data is sufficient.  To avoind problems internal to a
+ * per 32767 bytes of data is sufficient.  To avoid problems internal to a
  * block adding an extra 32767 bytes (the worst case uncompressed block size)
  * is sufficient, to ensure that in the worst case the decompressed data for
  * block will stop the byte before the compressed data for a block begins.
@@ -88,11 +90,17 @@
  * Adding 8 bytes per 32K is a bit excessive but much easier to calculate.
  * Adding 32768 instead of 32767 just makes for round numbers.
  *
+ * Above analysis is for decompressing gzip compressed kernel only. Up to
+ * now 6 different decompressor are supported all together. And among them
+ * xz stores data in chunks and has maximum chunk of 64K. Hence safety
+ * margin should be updated to cover all decompressors so that we don't
+ * need to deal with each of them separately. Please check
+ * the description in lib/decompressor_xxx.c for specific information.
+ *
+ * extra_bytes = (uncompressed_size >> 12) + 65536 + 128.
+ *
  */
 
-/*
- * gzip declarations
- */
 #define STATIC		static
 
 #undef memcpy
-- 
2.6.3

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v5 07/21] x86, boot: Fix run_size calculation
  2016-04-14 22:28 ` [kernel-hardening] " Kees Cook
@ 2016-04-14 22:29   ` Kees Cook
  -1 siblings, 0 replies; 80+ messages in thread
From: Kees Cook @ 2016-04-14 22:29 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kees Cook, Yinghai Lu, Junjie Mao, Josh Triplett, Andrew Morton,
	Baoquan He, Ard Biesheuvel, Matt Redfearn, x86, H. Peter Anvin,
	Ingo Molnar, Borislav Petkov, Vivek Goyal, Andy Lutomirski,
	lasse.collin, Dave Young, kernel-hardening, LKML

From: Yinghai Lu <yinghai@kernel.org>

Currently, the kernel run_size (size of code plus brk and bss) is
calculated via the shell script arch/x86/tools/calc_run_size.sh. It gets
the file offset and mem size of for the .bss and .brk sections in vmlinux,
and adds them as follows:

run_size=$(( $offsetA + $sizeA + $sizeB ))

However, this is not correct (it is too large). To illustrate, here's
a walk-through of the script's calculation, compared to the correct way
to find it.

First, offsetA is found as the starting address of the first .bss or
.brk section seen in the ELF file. The sizeA and sizeB values are the
respective section sizes.

[bhe@x1 linux]$ objdump -h vmlinux

vmlinux:     file format elf64-x86-64

Sections:
Idx Name    Size      VMA               LMA               File off  Algn
 27 .bss    00170000  ffffffff81ec8000  0000000001ec8000  012c8000  2**12
            ALLOC
 28 .brk    00027000  ffffffff82038000  0000000002038000  012c8000  2**0
            ALLOC

Here, offsetA is 0x012c8000, with sizeA at 0x00170000 and sizeB at
0x00027000. The resulting run_size is 0x145f000:

0x012c8000 + 0x00170000 + 0x00027000 = 0x145f000

However, if we instead examine the ELF LOAD program headers, we see a
different picture.

[bhe@x1 linux]$ readelf -l vmlinux

Elf file type is EXEC (Executable file)
Entry point 0x1000000
There are 5 program headers, starting at offset 64

Program Headers:
  Type        Offset             VirtAddr           PhysAddr
              FileSiz            MemSiz              Flags  Align
  LOAD        0x0000000000200000 0xffffffff81000000 0x0000000001000000
              0x0000000000b5e000 0x0000000000b5e000  R E    200000
  LOAD        0x0000000000e00000 0xffffffff81c00000 0x0000000001c00000
              0x0000000000145000 0x0000000000145000  RW     200000
  LOAD        0x0000000001000000 0x0000000000000000 0x0000000001d45000
              0x0000000000018158 0x0000000000018158  RW     200000
  LOAD        0x000000000115e000 0xffffffff81d5e000 0x0000000001d5e000
              0x000000000016a000 0x0000000000301000  RWE    200000
  NOTE        0x000000000099bcac 0xffffffff8179bcac 0x000000000179bcac
              0x00000000000001bc 0x00000000000001bc         4

 Section to Segment mapping:
  Segment Sections...
   00     .text .notes __ex_table .rodata __bug_table .pci_fixup .tracedata
          __ksymtab __ksymtab_gpl __ksymtab_strings __init_rodata __param
          __modver
   01     .data .vvar
   02     .data..percpu
   03     .init.text .init.data .x86_cpu_dev.init .parainstructions
          .altinstructions .altinstr_replacement .iommu_table .apicdrivers
          .exit.text .smp_locks .bss .brk
   04     .notes

As mentioned, run_size needs to be the size of the running kernel
including .bss and .brk. We can see from the Section/Segment mapping
above that .bss and .brk are included in segment 03 (which corresponds
to the final LOAD program header). To find the run_size, we calculate
the end of the LOAD segment from its PhysAddr start (0x0000000001d5e000)
and its MemSiz (0x0000000000301000), minus the physical load address of
the kernel (the first LOAD segment's PhysAddr: 0x0000000001000000). The
resulting run_size is 0x105f000:

0x0000000001d5e000 + 0x0000000000301000 - 0x0000000001000000 = 0x105f000

So, from this we can see that the existing run_size calculation is
0x400000 too high. And, as it turns out, the correct run_size is
actually equal to VO_end - VO_text, which is certainly easier to calculate.
_end: 0xffffffff8205f000
_text:0xffffffff81000000

0xffffffff8205f000 - 0xffffffff81000000 = 0x105f000

As a result, run_size is a simple constant, so we don't need to pass it
around; we already have voffset.h such things. We can share voffset.h
between misc.c and header.S instead of getting run_size in other ways.
This patch moves voffset.h creation code to boot/compressed/Makefile,
and switches misc.c to use the VO_end - VO_text calculation.

Dependence before:
boot/header.S ==> boot/voffset.h ==> vmlinux
boot/header.S ==> compressed/vmlinux ==> compressed/misc.c

Dependence after:
boot/header.S ==> compressed/vmlinux ==> compressed/misc.c ==>
                  boot/voffset.h ==> vmlinux

Fixes: e6023367d779 ("x86, kaslr: Prevent .bss from overlaping initrd")
Cc: Junjie Mao <eternal.n08@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Baoquan He <bhe@redhat.com>
[kees: rewrote changelog]
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 arch/x86/boot/Makefile            | 11 +----------
 arch/x86/boot/compressed/Makefile | 12 ++++++++++++
 arch/x86/boot/compressed/misc.c   |  3 +++
 3 files changed, 16 insertions(+), 10 deletions(-)

diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
index 942f7dabfb1e..700a9c6e6159 100644
--- a/arch/x86/boot/Makefile
+++ b/arch/x86/boot/Makefile
@@ -86,15 +86,6 @@ $(obj)/vmlinux.bin: $(obj)/compressed/vmlinux FORCE
 
 SETUP_OBJS = $(addprefix $(obj)/,$(setup-y))
 
-sed-voffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(_text\|_end\)$$/\#define VO_\2 0x\1/p'
-
-quiet_cmd_voffset = VOFFSET $@
-      cmd_voffset = $(NM) $< | sed -n $(sed-voffset) > $@
-
-targets += voffset.h
-$(obj)/voffset.h: vmlinux FORCE
-	$(call if_changed,voffset)
-
 sed-zoffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(startup_32\|startup_64\|efi32_stub_entry\|efi64_stub_entry\|efi_pe_entry\|input_data\|_end\|_ehead\|_text\|z_.*\)$$/\#define ZO_\2 0x\1/p'
 
 quiet_cmd_zoffset = ZOFFSET $@
@@ -106,7 +97,7 @@ $(obj)/zoffset.h: $(obj)/compressed/vmlinux FORCE
 
 
 AFLAGS_header.o += -I$(obj)
-$(obj)/header.o: $(obj)/voffset.h $(obj)/zoffset.h
+$(obj)/header.o: $(obj)/zoffset.h
 
 LDFLAGS_setup.elf	:= -T
 $(obj)/setup.elf: $(src)/setup.ld $(SETUP_OBJS) FORCE
diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index 6915ff2bd996..323674a9d6db 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -45,6 +45,18 @@ LDFLAGS_vmlinux := -T
 hostprogs-y	:= mkpiggy
 HOST_EXTRACFLAGS += -I$(srctree)/tools/include
 
+sed-voffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(_text\|_end\)$$/\#define VO_\2 _AC(0x\1,UL)/p'
+
+quiet_cmd_voffset = VOFFSET $@
+      cmd_voffset = $(NM) $< | sed -n $(sed-voffset) > $@
+
+targets += ../voffset.h
+
+$(obj)/../voffset.h: vmlinux FORCE
+	$(call if_changed,voffset)
+
+$(obj)/misc.o: $(obj)/../voffset.h
+
 vmlinux-objs-y := $(obj)/vmlinux.lds $(obj)/head_$(BITS).o $(obj)/misc.o \
 	$(obj)/string.o $(obj)/cmdline.o \
 	$(obj)/piggy.o $(obj)/cpuflags.o
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 31e2d6155643..4d878a7b9fab 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -11,6 +11,7 @@
 
 #include "misc.h"
 #include "../string.h"
+#include "../voffset.h"
 
 /* WARNING!!
  * This code is compiled with -fPIC and it is relocated dynamically
@@ -415,6 +416,8 @@ asmlinkage __visible void *decompress_kernel(void *rmode, memptr heap,
 	lines = real_mode->screen_info.orig_video_lines;
 	cols = real_mode->screen_info.orig_video_cols;
 
+	run_size = VO__end - VO__text;
+
 	console_init();
 	debug_putstr("early console in decompress_kernel\n");
 
-- 
2.6.3

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [kernel-hardening] [PATCH v5 07/21] x86, boot: Fix run_size calculation
@ 2016-04-14 22:29   ` Kees Cook
  0 siblings, 0 replies; 80+ messages in thread
From: Kees Cook @ 2016-04-14 22:29 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kees Cook, Yinghai Lu, Junjie Mao, Josh Triplett, Andrew Morton,
	Baoquan He, Ard Biesheuvel, Matt Redfearn, x86, H. Peter Anvin,
	Ingo Molnar, Borislav Petkov, Vivek Goyal, Andy Lutomirski,
	lasse.collin, Dave Young, kernel-hardening, LKML

From: Yinghai Lu <yinghai@kernel.org>

Currently, the kernel run_size (size of code plus brk and bss) is
calculated via the shell script arch/x86/tools/calc_run_size.sh. It gets
the file offset and mem size of for the .bss and .brk sections in vmlinux,
and adds them as follows:

run_size=$(( $offsetA + $sizeA + $sizeB ))

However, this is not correct (it is too large). To illustrate, here's
a walk-through of the script's calculation, compared to the correct way
to find it.

First, offsetA is found as the starting address of the first .bss or
.brk section seen in the ELF file. The sizeA and sizeB values are the
respective section sizes.

[bhe@x1 linux]$ objdump -h vmlinux

vmlinux:     file format elf64-x86-64

Sections:
Idx Name    Size      VMA               LMA               File off  Algn
 27 .bss    00170000  ffffffff81ec8000  0000000001ec8000  012c8000  2**12
            ALLOC
 28 .brk    00027000  ffffffff82038000  0000000002038000  012c8000  2**0
            ALLOC

Here, offsetA is 0x012c8000, with sizeA at 0x00170000 and sizeB at
0x00027000. The resulting run_size is 0x145f000:

0x012c8000 + 0x00170000 + 0x00027000 = 0x145f000

However, if we instead examine the ELF LOAD program headers, we see a
different picture.

[bhe@x1 linux]$ readelf -l vmlinux

Elf file type is EXEC (Executable file)
Entry point 0x1000000
There are 5 program headers, starting at offset 64

Program Headers:
  Type        Offset             VirtAddr           PhysAddr
              FileSiz            MemSiz              Flags  Align
  LOAD        0x0000000000200000 0xffffffff81000000 0x0000000001000000
              0x0000000000b5e000 0x0000000000b5e000  R E    200000
  LOAD        0x0000000000e00000 0xffffffff81c00000 0x0000000001c00000
              0x0000000000145000 0x0000000000145000  RW     200000
  LOAD        0x0000000001000000 0x0000000000000000 0x0000000001d45000
              0x0000000000018158 0x0000000000018158  RW     200000
  LOAD        0x000000000115e000 0xffffffff81d5e000 0x0000000001d5e000
              0x000000000016a000 0x0000000000301000  RWE    200000
  NOTE        0x000000000099bcac 0xffffffff8179bcac 0x000000000179bcac
              0x00000000000001bc 0x00000000000001bc         4

 Section to Segment mapping:
  Segment Sections...
   00     .text .notes __ex_table .rodata __bug_table .pci_fixup .tracedata
          __ksymtab __ksymtab_gpl __ksymtab_strings __init_rodata __param
          __modver
   01     .data .vvar
   02     .data..percpu
   03     .init.text .init.data .x86_cpu_dev.init .parainstructions
          .altinstructions .altinstr_replacement .iommu_table .apicdrivers
          .exit.text .smp_locks .bss .brk
   04     .notes

As mentioned, run_size needs to be the size of the running kernel
including .bss and .brk. We can see from the Section/Segment mapping
above that .bss and .brk are included in segment 03 (which corresponds
to the final LOAD program header). To find the run_size, we calculate
the end of the LOAD segment from its PhysAddr start (0x0000000001d5e000)
and its MemSiz (0x0000000000301000), minus the physical load address of
the kernel (the first LOAD segment's PhysAddr: 0x0000000001000000). The
resulting run_size is 0x105f000:

0x0000000001d5e000 + 0x0000000000301000 - 0x0000000001000000 = 0x105f000

So, from this we can see that the existing run_size calculation is
0x400000 too high. And, as it turns out, the correct run_size is
actually equal to VO_end - VO_text, which is certainly easier to calculate.
_end: 0xffffffff8205f000
_text:0xffffffff81000000

0xffffffff8205f000 - 0xffffffff81000000 = 0x105f000

As a result, run_size is a simple constant, so we don't need to pass it
around; we already have voffset.h such things. We can share voffset.h
between misc.c and header.S instead of getting run_size in other ways.
This patch moves voffset.h creation code to boot/compressed/Makefile,
and switches misc.c to use the VO_end - VO_text calculation.

Dependence before:
boot/header.S ==> boot/voffset.h ==> vmlinux
boot/header.S ==> compressed/vmlinux ==> compressed/misc.c

Dependence after:
boot/header.S ==> compressed/vmlinux ==> compressed/misc.c ==>
                  boot/voffset.h ==> vmlinux

Fixes: e6023367d779 ("x86, kaslr: Prevent .bss from overlaping initrd")
Cc: Junjie Mao <eternal.n08@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Baoquan He <bhe@redhat.com>
[kees: rewrote changelog]
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 arch/x86/boot/Makefile            | 11 +----------
 arch/x86/boot/compressed/Makefile | 12 ++++++++++++
 arch/x86/boot/compressed/misc.c   |  3 +++
 3 files changed, 16 insertions(+), 10 deletions(-)

diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
index 942f7dabfb1e..700a9c6e6159 100644
--- a/arch/x86/boot/Makefile
+++ b/arch/x86/boot/Makefile
@@ -86,15 +86,6 @@ $(obj)/vmlinux.bin: $(obj)/compressed/vmlinux FORCE
 
 SETUP_OBJS = $(addprefix $(obj)/,$(setup-y))
 
-sed-voffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(_text\|_end\)$$/\#define VO_\2 0x\1/p'
-
-quiet_cmd_voffset = VOFFSET $@
-      cmd_voffset = $(NM) $< | sed -n $(sed-voffset) > $@
-
-targets += voffset.h
-$(obj)/voffset.h: vmlinux FORCE
-	$(call if_changed,voffset)
-
 sed-zoffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(startup_32\|startup_64\|efi32_stub_entry\|efi64_stub_entry\|efi_pe_entry\|input_data\|_end\|_ehead\|_text\|z_.*\)$$/\#define ZO_\2 0x\1/p'
 
 quiet_cmd_zoffset = ZOFFSET $@
@@ -106,7 +97,7 @@ $(obj)/zoffset.h: $(obj)/compressed/vmlinux FORCE
 
 
 AFLAGS_header.o += -I$(obj)
-$(obj)/header.o: $(obj)/voffset.h $(obj)/zoffset.h
+$(obj)/header.o: $(obj)/zoffset.h
 
 LDFLAGS_setup.elf	:= -T
 $(obj)/setup.elf: $(src)/setup.ld $(SETUP_OBJS) FORCE
diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index 6915ff2bd996..323674a9d6db 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -45,6 +45,18 @@ LDFLAGS_vmlinux := -T
 hostprogs-y	:= mkpiggy
 HOST_EXTRACFLAGS += -I$(srctree)/tools/include
 
+sed-voffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(_text\|_end\)$$/\#define VO_\2 _AC(0x\1,UL)/p'
+
+quiet_cmd_voffset = VOFFSET $@
+      cmd_voffset = $(NM) $< | sed -n $(sed-voffset) > $@
+
+targets += ../voffset.h
+
+$(obj)/../voffset.h: vmlinux FORCE
+	$(call if_changed,voffset)
+
+$(obj)/misc.o: $(obj)/../voffset.h
+
 vmlinux-objs-y := $(obj)/vmlinux.lds $(obj)/head_$(BITS).o $(obj)/misc.o \
 	$(obj)/string.o $(obj)/cmdline.o \
 	$(obj)/piggy.o $(obj)/cpuflags.o
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 31e2d6155643..4d878a7b9fab 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -11,6 +11,7 @@
 
 #include "misc.h"
 #include "../string.h"
+#include "../voffset.h"
 
 /* WARNING!!
  * This code is compiled with -fPIC and it is relocated dynamically
@@ -415,6 +416,8 @@ asmlinkage __visible void *decompress_kernel(void *rmode, memptr heap,
 	lines = real_mode->screen_info.orig_video_lines;
 	cols = real_mode->screen_info.orig_video_cols;
 
+	run_size = VO__end - VO__text;
+
 	console_init();
 	debug_putstr("early console in decompress_kernel\n");
 
-- 
2.6.3

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v5 08/21] x86, KASLR: Clean up unused code from old run_size
  2016-04-14 22:28 ` [kernel-hardening] " Kees Cook
@ 2016-04-14 22:29   ` Kees Cook
  -1 siblings, 0 replies; 80+ messages in thread
From: Kees Cook @ 2016-04-14 22:29 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kees Cook, Yinghai Lu, H. Peter Anvin, Josh Triplett,
	Andrew Morton, Ard Biesheuvel, Junjie Mao, Baoquan He,
	Matt Redfearn, x86, Ingo Molnar, Borislav Petkov, Vivek Goyal,
	Andy Lutomirski, lasse.collin, Dave Young, kernel-hardening,
	LKML

From: Yinghai Lu <yinghai@kernel.org>

Since run_size is now calculated in misc.c, the old script and associated
argument passing is no longer needed. This patch removes them.

Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Junjie Mao <eternal.n08@gmail.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Baoquan He <bhe@redhat.com>
[kees: rewrote changelog]
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 arch/x86/boot/compressed/Makefile  |  4 +---
 arch/x86/boot/compressed/head_32.S |  3 +--
 arch/x86/boot/compressed/head_64.S |  3 ---
 arch/x86/boot/compressed/misc.c    |  6 ++----
 arch/x86/boot/compressed/mkpiggy.c | 10 ++-------
 arch/x86/tools/calc_run_size.sh    | 42 --------------------------------------
 6 files changed, 6 insertions(+), 62 deletions(-)
 delete mode 100644 arch/x86/tools/calc_run_size.sh

diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index 323674a9d6db..81a8d0fd34eb 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -109,10 +109,8 @@ suffix-$(CONFIG_KERNEL_XZ)	:= xz
 suffix-$(CONFIG_KERNEL_LZO) 	:= lzo
 suffix-$(CONFIG_KERNEL_LZ4) 	:= lz4
 
-RUN_SIZE = $(shell $(OBJDUMP) -h vmlinux | \
-	     $(CONFIG_SHELL) $(srctree)/arch/x86/tools/calc_run_size.sh)
 quiet_cmd_mkpiggy = MKPIGGY $@
-      cmd_mkpiggy = $(obj)/mkpiggy $< $(RUN_SIZE) > $@ || ( rm -f $@ ; false )
+      cmd_mkpiggy = $(obj)/mkpiggy $< > $@ || ( rm -f $@ ; false )
 
 targets += piggy.S
 $(obj)/piggy.S: $(obj)/vmlinux.bin.$(suffix-y) $(obj)/mkpiggy FORCE
diff --git a/arch/x86/boot/compressed/head_32.S b/arch/x86/boot/compressed/head_32.S
index 0c140f99c602..122b32faf3ef 100644
--- a/arch/x86/boot/compressed/head_32.S
+++ b/arch/x86/boot/compressed/head_32.S
@@ -210,7 +210,6 @@ relocated:
  * Do the decompression, and jump to the new kernel..
  */
 				/* push arguments for decompress_kernel: */
-	pushl	$z_run_size	/* size of kernel with .bss and .brk */
 	pushl	$z_output_len	/* decompressed length, end of relocs */
 
 	movl    BP_init_size(%esi), %eax
@@ -226,7 +225,7 @@ relocated:
 	pushl	%eax		/* heap area */
 	pushl	%esi		/* real mode pointer */
 	call	decompress_kernel /* returns kernel location in %eax */
-	addl	$28, %esp
+	addl	$24, %esp
 
 /*
  * Jump to the decompressed kernel.
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 67dd8d300c61..36914515dd76 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -407,8 +407,6 @@ relocated:
  * Do the decompression, and jump to the new kernel..
  */
 	pushq	%rsi			/* Save the real mode argument */
-	movq	$z_run_size, %r9	/* size of kernel with .bss and .brk */
-	pushq	%r9
 	movq	%rsi, %rdi		/* real mode address */
 	leaq	boot_heap(%rip), %rsi	/* malloc area for uncompression */
 	leaq	input_data(%rip), %rdx  /* input_data */
@@ -416,7 +414,6 @@ relocated:
 	movq	%rbp, %r8		/* output target address */
 	movq	$z_output_len, %r9	/* decompressed length, end of relocs */
 	call	decompress_kernel	/* returns kernel location in %rax */
-	popq	%r9
 	popq	%rsi
 
 /*
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 4d878a7b9fab..6adb5b827aca 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -393,9 +393,9 @@ asmlinkage __visible void *decompress_kernel(void *rmode, memptr heap,
 				  unsigned char *input_data,
 				  unsigned long input_len,
 				  unsigned char *output,
-				  unsigned long output_len,
-				  unsigned long run_size)
+				  unsigned long output_len)
 {
+	const unsigned long run_size = VO__end - VO__text;
 	unsigned char *output_orig = output;
 
 	real_mode = rmode;
@@ -416,8 +416,6 @@ asmlinkage __visible void *decompress_kernel(void *rmode, memptr heap,
 	lines = real_mode->screen_info.orig_video_lines;
 	cols = real_mode->screen_info.orig_video_cols;
 
-	run_size = VO__end - VO__text;
-
 	console_init();
 	debug_putstr("early console in decompress_kernel\n");
 
diff --git a/arch/x86/boot/compressed/mkpiggy.c b/arch/x86/boot/compressed/mkpiggy.c
index a613c84d9b88..c51486429bdc 100644
--- a/arch/x86/boot/compressed/mkpiggy.c
+++ b/arch/x86/boot/compressed/mkpiggy.c
@@ -34,13 +34,11 @@ int main(int argc, char *argv[])
 {
 	uint32_t olen;
 	long ilen;
-	unsigned long run_size;
 	FILE *f = NULL;
 	int retval = 1;
 
-	if (argc < 3) {
-		fprintf(stderr, "Usage: %s compressed_file run_size\n",
-				argv[0]);
+	if (argc < 2) {
+		fprintf(stderr, "Usage: %s compressed_file\n", argv[0]);
 		goto bail;
 	}
 
@@ -65,15 +63,11 @@ int main(int argc, char *argv[])
 	ilen = ftell(f);
 	olen = get_unaligned_le32(&olen);
 
-	run_size = atoi(argv[2]);
-
 	printf(".section \".rodata..compressed\",\"a\",@progbits\n");
 	printf(".globl z_input_len\n");
 	printf("z_input_len = %lu\n", ilen);
 	printf(".globl z_output_len\n");
 	printf("z_output_len = %lu\n", (unsigned long)olen);
-	printf(".globl z_run_size\n");
-	printf("z_run_size = %lu\n", run_size);
 
 	printf(".globl input_data, input_data_end\n");
 	printf("input_data:\n");
diff --git a/arch/x86/tools/calc_run_size.sh b/arch/x86/tools/calc_run_size.sh
deleted file mode 100644
index 1a4c17bb3910..000000000000
--- a/arch/x86/tools/calc_run_size.sh
+++ /dev/null
@@ -1,42 +0,0 @@
-#!/bin/sh
-#
-# Calculate the amount of space needed to run the kernel, including room for
-# the .bss and .brk sections.
-#
-# Usage:
-# objdump -h a.out | sh calc_run_size.sh
-
-NUM='\([0-9a-fA-F]*[ \t]*\)'
-OUT=$(sed -n 's/^[ \t0-9]*.b[sr][sk][ \t]*'"$NUM$NUM$NUM$NUM"'.*/\1\4/p')
-if [ -z "$OUT" ] ; then
-	echo "Never found .bss or .brk file offset" >&2
-	exit 1
-fi
-
-OUT=$(echo ${OUT# })
-sizeA=$(printf "%d" 0x${OUT%% *})
-OUT=${OUT#* }
-offsetA=$(printf "%d" 0x${OUT%% *})
-OUT=${OUT#* }
-sizeB=$(printf "%d" 0x${OUT%% *})
-OUT=${OUT#* }
-offsetB=$(printf "%d" 0x${OUT%% *})
-
-run_size=$(( $offsetA + $sizeA + $sizeB ))
-
-# BFD linker shows the same file offset in ELF.
-if [ "$offsetA" -ne "$offsetB" ] ; then
-	# Gold linker shows them as consecutive.
-	endB=$(( $offsetB + $sizeB ))
-	if [ "$endB" != "$run_size" ] ; then
-		printf "sizeA: 0x%x\n" $sizeA >&2
-		printf "offsetA: 0x%x\n" $offsetA >&2
-		printf "sizeB: 0x%x\n" $sizeB >&2
-		printf "offsetB: 0x%x\n" $offsetB >&2
-		echo ".bss and .brk are non-contiguous" >&2
-		exit 1
-	fi
-fi
-
-printf "%d\n" $run_size
-exit 0
-- 
2.6.3

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [kernel-hardening] [PATCH v5 08/21] x86, KASLR: Clean up unused code from old run_size
@ 2016-04-14 22:29   ` Kees Cook
  0 siblings, 0 replies; 80+ messages in thread
From: Kees Cook @ 2016-04-14 22:29 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kees Cook, Yinghai Lu, H. Peter Anvin, Josh Triplett,
	Andrew Morton, Ard Biesheuvel, Junjie Mao, Baoquan He,
	Matt Redfearn, x86, Ingo Molnar, Borislav Petkov, Vivek Goyal,
	Andy Lutomirski, lasse.collin, Dave Young, kernel-hardening,
	LKML

From: Yinghai Lu <yinghai@kernel.org>

Since run_size is now calculated in misc.c, the old script and associated
argument passing is no longer needed. This patch removes them.

Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Junjie Mao <eternal.n08@gmail.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Baoquan He <bhe@redhat.com>
[kees: rewrote changelog]
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 arch/x86/boot/compressed/Makefile  |  4 +---
 arch/x86/boot/compressed/head_32.S |  3 +--
 arch/x86/boot/compressed/head_64.S |  3 ---
 arch/x86/boot/compressed/misc.c    |  6 ++----
 arch/x86/boot/compressed/mkpiggy.c | 10 ++-------
 arch/x86/tools/calc_run_size.sh    | 42 --------------------------------------
 6 files changed, 6 insertions(+), 62 deletions(-)
 delete mode 100644 arch/x86/tools/calc_run_size.sh

diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index 323674a9d6db..81a8d0fd34eb 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -109,10 +109,8 @@ suffix-$(CONFIG_KERNEL_XZ)	:= xz
 suffix-$(CONFIG_KERNEL_LZO) 	:= lzo
 suffix-$(CONFIG_KERNEL_LZ4) 	:= lz4
 
-RUN_SIZE = $(shell $(OBJDUMP) -h vmlinux | \
-	     $(CONFIG_SHELL) $(srctree)/arch/x86/tools/calc_run_size.sh)
 quiet_cmd_mkpiggy = MKPIGGY $@
-      cmd_mkpiggy = $(obj)/mkpiggy $< $(RUN_SIZE) > $@ || ( rm -f $@ ; false )
+      cmd_mkpiggy = $(obj)/mkpiggy $< > $@ || ( rm -f $@ ; false )
 
 targets += piggy.S
 $(obj)/piggy.S: $(obj)/vmlinux.bin.$(suffix-y) $(obj)/mkpiggy FORCE
diff --git a/arch/x86/boot/compressed/head_32.S b/arch/x86/boot/compressed/head_32.S
index 0c140f99c602..122b32faf3ef 100644
--- a/arch/x86/boot/compressed/head_32.S
+++ b/arch/x86/boot/compressed/head_32.S
@@ -210,7 +210,6 @@ relocated:
  * Do the decompression, and jump to the new kernel..
  */
 				/* push arguments for decompress_kernel: */
-	pushl	$z_run_size	/* size of kernel with .bss and .brk */
 	pushl	$z_output_len	/* decompressed length, end of relocs */
 
 	movl    BP_init_size(%esi), %eax
@@ -226,7 +225,7 @@ relocated:
 	pushl	%eax		/* heap area */
 	pushl	%esi		/* real mode pointer */
 	call	decompress_kernel /* returns kernel location in %eax */
-	addl	$28, %esp
+	addl	$24, %esp
 
 /*
  * Jump to the decompressed kernel.
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 67dd8d300c61..36914515dd76 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -407,8 +407,6 @@ relocated:
  * Do the decompression, and jump to the new kernel..
  */
 	pushq	%rsi			/* Save the real mode argument */
-	movq	$z_run_size, %r9	/* size of kernel with .bss and .brk */
-	pushq	%r9
 	movq	%rsi, %rdi		/* real mode address */
 	leaq	boot_heap(%rip), %rsi	/* malloc area for uncompression */
 	leaq	input_data(%rip), %rdx  /* input_data */
@@ -416,7 +414,6 @@ relocated:
 	movq	%rbp, %r8		/* output target address */
 	movq	$z_output_len, %r9	/* decompressed length, end of relocs */
 	call	decompress_kernel	/* returns kernel location in %rax */
-	popq	%r9
 	popq	%rsi
 
 /*
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 4d878a7b9fab..6adb5b827aca 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -393,9 +393,9 @@ asmlinkage __visible void *decompress_kernel(void *rmode, memptr heap,
 				  unsigned char *input_data,
 				  unsigned long input_len,
 				  unsigned char *output,
-				  unsigned long output_len,
-				  unsigned long run_size)
+				  unsigned long output_len)
 {
+	const unsigned long run_size = VO__end - VO__text;
 	unsigned char *output_orig = output;
 
 	real_mode = rmode;
@@ -416,8 +416,6 @@ asmlinkage __visible void *decompress_kernel(void *rmode, memptr heap,
 	lines = real_mode->screen_info.orig_video_lines;
 	cols = real_mode->screen_info.orig_video_cols;
 
-	run_size = VO__end - VO__text;
-
 	console_init();
 	debug_putstr("early console in decompress_kernel\n");
 
diff --git a/arch/x86/boot/compressed/mkpiggy.c b/arch/x86/boot/compressed/mkpiggy.c
index a613c84d9b88..c51486429bdc 100644
--- a/arch/x86/boot/compressed/mkpiggy.c
+++ b/arch/x86/boot/compressed/mkpiggy.c
@@ -34,13 +34,11 @@ int main(int argc, char *argv[])
 {
 	uint32_t olen;
 	long ilen;
-	unsigned long run_size;
 	FILE *f = NULL;
 	int retval = 1;
 
-	if (argc < 3) {
-		fprintf(stderr, "Usage: %s compressed_file run_size\n",
-				argv[0]);
+	if (argc < 2) {
+		fprintf(stderr, "Usage: %s compressed_file\n", argv[0]);
 		goto bail;
 	}
 
@@ -65,15 +63,11 @@ int main(int argc, char *argv[])
 	ilen = ftell(f);
 	olen = get_unaligned_le32(&olen);
 
-	run_size = atoi(argv[2]);
-
 	printf(".section \".rodata..compressed\",\"a\",@progbits\n");
 	printf(".globl z_input_len\n");
 	printf("z_input_len = %lu\n", ilen);
 	printf(".globl z_output_len\n");
 	printf("z_output_len = %lu\n", (unsigned long)olen);
-	printf(".globl z_run_size\n");
-	printf("z_run_size = %lu\n", run_size);
 
 	printf(".globl input_data, input_data_end\n");
 	printf("input_data:\n");
diff --git a/arch/x86/tools/calc_run_size.sh b/arch/x86/tools/calc_run_size.sh
deleted file mode 100644
index 1a4c17bb3910..000000000000
--- a/arch/x86/tools/calc_run_size.sh
+++ /dev/null
@@ -1,42 +0,0 @@
-#!/bin/sh
-#
-# Calculate the amount of space needed to run the kernel, including room for
-# the .bss and .brk sections.
-#
-# Usage:
-# objdump -h a.out | sh calc_run_size.sh
-
-NUM='\([0-9a-fA-F]*[ \t]*\)'
-OUT=$(sed -n 's/^[ \t0-9]*.b[sr][sk][ \t]*'"$NUM$NUM$NUM$NUM"'.*/\1\4/p')
-if [ -z "$OUT" ] ; then
-	echo "Never found .bss or .brk file offset" >&2
-	exit 1
-fi
-
-OUT=$(echo ${OUT# })
-sizeA=$(printf "%d" 0x${OUT%% *})
-OUT=${OUT#* }
-offsetA=$(printf "%d" 0x${OUT%% *})
-OUT=${OUT#* }
-sizeB=$(printf "%d" 0x${OUT%% *})
-OUT=${OUT#* }
-offsetB=$(printf "%d" 0x${OUT%% *})
-
-run_size=$(( $offsetA + $sizeA + $sizeB ))
-
-# BFD linker shows the same file offset in ELF.
-if [ "$offsetA" -ne "$offsetB" ] ; then
-	# Gold linker shows them as consecutive.
-	endB=$(( $offsetB + $sizeB ))
-	if [ "$endB" != "$run_size" ] ; then
-		printf "sizeA: 0x%x\n" $sizeA >&2
-		printf "offsetA: 0x%x\n" $offsetA >&2
-		printf "sizeB: 0x%x\n" $sizeB >&2
-		printf "offsetB: 0x%x\n" $offsetB >&2
-		echo ".bss and .brk are non-contiguous" >&2
-		exit 1
-	fi
-fi
-
-printf "%d\n" $run_size
-exit 0
-- 
2.6.3

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v5 09/21] x86, KASLR: Correctly bounds-check relocations
  2016-04-14 22:28 ` [kernel-hardening] " Kees Cook
@ 2016-04-14 22:29   ` Kees Cook
  -1 siblings, 0 replies; 80+ messages in thread
From: Kees Cook @ 2016-04-14 22:29 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kees Cook, Yinghai Lu, Baoquan He, Ard Biesheuvel, Matt Redfearn,
	x86, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Vivek Goyal,
	Andy Lutomirski, lasse.collin, Andrew Morton, Dave Young,
	kernel-hardening, LKML

From: Yinghai Lu <yinghai@kernel.org>

Relocation handling performs bounds checking on the resulting calculated
addresses. The existing code uses output_len (VO size plus relocs size) as
the max address. This is not right since the max_addr check should stop at
the end of VO and exclude bss, brk, etc, which follows.  The valid range
should be VO [_text, __bss_start] in the loaded physical address space.

This patch adds an export for __bss_start in voffset.h and uses it to
get the correct max_addr.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
[kees: rewrote changelog]
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 arch/x86/boot/compressed/Makefile | 2 +-
 arch/x86/boot/compressed/misc.c   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index 81a8d0fd34eb..aa421126556d 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -45,7 +45,7 @@ LDFLAGS_vmlinux := -T
 hostprogs-y	:= mkpiggy
 HOST_EXTRACFLAGS += -I$(srctree)/tools/include
 
-sed-voffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(_text\|_end\)$$/\#define VO_\2 _AC(0x\1,UL)/p'
+sed-voffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(_text\|__bss_start\|_end\)$$/\#define VO_\2 _AC(0x\1,UL)/p'
 
 quiet_cmd_voffset = VOFFSET $@
       cmd_voffset = $(NM) $< | sed -n $(sed-voffset) > $@
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 6adb5b827aca..562d647289ac 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -259,7 +259,7 @@ static void handle_relocations(void *output, unsigned long output_len)
 	int *reloc;
 	unsigned long delta, map, ptr;
 	unsigned long min_addr = (unsigned long)output;
-	unsigned long max_addr = min_addr + output_len;
+	unsigned long max_addr = min_addr + (VO___bss_start - VO__text);
 
 	/*
 	 * Calculate the delta between where vmlinux was linked to load
-- 
2.6.3

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [kernel-hardening] [PATCH v5 09/21] x86, KASLR: Correctly bounds-check relocations
@ 2016-04-14 22:29   ` Kees Cook
  0 siblings, 0 replies; 80+ messages in thread
From: Kees Cook @ 2016-04-14 22:29 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kees Cook, Yinghai Lu, Baoquan He, Ard Biesheuvel, Matt Redfearn,
	x86, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Vivek Goyal,
	Andy Lutomirski, lasse.collin, Andrew Morton, Dave Young,
	kernel-hardening, LKML

From: Yinghai Lu <yinghai@kernel.org>

Relocation handling performs bounds checking on the resulting calculated
addresses. The existing code uses output_len (VO size plus relocs size) as
the max address. This is not right since the max_addr check should stop at
the end of VO and exclude bss, brk, etc, which follows.  The valid range
should be VO [_text, __bss_start] in the loaded physical address space.

This patch adds an export for __bss_start in voffset.h and uses it to
get the correct max_addr.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
[kees: rewrote changelog]
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 arch/x86/boot/compressed/Makefile | 2 +-
 arch/x86/boot/compressed/misc.c   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index 81a8d0fd34eb..aa421126556d 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -45,7 +45,7 @@ LDFLAGS_vmlinux := -T
 hostprogs-y	:= mkpiggy
 HOST_EXTRACFLAGS += -I$(srctree)/tools/include
 
-sed-voffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(_text\|_end\)$$/\#define VO_\2 _AC(0x\1,UL)/p'
+sed-voffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(_text\|__bss_start\|_end\)$$/\#define VO_\2 _AC(0x\1,UL)/p'
 
 quiet_cmd_voffset = VOFFSET $@
       cmd_voffset = $(NM) $< | sed -n $(sed-voffset) > $@
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 6adb5b827aca..562d647289ac 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -259,7 +259,7 @@ static void handle_relocations(void *output, unsigned long output_len)
 	int *reloc;
 	unsigned long delta, map, ptr;
 	unsigned long min_addr = (unsigned long)output;
-	unsigned long max_addr = min_addr + output_len;
+	unsigned long max_addr = min_addr + (VO___bss_start - VO__text);
 
 	/*
 	 * Calculate the delta between where vmlinux was linked to load
-- 
2.6.3

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v5 10/21] x86, KASLR: Consolidate mem_avoid entries
  2016-04-14 22:28 ` [kernel-hardening] " Kees Cook
@ 2016-04-14 22:29   ` Kees Cook
  -1 siblings, 0 replies; 80+ messages in thread
From: Kees Cook @ 2016-04-14 22:29 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kees Cook, Yinghai Lu, Baoquan He, Ard Biesheuvel, Matt Redfearn,
	x86, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Vivek Goyal,
	Andy Lutomirski, lasse.collin, Andrew Morton, Dave Young,
	kernel-hardening, LKML

From: Yinghai Lu <yinghai@kernel.org>

The mem_avoid array is used to track positions that should be avoided
(like the compressed kernel, decompression code, etc) when selecting a
memory position for the relocated kernel. Since ZO is now at the end of
the decompression buffer and the decompression code (and its heap and
stack) are at the front, we can safely consolidate the decompression
entry, the heap entry, and the stack entry. The boot_params memory,
however, could be elsewhere, so it should be explicitly included.

Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Baoquan He <bhe@redhat.com>
[kees: rewrote changelog, cleaned up code comment]
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 arch/x86/boot/compressed/aslr.c | 72 ++++++++++++++++++++++++++++++++---------
 1 file changed, 56 insertions(+), 16 deletions(-)

diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index 462654097d63..277c7a4ec3a3 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -109,7 +109,7 @@ struct mem_vector {
 	unsigned long size;
 };
 
-#define MEM_AVOID_MAX 5
+#define MEM_AVOID_MAX 4
 static struct mem_vector mem_avoid[MEM_AVOID_MAX];
 
 static bool mem_contains(struct mem_vector *region, struct mem_vector *item)
@@ -134,22 +134,66 @@ static bool mem_overlaps(struct mem_vector *one, struct mem_vector *two)
 	return true;
 }
 
+/*
+ * In theroy, KASLR can put the kernel anywhere in area of [16M, 64T). The
+ * mem_avoid array is used to store the ranges that need be avoided when
+ * KASLR searches for the relocated address. We must avoid any regions that
+ * are unsafe to overlap with during decompression, and other things like
+ * the initrd, cmdline and boot_params.
+ *
+ * How to calculate the unsafe area used by decompression is detailed here.
+ * The compressed vmlinux (ZO) plus relocs and the run space of ZO can't be
+ * overwritten by decompression output.
+ *
+ * ZO sits against the end of the decompression buffer, so we can calculate
+ * where text, data, bss, etc of ZO are positioned.
+ *
+ * The follow are already enforced by the code:
+ *  - init_size >= run_size,
+ *  - input+input_len >= output+output_len,
+ *  - run_size could be >= or < output_len
+ *
+ * From this, we can make several observations, illustrated by a diagram:
+ *  - init_size > run_size
+ *  - input+input_len > output+output_len
+ *  - run_size > output_len
+ *
+ * 0   output            input            input+input_len    output+init_size
+ * |     |                 |                       |                       |
+ * |-----|--------|--------|------------------|----|------------|----------|
+ *                |                           |                 |
+ *   output+init_size-ZO_INIT_SIZE    output+output_len    output+run_size
+ *
+ * [output, output+init_size) is for the buffer for decompressing the
+ * compressed kernel (ZO).
+ *
+ * [output, output+run_size) is for the uncompressed kernel (VO) run size.
+ * [output, output+output_len) is VO plus relocs
+ *
+ * [output+init_size-ZO_INIT_SIZE, output+init_size) is copied ZO.
+ * [input, input+input_len) is copied compressed (VO (vmlinux after objcopy)
+ * plus relocs), not the ZO.
+ *
+ * [input+input_len, output+init_size) is [_text, _end) for ZO. That was the
+ * first range in mem_avoid, which included ZO's heap and stack. Also
+ * [input, input+input_size) need be put in mem_avoid array, but since it
+ * is adjacent to the first entry, they can be merged. This is how we get
+ * the first entry in mem_avoid[].
+ */
 static void mem_avoid_init(unsigned long input, unsigned long input_size,
-			   unsigned long output, unsigned long output_size)
+			   unsigned long output)
 {
+	unsigned long init_size = real_mode->hdr.init_size;
 	u64 initrd_start, initrd_size;
 	u64 cmd_line, cmd_line_size;
-	unsigned long unsafe, unsafe_len;
 	char *ptr;
 
 	/*
 	 * Avoid the region that is unsafe to overlap during
-	 * decompression (see calculations at top of misc.c).
+	 * decompression.
 	 */
-	unsafe_len = (output_size >> 12) + 32768 + 18;
-	unsafe = (unsigned long)input + input_size - unsafe_len;
-	mem_avoid[0].start = unsafe;
-	mem_avoid[0].size = unsafe_len;
+	mem_avoid[0].start = input;
+	mem_avoid[0].size = (output + init_size) - input;
 
 	/* Avoid initrd. */
 	initrd_start  = (u64)real_mode->ext_ramdisk_image << 32;
@@ -169,13 +213,9 @@ static void mem_avoid_init(unsigned long input, unsigned long input_size,
 	mem_avoid[2].start = cmd_line;
 	mem_avoid[2].size = cmd_line_size;
 
-	/* Avoid heap memory. */
-	mem_avoid[3].start = (unsigned long)free_mem_ptr;
-	mem_avoid[3].size = BOOT_HEAP_SIZE;
-
-	/* Avoid stack memory. */
-	mem_avoid[4].start = (unsigned long)free_mem_end_ptr;
-	mem_avoid[4].size = BOOT_STACK_SIZE;
+	/* Avoid params */
+	mem_avoid[3].start = (unsigned long)real_mode;
+	mem_avoid[3].size = sizeof(*real_mode);
 }
 
 /* Does this memory vector overlap a known avoided area? */
@@ -317,7 +357,7 @@ unsigned char *choose_kernel_location(unsigned char *input,
 
 	/* Record the various known unsafe memory ranges. */
 	mem_avoid_init((unsigned long)input, input_size,
-		       (unsigned long)output, output_size);
+		       (unsigned long)output);
 
 	/* Walk e820 and find a random address. */
 	random = find_random_addr(choice, output_size);
-- 
2.6.3

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [kernel-hardening] [PATCH v5 10/21] x86, KASLR: Consolidate mem_avoid entries
@ 2016-04-14 22:29   ` Kees Cook
  0 siblings, 0 replies; 80+ messages in thread
From: Kees Cook @ 2016-04-14 22:29 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kees Cook, Yinghai Lu, Baoquan He, Ard Biesheuvel, Matt Redfearn,
	x86, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Vivek Goyal,
	Andy Lutomirski, lasse.collin, Andrew Morton, Dave Young,
	kernel-hardening, LKML

From: Yinghai Lu <yinghai@kernel.org>

The mem_avoid array is used to track positions that should be avoided
(like the compressed kernel, decompression code, etc) when selecting a
memory position for the relocated kernel. Since ZO is now at the end of
the decompression buffer and the decompression code (and its heap and
stack) are at the front, we can safely consolidate the decompression
entry, the heap entry, and the stack entry. The boot_params memory,
however, could be elsewhere, so it should be explicitly included.

Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Baoquan He <bhe@redhat.com>
[kees: rewrote changelog, cleaned up code comment]
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 arch/x86/boot/compressed/aslr.c | 72 ++++++++++++++++++++++++++++++++---------
 1 file changed, 56 insertions(+), 16 deletions(-)

diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index 462654097d63..277c7a4ec3a3 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -109,7 +109,7 @@ struct mem_vector {
 	unsigned long size;
 };
 
-#define MEM_AVOID_MAX 5
+#define MEM_AVOID_MAX 4
 static struct mem_vector mem_avoid[MEM_AVOID_MAX];
 
 static bool mem_contains(struct mem_vector *region, struct mem_vector *item)
@@ -134,22 +134,66 @@ static bool mem_overlaps(struct mem_vector *one, struct mem_vector *two)
 	return true;
 }
 
+/*
+ * In theroy, KASLR can put the kernel anywhere in area of [16M, 64T). The
+ * mem_avoid array is used to store the ranges that need be avoided when
+ * KASLR searches for the relocated address. We must avoid any regions that
+ * are unsafe to overlap with during decompression, and other things like
+ * the initrd, cmdline and boot_params.
+ *
+ * How to calculate the unsafe area used by decompression is detailed here.
+ * The compressed vmlinux (ZO) plus relocs and the run space of ZO can't be
+ * overwritten by decompression output.
+ *
+ * ZO sits against the end of the decompression buffer, so we can calculate
+ * where text, data, bss, etc of ZO are positioned.
+ *
+ * The follow are already enforced by the code:
+ *  - init_size >= run_size,
+ *  - input+input_len >= output+output_len,
+ *  - run_size could be >= or < output_len
+ *
+ * From this, we can make several observations, illustrated by a diagram:
+ *  - init_size > run_size
+ *  - input+input_len > output+output_len
+ *  - run_size > output_len
+ *
+ * 0   output            input            input+input_len    output+init_size
+ * |     |                 |                       |                       |
+ * |-----|--------|--------|------------------|----|------------|----------|
+ *                |                           |                 |
+ *   output+init_size-ZO_INIT_SIZE    output+output_len    output+run_size
+ *
+ * [output, output+init_size) is for the buffer for decompressing the
+ * compressed kernel (ZO).
+ *
+ * [output, output+run_size) is for the uncompressed kernel (VO) run size.
+ * [output, output+output_len) is VO plus relocs
+ *
+ * [output+init_size-ZO_INIT_SIZE, output+init_size) is copied ZO.
+ * [input, input+input_len) is copied compressed (VO (vmlinux after objcopy)
+ * plus relocs), not the ZO.
+ *
+ * [input+input_len, output+init_size) is [_text, _end) for ZO. That was the
+ * first range in mem_avoid, which included ZO's heap and stack. Also
+ * [input, input+input_size) need be put in mem_avoid array, but since it
+ * is adjacent to the first entry, they can be merged. This is how we get
+ * the first entry in mem_avoid[].
+ */
 static void mem_avoid_init(unsigned long input, unsigned long input_size,
-			   unsigned long output, unsigned long output_size)
+			   unsigned long output)
 {
+	unsigned long init_size = real_mode->hdr.init_size;
 	u64 initrd_start, initrd_size;
 	u64 cmd_line, cmd_line_size;
-	unsigned long unsafe, unsafe_len;
 	char *ptr;
 
 	/*
 	 * Avoid the region that is unsafe to overlap during
-	 * decompression (see calculations at top of misc.c).
+	 * decompression.
 	 */
-	unsafe_len = (output_size >> 12) + 32768 + 18;
-	unsafe = (unsigned long)input + input_size - unsafe_len;
-	mem_avoid[0].start = unsafe;
-	mem_avoid[0].size = unsafe_len;
+	mem_avoid[0].start = input;
+	mem_avoid[0].size = (output + init_size) - input;
 
 	/* Avoid initrd. */
 	initrd_start  = (u64)real_mode->ext_ramdisk_image << 32;
@@ -169,13 +213,9 @@ static void mem_avoid_init(unsigned long input, unsigned long input_size,
 	mem_avoid[2].start = cmd_line;
 	mem_avoid[2].size = cmd_line_size;
 
-	/* Avoid heap memory. */
-	mem_avoid[3].start = (unsigned long)free_mem_ptr;
-	mem_avoid[3].size = BOOT_HEAP_SIZE;
-
-	/* Avoid stack memory. */
-	mem_avoid[4].start = (unsigned long)free_mem_end_ptr;
-	mem_avoid[4].size = BOOT_STACK_SIZE;
+	/* Avoid params */
+	mem_avoid[3].start = (unsigned long)real_mode;
+	mem_avoid[3].size = sizeof(*real_mode);
 }
 
 /* Does this memory vector overlap a known avoided area? */
@@ -317,7 +357,7 @@ unsigned char *choose_kernel_location(unsigned char *input,
 
 	/* Record the various known unsafe memory ranges. */
 	mem_avoid_init((unsigned long)input, input_size,
-		       (unsigned long)output, output_size);
+		       (unsigned long)output);
 
 	/* Walk e820 and find a random address. */
 	random = find_random_addr(choice, output_size);
-- 
2.6.3

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v5 11/21] x86, boot: Split out kernel_ident_mapping_init
  2016-04-14 22:28 ` [kernel-hardening] " Kees Cook
@ 2016-04-14 22:29   ` Kees Cook
  -1 siblings, 0 replies; 80+ messages in thread
From: Kees Cook @ 2016-04-14 22:29 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kees Cook, Yinghai Lu, Baoquan He, Ard Biesheuvel, Matt Redfearn,
	x86, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Vivek Goyal,
	Andy Lutomirski, lasse.collin, Andrew Morton, Dave Young,
	kernel-hardening, LKML

From: Yinghai Lu <yinghai@kernel.org>

In order to support on-demand page table creation when moving the
kernel for KASLR, we need to use kernel_ident_mapping_init in the
decompression code. This splits it out into its own file for use outside
of init_64.c. Additionally, checking for __pa/__va defines is added
since they need to be overridden in the decompression code.

Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
[kees: rewrote changelog]
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 arch/x86/include/asm/page.h |  5 +++
 arch/x86/mm/ident_map.c     | 74 +++++++++++++++++++++++++++++++++++++++++++++
 arch/x86/mm/init_64.c       | 74 +--------------------------------------------
 3 files changed, 80 insertions(+), 73 deletions(-)
 create mode 100644 arch/x86/mm/ident_map.c

diff --git a/arch/x86/include/asm/page.h b/arch/x86/include/asm/page.h
index 802dde30c928..cf8f619b305f 100644
--- a/arch/x86/include/asm/page.h
+++ b/arch/x86/include/asm/page.h
@@ -37,7 +37,10 @@ static inline void copy_user_page(void *to, void *from, unsigned long vaddr,
 	alloc_page_vma(GFP_HIGHUSER | __GFP_ZERO | movableflags, vma, vaddr)
 #define __HAVE_ARCH_ALLOC_ZEROED_USER_HIGHPAGE
 
+#ifndef __pa
 #define __pa(x)		__phys_addr((unsigned long)(x))
+#endif
+
 #define __pa_nodebug(x)	__phys_addr_nodebug((unsigned long)(x))
 /* __pa_symbol should be used for C visible symbols.
    This seems to be the official gcc blessed way to do such arithmetic. */
@@ -51,7 +54,9 @@ static inline void copy_user_page(void *to, void *from, unsigned long vaddr,
 #define __pa_symbol(x) \
 	__phys_addr_symbol(__phys_reloc_hide((unsigned long)(x)))
 
+#ifndef __va
 #define __va(x)			((void *)((unsigned long)(x)+PAGE_OFFSET))
+#endif
 
 #define __boot_va(x)		__va(x)
 #define __boot_pa(x)		__pa(x)
diff --git a/arch/x86/mm/ident_map.c b/arch/x86/mm/ident_map.c
new file mode 100644
index 000000000000..751ca920773a
--- /dev/null
+++ b/arch/x86/mm/ident_map.c
@@ -0,0 +1,74 @@
+
+static void ident_pmd_init(unsigned long pmd_flag, pmd_t *pmd_page,
+			   unsigned long addr, unsigned long end)
+{
+	addr &= PMD_MASK;
+	for (; addr < end; addr += PMD_SIZE) {
+		pmd_t *pmd = pmd_page + pmd_index(addr);
+
+		if (!pmd_present(*pmd))
+			set_pmd(pmd, __pmd(addr | pmd_flag));
+	}
+}
+static int ident_pud_init(struct x86_mapping_info *info, pud_t *pud_page,
+			  unsigned long addr, unsigned long end)
+{
+	unsigned long next;
+
+	for (; addr < end; addr = next) {
+		pud_t *pud = pud_page + pud_index(addr);
+		pmd_t *pmd;
+
+		next = (addr & PUD_MASK) + PUD_SIZE;
+		if (next > end)
+			next = end;
+
+		if (pud_present(*pud)) {
+			pmd = pmd_offset(pud, 0);
+			ident_pmd_init(info->pmd_flag, pmd, addr, next);
+			continue;
+		}
+		pmd = (pmd_t *)info->alloc_pgt_page(info->context);
+		if (!pmd)
+			return -ENOMEM;
+		ident_pmd_init(info->pmd_flag, pmd, addr, next);
+		set_pud(pud, __pud(__pa(pmd) | _KERNPG_TABLE));
+	}
+
+	return 0;
+}
+
+int kernel_ident_mapping_init(struct x86_mapping_info *info, pgd_t *pgd_page,
+			      unsigned long addr, unsigned long end)
+{
+	unsigned long next;
+	int result;
+	int off = info->kernel_mapping ? pgd_index(__PAGE_OFFSET) : 0;
+
+	for (; addr < end; addr = next) {
+		pgd_t *pgd = pgd_page + pgd_index(addr) + off;
+		pud_t *pud;
+
+		next = (addr & PGDIR_MASK) + PGDIR_SIZE;
+		if (next > end)
+			next = end;
+
+		if (pgd_present(*pgd)) {
+			pud = pud_offset(pgd, 0);
+			result = ident_pud_init(info, pud, addr, next);
+			if (result)
+				return result;
+			continue;
+		}
+
+		pud = (pud_t *)info->alloc_pgt_page(info->context);
+		if (!pud)
+			return -ENOMEM;
+		result = ident_pud_init(info, pud, addr, next);
+		if (result)
+			return result;
+		set_pgd(pgd, __pgd(__pa(pud) | _KERNPG_TABLE));
+	}
+
+	return 0;
+}
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 214afda97911..65cfbeefbec4 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -58,79 +58,7 @@
 
 #include "mm_internal.h"
 
-static void ident_pmd_init(unsigned long pmd_flag, pmd_t *pmd_page,
-			   unsigned long addr, unsigned long end)
-{
-	addr &= PMD_MASK;
-	for (; addr < end; addr += PMD_SIZE) {
-		pmd_t *pmd = pmd_page + pmd_index(addr);
-
-		if (!pmd_present(*pmd))
-			set_pmd(pmd, __pmd(addr | pmd_flag));
-	}
-}
-static int ident_pud_init(struct x86_mapping_info *info, pud_t *pud_page,
-			  unsigned long addr, unsigned long end)
-{
-	unsigned long next;
-
-	for (; addr < end; addr = next) {
-		pud_t *pud = pud_page + pud_index(addr);
-		pmd_t *pmd;
-
-		next = (addr & PUD_MASK) + PUD_SIZE;
-		if (next > end)
-			next = end;
-
-		if (pud_present(*pud)) {
-			pmd = pmd_offset(pud, 0);
-			ident_pmd_init(info->pmd_flag, pmd, addr, next);
-			continue;
-		}
-		pmd = (pmd_t *)info->alloc_pgt_page(info->context);
-		if (!pmd)
-			return -ENOMEM;
-		ident_pmd_init(info->pmd_flag, pmd, addr, next);
-		set_pud(pud, __pud(__pa(pmd) | _KERNPG_TABLE));
-	}
-
-	return 0;
-}
-
-int kernel_ident_mapping_init(struct x86_mapping_info *info, pgd_t *pgd_page,
-			      unsigned long addr, unsigned long end)
-{
-	unsigned long next;
-	int result;
-	int off = info->kernel_mapping ? pgd_index(__PAGE_OFFSET) : 0;
-
-	for (; addr < end; addr = next) {
-		pgd_t *pgd = pgd_page + pgd_index(addr) + off;
-		pud_t *pud;
-
-		next = (addr & PGDIR_MASK) + PGDIR_SIZE;
-		if (next > end)
-			next = end;
-
-		if (pgd_present(*pgd)) {
-			pud = pud_offset(pgd, 0);
-			result = ident_pud_init(info, pud, addr, next);
-			if (result)
-				return result;
-			continue;
-		}
-
-		pud = (pud_t *)info->alloc_pgt_page(info->context);
-		if (!pud)
-			return -ENOMEM;
-		result = ident_pud_init(info, pud, addr, next);
-		if (result)
-			return result;
-		set_pgd(pgd, __pgd(__pa(pud) | _KERNPG_TABLE));
-	}
-
-	return 0;
-}
+#include "ident_map.c"
 
 /*
  * NOTE: pagetable_init alloc all the fixmap pagetables contiguous on the
-- 
2.6.3

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [kernel-hardening] [PATCH v5 11/21] x86, boot: Split out kernel_ident_mapping_init
@ 2016-04-14 22:29   ` Kees Cook
  0 siblings, 0 replies; 80+ messages in thread
From: Kees Cook @ 2016-04-14 22:29 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kees Cook, Yinghai Lu, Baoquan He, Ard Biesheuvel, Matt Redfearn,
	x86, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Vivek Goyal,
	Andy Lutomirski, lasse.collin, Andrew Morton, Dave Young,
	kernel-hardening, LKML

From: Yinghai Lu <yinghai@kernel.org>

In order to support on-demand page table creation when moving the
kernel for KASLR, we need to use kernel_ident_mapping_init in the
decompression code. This splits it out into its own file for use outside
of init_64.c. Additionally, checking for __pa/__va defines is added
since they need to be overridden in the decompression code.

Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
[kees: rewrote changelog]
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 arch/x86/include/asm/page.h |  5 +++
 arch/x86/mm/ident_map.c     | 74 +++++++++++++++++++++++++++++++++++++++++++++
 arch/x86/mm/init_64.c       | 74 +--------------------------------------------
 3 files changed, 80 insertions(+), 73 deletions(-)
 create mode 100644 arch/x86/mm/ident_map.c

diff --git a/arch/x86/include/asm/page.h b/arch/x86/include/asm/page.h
index 802dde30c928..cf8f619b305f 100644
--- a/arch/x86/include/asm/page.h
+++ b/arch/x86/include/asm/page.h
@@ -37,7 +37,10 @@ static inline void copy_user_page(void *to, void *from, unsigned long vaddr,
 	alloc_page_vma(GFP_HIGHUSER | __GFP_ZERO | movableflags, vma, vaddr)
 #define __HAVE_ARCH_ALLOC_ZEROED_USER_HIGHPAGE
 
+#ifndef __pa
 #define __pa(x)		__phys_addr((unsigned long)(x))
+#endif
+
 #define __pa_nodebug(x)	__phys_addr_nodebug((unsigned long)(x))
 /* __pa_symbol should be used for C visible symbols.
    This seems to be the official gcc blessed way to do such arithmetic. */
@@ -51,7 +54,9 @@ static inline void copy_user_page(void *to, void *from, unsigned long vaddr,
 #define __pa_symbol(x) \
 	__phys_addr_symbol(__phys_reloc_hide((unsigned long)(x)))
 
+#ifndef __va
 #define __va(x)			((void *)((unsigned long)(x)+PAGE_OFFSET))
+#endif
 
 #define __boot_va(x)		__va(x)
 #define __boot_pa(x)		__pa(x)
diff --git a/arch/x86/mm/ident_map.c b/arch/x86/mm/ident_map.c
new file mode 100644
index 000000000000..751ca920773a
--- /dev/null
+++ b/arch/x86/mm/ident_map.c
@@ -0,0 +1,74 @@
+
+static void ident_pmd_init(unsigned long pmd_flag, pmd_t *pmd_page,
+			   unsigned long addr, unsigned long end)
+{
+	addr &= PMD_MASK;
+	for (; addr < end; addr += PMD_SIZE) {
+		pmd_t *pmd = pmd_page + pmd_index(addr);
+
+		if (!pmd_present(*pmd))
+			set_pmd(pmd, __pmd(addr | pmd_flag));
+	}
+}
+static int ident_pud_init(struct x86_mapping_info *info, pud_t *pud_page,
+			  unsigned long addr, unsigned long end)
+{
+	unsigned long next;
+
+	for (; addr < end; addr = next) {
+		pud_t *pud = pud_page + pud_index(addr);
+		pmd_t *pmd;
+
+		next = (addr & PUD_MASK) + PUD_SIZE;
+		if (next > end)
+			next = end;
+
+		if (pud_present(*pud)) {
+			pmd = pmd_offset(pud, 0);
+			ident_pmd_init(info->pmd_flag, pmd, addr, next);
+			continue;
+		}
+		pmd = (pmd_t *)info->alloc_pgt_page(info->context);
+		if (!pmd)
+			return -ENOMEM;
+		ident_pmd_init(info->pmd_flag, pmd, addr, next);
+		set_pud(pud, __pud(__pa(pmd) | _KERNPG_TABLE));
+	}
+
+	return 0;
+}
+
+int kernel_ident_mapping_init(struct x86_mapping_info *info, pgd_t *pgd_page,
+			      unsigned long addr, unsigned long end)
+{
+	unsigned long next;
+	int result;
+	int off = info->kernel_mapping ? pgd_index(__PAGE_OFFSET) : 0;
+
+	for (; addr < end; addr = next) {
+		pgd_t *pgd = pgd_page + pgd_index(addr) + off;
+		pud_t *pud;
+
+		next = (addr & PGDIR_MASK) + PGDIR_SIZE;
+		if (next > end)
+			next = end;
+
+		if (pgd_present(*pgd)) {
+			pud = pud_offset(pgd, 0);
+			result = ident_pud_init(info, pud, addr, next);
+			if (result)
+				return result;
+			continue;
+		}
+
+		pud = (pud_t *)info->alloc_pgt_page(info->context);
+		if (!pud)
+			return -ENOMEM;
+		result = ident_pud_init(info, pud, addr, next);
+		if (result)
+			return result;
+		set_pgd(pgd, __pgd(__pa(pud) | _KERNPG_TABLE));
+	}
+
+	return 0;
+}
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 214afda97911..65cfbeefbec4 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -58,79 +58,7 @@
 
 #include "mm_internal.h"
 
-static void ident_pmd_init(unsigned long pmd_flag, pmd_t *pmd_page,
-			   unsigned long addr, unsigned long end)
-{
-	addr &= PMD_MASK;
-	for (; addr < end; addr += PMD_SIZE) {
-		pmd_t *pmd = pmd_page + pmd_index(addr);
-
-		if (!pmd_present(*pmd))
-			set_pmd(pmd, __pmd(addr | pmd_flag));
-	}
-}
-static int ident_pud_init(struct x86_mapping_info *info, pud_t *pud_page,
-			  unsigned long addr, unsigned long end)
-{
-	unsigned long next;
-
-	for (; addr < end; addr = next) {
-		pud_t *pud = pud_page + pud_index(addr);
-		pmd_t *pmd;
-
-		next = (addr & PUD_MASK) + PUD_SIZE;
-		if (next > end)
-			next = end;
-
-		if (pud_present(*pud)) {
-			pmd = pmd_offset(pud, 0);
-			ident_pmd_init(info->pmd_flag, pmd, addr, next);
-			continue;
-		}
-		pmd = (pmd_t *)info->alloc_pgt_page(info->context);
-		if (!pmd)
-			return -ENOMEM;
-		ident_pmd_init(info->pmd_flag, pmd, addr, next);
-		set_pud(pud, __pud(__pa(pmd) | _KERNPG_TABLE));
-	}
-
-	return 0;
-}
-
-int kernel_ident_mapping_init(struct x86_mapping_info *info, pgd_t *pgd_page,
-			      unsigned long addr, unsigned long end)
-{
-	unsigned long next;
-	int result;
-	int off = info->kernel_mapping ? pgd_index(__PAGE_OFFSET) : 0;
-
-	for (; addr < end; addr = next) {
-		pgd_t *pgd = pgd_page + pgd_index(addr) + off;
-		pud_t *pud;
-
-		next = (addr & PGDIR_MASK) + PGDIR_SIZE;
-		if (next > end)
-			next = end;
-
-		if (pgd_present(*pgd)) {
-			pud = pud_offset(pgd, 0);
-			result = ident_pud_init(info, pud, addr, next);
-			if (result)
-				return result;
-			continue;
-		}
-
-		pud = (pud_t *)info->alloc_pgt_page(info->context);
-		if (!pud)
-			return -ENOMEM;
-		result = ident_pud_init(info, pud, addr, next);
-		if (result)
-			return result;
-		set_pgd(pgd, __pgd(__pa(pud) | _KERNPG_TABLE));
-	}
-
-	return 0;
-}
+#include "ident_map.c"
 
 /*
  * NOTE: pagetable_init alloc all the fixmap pagetables contiguous on the
-- 
2.6.3

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v5 12/21] x86, 64bit: Set ident_mapping for KASLR
  2016-04-14 22:28 ` [kernel-hardening] " Kees Cook
@ 2016-04-14 22:29   ` Kees Cook
  -1 siblings, 0 replies; 80+ messages in thread
From: Kees Cook @ 2016-04-14 22:29 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kees Cook, Yinghai Lu, Jiri Kosina, Borislav Petkov, Baoquan He,
	Ard Biesheuvel, Matt Redfearn, x86, H. Peter Anvin, Ingo Molnar,
	Borislav Petkov, Vivek Goyal, Andy Lutomirski, lasse.collin,
	Andrew Morton, Dave Young, kernel-hardening, LKML

From: Yinghai Lu <yinghai@kernel.org>

Currently KASLR only supports relocation in a small physical range (from
16M to 1G), due to using the initial kernel page table identify mapping.
To support ranges above this, we need to have an identity mapping for
the desired memory range before we can decompress (and later run) the
kernel.

This patch adds an identity mapping for needed memory range. This happens
in two possible boot paths:

If loaded via startup_32, we need to set up the identity mapping we need.

If loaded from a 64-bit bootloader, the bootloader will have
already set up an identity mapping, and we'll start via ZO's
(arch/x86/boot/compressed/vmlinux) startup_64. In this case, the
bootloader's page tables need to be avoided when we select the new VO
(vmlinux) location. If not, the decompressor could overwrite them during
decompression.

To accomplish avoiding the bootloader's page tables, we could walk the
pagetable and find every page that is used, and add them to mem_avoid,
but this needs extra code and will require increasing the size of the
mem_avoid array.

Instead, we can create new ident mapping instead, and pages for the
pagetable will come from the _pagetable section of ZO, which means they
are already in mem_avoid array. To do this, we reuse the code for the
kernel's identity mapping.

The _pgtable will be shared by 32-bit and 64-bit path to reduce init_size,
as now ZO _rodata to _end will contribute init_size.

To handle the possible mappings, we need to increase the pgt buffer size:

When booting via startup_64, as we need to cover the old VO, params,
cmdline and new VO. In an extreme case we could have them all beyond the
512G boundary, which needs (2+2)*4 pages with 2M mappings. And we'll
need 2 for first 2M for VGA RAM. One more is needed for level4. This
gets us to 19 pages total.

When booting via startup_32, KASLR could move new VO above 4G, so we
need to set extra identity mappings for new VO, which should only need
(2+2) pages at most when it is beyond the 512G boundary. So 19 pages
is sufficient for this case as well.

Cc: Kees Cook <keescook@chromium.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Borislav Petkov <bp@suse.de>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Baoquan He <bhe@redhat.com>
[kees: rewrote changelog]
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 arch/x86/boot/compressed/Makefile   |  3 ++
 arch/x86/boot/compressed/aslr.c     | 14 ++++++
 arch/x86/boot/compressed/head_64.S  |  4 +-
 arch/x86/boot/compressed/misc.h     | 11 +++++
 arch/x86/boot/compressed/misc_pgt.c | 93 +++++++++++++++++++++++++++++++++++++
 arch/x86/include/asm/boot.h         | 19 ++++++++
 6 files changed, 142 insertions(+), 2 deletions(-)
 create mode 100644 arch/x86/boot/compressed/misc_pgt.c

diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index aa421126556d..af9ea6f0baa2 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -63,6 +63,9 @@ vmlinux-objs-y := $(obj)/vmlinux.lds $(obj)/head_$(BITS).o $(obj)/misc.o \
 
 vmlinux-objs-$(CONFIG_EARLY_PRINTK) += $(obj)/early_serial_console.o
 vmlinux-objs-$(CONFIG_RANDOMIZE_BASE) += $(obj)/aslr.o
+ifdef CONFIG_X86_64
+	vmlinux-objs-$(CONFIG_RANDOMIZE_BASE) += $(obj)/misc_pgt.o
+endif
 
 $(obj)/eboot.o: KBUILD_CFLAGS += -fshort-wchar -mno-red-zone
 
diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index 277c7a4ec3a3..cd261debead9 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -194,6 +194,7 @@ static void mem_avoid_init(unsigned long input, unsigned long input_size,
 	 */
 	mem_avoid[0].start = input;
 	mem_avoid[0].size = (output + init_size) - input;
+	fill_pagetable(mem_avoid[0].start, mem_avoid[0].size);
 
 	/* Avoid initrd. */
 	initrd_start  = (u64)real_mode->ext_ramdisk_image << 32;
@@ -202,6 +203,7 @@ static void mem_avoid_init(unsigned long input, unsigned long input_size,
 	initrd_size |= real_mode->hdr.ramdisk_size;
 	mem_avoid[1].start = initrd_start;
 	mem_avoid[1].size = initrd_size;
+	/* No need to set mapping for initrd */
 
 	/* Avoid kernel command line. */
 	cmd_line  = (u64)real_mode->ext_cmd_line_ptr << 32;
@@ -212,10 +214,19 @@ static void mem_avoid_init(unsigned long input, unsigned long input_size,
 		;
 	mem_avoid[2].start = cmd_line;
 	mem_avoid[2].size = cmd_line_size;
+	fill_pagetable(mem_avoid[2].start, mem_avoid[2].size);
 
 	/* Avoid params */
 	mem_avoid[3].start = (unsigned long)real_mode;
 	mem_avoid[3].size = sizeof(*real_mode);
+	fill_pagetable(mem_avoid[3].start, mem_avoid[3].size);
+
+	/* don't need to set mapping for setup_data */
+
+#ifdef CONFIG_X86_VERBOSE_BOOTUP
+	/* for video ram */
+	fill_pagetable(0, PMD_SIZE);
+#endif
 }
 
 /* Does this memory vector overlap a known avoided area? */
@@ -371,6 +382,9 @@ unsigned char *choose_kernel_location(unsigned char *input,
 		goto out;
 
 	choice = random;
+
+	fill_pagetable(choice, output_size);
+	switch_pagetable();
 out:
 	return (unsigned char *)choice;
 }
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 36914515dd76..075bb15f9494 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -126,7 +126,7 @@ ENTRY(startup_32)
 	/* Initialize Page tables to 0 */
 	leal	pgtable(%ebx), %edi
 	xorl	%eax, %eax
-	movl	$((4096*6)/4), %ecx
+	movl	$(BOOT_INIT_PGT_SIZE/4), %ecx
 	rep	stosl
 
 	/* Build Level 4 */
@@ -478,4 +478,4 @@ boot_stack_end:
 	.section ".pgtable","a",@nobits
 	.balign 4096
 pgtable:
-	.fill 6*4096, 1, 0
+	.fill BOOT_PGT_SIZE, 1, 0
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index dcf01c22400e..11736a6a6670 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -84,6 +84,17 @@ unsigned char *choose_kernel_location(unsigned char *input,
 }
 #endif
 
+#ifdef CONFIG_X86_64
+void fill_pagetable(unsigned long start, unsigned long size);
+void switch_pagetable(void);
+extern unsigned char _pgtable[];
+#else
+static inline void fill_pagetable(unsigned long start, unsigned long size)
+{ }
+static inline void switch_pagetable(void)
+{ }
+#endif
+
 #ifdef CONFIG_EARLY_PRINTK
 /* early_serial_console.c */
 extern int early_serial_base;
diff --git a/arch/x86/boot/compressed/misc_pgt.c b/arch/x86/boot/compressed/misc_pgt.c
new file mode 100644
index 000000000000..816551d0327c
--- /dev/null
+++ b/arch/x86/boot/compressed/misc_pgt.c
@@ -0,0 +1,93 @@
+#define __pa(x)  ((unsigned long)(x))
+#define __va(x)  ((void *)((unsigned long)(x)))
+
+#include "misc.h"
+
+#include <asm/init.h>
+#include <asm/pgtable.h>
+
+#include "../../mm/ident_map.c"
+#include "../string.h"
+
+struct alloc_pgt_data {
+	unsigned char *pgt_buf;
+	unsigned long pgt_buf_size;
+	unsigned long pgt_buf_offset;
+};
+
+static void *alloc_pgt_page(void *context)
+{
+	struct alloc_pgt_data *d = (struct alloc_pgt_data *)context;
+	unsigned char *p = (unsigned char *)d->pgt_buf;
+
+	if (d->pgt_buf_offset >= d->pgt_buf_size) {
+		debug_putstr("out of pgt_buf in misc.c\n");
+		debug_putaddr(d->pgt_buf_offset);
+		debug_putaddr(d->pgt_buf_size);
+		return NULL;
+	}
+
+	p += d->pgt_buf_offset;
+	d->pgt_buf_offset += PAGE_SIZE;
+
+	return p;
+}
+
+/*
+ * Use a normal definition of memset() from string.c. There are already
+ * included header files which expect a definition of memset() and by
+ * the time we define memset macro, it is too late.
+ */
+#undef memset
+
+unsigned long __force_order;
+static struct alloc_pgt_data pgt_data;
+static struct x86_mapping_info mapping_info;
+static pgd_t *level4p;
+
+void fill_pagetable(unsigned long start, unsigned long size)
+{
+	unsigned long end = start + size;
+
+	if (!level4p) {
+		pgt_data.pgt_buf_offset = 0;
+		mapping_info.alloc_pgt_page = alloc_pgt_page;
+		mapping_info.context = &pgt_data;
+		mapping_info.pmd_flag = __PAGE_KERNEL_LARGE_EXEC;
+
+		/*
+		 * come from startup_32 ?
+		 * then cr3 is _pgtable, we can reuse it.
+		 */
+		level4p = (pgd_t *)read_cr3();
+		if ((unsigned long)level4p == (unsigned long)_pgtable) {
+			pgt_data.pgt_buf = (unsigned char *)_pgtable +
+						 BOOT_INIT_PGT_SIZE;
+			pgt_data.pgt_buf_size = BOOT_PGT_SIZE -
+						 BOOT_INIT_PGT_SIZE;
+			memset((unsigned char *)pgt_data.pgt_buf, 0,
+				pgt_data.pgt_buf_size);
+			debug_putstr("boot via startup_32\n");
+		} else {
+			pgt_data.pgt_buf = (unsigned char *)_pgtable;
+			pgt_data.pgt_buf_size = BOOT_PGT_SIZE;
+			memset((unsigned char *)pgt_data.pgt_buf, 0,
+				pgt_data.pgt_buf_size);
+			debug_putstr("boot via startup_64\n");
+			level4p = (pgd_t *)alloc_pgt_page(&pgt_data);
+		}
+	}
+
+	/* align boundary to 2M */
+	start = round_down(start, PMD_SIZE);
+	end = round_up(end, PMD_SIZE);
+	if (start >= end)
+		return;
+
+	kernel_ident_mapping_init(&mapping_info, level4p, start, end);
+}
+
+void switch_pagetable(void)
+{
+	write_cr3((unsigned long)level4p);
+}
diff --git a/arch/x86/include/asm/boot.h b/arch/x86/include/asm/boot.h
index 6b8d6e8cd449..52a9cbc1419f 100644
--- a/arch/x86/include/asm/boot.h
+++ b/arch/x86/include/asm/boot.h
@@ -32,7 +32,26 @@
 #endif /* !CONFIG_KERNEL_BZIP2 */
 
 #ifdef CONFIG_X86_64
+
 #define BOOT_STACK_SIZE	0x4000
+
+#define BOOT_INIT_PGT_SIZE (6*4096)
+#ifdef CONFIG_RANDOMIZE_BASE
+/*
+ * 1 page for level4, 2 pages for first 2M.
+ * (2+2)*4 pages for kernel, param, cmd_line, random kernel
+ * if all cross 512G boundary.
+ * So total will be 19 pages.
+ */
+#ifdef CONFIG_X86_VERBOSE_BOOTUP
+#define BOOT_PGT_SIZE (19*4096)
+#else
+#define BOOT_PGT_SIZE (17*4096)
+#endif
+#else
+#define BOOT_PGT_SIZE BOOT_INIT_PGT_SIZE
+#endif
+
 #else
 #define BOOT_STACK_SIZE	0x1000
 #endif
-- 
2.6.3

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [kernel-hardening] [PATCH v5 12/21] x86, 64bit: Set ident_mapping for KASLR
@ 2016-04-14 22:29   ` Kees Cook
  0 siblings, 0 replies; 80+ messages in thread
From: Kees Cook @ 2016-04-14 22:29 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kees Cook, Yinghai Lu, Jiri Kosina, Borislav Petkov, Baoquan He,
	Ard Biesheuvel, Matt Redfearn, x86, H. Peter Anvin, Ingo Molnar,
	Borislav Petkov, Vivek Goyal, Andy Lutomirski, lasse.collin,
	Andrew Morton, Dave Young, kernel-hardening, LKML

From: Yinghai Lu <yinghai@kernel.org>

Currently KASLR only supports relocation in a small physical range (from
16M to 1G), due to using the initial kernel page table identify mapping.
To support ranges above this, we need to have an identity mapping for
the desired memory range before we can decompress (and later run) the
kernel.

This patch adds an identity mapping for needed memory range. This happens
in two possible boot paths:

If loaded via startup_32, we need to set up the identity mapping we need.

If loaded from a 64-bit bootloader, the bootloader will have
already set up an identity mapping, and we'll start via ZO's
(arch/x86/boot/compressed/vmlinux) startup_64. In this case, the
bootloader's page tables need to be avoided when we select the new VO
(vmlinux) location. If not, the decompressor could overwrite them during
decompression.

To accomplish avoiding the bootloader's page tables, we could walk the
pagetable and find every page that is used, and add them to mem_avoid,
but this needs extra code and will require increasing the size of the
mem_avoid array.

Instead, we can create new ident mapping instead, and pages for the
pagetable will come from the _pagetable section of ZO, which means they
are already in mem_avoid array. To do this, we reuse the code for the
kernel's identity mapping.

The _pgtable will be shared by 32-bit and 64-bit path to reduce init_size,
as now ZO _rodata to _end will contribute init_size.

To handle the possible mappings, we need to increase the pgt buffer size:

When booting via startup_64, as we need to cover the old VO, params,
cmdline and new VO. In an extreme case we could have them all beyond the
512G boundary, which needs (2+2)*4 pages with 2M mappings. And we'll
need 2 for first 2M for VGA RAM. One more is needed for level4. This
gets us to 19 pages total.

When booting via startup_32, KASLR could move new VO above 4G, so we
need to set extra identity mappings for new VO, which should only need
(2+2) pages at most when it is beyond the 512G boundary. So 19 pages
is sufficient for this case as well.

Cc: Kees Cook <keescook@chromium.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Borislav Petkov <bp@suse.de>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Baoquan He <bhe@redhat.com>
[kees: rewrote changelog]
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 arch/x86/boot/compressed/Makefile   |  3 ++
 arch/x86/boot/compressed/aslr.c     | 14 ++++++
 arch/x86/boot/compressed/head_64.S  |  4 +-
 arch/x86/boot/compressed/misc.h     | 11 +++++
 arch/x86/boot/compressed/misc_pgt.c | 93 +++++++++++++++++++++++++++++++++++++
 arch/x86/include/asm/boot.h         | 19 ++++++++
 6 files changed, 142 insertions(+), 2 deletions(-)
 create mode 100644 arch/x86/boot/compressed/misc_pgt.c

diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index aa421126556d..af9ea6f0baa2 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -63,6 +63,9 @@ vmlinux-objs-y := $(obj)/vmlinux.lds $(obj)/head_$(BITS).o $(obj)/misc.o \
 
 vmlinux-objs-$(CONFIG_EARLY_PRINTK) += $(obj)/early_serial_console.o
 vmlinux-objs-$(CONFIG_RANDOMIZE_BASE) += $(obj)/aslr.o
+ifdef CONFIG_X86_64
+	vmlinux-objs-$(CONFIG_RANDOMIZE_BASE) += $(obj)/misc_pgt.o
+endif
 
 $(obj)/eboot.o: KBUILD_CFLAGS += -fshort-wchar -mno-red-zone
 
diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index 277c7a4ec3a3..cd261debead9 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -194,6 +194,7 @@ static void mem_avoid_init(unsigned long input, unsigned long input_size,
 	 */
 	mem_avoid[0].start = input;
 	mem_avoid[0].size = (output + init_size) - input;
+	fill_pagetable(mem_avoid[0].start, mem_avoid[0].size);
 
 	/* Avoid initrd. */
 	initrd_start  = (u64)real_mode->ext_ramdisk_image << 32;
@@ -202,6 +203,7 @@ static void mem_avoid_init(unsigned long input, unsigned long input_size,
 	initrd_size |= real_mode->hdr.ramdisk_size;
 	mem_avoid[1].start = initrd_start;
 	mem_avoid[1].size = initrd_size;
+	/* No need to set mapping for initrd */
 
 	/* Avoid kernel command line. */
 	cmd_line  = (u64)real_mode->ext_cmd_line_ptr << 32;
@@ -212,10 +214,19 @@ static void mem_avoid_init(unsigned long input, unsigned long input_size,
 		;
 	mem_avoid[2].start = cmd_line;
 	mem_avoid[2].size = cmd_line_size;
+	fill_pagetable(mem_avoid[2].start, mem_avoid[2].size);
 
 	/* Avoid params */
 	mem_avoid[3].start = (unsigned long)real_mode;
 	mem_avoid[3].size = sizeof(*real_mode);
+	fill_pagetable(mem_avoid[3].start, mem_avoid[3].size);
+
+	/* don't need to set mapping for setup_data */
+
+#ifdef CONFIG_X86_VERBOSE_BOOTUP
+	/* for video ram */
+	fill_pagetable(0, PMD_SIZE);
+#endif
 }
 
 /* Does this memory vector overlap a known avoided area? */
@@ -371,6 +382,9 @@ unsigned char *choose_kernel_location(unsigned char *input,
 		goto out;
 
 	choice = random;
+
+	fill_pagetable(choice, output_size);
+	switch_pagetable();
 out:
 	return (unsigned char *)choice;
 }
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 36914515dd76..075bb15f9494 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -126,7 +126,7 @@ ENTRY(startup_32)
 	/* Initialize Page tables to 0 */
 	leal	pgtable(%ebx), %edi
 	xorl	%eax, %eax
-	movl	$((4096*6)/4), %ecx
+	movl	$(BOOT_INIT_PGT_SIZE/4), %ecx
 	rep	stosl
 
 	/* Build Level 4 */
@@ -478,4 +478,4 @@ boot_stack_end:
 	.section ".pgtable","a",@nobits
 	.balign 4096
 pgtable:
-	.fill 6*4096, 1, 0
+	.fill BOOT_PGT_SIZE, 1, 0
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index dcf01c22400e..11736a6a6670 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -84,6 +84,17 @@ unsigned char *choose_kernel_location(unsigned char *input,
 }
 #endif
 
+#ifdef CONFIG_X86_64
+void fill_pagetable(unsigned long start, unsigned long size);
+void switch_pagetable(void);
+extern unsigned char _pgtable[];
+#else
+static inline void fill_pagetable(unsigned long start, unsigned long size)
+{ }
+static inline void switch_pagetable(void)
+{ }
+#endif
+
 #ifdef CONFIG_EARLY_PRINTK
 /* early_serial_console.c */
 extern int early_serial_base;
diff --git a/arch/x86/boot/compressed/misc_pgt.c b/arch/x86/boot/compressed/misc_pgt.c
new file mode 100644
index 000000000000..816551d0327c
--- /dev/null
+++ b/arch/x86/boot/compressed/misc_pgt.c
@@ -0,0 +1,93 @@
+#define __pa(x)  ((unsigned long)(x))
+#define __va(x)  ((void *)((unsigned long)(x)))
+
+#include "misc.h"
+
+#include <asm/init.h>
+#include <asm/pgtable.h>
+
+#include "../../mm/ident_map.c"
+#include "../string.h"
+
+struct alloc_pgt_data {
+	unsigned char *pgt_buf;
+	unsigned long pgt_buf_size;
+	unsigned long pgt_buf_offset;
+};
+
+static void *alloc_pgt_page(void *context)
+{
+	struct alloc_pgt_data *d = (struct alloc_pgt_data *)context;
+	unsigned char *p = (unsigned char *)d->pgt_buf;
+
+	if (d->pgt_buf_offset >= d->pgt_buf_size) {
+		debug_putstr("out of pgt_buf in misc.c\n");
+		debug_putaddr(d->pgt_buf_offset);
+		debug_putaddr(d->pgt_buf_size);
+		return NULL;
+	}
+
+	p += d->pgt_buf_offset;
+	d->pgt_buf_offset += PAGE_SIZE;
+
+	return p;
+}
+
+/*
+ * Use a normal definition of memset() from string.c. There are already
+ * included header files which expect a definition of memset() and by
+ * the time we define memset macro, it is too late.
+ */
+#undef memset
+
+unsigned long __force_order;
+static struct alloc_pgt_data pgt_data;
+static struct x86_mapping_info mapping_info;
+static pgd_t *level4p;
+
+void fill_pagetable(unsigned long start, unsigned long size)
+{
+	unsigned long end = start + size;
+
+	if (!level4p) {
+		pgt_data.pgt_buf_offset = 0;
+		mapping_info.alloc_pgt_page = alloc_pgt_page;
+		mapping_info.context = &pgt_data;
+		mapping_info.pmd_flag = __PAGE_KERNEL_LARGE_EXEC;
+
+		/*
+		 * come from startup_32 ?
+		 * then cr3 is _pgtable, we can reuse it.
+		 */
+		level4p = (pgd_t *)read_cr3();
+		if ((unsigned long)level4p == (unsigned long)_pgtable) {
+			pgt_data.pgt_buf = (unsigned char *)_pgtable +
+						 BOOT_INIT_PGT_SIZE;
+			pgt_data.pgt_buf_size = BOOT_PGT_SIZE -
+						 BOOT_INIT_PGT_SIZE;
+			memset((unsigned char *)pgt_data.pgt_buf, 0,
+				pgt_data.pgt_buf_size);
+			debug_putstr("boot via startup_32\n");
+		} else {
+			pgt_data.pgt_buf = (unsigned char *)_pgtable;
+			pgt_data.pgt_buf_size = BOOT_PGT_SIZE;
+			memset((unsigned char *)pgt_data.pgt_buf, 0,
+				pgt_data.pgt_buf_size);
+			debug_putstr("boot via startup_64\n");
+			level4p = (pgd_t *)alloc_pgt_page(&pgt_data);
+		}
+	}
+
+	/* align boundary to 2M */
+	start = round_down(start, PMD_SIZE);
+	end = round_up(end, PMD_SIZE);
+	if (start >= end)
+		return;
+
+	kernel_ident_mapping_init(&mapping_info, level4p, start, end);
+}
+
+void switch_pagetable(void)
+{
+	write_cr3((unsigned long)level4p);
+}
diff --git a/arch/x86/include/asm/boot.h b/arch/x86/include/asm/boot.h
index 6b8d6e8cd449..52a9cbc1419f 100644
--- a/arch/x86/include/asm/boot.h
+++ b/arch/x86/include/asm/boot.h
@@ -32,7 +32,26 @@
 #endif /* !CONFIG_KERNEL_BZIP2 */
 
 #ifdef CONFIG_X86_64
+
 #define BOOT_STACK_SIZE	0x4000
+
+#define BOOT_INIT_PGT_SIZE (6*4096)
+#ifdef CONFIG_RANDOMIZE_BASE
+/*
+ * 1 page for level4, 2 pages for first 2M.
+ * (2+2)*4 pages for kernel, param, cmd_line, random kernel
+ * if all cross 512G boundary.
+ * So total will be 19 pages.
+ */
+#ifdef CONFIG_X86_VERBOSE_BOOTUP
+#define BOOT_PGT_SIZE (19*4096)
+#else
+#define BOOT_PGT_SIZE (17*4096)
+#endif
+#else
+#define BOOT_PGT_SIZE BOOT_INIT_PGT_SIZE
+#endif
+
 #else
 #define BOOT_STACK_SIZE	0x1000
 #endif
-- 
2.6.3

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v5 13/21] x86, boot: Report overlap failures in memcpy
  2016-04-14 22:28 ` [kernel-hardening] " Kees Cook
@ 2016-04-14 22:29   ` Kees Cook
  -1 siblings, 0 replies; 80+ messages in thread
From: Kees Cook @ 2016-04-14 22:29 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kees Cook, Yinghai Lu, Baoquan He, Ard Biesheuvel, Matt Redfearn,
	x86, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Vivek Goyal,
	Andy Lutomirski, lasse.collin, Andrew Morton, Dave Young,
	kernel-hardening, LKML

From: Yinghai Lu <yinghai@kernel.org>

parse_elf is using a local memcpy to move sections to their final running
position. However, this memcpy only supports non-overlapping arguments
(or dest < src).

To avoid future hard-to-debug surprises, this adds checking in memcpy to
detect the unhandled condition (which should not be happening currently).

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Baoquan He <bhe@redhat.com>
[kees: rewrote changelog]
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 arch/x86/boot/compressed/misc.c   |  9 ++-------
 arch/x86/boot/compressed/misc.h   |  2 ++
 arch/x86/boot/compressed/string.c | 29 +++++++++++++++++++++++++++--
 3 files changed, 31 insertions(+), 9 deletions(-)

diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 562d647289ac..c47ac162d3bd 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -114,9 +114,6 @@
 #undef memset
 #define memzero(s, n)	memset((s), 0, (n))
 
-
-static void error(char *m);
-
 /*
  * This is set up by the setup-routine at boot-time
  */
@@ -243,7 +240,7 @@ void __puthex(unsigned long value)
 	}
 }
 
-static void error(char *x)
+void error(char *x)
 {
 	error_putstr("\n\n");
 	error_putstr(x);
@@ -378,9 +375,7 @@ static void parse_elf(void *output)
 #else
 			dest = (void *)(phdr->p_paddr);
 #endif
-			memcpy(dest,
-			       output + phdr->p_offset,
-			       phdr->p_filesz);
+			memcpy(dest, output + phdr->p_offset, phdr->p_filesz);
 			break;
 		default: /* Ignore other PT_* */ break;
 		}
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index 11736a6a6670..39d0e9a53736 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -38,6 +38,8 @@ void __puthex(unsigned long value);
 #define error_putstr(__x)  __putstr(__x)
 #define error_puthex(__x)  __puthex(__x)
 
+void error(char *x);
+
 #ifdef CONFIG_X86_VERBOSE_BOOTUP
 
 #define debug_putstr(__x)  __putstr(__x)
diff --git a/arch/x86/boot/compressed/string.c b/arch/x86/boot/compressed/string.c
index 00e788be1db9..3a935d0c82a8 100644
--- a/arch/x86/boot/compressed/string.c
+++ b/arch/x86/boot/compressed/string.c
@@ -1,7 +1,7 @@
 #include "../string.c"
 
 #ifdef CONFIG_X86_32
-void *memcpy(void *dest, const void *src, size_t n)
+void *__memcpy(void *dest, const void *src, size_t n)
 {
 	int d0, d1, d2;
 	asm volatile(
@@ -15,7 +15,7 @@ void *memcpy(void *dest, const void *src, size_t n)
 	return dest;
 }
 #else
-void *memcpy(void *dest, const void *src, size_t n)
+void *__memcpy(void *dest, const void *src, size_t n)
 {
 	long d0, d1, d2;
 	asm volatile(
@@ -30,6 +30,31 @@ void *memcpy(void *dest, const void *src, size_t n)
 }
 #endif
 
+extern void error(char *x);
+void *memcpy(void *dest, const void *src, size_t n)
+{
+	unsigned long start_dest, end_dest;
+	unsigned long start_src, end_src;
+	unsigned long max_start, min_end;
+
+	if (dest < src)
+		return __memcpy(dest, src, n);
+
+	start_dest = (unsigned long)dest;
+	end_dest = (unsigned long)dest + n;
+	start_src = (unsigned long)src;
+	end_src = (unsigned long)src + n;
+	max_start = (start_dest > start_src) ?  start_dest : start_src;
+	min_end = (end_dest < end_src) ? end_dest : end_src;
+
+	if (max_start >= min_end)
+		return __memcpy(dest, src, n);
+
+	error("memcpy does not support overlapping with dest > src!\n");
+
+	return dest;
+}
+
 void *memset(void *s, int c, size_t n)
 {
 	int i;
-- 
2.6.3

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [kernel-hardening] [PATCH v5 13/21] x86, boot: Report overlap failures in memcpy
@ 2016-04-14 22:29   ` Kees Cook
  0 siblings, 0 replies; 80+ messages in thread
From: Kees Cook @ 2016-04-14 22:29 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kees Cook, Yinghai Lu, Baoquan He, Ard Biesheuvel, Matt Redfearn,
	x86, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Vivek Goyal,
	Andy Lutomirski, lasse.collin, Andrew Morton, Dave Young,
	kernel-hardening, LKML

From: Yinghai Lu <yinghai@kernel.org>

parse_elf is using a local memcpy to move sections to their final running
position. However, this memcpy only supports non-overlapping arguments
(or dest < src).

To avoid future hard-to-debug surprises, this adds checking in memcpy to
detect the unhandled condition (which should not be happening currently).

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Baoquan He <bhe@redhat.com>
[kees: rewrote changelog]
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 arch/x86/boot/compressed/misc.c   |  9 ++-------
 arch/x86/boot/compressed/misc.h   |  2 ++
 arch/x86/boot/compressed/string.c | 29 +++++++++++++++++++++++++++--
 3 files changed, 31 insertions(+), 9 deletions(-)

diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 562d647289ac..c47ac162d3bd 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -114,9 +114,6 @@
 #undef memset
 #define memzero(s, n)	memset((s), 0, (n))
 
-
-static void error(char *m);
-
 /*
  * This is set up by the setup-routine at boot-time
  */
@@ -243,7 +240,7 @@ void __puthex(unsigned long value)
 	}
 }
 
-static void error(char *x)
+void error(char *x)
 {
 	error_putstr("\n\n");
 	error_putstr(x);
@@ -378,9 +375,7 @@ static void parse_elf(void *output)
 #else
 			dest = (void *)(phdr->p_paddr);
 #endif
-			memcpy(dest,
-			       output + phdr->p_offset,
-			       phdr->p_filesz);
+			memcpy(dest, output + phdr->p_offset, phdr->p_filesz);
 			break;
 		default: /* Ignore other PT_* */ break;
 		}
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index 11736a6a6670..39d0e9a53736 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -38,6 +38,8 @@ void __puthex(unsigned long value);
 #define error_putstr(__x)  __putstr(__x)
 #define error_puthex(__x)  __puthex(__x)
 
+void error(char *x);
+
 #ifdef CONFIG_X86_VERBOSE_BOOTUP
 
 #define debug_putstr(__x)  __putstr(__x)
diff --git a/arch/x86/boot/compressed/string.c b/arch/x86/boot/compressed/string.c
index 00e788be1db9..3a935d0c82a8 100644
--- a/arch/x86/boot/compressed/string.c
+++ b/arch/x86/boot/compressed/string.c
@@ -1,7 +1,7 @@
 #include "../string.c"
 
 #ifdef CONFIG_X86_32
-void *memcpy(void *dest, const void *src, size_t n)
+void *__memcpy(void *dest, const void *src, size_t n)
 {
 	int d0, d1, d2;
 	asm volatile(
@@ -15,7 +15,7 @@ void *memcpy(void *dest, const void *src, size_t n)
 	return dest;
 }
 #else
-void *memcpy(void *dest, const void *src, size_t n)
+void *__memcpy(void *dest, const void *src, size_t n)
 {
 	long d0, d1, d2;
 	asm volatile(
@@ -30,6 +30,31 @@ void *memcpy(void *dest, const void *src, size_t n)
 }
 #endif
 
+extern void error(char *x);
+void *memcpy(void *dest, const void *src, size_t n)
+{
+	unsigned long start_dest, end_dest;
+	unsigned long start_src, end_src;
+	unsigned long max_start, min_end;
+
+	if (dest < src)
+		return __memcpy(dest, src, n);
+
+	start_dest = (unsigned long)dest;
+	end_dest = (unsigned long)dest + n;
+	start_src = (unsigned long)src;
+	end_src = (unsigned long)src + n;
+	max_start = (start_dest > start_src) ?  start_dest : start_src;
+	min_end = (end_dest < end_src) ? end_dest : end_src;
+
+	if (max_start >= min_end)
+		return __memcpy(dest, src, n);
+
+	error("memcpy does not support overlapping with dest > src!\n");
+
+	return dest;
+}
+
 void *memset(void *s, int c, size_t n)
 {
 	int i;
-- 
2.6.3

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v5 14/21] x86, KASLR: Add slot_area to manage random slots
  2016-04-14 22:28 ` [kernel-hardening] " Kees Cook
@ 2016-04-14 22:29   ` Kees Cook
  -1 siblings, 0 replies; 80+ messages in thread
From: Kees Cook @ 2016-04-14 22:29 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kees Cook, Baoquan He, Yinghai Lu, Ard Biesheuvel, Matt Redfearn,
	x86, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Vivek Goyal,
	Andy Lutomirski, lasse.collin, Andrew Morton, Dave Young,
	kernel-hardening, LKML

From: Baoquan He <bhe@redhat.com>

In order to support KASLR moving the kernel anywhere in the whole
of physical memory (could be 64T), we need to handle counting the
potential randomization locations in a more efficient manner.

In the worst case with 64T, there could be roughly 32 * 1024 * 1024
randomization slots if CONFIG_PHYSICAL_ALIGN is 0x1000000. Currently
the starting address of candidate positions is stored into the slots[]
array, one at a time.  With this method will cost too much memory and
it's also very inefficient to get and save the slot information into
slot array one by one.

This patch introduces struct slot_area to manage each contiguous region
of randomization slots. Each slot_area will contain the starting address
and how many available slots are in this area. As with the original code,
the slot_areas will be avoid the mem_avoid regions, since those areas
should be be used for the relocated position.

Since setup_data is a linked list, it could contain an unknown number
of memory regions to be avoided, which could cause us to fragment
the contiguous memory that the slot_area array is tracking. In normal
operation this fragmentation will be extremely unlikely, but choose
a suitably large value (100) for the array. If there are more slots
available beyond 100, they will be ignored for KASLR selection.

Signed-off-by: Baoquan He <bhe@redhat.com>
[kees: rewrote changelog]
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 arch/x86/boot/compressed/aslr.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index cd261debead9..abe618d489ea 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -258,8 +258,20 @@ static bool mem_avoid_overlap(struct mem_vector *img)
 }
 
 static unsigned long slots[KERNEL_IMAGE_SIZE / CONFIG_PHYSICAL_ALIGN];
+
+struct slot_area {
+	unsigned long addr;
+	int num;
+};
+
+#define MAX_SLOT_AREA 100
+
+static struct slot_area slot_areas[MAX_SLOT_AREA];
+
 static unsigned long slot_max;
 
+static unsigned long slot_area_index;
+
 static void slots_append(unsigned long addr)
 {
 	/* Overflowing the slots list should be impossible. */
-- 
2.6.3

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [kernel-hardening] [PATCH v5 14/21] x86, KASLR: Add slot_area to manage random slots
@ 2016-04-14 22:29   ` Kees Cook
  0 siblings, 0 replies; 80+ messages in thread
From: Kees Cook @ 2016-04-14 22:29 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kees Cook, Baoquan He, Yinghai Lu, Ard Biesheuvel, Matt Redfearn,
	x86, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Vivek Goyal,
	Andy Lutomirski, lasse.collin, Andrew Morton, Dave Young,
	kernel-hardening, LKML

From: Baoquan He <bhe@redhat.com>

In order to support KASLR moving the kernel anywhere in the whole
of physical memory (could be 64T), we need to handle counting the
potential randomization locations in a more efficient manner.

In the worst case with 64T, there could be roughly 32 * 1024 * 1024
randomization slots if CONFIG_PHYSICAL_ALIGN is 0x1000000. Currently
the starting address of candidate positions is stored into the slots[]
array, one at a time.  With this method will cost too much memory and
it's also very inefficient to get and save the slot information into
slot array one by one.

This patch introduces struct slot_area to manage each contiguous region
of randomization slots. Each slot_area will contain the starting address
and how many available slots are in this area. As with the original code,
the slot_areas will be avoid the mem_avoid regions, since those areas
should be be used for the relocated position.

Since setup_data is a linked list, it could contain an unknown number
of memory regions to be avoided, which could cause us to fragment
the contiguous memory that the slot_area array is tracking. In normal
operation this fragmentation will be extremely unlikely, but choose
a suitably large value (100) for the array. If there are more slots
available beyond 100, they will be ignored for KASLR selection.

Signed-off-by: Baoquan He <bhe@redhat.com>
[kees: rewrote changelog]
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 arch/x86/boot/compressed/aslr.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index cd261debead9..abe618d489ea 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -258,8 +258,20 @@ static bool mem_avoid_overlap(struct mem_vector *img)
 }
 
 static unsigned long slots[KERNEL_IMAGE_SIZE / CONFIG_PHYSICAL_ALIGN];
+
+struct slot_area {
+	unsigned long addr;
+	int num;
+};
+
+#define MAX_SLOT_AREA 100
+
+static struct slot_area slot_areas[MAX_SLOT_AREA];
+
 static unsigned long slot_max;
 
+static unsigned long slot_area_index;
+
 static void slots_append(unsigned long addr)
 {
 	/* Overflowing the slots list should be impossible. */
-- 
2.6.3

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v5 15/21] x86, KASLR: Add slot_area support functions
  2016-04-14 22:28 ` [kernel-hardening] " Kees Cook
@ 2016-04-14 22:29   ` Kees Cook
  -1 siblings, 0 replies; 80+ messages in thread
From: Kees Cook @ 2016-04-14 22:29 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kees Cook, Baoquan He, Yinghai Lu, Ard Biesheuvel, Matt Redfearn,
	x86, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Vivek Goyal,
	Andy Lutomirski, lasse.collin, Andrew Morton, Dave Young,
	kernel-hardening, LKML

From: Baoquan He <bhe@redhat.com>

The function store_slot_info() is used to calculate the slot info of the
passed-in memory region and stores it into slot_areas[] after adjusting
for alignment and size requirements.

The function mem_min_overlap() is used to iterate over all mem_avoid
regions to find the earliest mem_avoid address that conflicts with the
given memory region. (For example, with the region [1024M, 2048M), if
there is a mem_avoid of [1536M, 1664M), this returns 1536M.) This can
be used to split memory regions when building the slot_area array.

Signed-off-by: Baoquan He <bhe@redhat.com>
[kees: rewrote changelog]
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 arch/x86/boot/compressed/aslr.c | 51 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 51 insertions(+)

diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index abe618d489ea..b06618000732 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -257,6 +257,40 @@ static bool mem_avoid_overlap(struct mem_vector *img)
 	return false;
 }
 
+static unsigned long
+mem_min_overlap(struct mem_vector *img, struct mem_vector *out)
+{
+	int i;
+	struct setup_data *ptr;
+	unsigned long min = img->start + img->size;
+
+	for (i = 0; i < MEM_AVOID_MAX; i++) {
+		if (mem_overlaps(img, &mem_avoid[i]) &&
+			(mem_avoid[i].start < min)) {
+			*out = mem_avoid[i];
+			min = mem_avoid[i].start;
+		}
+	}
+
+	/* Check all entries in the setup_data linked list. */
+	ptr = (struct setup_data *)(unsigned long)real_mode->hdr.setup_data;
+	while (ptr) {
+		struct mem_vector avoid;
+
+		avoid.start = (unsigned long)ptr;
+		avoid.size = sizeof(*ptr) + ptr->len;
+
+		if (mem_overlaps(img, &avoid) && (avoid.start < min)) {
+			*out = avoid;
+			min = avoid.start;
+		}
+
+		ptr = (struct setup_data *)(unsigned long)ptr->next;
+	}
+
+	return min;
+}
+
 static unsigned long slots[KERNEL_IMAGE_SIZE / CONFIG_PHYSICAL_ALIGN];
 
 struct slot_area {
@@ -272,6 +306,23 @@ static unsigned long slot_max;
 
 static unsigned long slot_area_index;
 
+static void store_slot_info(struct mem_vector *region, unsigned long image_size)
+{
+	struct slot_area slot_area;
+
+	slot_area.addr = region->start;
+	if (image_size <= CONFIG_PHYSICAL_ALIGN)
+		slot_area.num = region->size / CONFIG_PHYSICAL_ALIGN;
+	else
+		slot_area.num = (region->size - image_size) /
+				CONFIG_PHYSICAL_ALIGN + 1;
+
+	if (slot_area.num > 0) {
+		slot_areas[slot_area_index++] = slot_area;
+		slot_max += slot_area.num;
+	}
+}
+
 static void slots_append(unsigned long addr)
 {
 	/* Overflowing the slots list should be impossible. */
-- 
2.6.3

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [kernel-hardening] [PATCH v5 15/21] x86, KASLR: Add slot_area support functions
@ 2016-04-14 22:29   ` Kees Cook
  0 siblings, 0 replies; 80+ messages in thread
From: Kees Cook @ 2016-04-14 22:29 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kees Cook, Baoquan He, Yinghai Lu, Ard Biesheuvel, Matt Redfearn,
	x86, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Vivek Goyal,
	Andy Lutomirski, lasse.collin, Andrew Morton, Dave Young,
	kernel-hardening, LKML

From: Baoquan He <bhe@redhat.com>

The function store_slot_info() is used to calculate the slot info of the
passed-in memory region and stores it into slot_areas[] after adjusting
for alignment and size requirements.

The function mem_min_overlap() is used to iterate over all mem_avoid
regions to find the earliest mem_avoid address that conflicts with the
given memory region. (For example, with the region [1024M, 2048M), if
there is a mem_avoid of [1536M, 1664M), this returns 1536M.) This can
be used to split memory regions when building the slot_area array.

Signed-off-by: Baoquan He <bhe@redhat.com>
[kees: rewrote changelog]
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 arch/x86/boot/compressed/aslr.c | 51 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 51 insertions(+)

diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index abe618d489ea..b06618000732 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -257,6 +257,40 @@ static bool mem_avoid_overlap(struct mem_vector *img)
 	return false;
 }
 
+static unsigned long
+mem_min_overlap(struct mem_vector *img, struct mem_vector *out)
+{
+	int i;
+	struct setup_data *ptr;
+	unsigned long min = img->start + img->size;
+
+	for (i = 0; i < MEM_AVOID_MAX; i++) {
+		if (mem_overlaps(img, &mem_avoid[i]) &&
+			(mem_avoid[i].start < min)) {
+			*out = mem_avoid[i];
+			min = mem_avoid[i].start;
+		}
+	}
+
+	/* Check all entries in the setup_data linked list. */
+	ptr = (struct setup_data *)(unsigned long)real_mode->hdr.setup_data;
+	while (ptr) {
+		struct mem_vector avoid;
+
+		avoid.start = (unsigned long)ptr;
+		avoid.size = sizeof(*ptr) + ptr->len;
+
+		if (mem_overlaps(img, &avoid) && (avoid.start < min)) {
+			*out = avoid;
+			min = avoid.start;
+		}
+
+		ptr = (struct setup_data *)(unsigned long)ptr->next;
+	}
+
+	return min;
+}
+
 static unsigned long slots[KERNEL_IMAGE_SIZE / CONFIG_PHYSICAL_ALIGN];
 
 struct slot_area {
@@ -272,6 +306,23 @@ static unsigned long slot_max;
 
 static unsigned long slot_area_index;
 
+static void store_slot_info(struct mem_vector *region, unsigned long image_size)
+{
+	struct slot_area slot_area;
+
+	slot_area.addr = region->start;
+	if (image_size <= CONFIG_PHYSICAL_ALIGN)
+		slot_area.num = region->size / CONFIG_PHYSICAL_ALIGN;
+	else
+		slot_area.num = (region->size - image_size) /
+				CONFIG_PHYSICAL_ALIGN + 1;
+
+	if (slot_area.num > 0) {
+		slot_areas[slot_area_index++] = slot_area;
+		slot_max += slot_area.num;
+	}
+}
+
 static void slots_append(unsigned long addr)
 {
 	/* Overflowing the slots list should be impossible. */
-- 
2.6.3

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v5 16/21] x86, KASLR: Add virtual address choosing function
  2016-04-14 22:28 ` [kernel-hardening] " Kees Cook
@ 2016-04-14 22:29   ` Kees Cook
  -1 siblings, 0 replies; 80+ messages in thread
From: Kees Cook @ 2016-04-14 22:29 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kees Cook, Baoquan He, Yinghai Lu, Ard Biesheuvel, Matt Redfearn,
	x86, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Vivek Goyal,
	Andy Lutomirski, lasse.collin, Andrew Morton, Dave Young,
	kernel-hardening, LKML

From: Baoquan He <bhe@redhat.com>

To support randomizing the kernel virtual address separately from the
physical address, this patch adds find_random_virt_offset() to choose
a slot anywhere between LOAD_PHYSICAL_ADDR and KERNEL_IMAGE_SIZE.
Since this address is virtual, not physical, we can place the kernel
anywhere in this region, as long as it is aligned and (in the case of
kernel being larger than the slot size) placed with enough room to load
the entire kernel image.

Signed-off-by: Baoquan He <bhe@redhat.com>
[kees: rewrote changelog, refactor slot calculation for readability]
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 arch/x86/boot/compressed/aslr.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index b06618000732..b527ff10372a 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -407,6 +407,29 @@ static unsigned long find_random_addr(unsigned long minimum,
 	return slots_fetch_random();
 }
 
+static unsigned long find_random_virt_offset(unsigned long minimum,
+				  unsigned long image_size)
+{
+	unsigned long slots, random;
+
+	/* Make sure minimum is aligned. */
+	minimum = ALIGN(minimum, CONFIG_PHYSICAL_ALIGN);
+	/* Align image_size for easy slot calculations. */
+	image_size = ALIGN(image_size, CONFIG_PHYSICAL_ALIGN);
+
+	/*
+	 * There are how many CONFIG_PHYSICAL_ALIGN-sized slots
+	 * that can hold image_size within the range of minimum to
+	 * KERNEL_IMAGE_SIZE?
+	 */
+	slots = (KERNEL_IMAGE_SIZE - minimum - image_size) /
+		 CONFIG_PHYSICAL_ALIGN + 1;
+
+	random = get_random_long() % slots;
+
+	return random * CONFIG_PHYSICAL_ALIGN + minimum;
+}
+
 unsigned char *choose_kernel_location(unsigned char *input,
 				      unsigned long input_size,
 				      unsigned char *output,
-- 
2.6.3

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [kernel-hardening] [PATCH v5 16/21] x86, KASLR: Add virtual address choosing function
@ 2016-04-14 22:29   ` Kees Cook
  0 siblings, 0 replies; 80+ messages in thread
From: Kees Cook @ 2016-04-14 22:29 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kees Cook, Baoquan He, Yinghai Lu, Ard Biesheuvel, Matt Redfearn,
	x86, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Vivek Goyal,
	Andy Lutomirski, lasse.collin, Andrew Morton, Dave Young,
	kernel-hardening, LKML

From: Baoquan He <bhe@redhat.com>

To support randomizing the kernel virtual address separately from the
physical address, this patch adds find_random_virt_offset() to choose
a slot anywhere between LOAD_PHYSICAL_ADDR and KERNEL_IMAGE_SIZE.
Since this address is virtual, not physical, we can place the kernel
anywhere in this region, as long as it is aligned and (in the case of
kernel being larger than the slot size) placed with enough room to load
the entire kernel image.

Signed-off-by: Baoquan He <bhe@redhat.com>
[kees: rewrote changelog, refactor slot calculation for readability]
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 arch/x86/boot/compressed/aslr.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index b06618000732..b527ff10372a 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -407,6 +407,29 @@ static unsigned long find_random_addr(unsigned long minimum,
 	return slots_fetch_random();
 }
 
+static unsigned long find_random_virt_offset(unsigned long minimum,
+				  unsigned long image_size)
+{
+	unsigned long slots, random;
+
+	/* Make sure minimum is aligned. */
+	minimum = ALIGN(minimum, CONFIG_PHYSICAL_ALIGN);
+	/* Align image_size for easy slot calculations. */
+	image_size = ALIGN(image_size, CONFIG_PHYSICAL_ALIGN);
+
+	/*
+	 * There are how many CONFIG_PHYSICAL_ALIGN-sized slots
+	 * that can hold image_size within the range of minimum to
+	 * KERNEL_IMAGE_SIZE?
+	 */
+	slots = (KERNEL_IMAGE_SIZE - minimum - image_size) /
+		 CONFIG_PHYSICAL_ALIGN + 1;
+
+	random = get_random_long() % slots;
+
+	return random * CONFIG_PHYSICAL_ALIGN + minimum;
+}
+
 unsigned char *choose_kernel_location(unsigned char *input,
 				      unsigned long input_size,
 				      unsigned char *output,
-- 
2.6.3

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v5 17/21] x86, KASLR: Clarify purpose of each get_random_long
  2016-04-14 22:28 ` [kernel-hardening] " Kees Cook
@ 2016-04-14 22:29   ` Kees Cook
  -1 siblings, 0 replies; 80+ messages in thread
From: Kees Cook @ 2016-04-14 22:29 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kees Cook, Baoquan He, Yinghai Lu, Ard Biesheuvel, Matt Redfearn,
	x86, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Vivek Goyal,
	Andy Lutomirski, lasse.collin, Andrew Morton, Dave Young,
	kernel-hardening, LKML

KASLR will be calling get_random_long twice, but the debug output won't
distinguishing between them. This patch adds a report on when it is
fetching the physical vs virtual address offset. With this, once the
virtual offset is separate, the report changes:

Before:
KASLR using RDTSC...
KASLR using RDTSC...

After:
Physical KASLR using RDTSC...
Virtual KASLR using RDTSC...

Signed-off-by: Kees Cook <keescook@chromium.org>
---
 arch/x86/boot/compressed/aslr.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index b527ff10372a..c58215d5a80d 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -60,7 +60,7 @@ static unsigned long get_random_boot(void)
 	return hash;
 }
 
-static unsigned long get_random_long(void)
+static unsigned long get_random_long(const char *purpose)
 {
 #ifdef CONFIG_X86_64
 	const unsigned long mix_const = 0x5d6008cbf3848dd3UL;
@@ -70,7 +70,8 @@ static unsigned long get_random_long(void)
 	unsigned long raw, random = get_random_boot();
 	bool use_i8254 = true;
 
-	debug_putstr("KASLR using");
+	debug_putstr(purpose);
+	debug_putstr(" KASLR using");
 
 	if (has_cpuflag(X86_FEATURE_RDRAND)) {
 		debug_putstr(" RDRAND");
@@ -338,7 +339,7 @@ static unsigned long slots_fetch_random(void)
 	if (slot_max == 0)
 		return 0;
 
-	return slots[get_random_long() % slot_max];
+	return slots[get_random_long("Physical") % slot_max];
 }
 
 static void process_e820_entry(struct e820entry *entry,
@@ -425,7 +426,7 @@ static unsigned long find_random_virt_offset(unsigned long minimum,
 	slots = (KERNEL_IMAGE_SIZE - minimum - image_size) /
 		 CONFIG_PHYSICAL_ALIGN + 1;
 
-	random = get_random_long() % slots;
+	random = get_random_long("Virtual") % slots;
 
 	return random * CONFIG_PHYSICAL_ALIGN + minimum;
 }
-- 
2.6.3

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [kernel-hardening] [PATCH v5 17/21] x86, KASLR: Clarify purpose of each get_random_long
@ 2016-04-14 22:29   ` Kees Cook
  0 siblings, 0 replies; 80+ messages in thread
From: Kees Cook @ 2016-04-14 22:29 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kees Cook, Baoquan He, Yinghai Lu, Ard Biesheuvel, Matt Redfearn,
	x86, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Vivek Goyal,
	Andy Lutomirski, lasse.collin, Andrew Morton, Dave Young,
	kernel-hardening, LKML

KASLR will be calling get_random_long twice, but the debug output won't
distinguishing between them. This patch adds a report on when it is
fetching the physical vs virtual address offset. With this, once the
virtual offset is separate, the report changes:

Before:
KASLR using RDTSC...
KASLR using RDTSC...

After:
Physical KASLR using RDTSC...
Virtual KASLR using RDTSC...

Signed-off-by: Kees Cook <keescook@chromium.org>
---
 arch/x86/boot/compressed/aslr.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index b527ff10372a..c58215d5a80d 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -60,7 +60,7 @@ static unsigned long get_random_boot(void)
 	return hash;
 }
 
-static unsigned long get_random_long(void)
+static unsigned long get_random_long(const char *purpose)
 {
 #ifdef CONFIG_X86_64
 	const unsigned long mix_const = 0x5d6008cbf3848dd3UL;
@@ -70,7 +70,8 @@ static unsigned long get_random_long(void)
 	unsigned long raw, random = get_random_boot();
 	bool use_i8254 = true;
 
-	debug_putstr("KASLR using");
+	debug_putstr(purpose);
+	debug_putstr(" KASLR using");
 
 	if (has_cpuflag(X86_FEATURE_RDRAND)) {
 		debug_putstr(" RDRAND");
@@ -338,7 +339,7 @@ static unsigned long slots_fetch_random(void)
 	if (slot_max == 0)
 		return 0;
 
-	return slots[get_random_long() % slot_max];
+	return slots[get_random_long("Physical") % slot_max];
 }
 
 static void process_e820_entry(struct e820entry *entry,
@@ -425,7 +426,7 @@ static unsigned long find_random_virt_offset(unsigned long minimum,
 	slots = (KERNEL_IMAGE_SIZE - minimum - image_size) /
 		 CONFIG_PHYSICAL_ALIGN + 1;
 
-	random = get_random_long() % slots;
+	random = get_random_long("Virtual") % slots;
 
 	return random * CONFIG_PHYSICAL_ALIGN + minimum;
 }
-- 
2.6.3

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v5 18/21] x86, KASLR: Randomize virtual address separately
  2016-04-14 22:28 ` [kernel-hardening] " Kees Cook
@ 2016-04-14 22:29   ` Kees Cook
  -1 siblings, 0 replies; 80+ messages in thread
From: Kees Cook @ 2016-04-14 22:29 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kees Cook, Baoquan He, Yinghai Lu, Ard Biesheuvel, Matt Redfearn,
	x86, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Vivek Goyal,
	Andy Lutomirski, lasse.collin, Andrew Morton, Dave Young,
	kernel-hardening, LKML

From: Baoquan He <bhe@redhat.com>

The current KASLR implementation randomizes the physical and virtual
address of kernel together (both are offset by the same amount). It
calculates the delta of the physical address where vmlinux was linked
to load and where it is finally loaded. If the delta is not equal to 0
(i.e. the kernel was relocated), relocation handling needs be done.

On 64-bit, this patch randomizes both the physical address where kernel
is decompressed and the virtual address where kernel text is mapped and
will execute from. We now have two values being chosen, so the function
arguments are reorganized to pass by pointer so they can be directly
updated. Since relocation handling only depends on the virtual address,
we must check the virtual delta, not the physical delta for processing
kernel relocations. This also populates the page table for the new
virtual address range. 32-bit does not support a separate virtual
address, so it continues to use the physical offset for its virtual
offset.

Signed-off-by: Baoquan He <bhe@redhat.com>
[kees: rewrote changelog, limited virtual split to 64-bit only]
[kees: fix CONFIG_RANDOMIZE_BASE=n boot failure]
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 arch/x86/boot/compressed/aslr.c | 45 +++++++++++++++++++++--------------------
 arch/x86/boot/compressed/misc.c | 39 ++++++++++++++++++++---------------
 arch/x86/boot/compressed/misc.h | 20 +++++++++---------
 3 files changed, 57 insertions(+), 47 deletions(-)

diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index c58215d5a80d..53ceaa0a08b9 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -391,7 +391,7 @@ static void process_e820_entry(struct e820entry *entry,
 	}
 }
 
-static unsigned long find_random_addr(unsigned long minimum,
+static unsigned long find_random_phy_addr(unsigned long minimum,
 				      unsigned long size)
 {
 	int i;
@@ -431,23 +431,24 @@ static unsigned long find_random_virt_offset(unsigned long minimum,
 	return random * CONFIG_PHYSICAL_ALIGN + minimum;
 }
 
-unsigned char *choose_kernel_location(unsigned char *input,
-				      unsigned long input_size,
-				      unsigned char *output,
-				      unsigned long output_size)
+void choose_kernel_location(unsigned char *input,
+				unsigned long input_size,
+				unsigned char **output,
+				unsigned long output_size,
+				unsigned char **virt_offset)
 {
-	unsigned long choice = (unsigned long)output;
 	unsigned long random;
+	*virt_offset = (unsigned char *)LOAD_PHYSICAL_ADDR;
 
 #ifdef CONFIG_HIBERNATION
 	if (!cmdline_find_option_bool("kaslr")) {
 		debug_putstr("KASLR disabled by default...\n");
-		goto out;
+		return;
 	}
 #else
 	if (cmdline_find_option_bool("nokaslr")) {
 		debug_putstr("KASLR disabled by cmdline...\n");
-		goto out;
+		return;
 	}
 #endif
 
@@ -455,23 +456,23 @@ unsigned char *choose_kernel_location(unsigned char *input,
 
 	/* Record the various known unsafe memory ranges. */
 	mem_avoid_init((unsigned long)input, input_size,
-		       (unsigned long)output);
+		       (unsigned long)*output);
 
 	/* Walk e820 and find a random address. */
-	random = find_random_addr(choice, output_size);
-	if (!random) {
+	random = find_random_phy_addr((unsigned long)*output, output_size);
+	if (!random)
 		debug_putstr("KASLR could not find suitable E820 region...\n");
-		goto out;
+	else {
+		if ((unsigned long)*output != random) {
+			fill_pagetable(random, output_size);
+			switch_pagetable();
+			*output = (unsigned char *)random;
+		}
 	}
 
-	/* Always enforce the minimum. */
-	if (random < choice)
-		goto out;
-
-	choice = random;
-
-	fill_pagetable(choice, output_size);
-	switch_pagetable();
-out:
-	return (unsigned char *)choice;
+	/* Pick random virtual address starting from LOAD_PHYSICAL_ADDR. */
+	if (IS_ENABLED(CONFIG_X86_64))
+		random = find_random_virt_offset(LOAD_PHYSICAL_ADDR,
+						 output_size);
+	*virt_offset = (unsigned char *)random;
 }
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index c47ac162d3bd..3d5b0eda0711 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -251,7 +251,8 @@ void error(char *x)
 }
 
 #if CONFIG_X86_NEED_RELOCS
-static void handle_relocations(void *output, unsigned long output_len)
+static void handle_relocations(void *output, unsigned long output_len,
+			       void *virt_offset)
 {
 	int *reloc;
 	unsigned long delta, map, ptr;
@@ -263,11 +264,6 @@ static void handle_relocations(void *output, unsigned long output_len)
 	 * and where it was actually loaded.
 	 */
 	delta = min_addr - LOAD_PHYSICAL_ADDR;
-	if (!delta) {
-		debug_putstr("No relocation needed... ");
-		return;
-	}
-	debug_putstr("Performing relocations... ");
 
 	/*
 	 * The kernel contains a table of relocation addresses. Those
@@ -279,6 +275,20 @@ static void handle_relocations(void *output, unsigned long output_len)
 	map = delta - __START_KERNEL_map;
 
 	/*
+	 * 32-bit always performs relocations. 64-bit relocations are only
+	 * needed if KASLR has chosen a different starting address offset
+	 * from __START_KERNEL_map.
+	 */
+	if (IS_ENABLED(CONFIG_X86_64))
+		delta = (unsigned long)virt_offset - LOAD_PHYSICAL_ADDR;
+
+	if (!delta) {
+		debug_putstr("No relocation needed... ");
+		return;
+	}
+	debug_putstr("Performing relocations... ");
+
+	/*
 	 * Process relocations: 32 bit relocations first then 64 bit after.
 	 * Three sets of binary relocations are added to the end of the kernel
 	 * before compression. Each relocation table entry is the kernel
@@ -331,7 +341,8 @@ static void handle_relocations(void *output, unsigned long output_len)
 #endif
 }
 #else
-static inline void handle_relocations(void *output, unsigned long output_len)
+static inline void handle_relocations(void *output, unsigned long output_len,
+				      void *virt_offset)
 { }
 #endif
 
@@ -391,7 +402,7 @@ asmlinkage __visible void *decompress_kernel(void *rmode, memptr heap,
 				  unsigned long output_len)
 {
 	const unsigned long run_size = VO__end - VO__text;
-	unsigned char *output_orig = output;
+	unsigned char *virt_offset;
 
 	real_mode = rmode;
 
@@ -429,9 +440,10 @@ asmlinkage __visible void *decompress_kernel(void *rmode, memptr heap,
 	 * the entire decompressed kernel plus relocation table, or the
 	 * entire decompressed kernel plus .bss and .brk sections.
 	 */
-	output = choose_kernel_location(input_data, input_len, output,
+	choose_kernel_location(input_data, input_len, &output,
 					output_len > run_size ? output_len
-							      : run_size);
+							      : run_size,
+					&virt_offset);
 
 	/* Validate memory location choices. */
 	if ((unsigned long)output & (MIN_KERNEL_ALIGN - 1))
@@ -452,12 +464,7 @@ asmlinkage __visible void *decompress_kernel(void *rmode, memptr heap,
 	__decompress(input_data, input_len, NULL, NULL, output, output_len,
 			NULL, error);
 	parse_elf(output);
-	/*
-	 * 32-bit always performs relocations. 64-bit relocations are only
-	 * needed if kASLR has chosen a different load address.
-	 */
-	if (!IS_ENABLED(CONFIG_X86_64) || output != output_orig)
-		handle_relocations(output, output_len);
+	handle_relocations(output, output_len, virt_offset);
 	debug_putstr("done.\nBooting the kernel.\n");
 	return output;
 }
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index 39d0e9a53736..0b455406a850 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -69,20 +69,22 @@ int cmdline_find_option_bool(const char *option);
 
 #if CONFIG_RANDOMIZE_BASE
 /* aslr.c */
-unsigned char *choose_kernel_location(unsigned char *input,
-				      unsigned long input_size,
-				      unsigned char *output,
-				      unsigned long output_size);
+void choose_kernel_location(unsigned char *input,
+				unsigned long input_size,
+				unsigned char **output,
+				unsigned long output_size,
+				unsigned char **virt_offset);
 /* cpuflags.c */
 bool has_cpuflag(int flag);
 #else
 static inline
-unsigned char *choose_kernel_location(unsigned char *input,
-				      unsigned long input_size,
-				      unsigned char *output,
-				      unsigned long output_size)
+void choose_kernel_location(unsigned char *input,
+				unsigned long input_size,
+				unsigned char **output,
+				unsigned long output_size,
+				unsigned char **virt_offset)
 {
-	return output;
+	*virt_offset = *output;
 }
 #endif
 
-- 
2.6.3

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [kernel-hardening] [PATCH v5 18/21] x86, KASLR: Randomize virtual address separately
@ 2016-04-14 22:29   ` Kees Cook
  0 siblings, 0 replies; 80+ messages in thread
From: Kees Cook @ 2016-04-14 22:29 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kees Cook, Baoquan He, Yinghai Lu, Ard Biesheuvel, Matt Redfearn,
	x86, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Vivek Goyal,
	Andy Lutomirski, lasse.collin, Andrew Morton, Dave Young,
	kernel-hardening, LKML

From: Baoquan He <bhe@redhat.com>

The current KASLR implementation randomizes the physical and virtual
address of kernel together (both are offset by the same amount). It
calculates the delta of the physical address where vmlinux was linked
to load and where it is finally loaded. If the delta is not equal to 0
(i.e. the kernel was relocated), relocation handling needs be done.

On 64-bit, this patch randomizes both the physical address where kernel
is decompressed and the virtual address where kernel text is mapped and
will execute from. We now have two values being chosen, so the function
arguments are reorganized to pass by pointer so they can be directly
updated. Since relocation handling only depends on the virtual address,
we must check the virtual delta, not the physical delta for processing
kernel relocations. This also populates the page table for the new
virtual address range. 32-bit does not support a separate virtual
address, so it continues to use the physical offset for its virtual
offset.

Signed-off-by: Baoquan He <bhe@redhat.com>
[kees: rewrote changelog, limited virtual split to 64-bit only]
[kees: fix CONFIG_RANDOMIZE_BASE=n boot failure]
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 arch/x86/boot/compressed/aslr.c | 45 +++++++++++++++++++++--------------------
 arch/x86/boot/compressed/misc.c | 39 ++++++++++++++++++++---------------
 arch/x86/boot/compressed/misc.h | 20 +++++++++---------
 3 files changed, 57 insertions(+), 47 deletions(-)

diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index c58215d5a80d..53ceaa0a08b9 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -391,7 +391,7 @@ static void process_e820_entry(struct e820entry *entry,
 	}
 }
 
-static unsigned long find_random_addr(unsigned long minimum,
+static unsigned long find_random_phy_addr(unsigned long minimum,
 				      unsigned long size)
 {
 	int i;
@@ -431,23 +431,24 @@ static unsigned long find_random_virt_offset(unsigned long minimum,
 	return random * CONFIG_PHYSICAL_ALIGN + minimum;
 }
 
-unsigned char *choose_kernel_location(unsigned char *input,
-				      unsigned long input_size,
-				      unsigned char *output,
-				      unsigned long output_size)
+void choose_kernel_location(unsigned char *input,
+				unsigned long input_size,
+				unsigned char **output,
+				unsigned long output_size,
+				unsigned char **virt_offset)
 {
-	unsigned long choice = (unsigned long)output;
 	unsigned long random;
+	*virt_offset = (unsigned char *)LOAD_PHYSICAL_ADDR;
 
 #ifdef CONFIG_HIBERNATION
 	if (!cmdline_find_option_bool("kaslr")) {
 		debug_putstr("KASLR disabled by default...\n");
-		goto out;
+		return;
 	}
 #else
 	if (cmdline_find_option_bool("nokaslr")) {
 		debug_putstr("KASLR disabled by cmdline...\n");
-		goto out;
+		return;
 	}
 #endif
 
@@ -455,23 +456,23 @@ unsigned char *choose_kernel_location(unsigned char *input,
 
 	/* Record the various known unsafe memory ranges. */
 	mem_avoid_init((unsigned long)input, input_size,
-		       (unsigned long)output);
+		       (unsigned long)*output);
 
 	/* Walk e820 and find a random address. */
-	random = find_random_addr(choice, output_size);
-	if (!random) {
+	random = find_random_phy_addr((unsigned long)*output, output_size);
+	if (!random)
 		debug_putstr("KASLR could not find suitable E820 region...\n");
-		goto out;
+	else {
+		if ((unsigned long)*output != random) {
+			fill_pagetable(random, output_size);
+			switch_pagetable();
+			*output = (unsigned char *)random;
+		}
 	}
 
-	/* Always enforce the minimum. */
-	if (random < choice)
-		goto out;
-
-	choice = random;
-
-	fill_pagetable(choice, output_size);
-	switch_pagetable();
-out:
-	return (unsigned char *)choice;
+	/* Pick random virtual address starting from LOAD_PHYSICAL_ADDR. */
+	if (IS_ENABLED(CONFIG_X86_64))
+		random = find_random_virt_offset(LOAD_PHYSICAL_ADDR,
+						 output_size);
+	*virt_offset = (unsigned char *)random;
 }
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index c47ac162d3bd..3d5b0eda0711 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -251,7 +251,8 @@ void error(char *x)
 }
 
 #if CONFIG_X86_NEED_RELOCS
-static void handle_relocations(void *output, unsigned long output_len)
+static void handle_relocations(void *output, unsigned long output_len,
+			       void *virt_offset)
 {
 	int *reloc;
 	unsigned long delta, map, ptr;
@@ -263,11 +264,6 @@ static void handle_relocations(void *output, unsigned long output_len)
 	 * and where it was actually loaded.
 	 */
 	delta = min_addr - LOAD_PHYSICAL_ADDR;
-	if (!delta) {
-		debug_putstr("No relocation needed... ");
-		return;
-	}
-	debug_putstr("Performing relocations... ");
 
 	/*
 	 * The kernel contains a table of relocation addresses. Those
@@ -279,6 +275,20 @@ static void handle_relocations(void *output, unsigned long output_len)
 	map = delta - __START_KERNEL_map;
 
 	/*
+	 * 32-bit always performs relocations. 64-bit relocations are only
+	 * needed if KASLR has chosen a different starting address offset
+	 * from __START_KERNEL_map.
+	 */
+	if (IS_ENABLED(CONFIG_X86_64))
+		delta = (unsigned long)virt_offset - LOAD_PHYSICAL_ADDR;
+
+	if (!delta) {
+		debug_putstr("No relocation needed... ");
+		return;
+	}
+	debug_putstr("Performing relocations... ");
+
+	/*
 	 * Process relocations: 32 bit relocations first then 64 bit after.
 	 * Three sets of binary relocations are added to the end of the kernel
 	 * before compression. Each relocation table entry is the kernel
@@ -331,7 +341,8 @@ static void handle_relocations(void *output, unsigned long output_len)
 #endif
 }
 #else
-static inline void handle_relocations(void *output, unsigned long output_len)
+static inline void handle_relocations(void *output, unsigned long output_len,
+				      void *virt_offset)
 { }
 #endif
 
@@ -391,7 +402,7 @@ asmlinkage __visible void *decompress_kernel(void *rmode, memptr heap,
 				  unsigned long output_len)
 {
 	const unsigned long run_size = VO__end - VO__text;
-	unsigned char *output_orig = output;
+	unsigned char *virt_offset;
 
 	real_mode = rmode;
 
@@ -429,9 +440,10 @@ asmlinkage __visible void *decompress_kernel(void *rmode, memptr heap,
 	 * the entire decompressed kernel plus relocation table, or the
 	 * entire decompressed kernel plus .bss and .brk sections.
 	 */
-	output = choose_kernel_location(input_data, input_len, output,
+	choose_kernel_location(input_data, input_len, &output,
 					output_len > run_size ? output_len
-							      : run_size);
+							      : run_size,
+					&virt_offset);
 
 	/* Validate memory location choices. */
 	if ((unsigned long)output & (MIN_KERNEL_ALIGN - 1))
@@ -452,12 +464,7 @@ asmlinkage __visible void *decompress_kernel(void *rmode, memptr heap,
 	__decompress(input_data, input_len, NULL, NULL, output, output_len,
 			NULL, error);
 	parse_elf(output);
-	/*
-	 * 32-bit always performs relocations. 64-bit relocations are only
-	 * needed if kASLR has chosen a different load address.
-	 */
-	if (!IS_ENABLED(CONFIG_X86_64) || output != output_orig)
-		handle_relocations(output, output_len);
+	handle_relocations(output, output_len, virt_offset);
 	debug_putstr("done.\nBooting the kernel.\n");
 	return output;
 }
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index 39d0e9a53736..0b455406a850 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -69,20 +69,22 @@ int cmdline_find_option_bool(const char *option);
 
 #if CONFIG_RANDOMIZE_BASE
 /* aslr.c */
-unsigned char *choose_kernel_location(unsigned char *input,
-				      unsigned long input_size,
-				      unsigned char *output,
-				      unsigned long output_size);
+void choose_kernel_location(unsigned char *input,
+				unsigned long input_size,
+				unsigned char **output,
+				unsigned long output_size,
+				unsigned char **virt_offset);
 /* cpuflags.c */
 bool has_cpuflag(int flag);
 #else
 static inline
-unsigned char *choose_kernel_location(unsigned char *input,
-				      unsigned long input_size,
-				      unsigned char *output,
-				      unsigned long output_size)
+void choose_kernel_location(unsigned char *input,
+				unsigned long input_size,
+				unsigned char **output,
+				unsigned long output_size,
+				unsigned char **virt_offset)
 {
-	return output;
+	*virt_offset = *output;
 }
 #endif
 
-- 
2.6.3

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v5 19/21] x86, KASLR: Add physical address randomization >4G
  2016-04-14 22:28 ` [kernel-hardening] " Kees Cook
@ 2016-04-14 22:29   ` Kees Cook
  -1 siblings, 0 replies; 80+ messages in thread
From: Kees Cook @ 2016-04-14 22:29 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kees Cook, Baoquan He, Yinghai Lu, Ard Biesheuvel, Matt Redfearn,
	x86, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Vivek Goyal,
	Andy Lutomirski, lasse.collin, Andrew Morton, Dave Young,
	kernel-hardening, LKML

From: Baoquan He <bhe@redhat.com>

This patch exchanges the prior slots[] array for the new slot_areas[]
array, and lifts the limitation of KERNEL_IMAGE_SIZE on the physical
address offset for 64-bit. As before, process_e820_entry walks memory and
populates slot_areas[], splitting on any detected mem_avoid collisions.

Signed-off-by: Baoquan He <bhe@redhat.com>
[kees: rewrote changelog, refactored goto into while, limit 32-bit to 1G]
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 arch/x86/boot/compressed/aslr.c | 92 ++++++++++++++++++++++++++++++-----------
 1 file changed, 68 insertions(+), 24 deletions(-)

diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index 53ceaa0a08b9..0587eac3e05d 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -335,25 +335,42 @@ static void slots_append(unsigned long addr)
 
 static unsigned long slots_fetch_random(void)
 {
+	unsigned long random;
+	int i;
+
 	/* Handle case of no slots stored. */
 	if (slot_max == 0)
 		return 0;
 
-	return slots[get_random_long("Physical") % slot_max];
+	random = get_random_long("Physical") % slot_max;
+
+	for (i = 0; i < slot_area_index; i++) {
+		if (random >= slot_areas[i].num) {
+			random -= slot_areas[i].num;
+			continue;
+		}
+		return slot_areas[i].addr + random * CONFIG_PHYSICAL_ALIGN;
+	}
+
+	if (i == slot_area_index)
+		debug_putstr("slots_fetch_random() failed!?\n");
+	return 0;
 }
 
 static void process_e820_entry(struct e820entry *entry,
 			       unsigned long minimum,
 			       unsigned long image_size)
 {
-	struct mem_vector region, img;
+	struct mem_vector region, out;
+	struct slot_area slot_area;
+	unsigned long min, start_orig;
 
 	/* Skip non-RAM entries. */
 	if (entry->type != E820_RAM)
 		return;
 
-	/* Ignore entries entirely above our maximum. */
-	if (entry->addr >= KERNEL_IMAGE_SIZE)
+	/* On 32-bit, ignore entries entirely above our maximum. */
+	if (IS_ENABLED(CONFIG_X86_32) && entry->addr >= KERNEL_IMAGE_SIZE)
 		return;
 
 	/* Ignore entries entirely below our minimum. */
@@ -363,31 +380,54 @@ static void process_e820_entry(struct e820entry *entry,
 	region.start = entry->addr;
 	region.size = entry->size;
 
-	/* Potentially raise address to minimum location. */
-	if (region.start < minimum)
-		region.start = minimum;
+	/* Give up if slot area array is full. */
+	while (slot_area_index < MAX_SLOT_AREA) {
+		start_orig = region.start;
 
-	/* Potentially raise address to meet alignment requirements. */
-	region.start = ALIGN(region.start, CONFIG_PHYSICAL_ALIGN);
+		/* Potentially raise address to minimum location. */
+		if (region.start < minimum)
+			region.start = minimum;
 
-	/* Did we raise the address above the bounds of this e820 region? */
-	if (region.start > entry->addr + entry->size)
-		return;
+		/* Potentially raise address to meet alignment needs. */
+		region.start = ALIGN(region.start, CONFIG_PHYSICAL_ALIGN);
 
-	/* Reduce size by any delta from the original address. */
-	region.size -= region.start - entry->addr;
+		/* Did we raise the address above this e820 region? */
+		if (region.start > entry->addr + entry->size)
+			return;
 
-	/* Reduce maximum size to fit end of image within maximum limit. */
-	if (region.start + region.size > KERNEL_IMAGE_SIZE)
-		region.size = KERNEL_IMAGE_SIZE - region.start;
+		/* Reduce size by any delta from the original address. */
+		region.size -= region.start - start_orig;
 
-	/* Walk each aligned slot and check for avoided areas. */
-	for (img.start = region.start, img.size = image_size ;
-	     mem_contains(&region, &img) ;
-	     img.start += CONFIG_PHYSICAL_ALIGN) {
-		if (mem_avoid_overlap(&img))
-			continue;
-		slots_append(img.start);
+		/* On 32-bit, reduce region size to fit within max size. */
+		if (IS_ENABLED(CONFIG_X86_32) &&
+		    region.start + region.size > KERNEL_IMAGE_SIZE)
+			region.size = KERNEL_IMAGE_SIZE - region.start;
+
+		/* Return if region can't contain decompressed kernel */
+		if (region.size < image_size)
+			return;
+
+		/* If nothing overlaps, store the region and return. */
+		if (!mem_avoid_overlap(&region)) {
+			store_slot_info(&region, image_size);
+			return;
+		}
+
+		/* Other wise, find the lowest overlap. */
+		min = mem_min_overlap(&region, &out);
+
+		/* Store the region if it can hold at least image_size. */
+		if (min > region.start + image_size) {
+			struct mem_vector tmp;
+
+			tmp.start = region.start;
+			tmp.size = min - region.start;
+			store_slot_info(&tmp, image_size);
+		}
+
+		/* Clip off the overlapping region and start over. */
+		region.size -= out.start - region.start + out.size;
+		region.start = out.start + out.size;
 	}
 }
 
@@ -403,6 +443,10 @@ static unsigned long find_random_phy_addr(unsigned long minimum,
 	/* Verify potential e820 positions, appending to slots list. */
 	for (i = 0; i < real_mode->e820_entries; i++) {
 		process_e820_entry(&real_mode->e820_map[i], minimum, size);
+		if (slot_area_index == MAX_SLOT_AREA) {
+			debug_putstr("Aborted e820 scan (slot_areas full)!\n");
+			break;
+		}
 	}
 
 	return slots_fetch_random();
-- 
2.6.3

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [kernel-hardening] [PATCH v5 19/21] x86, KASLR: Add physical address randomization >4G
@ 2016-04-14 22:29   ` Kees Cook
  0 siblings, 0 replies; 80+ messages in thread
From: Kees Cook @ 2016-04-14 22:29 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kees Cook, Baoquan He, Yinghai Lu, Ard Biesheuvel, Matt Redfearn,
	x86, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Vivek Goyal,
	Andy Lutomirski, lasse.collin, Andrew Morton, Dave Young,
	kernel-hardening, LKML

From: Baoquan He <bhe@redhat.com>

This patch exchanges the prior slots[] array for the new slot_areas[]
array, and lifts the limitation of KERNEL_IMAGE_SIZE on the physical
address offset for 64-bit. As before, process_e820_entry walks memory and
populates slot_areas[], splitting on any detected mem_avoid collisions.

Signed-off-by: Baoquan He <bhe@redhat.com>
[kees: rewrote changelog, refactored goto into while, limit 32-bit to 1G]
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 arch/x86/boot/compressed/aslr.c | 92 ++++++++++++++++++++++++++++++-----------
 1 file changed, 68 insertions(+), 24 deletions(-)

diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index 53ceaa0a08b9..0587eac3e05d 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -335,25 +335,42 @@ static void slots_append(unsigned long addr)
 
 static unsigned long slots_fetch_random(void)
 {
+	unsigned long random;
+	int i;
+
 	/* Handle case of no slots stored. */
 	if (slot_max == 0)
 		return 0;
 
-	return slots[get_random_long("Physical") % slot_max];
+	random = get_random_long("Physical") % slot_max;
+
+	for (i = 0; i < slot_area_index; i++) {
+		if (random >= slot_areas[i].num) {
+			random -= slot_areas[i].num;
+			continue;
+		}
+		return slot_areas[i].addr + random * CONFIG_PHYSICAL_ALIGN;
+	}
+
+	if (i == slot_area_index)
+		debug_putstr("slots_fetch_random() failed!?\n");
+	return 0;
 }
 
 static void process_e820_entry(struct e820entry *entry,
 			       unsigned long minimum,
 			       unsigned long image_size)
 {
-	struct mem_vector region, img;
+	struct mem_vector region, out;
+	struct slot_area slot_area;
+	unsigned long min, start_orig;
 
 	/* Skip non-RAM entries. */
 	if (entry->type != E820_RAM)
 		return;
 
-	/* Ignore entries entirely above our maximum. */
-	if (entry->addr >= KERNEL_IMAGE_SIZE)
+	/* On 32-bit, ignore entries entirely above our maximum. */
+	if (IS_ENABLED(CONFIG_X86_32) && entry->addr >= KERNEL_IMAGE_SIZE)
 		return;
 
 	/* Ignore entries entirely below our minimum. */
@@ -363,31 +380,54 @@ static void process_e820_entry(struct e820entry *entry,
 	region.start = entry->addr;
 	region.size = entry->size;
 
-	/* Potentially raise address to minimum location. */
-	if (region.start < minimum)
-		region.start = minimum;
+	/* Give up if slot area array is full. */
+	while (slot_area_index < MAX_SLOT_AREA) {
+		start_orig = region.start;
 
-	/* Potentially raise address to meet alignment requirements. */
-	region.start = ALIGN(region.start, CONFIG_PHYSICAL_ALIGN);
+		/* Potentially raise address to minimum location. */
+		if (region.start < minimum)
+			region.start = minimum;
 
-	/* Did we raise the address above the bounds of this e820 region? */
-	if (region.start > entry->addr + entry->size)
-		return;
+		/* Potentially raise address to meet alignment needs. */
+		region.start = ALIGN(region.start, CONFIG_PHYSICAL_ALIGN);
 
-	/* Reduce size by any delta from the original address. */
-	region.size -= region.start - entry->addr;
+		/* Did we raise the address above this e820 region? */
+		if (region.start > entry->addr + entry->size)
+			return;
 
-	/* Reduce maximum size to fit end of image within maximum limit. */
-	if (region.start + region.size > KERNEL_IMAGE_SIZE)
-		region.size = KERNEL_IMAGE_SIZE - region.start;
+		/* Reduce size by any delta from the original address. */
+		region.size -= region.start - start_orig;
 
-	/* Walk each aligned slot and check for avoided areas. */
-	for (img.start = region.start, img.size = image_size ;
-	     mem_contains(&region, &img) ;
-	     img.start += CONFIG_PHYSICAL_ALIGN) {
-		if (mem_avoid_overlap(&img))
-			continue;
-		slots_append(img.start);
+		/* On 32-bit, reduce region size to fit within max size. */
+		if (IS_ENABLED(CONFIG_X86_32) &&
+		    region.start + region.size > KERNEL_IMAGE_SIZE)
+			region.size = KERNEL_IMAGE_SIZE - region.start;
+
+		/* Return if region can't contain decompressed kernel */
+		if (region.size < image_size)
+			return;
+
+		/* If nothing overlaps, store the region and return. */
+		if (!mem_avoid_overlap(&region)) {
+			store_slot_info(&region, image_size);
+			return;
+		}
+
+		/* Other wise, find the lowest overlap. */
+		min = mem_min_overlap(&region, &out);
+
+		/* Store the region if it can hold at least image_size. */
+		if (min > region.start + image_size) {
+			struct mem_vector tmp;
+
+			tmp.start = region.start;
+			tmp.size = min - region.start;
+			store_slot_info(&tmp, image_size);
+		}
+
+		/* Clip off the overlapping region and start over. */
+		region.size -= out.start - region.start + out.size;
+		region.start = out.start + out.size;
 	}
 }
 
@@ -403,6 +443,10 @@ static unsigned long find_random_phy_addr(unsigned long minimum,
 	/* Verify potential e820 positions, appending to slots list. */
 	for (i = 0; i < real_mode->e820_entries; i++) {
 		process_e820_entry(&real_mode->e820_map[i], minimum, size);
+		if (slot_area_index == MAX_SLOT_AREA) {
+			debug_putstr("Aborted e820 scan (slot_areas full)!\n");
+			break;
+		}
 	}
 
 	return slots_fetch_random();
-- 
2.6.3

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v5 20/21] x86, KASLR: Remove unused slot tracking code
  2016-04-14 22:28 ` [kernel-hardening] " Kees Cook
@ 2016-04-14 22:29   ` Kees Cook
  -1 siblings, 0 replies; 80+ messages in thread
From: Kees Cook @ 2016-04-14 22:29 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kees Cook, Baoquan He, Yinghai Lu, Ard Biesheuvel, Matt Redfearn,
	x86, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Vivek Goyal,
	Andy Lutomirski, lasse.collin, Andrew Morton, Dave Young,
	kernel-hardening, LKML

From: Baoquan He <bhe@redhat.com>

Since struct slot_area and its new algorithm are used to track and select
memory ranges, the slots[] array and its associated functions are not
needed any more. This patch removes them.

Signed-off-by: Baoquan He <bhe@redhat.com>
[kees: rewrote changelog]
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 arch/x86/boot/compressed/aslr.c | 22 ----------------------
 1 file changed, 22 deletions(-)

diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index 0587eac3e05d..e83d3bb3808b 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -113,17 +113,6 @@ struct mem_vector {
 #define MEM_AVOID_MAX 4
 static struct mem_vector mem_avoid[MEM_AVOID_MAX];
 
-static bool mem_contains(struct mem_vector *region, struct mem_vector *item)
-{
-	/* Item at least partially before region. */
-	if (item->start < region->start)
-		return false;
-	/* Item at least partially after region. */
-	if (item->start + item->size > region->start + region->size)
-		return false;
-	return true;
-}
-
 static bool mem_overlaps(struct mem_vector *one, struct mem_vector *two)
 {
 	/* Item one is entirely before item two. */
@@ -292,8 +281,6 @@ mem_min_overlap(struct mem_vector *img, struct mem_vector *out)
 	return min;
 }
 
-static unsigned long slots[KERNEL_IMAGE_SIZE / CONFIG_PHYSICAL_ALIGN];
-
 struct slot_area {
 	unsigned long addr;
 	int num;
@@ -324,15 +311,6 @@ static void store_slot_info(struct mem_vector *region, unsigned long image_size)
 	}
 }
 
-static void slots_append(unsigned long addr)
-{
-	/* Overflowing the slots list should be impossible. */
-	if (slot_max >= KERNEL_IMAGE_SIZE / CONFIG_PHYSICAL_ALIGN)
-		return;
-
-	slots[slot_max++] = addr;
-}
-
 static unsigned long slots_fetch_random(void)
 {
 	unsigned long random;
-- 
2.6.3

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [kernel-hardening] [PATCH v5 20/21] x86, KASLR: Remove unused slot tracking code
@ 2016-04-14 22:29   ` Kees Cook
  0 siblings, 0 replies; 80+ messages in thread
From: Kees Cook @ 2016-04-14 22:29 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kees Cook, Baoquan He, Yinghai Lu, Ard Biesheuvel, Matt Redfearn,
	x86, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Vivek Goyal,
	Andy Lutomirski, lasse.collin, Andrew Morton, Dave Young,
	kernel-hardening, LKML

From: Baoquan He <bhe@redhat.com>

Since struct slot_area and its new algorithm are used to track and select
memory ranges, the slots[] array and its associated functions are not
needed any more. This patch removes them.

Signed-off-by: Baoquan He <bhe@redhat.com>
[kees: rewrote changelog]
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 arch/x86/boot/compressed/aslr.c | 22 ----------------------
 1 file changed, 22 deletions(-)

diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index 0587eac3e05d..e83d3bb3808b 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -113,17 +113,6 @@ struct mem_vector {
 #define MEM_AVOID_MAX 4
 static struct mem_vector mem_avoid[MEM_AVOID_MAX];
 
-static bool mem_contains(struct mem_vector *region, struct mem_vector *item)
-{
-	/* Item at least partially before region. */
-	if (item->start < region->start)
-		return false;
-	/* Item at least partially after region. */
-	if (item->start + item->size > region->start + region->size)
-		return false;
-	return true;
-}
-
 static bool mem_overlaps(struct mem_vector *one, struct mem_vector *two)
 {
 	/* Item one is entirely before item two. */
@@ -292,8 +281,6 @@ mem_min_overlap(struct mem_vector *img, struct mem_vector *out)
 	return min;
 }
 
-static unsigned long slots[KERNEL_IMAGE_SIZE / CONFIG_PHYSICAL_ALIGN];
-
 struct slot_area {
 	unsigned long addr;
 	int num;
@@ -324,15 +311,6 @@ static void store_slot_info(struct mem_vector *region, unsigned long image_size)
 	}
 }
 
-static void slots_append(unsigned long addr)
-{
-	/* Overflowing the slots list should be impossible. */
-	if (slot_max >= KERNEL_IMAGE_SIZE / CONFIG_PHYSICAL_ALIGN)
-		return;
-
-	slots[slot_max++] = addr;
-}
-
 static unsigned long slots_fetch_random(void)
 {
 	unsigned long random;
-- 
2.6.3

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v5 21/21] x86, KASLR: Allow randomization below load address
  2016-04-14 22:28 ` [kernel-hardening] " Kees Cook
@ 2016-04-14 22:29   ` Kees Cook
  -1 siblings, 0 replies; 80+ messages in thread
From: Kees Cook @ 2016-04-14 22:29 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kees Cook, Yinghai Lu, Baoquan He, Ard Biesheuvel, Matt Redfearn,
	x86, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Vivek Goyal,
	Andy Lutomirski, lasse.collin, Andrew Morton, Dave Young,
	kernel-hardening, LKML

From: Yinghai Lu <yinghai@kernel.org>

Currently the physical randomization's lower boundary is the load
address. For bootloaders that load kernels into very high memory
(e.g. kexec), this means randomization takes place in a very small window
at the top of memory, ignoring the large region of physical memory below
the load address.

Since mem_avoid is already correctly tracking the regions that must be
avoided, this patch changes the minimum address to which ever is less:
512M (to conservatively avoid unknown things in lower memory) or the
load address. Now, for example, if the kernel is loaded at 8G, [512M,
8G) will be added into possible physical memory positions.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
[kees: rewrote changelog, refactor to use min()]
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 arch/x86/boot/compressed/aslr.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index e83d3bb3808b..864a51863b45 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -459,7 +459,8 @@ void choose_kernel_location(unsigned char *input,
 				unsigned long output_size,
 				unsigned char **virt_offset)
 {
-	unsigned long random;
+	unsigned long random, min_addr;
+
 	*virt_offset = (unsigned char *)LOAD_PHYSICAL_ADDR;
 
 #ifdef CONFIG_HIBERNATION
@@ -480,8 +481,11 @@ void choose_kernel_location(unsigned char *input,
 	mem_avoid_init((unsigned long)input, input_size,
 		       (unsigned long)*output);
 
+	/* Low end should be the smaller of 512M or initial location. */
+	min_addr = min((unsigned long)*output, 512UL << 20);
+
 	/* Walk e820 and find a random address. */
-	random = find_random_phy_addr((unsigned long)*output, output_size);
+	random = find_random_phy_addr(min_addr, output_size);
 	if (!random)
 		debug_putstr("KASLR could not find suitable E820 region...\n");
 	else {
-- 
2.6.3

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [kernel-hardening] [PATCH v5 21/21] x86, KASLR: Allow randomization below load address
@ 2016-04-14 22:29   ` Kees Cook
  0 siblings, 0 replies; 80+ messages in thread
From: Kees Cook @ 2016-04-14 22:29 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kees Cook, Yinghai Lu, Baoquan He, Ard Biesheuvel, Matt Redfearn,
	x86, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Vivek Goyal,
	Andy Lutomirski, lasse.collin, Andrew Morton, Dave Young,
	kernel-hardening, LKML

From: Yinghai Lu <yinghai@kernel.org>

Currently the physical randomization's lower boundary is the load
address. For bootloaders that load kernels into very high memory
(e.g. kexec), this means randomization takes place in a very small window
at the top of memory, ignoring the large region of physical memory below
the load address.

Since mem_avoid is already correctly tracking the regions that must be
avoided, this patch changes the minimum address to which ever is less:
512M (to conservatively avoid unknown things in lower memory) or the
load address. Now, for example, if the kernel is loaded at 8G, [512M,
8G) will be added into possible physical memory positions.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
[kees: rewrote changelog, refactor to use min()]
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 arch/x86/boot/compressed/aslr.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index e83d3bb3808b..864a51863b45 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -459,7 +459,8 @@ void choose_kernel_location(unsigned char *input,
 				unsigned long output_size,
 				unsigned char **virt_offset)
 {
-	unsigned long random;
+	unsigned long random, min_addr;
+
 	*virt_offset = (unsigned char *)LOAD_PHYSICAL_ADDR;
 
 #ifdef CONFIG_HIBERNATION
@@ -480,8 +481,11 @@ void choose_kernel_location(unsigned char *input,
 	mem_avoid_init((unsigned long)input, input_size,
 		       (unsigned long)*output);
 
+	/* Low end should be the smaller of 512M or initial location. */
+	min_addr = min((unsigned long)*output, 512UL << 20);
+
 	/* Walk e820 and find a random address. */
-	random = find_random_phy_addr((unsigned long)*output, output_size);
+	random = find_random_phy_addr(min_addr, output_size);
 	if (!random)
 		debug_putstr("KASLR could not find suitable E820 region...\n");
 	else {
-- 
2.6.3

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [PATCH v5 01/21] x86, KASLR: Remove unneeded boot_params argument
  2016-04-14 22:28   ` [kernel-hardening] " Kees Cook
@ 2016-04-15  7:29     ` Ingo Molnar
  -1 siblings, 0 replies; 80+ messages in thread
From: Ingo Molnar @ 2016-04-15  7:29 UTC (permalink / raw)
  To: Kees Cook
  Cc: Yinghai Lu, Baoquan He, Ard Biesheuvel, Matt Redfearn, x86,
	H. Peter Anvin, Ingo Molnar, Borislav Petkov, Vivek Goyal,
	Andy Lutomirski, lasse.collin, Andrew Morton, Dave Young,
	kernel-hardening, LKML, Linus Torvalds, Thomas Gleixner


* Kees Cook <keescook@chromium.org> wrote:

> From: Yinghai Lu <yinghai@kernel.org>
> 
> Since the boot_params can be found using the real_mode global variable, there is 
> no need to pass around a pointer to it. This slightly simplifies the 
> choose_kernel_location function and its callers.

Yeah, so I really wanted to apply this series because the changelogs are now very 
nice, but got held up by the very first patch again ...

Guys, 'real_mode' totally sucks as a variable name!

By removing a seemingly unnecessary parameter, you made the code actively worse to 
read...

So please make a second patch on top of this one that renames 'real_mode' to 
something that both displays what it is about (it sure isn't mainly about real 
mode only!!), and also expresses that it's a global variable, not some local 
function parameter.

In arch/x86/ we usually achieve that via prefixing it with x86_ or so. So 
something like 'x86_boot_params' would work for me. I realize that this is 
somewhat special, because this is pre-relocation code so we really have to be 
careful about pointers - so maybe name it x86_boot_params_rm or so.

Also, I had a look at the whole file you are patching, 
arch/x86/boot/compressed/misc.c, and for heaven's sake, please first improve the 
comments on top of that file before further complicating that code with KASLR 
details! Right now it consists of various low level ramblings choke full of typos. 
No mention of where KASLR fits into the picture at all!

Right now it says:

 * misc.c
 *
 * This is a collection of several routines from gzip-1.0.3
 * adapted for Linux.

Which was perhaps true 15 years ago, but sure it's not true today! We do 
relocation and KASLR processing in this path as well, which is just as important 
as running the decompressor. Also, gzip is not the only decompressor we support.

The high level purpose of the code in this file should be explained first, and the 
file should probably be renamed to something like extract.c - to signal that this 
file is about extracting a kernel image and preparing it for execution (i.e. 
relinking it, etc.). It's (much!) more than just pure decompression.

Likewise, decompress_kernel() should be renamed to extract_kernel().

The low level explanations in extract.c should be moved to the function that is 
affected by those details. The top of the file should only contain general 
explanations, a high level description of in what state the kernel is when we call 
this code, and the sort of stuff we do in that code.

The credits should be collected into a single, coherent block. Unnecessary fluff 
should be cut.

Furthermore, I see similar problems with arch/x86/boot/compressed/aslr.c: for 
example it has no high level description at the top of the file _at all_. WTF is 
this with writing security-sensitive code that has no high level design 
description whatsoever?? Also, why is it named aslr.c, why not kaslr.c? We do have 
both KASLR and ASLR code in the kernel, confusing naming on the source code level 
sure does not help.

Also, what's this thing about choose_kernel_location()? That name is actively 
hiding the fact that the main purpose of that function is to randomize things ... 

It should be named x86_randomize_kernel_address() or so.

We need to stop making a mess of the x86 boot code! This is a highly critical 
piece of kernel code that every single x86 Linux user will execute, we should 
improve its visual presentation and general readability to the level of being 
proud of it ...

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [kernel-hardening] Re: [PATCH v5 01/21] x86, KASLR: Remove unneeded boot_params argument
@ 2016-04-15  7:29     ` Ingo Molnar
  0 siblings, 0 replies; 80+ messages in thread
From: Ingo Molnar @ 2016-04-15  7:29 UTC (permalink / raw)
  To: Kees Cook
  Cc: Yinghai Lu, Baoquan He, Ard Biesheuvel, Matt Redfearn, x86,
	H. Peter Anvin, Ingo Molnar, Borislav Petkov, Vivek Goyal,
	Andy Lutomirski, lasse.collin, Andrew Morton, Dave Young,
	kernel-hardening, LKML, Linus Torvalds, Thomas Gleixner


* Kees Cook <keescook@chromium.org> wrote:

> From: Yinghai Lu <yinghai@kernel.org>
> 
> Since the boot_params can be found using the real_mode global variable, there is 
> no need to pass around a pointer to it. This slightly simplifies the 
> choose_kernel_location function and its callers.

Yeah, so I really wanted to apply this series because the changelogs are now very 
nice, but got held up by the very first patch again ...

Guys, 'real_mode' totally sucks as a variable name!

By removing a seemingly unnecessary parameter, you made the code actively worse to 
read...

So please make a second patch on top of this one that renames 'real_mode' to 
something that both displays what it is about (it sure isn't mainly about real 
mode only!!), and also expresses that it's a global variable, not some local 
function parameter.

In arch/x86/ we usually achieve that via prefixing it with x86_ or so. So 
something like 'x86_boot_params' would work for me. I realize that this is 
somewhat special, because this is pre-relocation code so we really have to be 
careful about pointers - so maybe name it x86_boot_params_rm or so.

Also, I had a look at the whole file you are patching, 
arch/x86/boot/compressed/misc.c, and for heaven's sake, please first improve the 
comments on top of that file before further complicating that code with KASLR 
details! Right now it consists of various low level ramblings choke full of typos. 
No mention of where KASLR fits into the picture at all!

Right now it says:

 * misc.c
 *
 * This is a collection of several routines from gzip-1.0.3
 * adapted for Linux.

Which was perhaps true 15 years ago, but sure it's not true today! We do 
relocation and KASLR processing in this path as well, which is just as important 
as running the decompressor. Also, gzip is not the only decompressor we support.

The high level purpose of the code in this file should be explained first, and the 
file should probably be renamed to something like extract.c - to signal that this 
file is about extracting a kernel image and preparing it for execution (i.e. 
relinking it, etc.). It's (much!) more than just pure decompression.

Likewise, decompress_kernel() should be renamed to extract_kernel().

The low level explanations in extract.c should be moved to the function that is 
affected by those details. The top of the file should only contain general 
explanations, a high level description of in what state the kernel is when we call 
this code, and the sort of stuff we do in that code.

The credits should be collected into a single, coherent block. Unnecessary fluff 
should be cut.

Furthermore, I see similar problems with arch/x86/boot/compressed/aslr.c: for 
example it has no high level description at the top of the file _at all_. WTF is 
this with writing security-sensitive code that has no high level design 
description whatsoever?? Also, why is it named aslr.c, why not kaslr.c? We do have 
both KASLR and ASLR code in the kernel, confusing naming on the source code level 
sure does not help.

Also, what's this thing about choose_kernel_location()? That name is actively 
hiding the fact that the main purpose of that function is to randomize things ... 

It should be named x86_randomize_kernel_address() or so.

We need to stop making a mess of the x86 boot code! This is a highly critical 
piece of kernel code that every single x86 Linux user will execute, we should 
improve its visual presentation and general readability to the level of being 
proud of it ...

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v5 02/21] x86, KASLR: Handle kernel relocation above 2G
  2016-04-14 22:28   ` [kernel-hardening] " Kees Cook
@ 2016-04-15  7:47     ` Ingo Molnar
  -1 siblings, 0 replies; 80+ messages in thread
From: Ingo Molnar @ 2016-04-15  7:47 UTC (permalink / raw)
  To: Kees Cook
  Cc: Baoquan He, Yinghai Lu, Ard Biesheuvel, Matt Redfearn, x86,
	H. Peter Anvin, Ingo Molnar, Borislav Petkov, Vivek Goyal,
	Andy Lutomirski, lasse.collin, Andrew Morton, Dave Young,
	kernel-hardening, LKML, Linus Torvalds, Thomas Gleixner


* Kees Cook <keescook@chromium.org> wrote:

> From: Baoquan He <bhe@redhat.com>
> 
> When processing the relocation table, the offset used to calculate the
> relocation is an int. This is sufficient for calculating the physical
> address of the relocs entry on 32-bit systems and on 64-bit systems when
> the relocation is under 2G. To handle relocations above 2G (seen in
> situations like kexec, netboot, etc), this offset needs to be calculated
> using a long to avoid wrapping and miscalculating the relocation.
> 
> Signed-off-by: Baoquan He <bhe@redhat.com>
> [kees: rewrote changelog]
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---
>  arch/x86/boot/compressed/misc.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
> index f35ad9eb1bf1..c4477d5f3fff 100644
> --- a/arch/x86/boot/compressed/misc.c
> +++ b/arch/x86/boot/compressed/misc.c
> @@ -295,7 +295,7 @@ static void handle_relocations(void *output, unsigned long output_len)
>  	 * So we work backwards from the end of the decompressed image.
>  	 */
>  	for (reloc = output + output_len - sizeof(*reloc); *reloc; reloc--) {
> -		int extended = *reloc;
> +		long extended = *reloc;
>  		extended += map;
>  
>  		ptr = (unsigned long)extended;

This patch and the code it patches is just plain sloppy. See this cast? This is an 
object lesson of why type casts in C are actively dangerous, they have hidden the 
32-bit truncation bug you've fixed with this patch.

But the primary bug, the cast, should be fixed! Together with all the other casts 
of 'extended'.

Also, the boot code should be reviewed for unnecessary casts, it seems to be a 
disease:

 triton:~/tip> git grep -cE '\(unsigned.*;$' arch/x86/boot/compressed/
 arch/x86/boot/compressed/aslr.c:14
 arch/x86/boot/compressed/eboot.c:41
 arch/x86/boot/compressed/misc.c:5
 arch/x86/boot/compressed/misc.h:2
 arch/x86/boot/compressed/mkpiggy.c:1
 arch/x86/boot/compressed/string.c:4

For example the type dance and overloaded usage that choose_kernel_location() does 
with the 'random' local variable in aslr.c is disgusting:

void choose_kernel_location(unsigned char *input,
				unsigned long input_size,
				unsigned char **output,
				unsigned long output_size,
				unsigned char **virt_offset)
{
	unsigned long random, min_addr;

	*virt_offset = (unsigned char *)LOAD_PHYSICAL_ADDR;

#ifdef CONFIG_HIBERNATION
	if (!cmdline_find_option_bool("kaslr")) {
		debug_putstr("KASLR disabled by default...\n");
		return;
	}
#else
	if (cmdline_find_option_bool("nokaslr")) {
		debug_putstr("KASLR disabled by cmdline...\n");
		return;
	}
#endif

	real_mode->hdr.loadflags |= KASLR_FLAG;

	/* Record the various known unsafe memory ranges. */
	mem_avoid_init((unsigned long)input, input_size,
		       (unsigned long)*output);

	/* Low end should be the smaller of 512M or initial location. */
	min_addr = min((unsigned long)*output, 512UL << 20);

	/* Walk e820 and find a random address. */
	random = find_random_phy_addr(min_addr, output_size);
	if (!random)
		debug_putstr("KASLR could not find suitable E820 region...\n");
	else {
		if ((unsigned long)*output != random) {
			fill_pagetable(random, output_size);
			switch_pagetable();
			*output = (unsigned char *)random;
		}
	}

	/* Pick random virtual address starting from LOAD_PHYSICAL_ADDR. */
	if (IS_ENABLED(CONFIG_X86_64))
		random = find_random_virt_offset(LOAD_PHYSICAL_ADDR,
						 output_size);
	*virt_offset = (unsigned char *)random;
}

Firstly, 'random' is a libc function name. We generally don't overload those.

Secondly, it's a random what? Variable names should make it plenty obvious. So it 
should probably be named 'random_addr'.

Third:

	/* Walk e820 and find a random address. */
	random = find_random_phy_addr(min_addr, output_size);

yeah, so what that comment tells us we knew already, due to the function name! 
What the comment should _really_ talk about is the high level purpose. Something 
like: 'Walk the e820 map and find a random free RAM address to which we can still 
decompress the whole kernel' would work so much better ...

Fourth, this function has seven (!!) type casts. We can sure do better.

Fifth:

        /* Pick random virtual address starting from LOAD_PHYSICAL_ADDR. */
        if (IS_ENABLED(CONFIG_X86_64))
                random = find_random_virt_offset(LOAD_PHYSICAL_ADDR,
                                                 output_size);
        *virt_offset = (unsigned char *)random;

So the purpose of this whole function is to pick _two_ random addresses: the 
random physical address to place the kernel at, and on x86_64, to also randomize 
the kernel virtual address, right? So exactly which comment tells us that it's 
about this? Names like 'choose_kernel_location' are singular and are actively 
misleading about this ...

... and then I haven't even mentioned small details like the imbalanced curly 
braces.

This code sucks, and I'm not surprised at all that it was broken. It should be 
improved before we can feature-extend it.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [kernel-hardening] Re: [PATCH v5 02/21] x86, KASLR: Handle kernel relocation above 2G
@ 2016-04-15  7:47     ` Ingo Molnar
  0 siblings, 0 replies; 80+ messages in thread
From: Ingo Molnar @ 2016-04-15  7:47 UTC (permalink / raw)
  To: Kees Cook
  Cc: Baoquan He, Yinghai Lu, Ard Biesheuvel, Matt Redfearn, x86,
	H. Peter Anvin, Ingo Molnar, Borislav Petkov, Vivek Goyal,
	Andy Lutomirski, lasse.collin, Andrew Morton, Dave Young,
	kernel-hardening, LKML, Linus Torvalds, Thomas Gleixner


* Kees Cook <keescook@chromium.org> wrote:

> From: Baoquan He <bhe@redhat.com>
> 
> When processing the relocation table, the offset used to calculate the
> relocation is an int. This is sufficient for calculating the physical
> address of the relocs entry on 32-bit systems and on 64-bit systems when
> the relocation is under 2G. To handle relocations above 2G (seen in
> situations like kexec, netboot, etc), this offset needs to be calculated
> using a long to avoid wrapping and miscalculating the relocation.
> 
> Signed-off-by: Baoquan He <bhe@redhat.com>
> [kees: rewrote changelog]
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---
>  arch/x86/boot/compressed/misc.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
> index f35ad9eb1bf1..c4477d5f3fff 100644
> --- a/arch/x86/boot/compressed/misc.c
> +++ b/arch/x86/boot/compressed/misc.c
> @@ -295,7 +295,7 @@ static void handle_relocations(void *output, unsigned long output_len)
>  	 * So we work backwards from the end of the decompressed image.
>  	 */
>  	for (reloc = output + output_len - sizeof(*reloc); *reloc; reloc--) {
> -		int extended = *reloc;
> +		long extended = *reloc;
>  		extended += map;
>  
>  		ptr = (unsigned long)extended;

This patch and the code it patches is just plain sloppy. See this cast? This is an 
object lesson of why type casts in C are actively dangerous, they have hidden the 
32-bit truncation bug you've fixed with this patch.

But the primary bug, the cast, should be fixed! Together with all the other casts 
of 'extended'.

Also, the boot code should be reviewed for unnecessary casts, it seems to be a 
disease:

 triton:~/tip> git grep -cE '\(unsigned.*;$' arch/x86/boot/compressed/
 arch/x86/boot/compressed/aslr.c:14
 arch/x86/boot/compressed/eboot.c:41
 arch/x86/boot/compressed/misc.c:5
 arch/x86/boot/compressed/misc.h:2
 arch/x86/boot/compressed/mkpiggy.c:1
 arch/x86/boot/compressed/string.c:4

For example the type dance and overloaded usage that choose_kernel_location() does 
with the 'random' local variable in aslr.c is disgusting:

void choose_kernel_location(unsigned char *input,
				unsigned long input_size,
				unsigned char **output,
				unsigned long output_size,
				unsigned char **virt_offset)
{
	unsigned long random, min_addr;

	*virt_offset = (unsigned char *)LOAD_PHYSICAL_ADDR;

#ifdef CONFIG_HIBERNATION
	if (!cmdline_find_option_bool("kaslr")) {
		debug_putstr("KASLR disabled by default...\n");
		return;
	}
#else
	if (cmdline_find_option_bool("nokaslr")) {
		debug_putstr("KASLR disabled by cmdline...\n");
		return;
	}
#endif

	real_mode->hdr.loadflags |= KASLR_FLAG;

	/* Record the various known unsafe memory ranges. */
	mem_avoid_init((unsigned long)input, input_size,
		       (unsigned long)*output);

	/* Low end should be the smaller of 512M or initial location. */
	min_addr = min((unsigned long)*output, 512UL << 20);

	/* Walk e820 and find a random address. */
	random = find_random_phy_addr(min_addr, output_size);
	if (!random)
		debug_putstr("KASLR could not find suitable E820 region...\n");
	else {
		if ((unsigned long)*output != random) {
			fill_pagetable(random, output_size);
			switch_pagetable();
			*output = (unsigned char *)random;
		}
	}

	/* Pick random virtual address starting from LOAD_PHYSICAL_ADDR. */
	if (IS_ENABLED(CONFIG_X86_64))
		random = find_random_virt_offset(LOAD_PHYSICAL_ADDR,
						 output_size);
	*virt_offset = (unsigned char *)random;
}

Firstly, 'random' is a libc function name. We generally don't overload those.

Secondly, it's a random what? Variable names should make it plenty obvious. So it 
should probably be named 'random_addr'.

Third:

	/* Walk e820 and find a random address. */
	random = find_random_phy_addr(min_addr, output_size);

yeah, so what that comment tells us we knew already, due to the function name! 
What the comment should _really_ talk about is the high level purpose. Something 
like: 'Walk the e820 map and find a random free RAM address to which we can still 
decompress the whole kernel' would work so much better ...

Fourth, this function has seven (!!) type casts. We can sure do better.

Fifth:

        /* Pick random virtual address starting from LOAD_PHYSICAL_ADDR. */
        if (IS_ENABLED(CONFIG_X86_64))
                random = find_random_virt_offset(LOAD_PHYSICAL_ADDR,
                                                 output_size);
        *virt_offset = (unsigned char *)random;

So the purpose of this whole function is to pick _two_ random addresses: the 
random physical address to place the kernel at, and on x86_64, to also randomize 
the kernel virtual address, right? So exactly which comment tells us that it's 
about this? Names like 'choose_kernel_location' are singular and are actively 
misleading about this ...

... and then I haven't even mentioned small details like the imbalanced curly 
braces.

This code sucks, and I'm not surprised at all that it was broken. It should be 
improved before we can feature-extend it.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v5 03/21] x86, KASLR: Drop CONFIG_RANDOMIZE_BASE_MAX_OFFSET
  2016-04-14 22:28   ` [kernel-hardening] " Kees Cook
@ 2016-04-15  8:07     ` Ingo Molnar
  -1 siblings, 0 replies; 80+ messages in thread
From: Ingo Molnar @ 2016-04-15  8:07 UTC (permalink / raw)
  To: Kees Cook
  Cc: Baoquan He, Yinghai Lu, Ard Biesheuvel, Matt Redfearn, x86,
	H. Peter Anvin, Ingo Molnar, Borislav Petkov, Vivek Goyal,
	Andy Lutomirski, lasse.collin, Andrew Morton, Dave Young,
	kernel-hardening, LKML, Linus Torvalds, Thomas Gleixner


* Kees Cook <keescook@chromium.org> wrote:

> From: Baoquan He <bhe@redhat.com>
> 
> Currently CONFIG_RANDOMIZE_BASE_MAX_OFFSET is used to limit the maximum
> offset for kernel randomization. This limit doesn't need to be a CONFIG
> since it is tied completely to KERNEL_IMAGE_SIZE, and will make no sense
> once physical and virtual offsets are randomized separately. This patch
> removes CONFIG_RANDOMIZE_BASE_MAX_OFFSET and consolidates the Kconfig
> help text.
> 
> Signed-off-by: Baoquan He <bhe@redhat.com>
> [kees: rewrote changelog, dropped KERNEL_IMAGE_SIZE_DEFAULT, moved earlier]
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---
>  arch/x86/Kconfig                     | 57 +++++++++++++-----------------------
>  arch/x86/boot/compressed/aslr.c      | 12 ++++----
>  arch/x86/include/asm/page_64_types.h |  8 ++---
>  arch/x86/mm/init_32.c                |  3 --
>  4 files changed, 29 insertions(+), 51 deletions(-)
> 
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 2dc18605831f..fd9ac711ada8 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -1936,51 +1936,36 @@ config RANDOMIZE_BASE
>  	depends on RELOCATABLE
>  	default n
>  	---help---
> -	   Randomizes the physical and virtual address at which the
> -	   kernel image is decompressed, as a security feature that
> -	   deters exploit attempts relying on knowledge of the location
> -	   of kernel internals.
> +	   Randomizes the physical address at which the kernel image
> +	   is decompressed and the virtual address where the kernel
> +	   image is mapped, as a secrurity feature that deters exploit

Guys, please _read_ what you write: s/secrurity/security

> +	   attempts relying on knowledge of the location of kernel
> +	   internals.
> +
> +	   The kernel physical address can be randomized from 16M to
> +	   64T at most.

The 64TB value sure reads weird if you are configuring a 32-bit system ...

A much better approach would be to split the help text into 32-bit and 64-bit 
portions:

  On 64-bit systems the kernel physical address will be randomized from 16M to the
  top of available physical memory. (With a maximum of 64TB.)

  On 32-bit systems the kernel physical address will be randomized from 16MB to
  1GB.

Also note the assertive tone: if this Kconfig feature is eanbled, we say that the 
kernel address _will_ be randomized, and we should make sure it is. (If for some 
weird reason randomization fails we should warn prominently during bootup.)


>                           The kernel virtual address will be offset
> +	   by up to KERNEL_IMAGE_SIZE. On 32-bit KERNEL_IMAGE_SIZE is
> +	   512MiB. while on 64-bit this is limited by how the kernel
> +	   fixmap page table is positioned, so this cannot be larger
> +	   than 1GiB currently. Without RANDOMIZE_BASE there is a 512MiB
> +	   to 1.5GiB split between kernel and modules. When RANDOMIZE_BASE
> +	   is enabled, the modules area will shrink to compensate, up
> +	   to a 1GiB to 1GiB split, KERNEL_IMAGE_SIZE changes from 512MiB
> +	   to 1GiB.

Beyond the broken capitalization, I'll show you what 99.999% of users who are not 
kernel hackers will understand from this paragraph, approximately:

                             To dream: ay, and them? To bear to sling afterprises 
            coil, and scover'd cowards of resolence dream: ay, the us for no mome 
            wish'd. Thus and sweary life, or nobles cast and makes, whips and that 
            is sicklied of resolence of so long afterprises us more; for whips 
            all; and name whething after bear to sleep; to beart-ache shocks the 
            undiscover'd consummative have, but that pith a sleep of somethe under 
            'tis the spurns of troud makes off thance doth make whose bourns of 
            dispriz'd consient and arms more.

So this is really deep kernel internals, I get a headache trying to interpret it, 
and it's my job to interpret this! Please try to formulate key pieces of 
information in Kconfig help texts in a more ... approachable fashion, and move the 
jargon to .c source code files.

>  	   Entropy is generated using the RDRAND instruction if it is
>  	   supported. If RDTSC is supported, it is used as well. If
>  	   neither RDRAND nor RDTSC are supported, then randomness is
>  	   read from the i8254 timer.

Also, instead of 'used as well' I'd say "is mixed into the entropy pool as well" 
or so, to make sure it's clear that we don't exclusively rely on RDRAND or RDTSC.

Also, could we always mix the i8254 timer into this as well, not just when RDTSC 
is unavailable?


> -	   The kernel will be offset by up to RANDOMIZE_BASE_MAX_OFFSET,
> -	   and aligned according to PHYSICAL_ALIGN. Since the kernel is
> -	   built using 2GiB addressing, and PHYSICAL_ALGIN must be at a
> -	   minimum of 2MiB, only 10 bits of entropy is theoretically
> -	   possible. At best, due to page table layouts, 64-bit can use
> -	   9 bits of entropy and 32-bit uses 8 bits.
> +	   Since the kernel is built using 2GiB addressing, and
> +	   PHYSICAL_ALGIN must be at a minimum of 2MiB, only 10 bits of
> +	   entropy is theoretically possible. At best, due to page table
> +	   layouts, 64-bit can use 9 bits of entropy and 32-bit uses 8
> +	   bits.

Please read what you write, there's a typo in this section.

Another request: please stop the MiB/GiB nonsense and call it MB/GB. This isn't 
storage code that has to fight marketing lies. Only the KASLR section in 
arch/x86/Kconfig* is using MiB/GiB, everything else uses MB/GB naming, we should 
stick with that.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [kernel-hardening] Re: [PATCH v5 03/21] x86, KASLR: Drop CONFIG_RANDOMIZE_BASE_MAX_OFFSET
@ 2016-04-15  8:07     ` Ingo Molnar
  0 siblings, 0 replies; 80+ messages in thread
From: Ingo Molnar @ 2016-04-15  8:07 UTC (permalink / raw)
  To: Kees Cook
  Cc: Baoquan He, Yinghai Lu, Ard Biesheuvel, Matt Redfearn, x86,
	H. Peter Anvin, Ingo Molnar, Borislav Petkov, Vivek Goyal,
	Andy Lutomirski, lasse.collin, Andrew Morton, Dave Young,
	kernel-hardening, LKML, Linus Torvalds, Thomas Gleixner


* Kees Cook <keescook@chromium.org> wrote:

> From: Baoquan He <bhe@redhat.com>
> 
> Currently CONFIG_RANDOMIZE_BASE_MAX_OFFSET is used to limit the maximum
> offset for kernel randomization. This limit doesn't need to be a CONFIG
> since it is tied completely to KERNEL_IMAGE_SIZE, and will make no sense
> once physical and virtual offsets are randomized separately. This patch
> removes CONFIG_RANDOMIZE_BASE_MAX_OFFSET and consolidates the Kconfig
> help text.
> 
> Signed-off-by: Baoquan He <bhe@redhat.com>
> [kees: rewrote changelog, dropped KERNEL_IMAGE_SIZE_DEFAULT, moved earlier]
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---
>  arch/x86/Kconfig                     | 57 +++++++++++++-----------------------
>  arch/x86/boot/compressed/aslr.c      | 12 ++++----
>  arch/x86/include/asm/page_64_types.h |  8 ++---
>  arch/x86/mm/init_32.c                |  3 --
>  4 files changed, 29 insertions(+), 51 deletions(-)
> 
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 2dc18605831f..fd9ac711ada8 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -1936,51 +1936,36 @@ config RANDOMIZE_BASE
>  	depends on RELOCATABLE
>  	default n
>  	---help---
> -	   Randomizes the physical and virtual address at which the
> -	   kernel image is decompressed, as a security feature that
> -	   deters exploit attempts relying on knowledge of the location
> -	   of kernel internals.
> +	   Randomizes the physical address at which the kernel image
> +	   is decompressed and the virtual address where the kernel
> +	   image is mapped, as a secrurity feature that deters exploit

Guys, please _read_ what you write: s/secrurity/security

> +	   attempts relying on knowledge of the location of kernel
> +	   internals.
> +
> +	   The kernel physical address can be randomized from 16M to
> +	   64T at most.

The 64TB value sure reads weird if you are configuring a 32-bit system ...

A much better approach would be to split the help text into 32-bit and 64-bit 
portions:

  On 64-bit systems the kernel physical address will be randomized from 16M to the
  top of available physical memory. (With a maximum of 64TB.)

  On 32-bit systems the kernel physical address will be randomized from 16MB to
  1GB.

Also note the assertive tone: if this Kconfig feature is eanbled, we say that the 
kernel address _will_ be randomized, and we should make sure it is. (If for some 
weird reason randomization fails we should warn prominently during bootup.)


>                           The kernel virtual address will be offset
> +	   by up to KERNEL_IMAGE_SIZE. On 32-bit KERNEL_IMAGE_SIZE is
> +	   512MiB. while on 64-bit this is limited by how the kernel
> +	   fixmap page table is positioned, so this cannot be larger
> +	   than 1GiB currently. Without RANDOMIZE_BASE there is a 512MiB
> +	   to 1.5GiB split between kernel and modules. When RANDOMIZE_BASE
> +	   is enabled, the modules area will shrink to compensate, up
> +	   to a 1GiB to 1GiB split, KERNEL_IMAGE_SIZE changes from 512MiB
> +	   to 1GiB.

Beyond the broken capitalization, I'll show you what 99.999% of users who are not 
kernel hackers will understand from this paragraph, approximately:

                             To dream: ay, and them? To bear to sling afterprises 
            coil, and scover'd cowards of resolence dream: ay, the us for no mome 
            wish'd. Thus and sweary life, or nobles cast and makes, whips and that 
            is sicklied of resolence of so long afterprises us more; for whips 
            all; and name whething after bear to sleep; to beart-ache shocks the 
            undiscover'd consummative have, but that pith a sleep of somethe under 
            'tis the spurns of troud makes off thance doth make whose bourns of 
            dispriz'd consient and arms more.

So this is really deep kernel internals, I get a headache trying to interpret it, 
and it's my job to interpret this! Please try to formulate key pieces of 
information in Kconfig help texts in a more ... approachable fashion, and move the 
jargon to .c source code files.

>  	   Entropy is generated using the RDRAND instruction if it is
>  	   supported. If RDTSC is supported, it is used as well. If
>  	   neither RDRAND nor RDTSC are supported, then randomness is
>  	   read from the i8254 timer.

Also, instead of 'used as well' I'd say "is mixed into the entropy pool as well" 
or so, to make sure it's clear that we don't exclusively rely on RDRAND or RDTSC.

Also, could we always mix the i8254 timer into this as well, not just when RDTSC 
is unavailable?


> -	   The kernel will be offset by up to RANDOMIZE_BASE_MAX_OFFSET,
> -	   and aligned according to PHYSICAL_ALIGN. Since the kernel is
> -	   built using 2GiB addressing, and PHYSICAL_ALGIN must be at a
> -	   minimum of 2MiB, only 10 bits of entropy is theoretically
> -	   possible. At best, due to page table layouts, 64-bit can use
> -	   9 bits of entropy and 32-bit uses 8 bits.
> +	   Since the kernel is built using 2GiB addressing, and
> +	   PHYSICAL_ALGIN must be at a minimum of 2MiB, only 10 bits of
> +	   entropy is theoretically possible. At best, due to page table
> +	   layouts, 64-bit can use 9 bits of entropy and 32-bit uses 8
> +	   bits.

Please read what you write, there's a typo in this section.

Another request: please stop the MiB/GiB nonsense and call it MB/GB. This isn't 
storage code that has to fight marketing lies. Only the KASLR section in 
arch/x86/Kconfig* is using MiB/GiB, everything else uses MB/GB naming, we should 
stick with that.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v5 04/21] x86, boot: Move compressed kernel to end of decompression buffer
  2016-04-14 22:28   ` [kernel-hardening] " Kees Cook
@ 2016-04-15  8:09     ` Ingo Molnar
  -1 siblings, 0 replies; 80+ messages in thread
From: Ingo Molnar @ 2016-04-15  8:09 UTC (permalink / raw)
  To: Kees Cook
  Cc: Yinghai Lu, Baoquan He, Ard Biesheuvel, Matt Redfearn, x86,
	H. Peter Anvin, Ingo Molnar, Borislav Petkov, Vivek Goyal,
	Andy Lutomirski, lasse.collin, Andrew Morton, Dave Young,
	kernel-hardening, LKML


* Kees Cook <keescook@chromium.org> wrote:

> From: Yinghai Lu <yinghai@kernel.org>
> 
> This change makes later calculations about where the kernel is located
> easier to reason about. To better understand this change, we must first
> clarify what VO and ZO are. They were introduced in commits by hpa:
> 
> 77d1a49 x86, boot: make symbols from the main vmlinux available
> 37ba7ab x86, boot: make kernel_alignment adjustable; new bzImage fields
> 
> Specifically:
> 
> VO:
> - uncompressed kernel image
> - size: VO__end - VO__text ("VO_INIT_SIZE" define)
> 
> ZO:
> - bootable compressed kernel image (boot/compressed/vmlinux)
> - head text + compressed kernel (VO and relocs table) + decompressor code
> - size: ZO__end - ZO_startup_32 ("ZO_INIT_SIZE" define, though see below)
> 
> The INIT_SIZE definition is used to find the larger of the two image sizes:
> 
>  #define ZO_INIT_SIZE    (ZO__end - ZO_startup_32 + ZO_z_extract_offset)
>  #define VO_INIT_SIZE    (VO__end - VO__text)
>  #if ZO_INIT_SIZE > VO_INIT_SIZE
>  #define INIT_SIZE ZO_INIT_SIZE
>  #else
>  #define INIT_SIZE VO_INIT_SIZE
>  #endif

Please also harmonize all the prefixes, i.e. use VO__/ZO__ everywhere (rename 
things where necessary), instead of this mixed up VO_/ZO_/VO__/ZO__ mess.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [kernel-hardening] Re: [PATCH v5 04/21] x86, boot: Move compressed kernel to end of decompression buffer
@ 2016-04-15  8:09     ` Ingo Molnar
  0 siblings, 0 replies; 80+ messages in thread
From: Ingo Molnar @ 2016-04-15  8:09 UTC (permalink / raw)
  To: Kees Cook
  Cc: Yinghai Lu, Baoquan He, Ard Biesheuvel, Matt Redfearn, x86,
	H. Peter Anvin, Ingo Molnar, Borislav Petkov, Vivek Goyal,
	Andy Lutomirski, lasse.collin, Andrew Morton, Dave Young,
	kernel-hardening, LKML


* Kees Cook <keescook@chromium.org> wrote:

> From: Yinghai Lu <yinghai@kernel.org>
> 
> This change makes later calculations about where the kernel is located
> easier to reason about. To better understand this change, we must first
> clarify what VO and ZO are. They were introduced in commits by hpa:
> 
> 77d1a49 x86, boot: make symbols from the main vmlinux available
> 37ba7ab x86, boot: make kernel_alignment adjustable; new bzImage fields
> 
> Specifically:
> 
> VO:
> - uncompressed kernel image
> - size: VO__end - VO__text ("VO_INIT_SIZE" define)
> 
> ZO:
> - bootable compressed kernel image (boot/compressed/vmlinux)
> - head text + compressed kernel (VO and relocs table) + decompressor code
> - size: ZO__end - ZO_startup_32 ("ZO_INIT_SIZE" define, though see below)
> 
> The INIT_SIZE definition is used to find the larger of the two image sizes:
> 
>  #define ZO_INIT_SIZE    (ZO__end - ZO_startup_32 + ZO_z_extract_offset)
>  #define VO_INIT_SIZE    (VO__end - VO__text)
>  #if ZO_INIT_SIZE > VO_INIT_SIZE
>  #define INIT_SIZE ZO_INIT_SIZE
>  #else
>  #define INIT_SIZE VO_INIT_SIZE
>  #endif

Please also harmonize all the prefixes, i.e. use VO__/ZO__ everywhere (rename 
things where necessary), instead of this mixed up VO_/ZO_/VO__/ZO__ mess.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v5 05/21] x86, boot: Calculate decompression size during boot not build
  2016-04-14 22:28   ` [kernel-hardening] " Kees Cook
@ 2016-04-15  8:12     ` Ingo Molnar
  -1 siblings, 0 replies; 80+ messages in thread
From: Ingo Molnar @ 2016-04-15  8:12 UTC (permalink / raw)
  To: Kees Cook
  Cc: Yinghai Lu, Baoquan He, Ard Biesheuvel, Matt Redfearn, x86,
	H. Peter Anvin, Ingo Molnar, Borislav Petkov, Vivek Goyal,
	Andy Lutomirski, lasse.collin, Andrew Morton, Dave Young,
	kernel-hardening, LKML


* Kees Cook <keescook@chromium.org> wrote:

> From: Yinghai Lu <yinghai@kernel.org>
> 
> Currently z_extract_offset is calculated in boot/compressed/mkpiggy.c.

What is the high level meaning of z_extract_offset? I cannot tell without reading 
the code - and the point of changelogs is for them to be readable without having 
to read the code.

The rest of the changelog is pretty good.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [kernel-hardening] Re: [PATCH v5 05/21] x86, boot: Calculate decompression size during boot not build
@ 2016-04-15  8:12     ` Ingo Molnar
  0 siblings, 0 replies; 80+ messages in thread
From: Ingo Molnar @ 2016-04-15  8:12 UTC (permalink / raw)
  To: Kees Cook
  Cc: Yinghai Lu, Baoquan He, Ard Biesheuvel, Matt Redfearn, x86,
	H. Peter Anvin, Ingo Molnar, Borislav Petkov, Vivek Goyal,
	Andy Lutomirski, lasse.collin, Andrew Morton, Dave Young,
	kernel-hardening, LKML


* Kees Cook <keescook@chromium.org> wrote:

> From: Yinghai Lu <yinghai@kernel.org>
> 
> Currently z_extract_offset is calculated in boot/compressed/mkpiggy.c.

What is the high level meaning of z_extract_offset? I cannot tell without reading 
the code - and the point of changelogs is for them to be readable without having 
to read the code.

The rest of the changelog is pretty good.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v5 07/21] x86, boot: Fix run_size calculation
  2016-04-14 22:29   ` [kernel-hardening] " Kees Cook
@ 2016-04-15  8:31     ` Ingo Molnar
  -1 siblings, 0 replies; 80+ messages in thread
From: Ingo Molnar @ 2016-04-15  8:31 UTC (permalink / raw)
  To: Kees Cook
  Cc: Yinghai Lu, Junjie Mao, Josh Triplett, Andrew Morton, Baoquan He,
	Ard Biesheuvel, Matt Redfearn, x86, H. Peter Anvin, Ingo Molnar,
	Borislav Petkov, Vivek Goyal, Andy Lutomirski, lasse.collin,
	Dave Young, kernel-hardening, LKML


* Kees Cook <keescook@chromium.org> wrote:

> From: Yinghai Lu <yinghai@kernel.org>
> 
> Currently, the kernel run_size (size of code plus brk and bss) is
> calculated via the shell script arch/x86/tools/calc_run_size.sh.
> It gets the file offset and mem size of for the .bss and .brk sections in 
> vmlinux, and adds them as follows:

'of for'?

> So, from this we can see that the existing run_size calculation is
> 0x400000 too high. 

Btw., why is it too high? Were some sections discarded?

> [...] And, as it turns out, the correct run_size is
> actually equal to VO_end - VO_text, which is certainly easier to calculate.
> _end: 0xffffffff8205f000
> _text:0xffffffff81000000
> 
> 0xffffffff8205f000 - 0xffffffff81000000 = 0x105f000
> 
> As a result, run_size is a simple constant, so we don't need to pass it
> around; we already have voffset.h such things.

(Typo.)

> [...] We can share voffset.h
> between misc.c and header.S instead of getting run_size in other ways.
> This patch moves voffset.h creation code to boot/compressed/Makefile,
> and switches misc.c to use the VO_end - VO_text calculation.

Btw., 'run_size' is a pretty meaningless name, it doesn't really tell us the most 
important thing: that this is the complete size of the uncompressed kernel, the 
total physically continuous size we need to allocate to it so it can execute?

So can we rename it to something more expressive, such as kernel_total_size or so?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [kernel-hardening] Re: [PATCH v5 07/21] x86, boot: Fix run_size calculation
@ 2016-04-15  8:31     ` Ingo Molnar
  0 siblings, 0 replies; 80+ messages in thread
From: Ingo Molnar @ 2016-04-15  8:31 UTC (permalink / raw)
  To: Kees Cook
  Cc: Yinghai Lu, Junjie Mao, Josh Triplett, Andrew Morton, Baoquan He,
	Ard Biesheuvel, Matt Redfearn, x86, H. Peter Anvin, Ingo Molnar,
	Borislav Petkov, Vivek Goyal, Andy Lutomirski, lasse.collin,
	Dave Young, kernel-hardening, LKML


* Kees Cook <keescook@chromium.org> wrote:

> From: Yinghai Lu <yinghai@kernel.org>
> 
> Currently, the kernel run_size (size of code plus brk and bss) is
> calculated via the shell script arch/x86/tools/calc_run_size.sh.
> It gets the file offset and mem size of for the .bss and .brk sections in 
> vmlinux, and adds them as follows:

'of for'?

> So, from this we can see that the existing run_size calculation is
> 0x400000 too high. 

Btw., why is it too high? Were some sections discarded?

> [...] And, as it turns out, the correct run_size is
> actually equal to VO_end - VO_text, which is certainly easier to calculate.
> _end: 0xffffffff8205f000
> _text:0xffffffff81000000
> 
> 0xffffffff8205f000 - 0xffffffff81000000 = 0x105f000
> 
> As a result, run_size is a simple constant, so we don't need to pass it
> around; we already have voffset.h such things.

(Typo.)

> [...] We can share voffset.h
> between misc.c and header.S instead of getting run_size in other ways.
> This patch moves voffset.h creation code to boot/compressed/Makefile,
> and switches misc.c to use the VO_end - VO_text calculation.

Btw., 'run_size' is a pretty meaningless name, it doesn't really tell us the most 
important thing: that this is the complete size of the uncompressed kernel, the 
total physically continuous size we need to allocate to it so it can execute?

So can we rename it to something more expressive, such as kernel_total_size or so?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v5 04/21] x86, boot: Move compressed kernel to end of decompression buffer
  2016-04-14 22:28   ` [kernel-hardening] " Kees Cook
@ 2016-04-15  9:05     ` Ingo Molnar
  -1 siblings, 0 replies; 80+ messages in thread
From: Ingo Molnar @ 2016-04-15  9:05 UTC (permalink / raw)
  To: Kees Cook
  Cc: Yinghai Lu, Baoquan He, Ard Biesheuvel, Matt Redfearn, x86,
	H. Peter Anvin, Ingo Molnar, Borislav Petkov, Vivek Goyal,
	Andy Lutomirski, lasse.collin, Andrew Morton, Dave Young,
	kernel-hardening, LKML


* Kees Cook <keescook@chromium.org> wrote:

> When INIT_SIZE is bigger than VO_INIT_SIZE (uncommon but possible),
> the copied ZO occupies the memory from extract_offset to the end of
> decompression buffer. It overlaps with the soon-to-be-uncompressed kernel
> like this:
> 
>                             |-----compressed kernel image------|
>                             V                                  V
> 0                       extract_offset                      +INIT_SIZE
> |-----------|---------------|-------------------------|--------|
>             |               |                         |        |
>           VO__text      startup_32 of ZO          VO__end    ZO__end
>             ^                                         ^
>             |-------uncompressed kernel image---------|
> 
> When INIT_SIZE is equal to VO_INIT_SIZE (likely) there's still space
> left from end of ZO to the end of decompressing buffer, like below.
> 
>                             |-compressed kernel image-|
>                             V                         V
> 0                       extract_offset                      +INIT_SIZE
> |-----------|---------------|-------------------------|--------|
>             |               |                         |        |
>           VO__text      startup_32 of ZO          ZO__end    VO__end
>             ^                                                  ^
>             |------------uncompressed kernel image-------------|
> 
> To simplify calculations and avoid special cases, it is cleaner to
> always place the compressed kernel image in memory so that ZO__end
> is at the end of the decompression buffer, instead of placing that
> start extract_offset as is currently done.

Btw., it would be nice to also put such a visualization (of the current layout of 
these values) into the code itself.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [kernel-hardening] Re: [PATCH v5 04/21] x86, boot: Move compressed kernel to end of decompression buffer
@ 2016-04-15  9:05     ` Ingo Molnar
  0 siblings, 0 replies; 80+ messages in thread
From: Ingo Molnar @ 2016-04-15  9:05 UTC (permalink / raw)
  To: Kees Cook
  Cc: Yinghai Lu, Baoquan He, Ard Biesheuvel, Matt Redfearn, x86,
	H. Peter Anvin, Ingo Molnar, Borislav Petkov, Vivek Goyal,
	Andy Lutomirski, lasse.collin, Andrew Morton, Dave Young,
	kernel-hardening, LKML


* Kees Cook <keescook@chromium.org> wrote:

> When INIT_SIZE is bigger than VO_INIT_SIZE (uncommon but possible),
> the copied ZO occupies the memory from extract_offset to the end of
> decompression buffer. It overlaps with the soon-to-be-uncompressed kernel
> like this:
> 
>                             |-----compressed kernel image------|
>                             V                                  V
> 0                       extract_offset                      +INIT_SIZE
> |-----------|---------------|-------------------------|--------|
>             |               |                         |        |
>           VO__text      startup_32 of ZO          VO__end    ZO__end
>             ^                                         ^
>             |-------uncompressed kernel image---------|
> 
> When INIT_SIZE is equal to VO_INIT_SIZE (likely) there's still space
> left from end of ZO to the end of decompressing buffer, like below.
> 
>                             |-compressed kernel image-|
>                             V                         V
> 0                       extract_offset                      +INIT_SIZE
> |-----------|---------------|-------------------------|--------|
>             |               |                         |        |
>           VO__text      startup_32 of ZO          ZO__end    VO__end
>             ^                                                  ^
>             |------------uncompressed kernel image-------------|
> 
> To simplify calculations and avoid special cases, it is cleaner to
> always place the compressed kernel image in memory so that ZO__end
> is at the end of the decompression buffer, instead of placing that
> start extract_offset as is currently done.

Btw., it would be nice to also put such a visualization (of the current layout of 
these values) into the code itself.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v5 13/21] x86, boot: Report overlap failures in memcpy
  2016-04-14 22:29   ` [kernel-hardening] " Kees Cook
@ 2016-04-15 14:42     ` Lasse Collin
  -1 siblings, 0 replies; 80+ messages in thread
From: Lasse Collin @ 2016-04-15 14:42 UTC (permalink / raw)
  To: Kees Cook
  Cc: Ingo Molnar, Yinghai Lu, Baoquan He, Ard Biesheuvel,
	Matt Redfearn, x86, H. Peter Anvin, Ingo Molnar, Borislav Petkov,
	Vivek Goyal, Andy Lutomirski, Andrew Morton, Dave Young,
	kernel-hardening, LKML

On 2016-04-14 Kees Cook wrote:
> From: Yinghai Lu <yinghai@kernel.org>
> 
> parse_elf is using a local memcpy to move sections to their final
> running position. However, this memcpy only supports non-overlapping
> arguments (or dest < src).

The same copy of memcpy is used by the decompressors too.

> To avoid future hard-to-debug surprises, this adds checking in memcpy
> to detect the unhandled condition (which should not be happening
> currently).

It's already a minor surprise that memcpy is expected to work for
overlapping buffers at all. It could be good to have a comment about
it because "scroll" and parse_elf seem to rely on it.

On the other hand, the new code and error message take quite a few bytes
of space, so a complete memmove can be smaller:

void *memmove(void *dest, const void *src, size_t n)
{
	unsigned char *d = dest;
	const unsigned char *s = src;

	if (d <= s || d - s >= n)
		return __memcpy(dest, src, n);

	while (n-- > 0)
		d[n] = s[n];

	return dest;
}

Note that memmove is needed by lib/decompress_unxz.c. It contains its
own small version inside a "#ifndef memmove" block. That #ifndef should
be taken into account when adding a memmove symbol. Changing
decompress_unxz.c is fine but then one needs to think about other
archs too.

-- 
Lasse Collin  |  IRC: Larhzu @ IRCnet & Freenode

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [kernel-hardening] Re: [PATCH v5 13/21] x86, boot: Report overlap failures in memcpy
@ 2016-04-15 14:42     ` Lasse Collin
  0 siblings, 0 replies; 80+ messages in thread
From: Lasse Collin @ 2016-04-15 14:42 UTC (permalink / raw)
  To: Kees Cook
  Cc: Ingo Molnar, Yinghai Lu, Baoquan He, Ard Biesheuvel,
	Matt Redfearn, x86, H. Peter Anvin, Ingo Molnar, Borislav Petkov,
	Vivek Goyal, Andy Lutomirski, Andrew Morton, Dave Young,
	kernel-hardening, LKML

On 2016-04-14 Kees Cook wrote:
> From: Yinghai Lu <yinghai@kernel.org>
> 
> parse_elf is using a local memcpy to move sections to their final
> running position. However, this memcpy only supports non-overlapping
> arguments (or dest < src).

The same copy of memcpy is used by the decompressors too.

> To avoid future hard-to-debug surprises, this adds checking in memcpy
> to detect the unhandled condition (which should not be happening
> currently).

It's already a minor surprise that memcpy is expected to work for
overlapping buffers at all. It could be good to have a comment about
it because "scroll" and parse_elf seem to rely on it.

On the other hand, the new code and error message take quite a few bytes
of space, so a complete memmove can be smaller:

void *memmove(void *dest, const void *src, size_t n)
{
	unsigned char *d = dest;
	const unsigned char *s = src;

	if (d <= s || d - s >= n)
		return __memcpy(dest, src, n);

	while (n-- > 0)
		d[n] = s[n];

	return dest;
}

Note that memmove is needed by lib/decompress_unxz.c. It contains its
own small version inside a "#ifndef memmove" block. That #ifndef should
be taken into account when adding a memmove symbol. Changing
decompress_unxz.c is fine but then one needs to think about other
archs too.

-- 
Lasse Collin  |  IRC: Larhzu @ IRCnet & Freenode

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v5 06/21] x86, KASLR: Update description for decompressor worst case size
  2016-04-14 22:28   ` [kernel-hardening] " Kees Cook
@ 2016-04-15 16:17     ` Lasse Collin
  -1 siblings, 0 replies; 80+ messages in thread
From: Lasse Collin @ 2016-04-15 16:17 UTC (permalink / raw)
  To: Kees Cook
  Cc: Ingo Molnar, Baoquan He, Yinghai Lu, Ard Biesheuvel,
	Matt Redfearn, x86, H. Peter Anvin, Ingo Molnar, Borislav Petkov,
	Vivek Goyal, Andy Lutomirski, Andrew Morton, Dave Young,
	kernel-hardening, LKML

On 2016-04-14 Kees Cook wrote:
> + * Above analysis is for decompressing gzip compressed kernel only. Up to
> + * now 6 different decompressor are supported all together.

There are six decompressors in Linux now, but the number can change and
the comment become outdated, so I suggest omitting the exact number here.

> And among them
> + * xz stores data in chunks and has maximum chunk of 64K. Hence safety
> + * margin should be updated to cover all decompressors so that we don't
> + * need to deal with each of them separately. Please check
> + * the description in lib/decompressor_xxx.c for specific information.
> + *
> + * extra_bytes = (uncompressed_size >> 12) + 65536 + 128.

lib/decompress_xxx.c, not lib/decompressor_xxx.c.

Referring to those files is problematic because only decompress_unxz.c
talks about the safety margin. The explanation of the safety margin for
gzip is still in misc.c instead of decompress_inflate.c.

I suspect that safety margin calculations haven't been made for other
compressors in Linux, so there is nothing that people could read for
more information. At least such information isn't present in the
comments or commit messages.

For example (and a bit off-topic), there is a possible sign of too small
safety margin in decompress_unlzo.c, where a memcpy call can get
overlapping dest and src buffers:

    /* When the input data is not compressed at all,
     * lzo1x_decompress_safe will fail, so call memcpy()
     * instead */
    if (unlikely(dst_len == src_len))
            memcpy(out_buf, in_buf, src_len);

The overlap can only happen if there's enough incompressible data near
the end of the kernel image. It still works in practice as long as
memcpy works with overlapping buffers for dest < src.

-- 
Lasse Collin  |  IRC: Larhzu @ IRCnet & Freenode

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [kernel-hardening] Re: [PATCH v5 06/21] x86, KASLR: Update description for decompressor worst case size
@ 2016-04-15 16:17     ` Lasse Collin
  0 siblings, 0 replies; 80+ messages in thread
From: Lasse Collin @ 2016-04-15 16:17 UTC (permalink / raw)
  To: Kees Cook
  Cc: Ingo Molnar, Baoquan He, Yinghai Lu, Ard Biesheuvel,
	Matt Redfearn, x86, H. Peter Anvin, Ingo Molnar, Borislav Petkov,
	Vivek Goyal, Andy Lutomirski, Andrew Morton, Dave Young,
	kernel-hardening, LKML

On 2016-04-14 Kees Cook wrote:
> + * Above analysis is for decompressing gzip compressed kernel only. Up to
> + * now 6 different decompressor are supported all together.

There are six decompressors in Linux now, but the number can change and
the comment become outdated, so I suggest omitting the exact number here.

> And among them
> + * xz stores data in chunks and has maximum chunk of 64K. Hence safety
> + * margin should be updated to cover all decompressors so that we don't
> + * need to deal with each of them separately. Please check
> + * the description in lib/decompressor_xxx.c for specific information.
> + *
> + * extra_bytes = (uncompressed_size >> 12) + 65536 + 128.

lib/decompress_xxx.c, not lib/decompressor_xxx.c.

Referring to those files is problematic because only decompress_unxz.c
talks about the safety margin. The explanation of the safety margin for
gzip is still in misc.c instead of decompress_inflate.c.

I suspect that safety margin calculations haven't been made for other
compressors in Linux, so there is nothing that people could read for
more information. At least such information isn't present in the
comments or commit messages.

For example (and a bit off-topic), there is a possible sign of too small
safety margin in decompress_unlzo.c, where a memcpy call can get
overlapping dest and src buffers:

    /* When the input data is not compressed at all,
     * lzo1x_decompress_safe will fail, so call memcpy()
     * instead */
    if (unlikely(dst_len == src_len))
            memcpy(out_buf, in_buf, src_len);

The overlap can only happen if there's enough incompressible data near
the end of the kernel image. It still works in practice as long as
memcpy works with overlapping buffers for dest < src.

-- 
Lasse Collin  |  IRC: Larhzu @ IRCnet & Freenode

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v5 01/21] x86, KASLR: Remove unneeded boot_params argument
  2016-04-15  7:29     ` [kernel-hardening] " Ingo Molnar
@ 2016-04-15 18:55       ` Kees Cook
  -1 siblings, 0 replies; 80+ messages in thread
From: Kees Cook @ 2016-04-15 18:55 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Yinghai Lu, Baoquan He, Ard Biesheuvel, Matt Redfearn, x86,
	H. Peter Anvin, Ingo Molnar, Borislav Petkov, Vivek Goyal,
	Andy Lutomirski, lasse.collin, Andrew Morton, Dave Young,
	kernel-hardening, LKML, Linus Torvalds, Thomas Gleixner

On Fri, Apr 15, 2016 at 12:29 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Kees Cook <keescook@chromium.org> wrote:
>
>> From: Yinghai Lu <yinghai@kernel.org>
>>
>> Since the boot_params can be found using the real_mode global variable, there is
>> no need to pass around a pointer to it. This slightly simplifies the
>> choose_kernel_location function and its callers.
>
> Yeah, so I really wanted to apply this series because the changelogs are now very
> nice, but got held up by the very first patch again ...

Thanks for reading through everything! I'll be going through it all
and fixing up what you mentioned (along with anything else that jumps
out at me).

> Guys, 'real_mode' totally sucks as a variable name!

In our defense, that variable name predates git history. ;)
Regardless, I'll get it renamed.

> By removing a seemingly unnecessary parameter, you made the code actively worse to
> read...
>
> So please make a second patch on top of this one that renames 'real_mode' to
> something that both displays what it is about (it sure isn't mainly about real
> mode only!!), and also expresses that it's a global variable, not some local
> function parameter.
>
> In arch/x86/ we usually achieve that via prefixing it with x86_ or so. So
> something like 'x86_boot_params' would work for me. I realize that this is
> somewhat special, because this is pre-relocation code so we really have to be
> careful about pointers - so maybe name it x86_boot_params_rm or so.
>
> Also, I had a look at the whole file you are patching,
> arch/x86/boot/compressed/misc.c, and for heaven's sake, please first improve the
> comments on top of that file before further complicating that code with KASLR
> details! Right now it consists of various low level ramblings choke full of typos.
> No mention of where KASLR fits into the picture at all!
>
> Right now it says:
>
>  * misc.c
>  *
>  * This is a collection of several routines from gzip-1.0.3
>  * adapted for Linux.
>
> Which was perhaps true 15 years ago, but sure it's not true today! We do
> relocation and KASLR processing in this path as well, which is just as important
> as running the decompressor. Also, gzip is not the only decompressor we support.
>
> The high level purpose of the code in this file should be explained first, and the
> file should probably be renamed to something like extract.c - to signal that this
> file is about extracting a kernel image and preparing it for execution (i.e.
> relinking it, etc.). It's (much!) more than just pure decompression.
>
> Likewise, decompress_kernel() should be renamed to extract_kernel().

I'll get these all cleaned up. Yay comment-bit-rot. :)

> The low level explanations in extract.c should be moved to the function that is
> affected by those details. The top of the file should only contain general
> explanations, a high level description of in what state the kernel is when we call
> this code, and the sort of stuff we do in that code.
>
> The credits should be collected into a single, coherent block. Unnecessary fluff
> should be cut.
>
> Furthermore, I see similar problems with arch/x86/boot/compressed/aslr.c: for
> example it has no high level description at the top of the file _at all_. WTF is
> this with writing security-sensitive code that has no high level design
> description whatsoever?? Also, why is it named aslr.c, why not kaslr.c? We do have
> both KASLR and ASLR code in the kernel, confusing naming on the source code level
> sure does not help.
>
> Also, what's this thing about choose_kernel_location()? That name is actively
> hiding the fact that the main purpose of that function is to randomize things ...
>
> It should be named x86_randomize_kernel_address() or so.

When naming this I wanted to make it clear that the only thing it was
doing was picking a location. The move is done by the decompressor.
I'll try to clarify the name of the function.

> We need to stop making a mess of the x86 boot code! This is a highly critical
> piece of kernel code that every single x86 Linux user will execute, we should
> improve its visual presentation and general readability to the level of being
> proud of it ...

I totally agree. :) There is a bit of a culture of "don't change it
just to rename things" which I try to avoid colliding with. In this
case, though, they would all seriously improve readability.

-Kees

-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [kernel-hardening] Re: [PATCH v5 01/21] x86, KASLR: Remove unneeded boot_params argument
@ 2016-04-15 18:55       ` Kees Cook
  0 siblings, 0 replies; 80+ messages in thread
From: Kees Cook @ 2016-04-15 18:55 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Yinghai Lu, Baoquan He, Ard Biesheuvel, Matt Redfearn, x86,
	H. Peter Anvin, Ingo Molnar, Borislav Petkov, Vivek Goyal,
	Andy Lutomirski, lasse.collin, Andrew Morton, Dave Young,
	kernel-hardening, LKML, Linus Torvalds, Thomas Gleixner

On Fri, Apr 15, 2016 at 12:29 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Kees Cook <keescook@chromium.org> wrote:
>
>> From: Yinghai Lu <yinghai@kernel.org>
>>
>> Since the boot_params can be found using the real_mode global variable, there is
>> no need to pass around a pointer to it. This slightly simplifies the
>> choose_kernel_location function and its callers.
>
> Yeah, so I really wanted to apply this series because the changelogs are now very
> nice, but got held up by the very first patch again ...

Thanks for reading through everything! I'll be going through it all
and fixing up what you mentioned (along with anything else that jumps
out at me).

> Guys, 'real_mode' totally sucks as a variable name!

In our defense, that variable name predates git history. ;)
Regardless, I'll get it renamed.

> By removing a seemingly unnecessary parameter, you made the code actively worse to
> read...
>
> So please make a second patch on top of this one that renames 'real_mode' to
> something that both displays what it is about (it sure isn't mainly about real
> mode only!!), and also expresses that it's a global variable, not some local
> function parameter.
>
> In arch/x86/ we usually achieve that via prefixing it with x86_ or so. So
> something like 'x86_boot_params' would work for me. I realize that this is
> somewhat special, because this is pre-relocation code so we really have to be
> careful about pointers - so maybe name it x86_boot_params_rm or so.
>
> Also, I had a look at the whole file you are patching,
> arch/x86/boot/compressed/misc.c, and for heaven's sake, please first improve the
> comments on top of that file before further complicating that code with KASLR
> details! Right now it consists of various low level ramblings choke full of typos.
> No mention of where KASLR fits into the picture at all!
>
> Right now it says:
>
>  * misc.c
>  *
>  * This is a collection of several routines from gzip-1.0.3
>  * adapted for Linux.
>
> Which was perhaps true 15 years ago, but sure it's not true today! We do
> relocation and KASLR processing in this path as well, which is just as important
> as running the decompressor. Also, gzip is not the only decompressor we support.
>
> The high level purpose of the code in this file should be explained first, and the
> file should probably be renamed to something like extract.c - to signal that this
> file is about extracting a kernel image and preparing it for execution (i.e.
> relinking it, etc.). It's (much!) more than just pure decompression.
>
> Likewise, decompress_kernel() should be renamed to extract_kernel().

I'll get these all cleaned up. Yay comment-bit-rot. :)

> The low level explanations in extract.c should be moved to the function that is
> affected by those details. The top of the file should only contain general
> explanations, a high level description of in what state the kernel is when we call
> this code, and the sort of stuff we do in that code.
>
> The credits should be collected into a single, coherent block. Unnecessary fluff
> should be cut.
>
> Furthermore, I see similar problems with arch/x86/boot/compressed/aslr.c: for
> example it has no high level description at the top of the file _at all_. WTF is
> this with writing security-sensitive code that has no high level design
> description whatsoever?? Also, why is it named aslr.c, why not kaslr.c? We do have
> both KASLR and ASLR code in the kernel, confusing naming on the source code level
> sure does not help.
>
> Also, what's this thing about choose_kernel_location()? That name is actively
> hiding the fact that the main purpose of that function is to randomize things ...
>
> It should be named x86_randomize_kernel_address() or so.

When naming this I wanted to make it clear that the only thing it was
doing was picking a location. The move is done by the decompressor.
I'll try to clarify the name of the function.

> We need to stop making a mess of the x86 boot code! This is a highly critical
> piece of kernel code that every single x86 Linux user will execute, we should
> improve its visual presentation and general readability to the level of being
> proud of it ...

I totally agree. :) There is a bit of a culture of "don't change it
just to rename things" which I try to avoid colliding with. In this
case, though, they would all seriously improve readability.

-Kees

-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v5 02/21] x86, KASLR: Handle kernel relocation above 2G
  2016-04-15  7:47     ` [kernel-hardening] " Ingo Molnar
@ 2016-04-15 19:01       ` Kees Cook
  -1 siblings, 0 replies; 80+ messages in thread
From: Kees Cook @ 2016-04-15 19:01 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Baoquan He, Yinghai Lu, Ard Biesheuvel, Matt Redfearn, x86,
	H. Peter Anvin, Ingo Molnar, Borislav Petkov, Vivek Goyal,
	Andy Lutomirski, lasse.collin, Andrew Morton, Dave Young,
	kernel-hardening, LKML, Linus Torvalds, Thomas Gleixner

On Fri, Apr 15, 2016 at 12:47 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Kees Cook <keescook@chromium.org> wrote:
>
>> From: Baoquan He <bhe@redhat.com>
>>
>> When processing the relocation table, the offset used to calculate the
>> relocation is an int. This is sufficient for calculating the physical
>> address of the relocs entry on 32-bit systems and on 64-bit systems when
>> the relocation is under 2G. To handle relocations above 2G (seen in
>> situations like kexec, netboot, etc), this offset needs to be calculated
>> using a long to avoid wrapping and miscalculating the relocation.
>>
>> Signed-off-by: Baoquan He <bhe@redhat.com>
>> [kees: rewrote changelog]
>> Signed-off-by: Kees Cook <keescook@chromium.org>
>> ---
>>  arch/x86/boot/compressed/misc.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
>> index f35ad9eb1bf1..c4477d5f3fff 100644
>> --- a/arch/x86/boot/compressed/misc.c
>> +++ b/arch/x86/boot/compressed/misc.c
>> @@ -295,7 +295,7 @@ static void handle_relocations(void *output, unsigned long output_len)
>>        * So we work backwards from the end of the decompressed image.
>>        */
>>       for (reloc = output + output_len - sizeof(*reloc); *reloc; reloc--) {
>> -             int extended = *reloc;
>> +             long extended = *reloc;
>>               extended += map;
>>
>>               ptr = (unsigned long)extended;
>
> This patch and the code it patches is just plain sloppy. See this cast? This is an
> object lesson of why type casts in C are actively dangerous, they have hidden the
> 32-bit truncation bug you've fixed with this patch.
>
> But the primary bug, the cast, should be fixed! Together with all the other casts
> of 'extended'.

In my defense, there's a ton of mixing of pointers vs unsigned longs
all through-out the boot code. And relocations are special since
they're explicitly designed to be sign-extended, etc. I'll clean
things up as best as I can.

>
> Also, the boot code should be reviewed for unnecessary casts, it seems to be a
> disease:
>
>  triton:~/tip> git grep -cE '\(unsigned.*;$' arch/x86/boot/compressed/
>  arch/x86/boot/compressed/aslr.c:14
>  arch/x86/boot/compressed/eboot.c:41
>  arch/x86/boot/compressed/misc.c:5
>  arch/x86/boot/compressed/misc.h:2
>  arch/x86/boot/compressed/mkpiggy.c:1
>  arch/x86/boot/compressed/string.c:4
>
> For example the type dance and overloaded usage that choose_kernel_location() does
> with the 'random' local variable in aslr.c is disgusting:
>
> void choose_kernel_location(unsigned char *input,
>                                 unsigned long input_size,
>                                 unsigned char **output,
>                                 unsigned long output_size,
>                                 unsigned char **virt_offset)
> {
>         unsigned long random, min_addr;
>
>         *virt_offset = (unsigned char *)LOAD_PHYSICAL_ADDR;
>
> #ifdef CONFIG_HIBERNATION
>         if (!cmdline_find_option_bool("kaslr")) {
>                 debug_putstr("KASLR disabled by default...\n");
>                 return;
>         }
> #else
>         if (cmdline_find_option_bool("nokaslr")) {
>                 debug_putstr("KASLR disabled by cmdline...\n");
>                 return;
>         }
> #endif
>
>         real_mode->hdr.loadflags |= KASLR_FLAG;
>
>         /* Record the various known unsafe memory ranges. */
>         mem_avoid_init((unsigned long)input, input_size,
>                        (unsigned long)*output);
>
>         /* Low end should be the smaller of 512M or initial location. */
>         min_addr = min((unsigned long)*output, 512UL << 20);
>
>         /* Walk e820 and find a random address. */
>         random = find_random_phy_addr(min_addr, output_size);
>         if (!random)
>                 debug_putstr("KASLR could not find suitable E820 region...\n");
>         else {
>                 if ((unsigned long)*output != random) {
>                         fill_pagetable(random, output_size);
>                         switch_pagetable();
>                         *output = (unsigned char *)random;
>                 }
>         }
>
>         /* Pick random virtual address starting from LOAD_PHYSICAL_ADDR. */
>         if (IS_ENABLED(CONFIG_X86_64))
>                 random = find_random_virt_offset(LOAD_PHYSICAL_ADDR,
>                                                  output_size);
>         *virt_offset = (unsigned char *)random;
> }
>
> Firstly, 'random' is a libc function name. We generally don't overload those.
>
> Secondly, it's a random what? Variable names should make it plenty obvious. So it
> should probably be named 'random_addr'.
>
> Third:
>
>         /* Walk e820 and find a random address. */
>         random = find_random_phy_addr(min_addr, output_size);
>
> yeah, so what that comment tells us we knew already, due to the function name!
> What the comment should _really_ talk about is the high level purpose. Something
> like: 'Walk the e820 map and find a random free RAM address to which we can still
> decompress the whole kernel' would work so much better ...
>
> Fourth, this function has seven (!!) type casts. We can sure do better.

Between the e820 values, the asm linkages, the relocations, etc,
there's a lot of mixing of types. As mentioned, I'll clean it up.

> Fifth:
>
>         /* Pick random virtual address starting from LOAD_PHYSICAL_ADDR. */
>         if (IS_ENABLED(CONFIG_X86_64))
>                 random = find_random_virt_offset(LOAD_PHYSICAL_ADDR,
>                                                  output_size);
>         *virt_offset = (unsigned char *)random;
>
> So the purpose of this whole function is to pick _two_ random addresses: the
> random physical address to place the kernel at, and on x86_64, to also randomize
> the kernel virtual address, right? So exactly which comment tells us that it's
> about this? Names like 'choose_kernel_location' are singular and are actively
> misleading about this ...
>
> ... and then I haven't even mentioned small details like the imbalanced curly
> braces.

I'll bite: which braces jumped out at you? I ran all this through
checkpatch.pl in the hopes of finding style mistakes... Is it the
mixing of single-line code with multi-line code in the if statements?

>
> This code sucks, and I'm not surprised at all that it was broken. It should be
> improved before we can feature-extend it.
>
> Thanks,
>
>         Ingo

-Kees

-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [kernel-hardening] Re: [PATCH v5 02/21] x86, KASLR: Handle kernel relocation above 2G
@ 2016-04-15 19:01       ` Kees Cook
  0 siblings, 0 replies; 80+ messages in thread
From: Kees Cook @ 2016-04-15 19:01 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Baoquan He, Yinghai Lu, Ard Biesheuvel, Matt Redfearn, x86,
	H. Peter Anvin, Ingo Molnar, Borislav Petkov, Vivek Goyal,
	Andy Lutomirski, lasse.collin, Andrew Morton, Dave Young,
	kernel-hardening, LKML, Linus Torvalds, Thomas Gleixner

On Fri, Apr 15, 2016 at 12:47 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Kees Cook <keescook@chromium.org> wrote:
>
>> From: Baoquan He <bhe@redhat.com>
>>
>> When processing the relocation table, the offset used to calculate the
>> relocation is an int. This is sufficient for calculating the physical
>> address of the relocs entry on 32-bit systems and on 64-bit systems when
>> the relocation is under 2G. To handle relocations above 2G (seen in
>> situations like kexec, netboot, etc), this offset needs to be calculated
>> using a long to avoid wrapping and miscalculating the relocation.
>>
>> Signed-off-by: Baoquan He <bhe@redhat.com>
>> [kees: rewrote changelog]
>> Signed-off-by: Kees Cook <keescook@chromium.org>
>> ---
>>  arch/x86/boot/compressed/misc.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
>> index f35ad9eb1bf1..c4477d5f3fff 100644
>> --- a/arch/x86/boot/compressed/misc.c
>> +++ b/arch/x86/boot/compressed/misc.c
>> @@ -295,7 +295,7 @@ static void handle_relocations(void *output, unsigned long output_len)
>>        * So we work backwards from the end of the decompressed image.
>>        */
>>       for (reloc = output + output_len - sizeof(*reloc); *reloc; reloc--) {
>> -             int extended = *reloc;
>> +             long extended = *reloc;
>>               extended += map;
>>
>>               ptr = (unsigned long)extended;
>
> This patch and the code it patches is just plain sloppy. See this cast? This is an
> object lesson of why type casts in C are actively dangerous, they have hidden the
> 32-bit truncation bug you've fixed with this patch.
>
> But the primary bug, the cast, should be fixed! Together with all the other casts
> of 'extended'.

In my defense, there's a ton of mixing of pointers vs unsigned longs
all through-out the boot code. And relocations are special since
they're explicitly designed to be sign-extended, etc. I'll clean
things up as best as I can.

>
> Also, the boot code should be reviewed for unnecessary casts, it seems to be a
> disease:
>
>  triton:~/tip> git grep -cE '\(unsigned.*;$' arch/x86/boot/compressed/
>  arch/x86/boot/compressed/aslr.c:14
>  arch/x86/boot/compressed/eboot.c:41
>  arch/x86/boot/compressed/misc.c:5
>  arch/x86/boot/compressed/misc.h:2
>  arch/x86/boot/compressed/mkpiggy.c:1
>  arch/x86/boot/compressed/string.c:4
>
> For example the type dance and overloaded usage that choose_kernel_location() does
> with the 'random' local variable in aslr.c is disgusting:
>
> void choose_kernel_location(unsigned char *input,
>                                 unsigned long input_size,
>                                 unsigned char **output,
>                                 unsigned long output_size,
>                                 unsigned char **virt_offset)
> {
>         unsigned long random, min_addr;
>
>         *virt_offset = (unsigned char *)LOAD_PHYSICAL_ADDR;
>
> #ifdef CONFIG_HIBERNATION
>         if (!cmdline_find_option_bool("kaslr")) {
>                 debug_putstr("KASLR disabled by default...\n");
>                 return;
>         }
> #else
>         if (cmdline_find_option_bool("nokaslr")) {
>                 debug_putstr("KASLR disabled by cmdline...\n");
>                 return;
>         }
> #endif
>
>         real_mode->hdr.loadflags |= KASLR_FLAG;
>
>         /* Record the various known unsafe memory ranges. */
>         mem_avoid_init((unsigned long)input, input_size,
>                        (unsigned long)*output);
>
>         /* Low end should be the smaller of 512M or initial location. */
>         min_addr = min((unsigned long)*output, 512UL << 20);
>
>         /* Walk e820 and find a random address. */
>         random = find_random_phy_addr(min_addr, output_size);
>         if (!random)
>                 debug_putstr("KASLR could not find suitable E820 region...\n");
>         else {
>                 if ((unsigned long)*output != random) {
>                         fill_pagetable(random, output_size);
>                         switch_pagetable();
>                         *output = (unsigned char *)random;
>                 }
>         }
>
>         /* Pick random virtual address starting from LOAD_PHYSICAL_ADDR. */
>         if (IS_ENABLED(CONFIG_X86_64))
>                 random = find_random_virt_offset(LOAD_PHYSICAL_ADDR,
>                                                  output_size);
>         *virt_offset = (unsigned char *)random;
> }
>
> Firstly, 'random' is a libc function name. We generally don't overload those.
>
> Secondly, it's a random what? Variable names should make it plenty obvious. So it
> should probably be named 'random_addr'.
>
> Third:
>
>         /* Walk e820 and find a random address. */
>         random = find_random_phy_addr(min_addr, output_size);
>
> yeah, so what that comment tells us we knew already, due to the function name!
> What the comment should _really_ talk about is the high level purpose. Something
> like: 'Walk the e820 map and find a random free RAM address to which we can still
> decompress the whole kernel' would work so much better ...
>
> Fourth, this function has seven (!!) type casts. We can sure do better.

Between the e820 values, the asm linkages, the relocations, etc,
there's a lot of mixing of types. As mentioned, I'll clean it up.

> Fifth:
>
>         /* Pick random virtual address starting from LOAD_PHYSICAL_ADDR. */
>         if (IS_ENABLED(CONFIG_X86_64))
>                 random = find_random_virt_offset(LOAD_PHYSICAL_ADDR,
>                                                  output_size);
>         *virt_offset = (unsigned char *)random;
>
> So the purpose of this whole function is to pick _two_ random addresses: the
> random physical address to place the kernel at, and on x86_64, to also randomize
> the kernel virtual address, right? So exactly which comment tells us that it's
> about this? Names like 'choose_kernel_location' are singular and are actively
> misleading about this ...
>
> ... and then I haven't even mentioned small details like the imbalanced curly
> braces.

I'll bite: which braces jumped out at you? I ran all this through
checkpatch.pl in the hopes of finding style mistakes... Is it the
mixing of single-line code with multi-line code in the if statements?

>
> This code sucks, and I'm not surprised at all that it was broken. It should be
> improved before we can feature-extend it.
>
> Thanks,
>
>         Ingo

-Kees

-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v5 03/21] x86, KASLR: Drop CONFIG_RANDOMIZE_BASE_MAX_OFFSET
  2016-04-15  8:07     ` [kernel-hardening] " Ingo Molnar
@ 2016-04-15 19:12       ` Kees Cook
  -1 siblings, 0 replies; 80+ messages in thread
From: Kees Cook @ 2016-04-15 19:12 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Baoquan He, Yinghai Lu, Ard Biesheuvel, Matt Redfearn, x86,
	H. Peter Anvin, Ingo Molnar, Borislav Petkov, Vivek Goyal,
	Andy Lutomirski, lasse.collin, Andrew Morton, Dave Young,
	kernel-hardening, LKML, Linus Torvalds, Thomas Gleixner

On Fri, Apr 15, 2016 at 1:07 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Kees Cook <keescook@chromium.org> wrote:
>
>> From: Baoquan He <bhe@redhat.com>
>>
>> Currently CONFIG_RANDOMIZE_BASE_MAX_OFFSET is used to limit the maximum
>> offset for kernel randomization. This limit doesn't need to be a CONFIG
>> since it is tied completely to KERNEL_IMAGE_SIZE, and will make no sense
>> once physical and virtual offsets are randomized separately. This patch
>> removes CONFIG_RANDOMIZE_BASE_MAX_OFFSET and consolidates the Kconfig
>> help text.
>>
>> Signed-off-by: Baoquan He <bhe@redhat.com>
>> [kees: rewrote changelog, dropped KERNEL_IMAGE_SIZE_DEFAULT, moved earlier]
>> Signed-off-by: Kees Cook <keescook@chromium.org>
>> ---
>>  arch/x86/Kconfig                     | 57 +++++++++++++-----------------------
>>  arch/x86/boot/compressed/aslr.c      | 12 ++++----
>>  arch/x86/include/asm/page_64_types.h |  8 ++---
>>  arch/x86/mm/init_32.c                |  3 --
>>  4 files changed, 29 insertions(+), 51 deletions(-)
>>
>> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
>> index 2dc18605831f..fd9ac711ada8 100644
>> --- a/arch/x86/Kconfig
>> +++ b/arch/x86/Kconfig
>> @@ -1936,51 +1936,36 @@ config RANDOMIZE_BASE
>>       depends on RELOCATABLE
>>       default n
>>       ---help---
>> -        Randomizes the physical and virtual address at which the
>> -        kernel image is decompressed, as a security feature that
>> -        deters exploit attempts relying on knowledge of the location
>> -        of kernel internals.
>> +        Randomizes the physical address at which the kernel image
>> +        is decompressed and the virtual address where the kernel
>> +        image is mapped, as a secrurity feature that deters exploit
>
> Guys, please _read_ what you write: s/secrurity/security

Gah, sorry. I was reading these, but things slip by. I'll fix it. (And
add these to the common misspellings that checkpatch.pl looks for.)

>
>> +        attempts relying on knowledge of the location of kernel
>> +        internals.
>> +
>> +        The kernel physical address can be randomized from 16M to
>> +        64T at most.)
>
> The 64TB value sure reads weird if you are configuring a 32-bit system ...
>
> A much better approach would be to split the help text into 32-bit and 64-bit
> portions:
>
>   On 64-bit systems the kernel physical address will be randomized from 16M to the
>   top of available physical memory. (With a maximum of 64TB.)
>
>   On 32-bit systems the kernel physical address will be randomized from 16MB to
>   1GB.

Yup, good idea.

> Also note the assertive tone: if this Kconfig feature is eanbled, we say that the
> kernel address _will_ be randomized, and we should make sure it is. (If for some
> weird reason randomization fails we should warn prominently during bootup.)

This will need some thought... is it better to fail to boot or to boot
without entropy? As-is, it's possible to randomly position the kernel
base address at exactly the location it was going to boot without
KASLR too, yet this is still considered random...

>
>
>>                           The kernel virtual address will be offset
>> +        by up to KERNEL_IMAGE_SIZE. On 32-bit KERNEL_IMAGE_SIZE is
>> +        512MiB. while on 64-bit this is limited by how the kernel
>> +        fixmap page table is positioned, so this cannot be larger
>> +        than 1GiB currently. Without RANDOMIZE_BASE there is a 512MiB
>> +        to 1.5GiB split between kernel and modules. When RANDOMIZE_BASE
>> +        is enabled, the modules area will shrink to compensate, up
>> +        to a 1GiB to 1GiB split, KERNEL_IMAGE_SIZE changes from 512MiB
>> +        to 1GiB.
>
> Beyond the broken capitalization, I'll show you what 99.999% of users who are not
> kernel hackers will understand from this paragraph, approximately:
>
>                              To dream: ay, and them? To bear to sling afterprises
>             coil, and scover'd cowards of resolence dream: ay, the us for no mome
>             wish'd. Thus and sweary life, or nobles cast and makes, whips and that
>             is sicklied of resolence of so long afterprises us more; for whips
>             all; and name whething after bear to sleep; to beart-ache shocks the
>             undiscover'd consummative have, but that pith a sleep of somethe under
>             'tis the spurns of troud makes off thance doth make whose bourns of
>             dispriz'd consient and arms more.
>
> So this is really deep kernel internals, I get a headache trying to interpret it,
> and it's my job to interpret this! Please try to formulate key pieces of
> information in Kconfig help texts in a more ... approachable fashion, and move the
> jargon to .c source code files.

Trying to capture the effect of reducing the kernel/module split
without detailing the actual numbers may sound evasive, but I'll see
what I can do.

>
>>          Entropy is generated using the RDRAND instruction if it is
>>          supported. If RDTSC is supported, it is used as well. If
>>          neither RDRAND nor RDTSC are supported, then randomness is
>>          read from the i8254 timer.
>
> Also, instead of 'used as well' I'd say "is mixed into the entropy pool as well"
> or so, to make sure it's clear that we don't exclusively rely on RDRAND or RDTSC.
>
> Also, could we always mix the i8254 timer into this as well, not just when RDTSC
> is unavailable?

IIRC, hpa explicitly did not want to do this when he was making
suggestions on this area. I would need to dig out the thread -- I
can't find it now. I'd prefer to leave this as-is, since changing it
would add yet another variable to the behavioral changes of this
series.

>> -        The kernel will be offset by up to RANDOMIZE_BASE_MAX_OFFSET,
>> -        and aligned according to PHYSICAL_ALIGN. Since the kernel is
>> -        built using 2GiB addressing, and PHYSICAL_ALGIN must be at a
>> -        minimum of 2MiB, only 10 bits of entropy is theoretically
>> -        possible. At best, due to page table layouts, 64-bit can use
>> -        9 bits of entropy and 32-bit uses 8 bits.
>> +        Since the kernel is built using 2GiB addressing, and
>> +        PHYSICAL_ALGIN must be at a minimum of 2MiB, only 10 bits of
>> +        entropy is theoretically possible. At best, due to page table
>> +        layouts, 64-bit can use 9 bits of entropy and 32-bit uses 8
>> +        bits.
>
> Please read what you write, there's a typo in this section.

Gah, I need to teach my spell checker about #defines. ;) I read this
multiple times after you called it out and still couldn't see it
(http://www.mrc-cbu.cam.ac.uk/people/matt.davis/cmabridge/). Finally
dropped the _ and the spell checker flagged it. ;)

> Another request: please stop the MiB/GiB nonsense and call it MB/GB. This isn't
> storage code that has to fight marketing lies. Only the KASLR section in
> arch/x86/Kconfig* is using MiB/GiB, everything else uses MB/GB naming, we should
> stick with that.

Totally fine by me. I prefer MB/GB. I wonder if it is worth
documenting this preference somewhere in the style guide? It's
certainly rare in the kernel, but it's present (and there are even
#defines for it *sob*).

$ git grep '[KMGTP]iB' | wc -l
1239
$ git grep '[KMGTP]B' | wc -l
192251

-Kees

-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [kernel-hardening] Re: [PATCH v5 03/21] x86, KASLR: Drop CONFIG_RANDOMIZE_BASE_MAX_OFFSET
@ 2016-04-15 19:12       ` Kees Cook
  0 siblings, 0 replies; 80+ messages in thread
From: Kees Cook @ 2016-04-15 19:12 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Baoquan He, Yinghai Lu, Ard Biesheuvel, Matt Redfearn, x86,
	H. Peter Anvin, Ingo Molnar, Borislav Petkov, Vivek Goyal,
	Andy Lutomirski, lasse.collin, Andrew Morton, Dave Young,
	kernel-hardening, LKML, Linus Torvalds, Thomas Gleixner

On Fri, Apr 15, 2016 at 1:07 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Kees Cook <keescook@chromium.org> wrote:
>
>> From: Baoquan He <bhe@redhat.com>
>>
>> Currently CONFIG_RANDOMIZE_BASE_MAX_OFFSET is used to limit the maximum
>> offset for kernel randomization. This limit doesn't need to be a CONFIG
>> since it is tied completely to KERNEL_IMAGE_SIZE, and will make no sense
>> once physical and virtual offsets are randomized separately. This patch
>> removes CONFIG_RANDOMIZE_BASE_MAX_OFFSET and consolidates the Kconfig
>> help text.
>>
>> Signed-off-by: Baoquan He <bhe@redhat.com>
>> [kees: rewrote changelog, dropped KERNEL_IMAGE_SIZE_DEFAULT, moved earlier]
>> Signed-off-by: Kees Cook <keescook@chromium.org>
>> ---
>>  arch/x86/Kconfig                     | 57 +++++++++++++-----------------------
>>  arch/x86/boot/compressed/aslr.c      | 12 ++++----
>>  arch/x86/include/asm/page_64_types.h |  8 ++---
>>  arch/x86/mm/init_32.c                |  3 --
>>  4 files changed, 29 insertions(+), 51 deletions(-)
>>
>> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
>> index 2dc18605831f..fd9ac711ada8 100644
>> --- a/arch/x86/Kconfig
>> +++ b/arch/x86/Kconfig
>> @@ -1936,51 +1936,36 @@ config RANDOMIZE_BASE
>>       depends on RELOCATABLE
>>       default n
>>       ---help---
>> -        Randomizes the physical and virtual address at which the
>> -        kernel image is decompressed, as a security feature that
>> -        deters exploit attempts relying on knowledge of the location
>> -        of kernel internals.
>> +        Randomizes the physical address at which the kernel image
>> +        is decompressed and the virtual address where the kernel
>> +        image is mapped, as a secrurity feature that deters exploit
>
> Guys, please _read_ what you write: s/secrurity/security

Gah, sorry. I was reading these, but things slip by. I'll fix it. (And
add these to the common misspellings that checkpatch.pl looks for.)

>
>> +        attempts relying on knowledge of the location of kernel
>> +        internals.
>> +
>> +        The kernel physical address can be randomized from 16M to
>> +        64T at most.)
>
> The 64TB value sure reads weird if you are configuring a 32-bit system ...
>
> A much better approach would be to split the help text into 32-bit and 64-bit
> portions:
>
>   On 64-bit systems the kernel physical address will be randomized from 16M to the
>   top of available physical memory. (With a maximum of 64TB.)
>
>   On 32-bit systems the kernel physical address will be randomized from 16MB to
>   1GB.

Yup, good idea.

> Also note the assertive tone: if this Kconfig feature is eanbled, we say that the
> kernel address _will_ be randomized, and we should make sure it is. (If for some
> weird reason randomization fails we should warn prominently during bootup.)

This will need some thought... is it better to fail to boot or to boot
without entropy? As-is, it's possible to randomly position the kernel
base address at exactly the location it was going to boot without
KASLR too, yet this is still considered random...

>
>
>>                           The kernel virtual address will be offset
>> +        by up to KERNEL_IMAGE_SIZE. On 32-bit KERNEL_IMAGE_SIZE is
>> +        512MiB. while on 64-bit this is limited by how the kernel
>> +        fixmap page table is positioned, so this cannot be larger
>> +        than 1GiB currently. Without RANDOMIZE_BASE there is a 512MiB
>> +        to 1.5GiB split between kernel and modules. When RANDOMIZE_BASE
>> +        is enabled, the modules area will shrink to compensate, up
>> +        to a 1GiB to 1GiB split, KERNEL_IMAGE_SIZE changes from 512MiB
>> +        to 1GiB.
>
> Beyond the broken capitalization, I'll show you what 99.999% of users who are not
> kernel hackers will understand from this paragraph, approximately:
>
>                              To dream: ay, and them? To bear to sling afterprises
>             coil, and scover'd cowards of resolence dream: ay, the us for no mome
>             wish'd. Thus and sweary life, or nobles cast and makes, whips and that
>             is sicklied of resolence of so long afterprises us more; for whips
>             all; and name whething after bear to sleep; to beart-ache shocks the
>             undiscover'd consummative have, but that pith a sleep of somethe under
>             'tis the spurns of troud makes off thance doth make whose bourns of
>             dispriz'd consient and arms more.
>
> So this is really deep kernel internals, I get a headache trying to interpret it,
> and it's my job to interpret this! Please try to formulate key pieces of
> information in Kconfig help texts in a more ... approachable fashion, and move the
> jargon to .c source code files.

Trying to capture the effect of reducing the kernel/module split
without detailing the actual numbers may sound evasive, but I'll see
what I can do.

>
>>          Entropy is generated using the RDRAND instruction if it is
>>          supported. If RDTSC is supported, it is used as well. If
>>          neither RDRAND nor RDTSC are supported, then randomness is
>>          read from the i8254 timer.
>
> Also, instead of 'used as well' I'd say "is mixed into the entropy pool as well"
> or so, to make sure it's clear that we don't exclusively rely on RDRAND or RDTSC.
>
> Also, could we always mix the i8254 timer into this as well, not just when RDTSC
> is unavailable?

IIRC, hpa explicitly did not want to do this when he was making
suggestions on this area. I would need to dig out the thread -- I
can't find it now. I'd prefer to leave this as-is, since changing it
would add yet another variable to the behavioral changes of this
series.

>> -        The kernel will be offset by up to RANDOMIZE_BASE_MAX_OFFSET,
>> -        and aligned according to PHYSICAL_ALIGN. Since the kernel is
>> -        built using 2GiB addressing, and PHYSICAL_ALGIN must be at a
>> -        minimum of 2MiB, only 10 bits of entropy is theoretically
>> -        possible. At best, due to page table layouts, 64-bit can use
>> -        9 bits of entropy and 32-bit uses 8 bits.
>> +        Since the kernel is built using 2GiB addressing, and
>> +        PHYSICAL_ALGIN must be at a minimum of 2MiB, only 10 bits of
>> +        entropy is theoretically possible. At best, due to page table
>> +        layouts, 64-bit can use 9 bits of entropy and 32-bit uses 8
>> +        bits.
>
> Please read what you write, there's a typo in this section.

Gah, I need to teach my spell checker about #defines. ;) I read this
multiple times after you called it out and still couldn't see it
(http://www.mrc-cbu.cam.ac.uk/people/matt.davis/cmabridge/). Finally
dropped the _ and the spell checker flagged it. ;)

> Another request: please stop the MiB/GiB nonsense and call it MB/GB. This isn't
> storage code that has to fight marketing lies. Only the KASLR section in
> arch/x86/Kconfig* is using MiB/GiB, everything else uses MB/GB naming, we should
> stick with that.

Totally fine by me. I prefer MB/GB. I wonder if it is worth
documenting this preference somewhere in the style guide? It's
certainly rare in the kernel, but it's present (and there are even
#defines for it *sob*).

$ git grep '[KMGTP]iB' | wc -l
1239
$ git grep '[KMGTP]B' | wc -l
192251

-Kees

-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v5 05/21] x86, boot: Calculate decompression size during boot not build
  2016-04-15  8:12     ` [kernel-hardening] " Ingo Molnar
@ 2016-04-15 19:14       ` Kees Cook
  -1 siblings, 0 replies; 80+ messages in thread
From: Kees Cook @ 2016-04-15 19:14 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Yinghai Lu, Baoquan He, Ard Biesheuvel, Matt Redfearn, x86,
	H. Peter Anvin, Ingo Molnar, Borislav Petkov, Vivek Goyal,
	Andy Lutomirski, lasse.collin, Andrew Morton, Dave Young,
	kernel-hardening, LKML

On Fri, Apr 15, 2016 at 1:12 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Kees Cook <keescook@chromium.org> wrote:
>
>> From: Yinghai Lu <yinghai@kernel.org>
>>
>> Currently z_extract_offset is calculated in boot/compressed/mkpiggy.c.
>
> What is the high level meaning of z_extract_offset? I cannot tell without reading
> the code - and the point of changelogs is for them to be readable without having
> to read the code.

I'll clarify it in the changelog. (FWIW, it's the offset into the
extraction buffer where it's safe to put the data-to-be-extracted).

> The rest of the changelog is pretty good.

Thanks!

-Kees

-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [kernel-hardening] Re: [PATCH v5 05/21] x86, boot: Calculate decompression size during boot not build
@ 2016-04-15 19:14       ` Kees Cook
  0 siblings, 0 replies; 80+ messages in thread
From: Kees Cook @ 2016-04-15 19:14 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Yinghai Lu, Baoquan He, Ard Biesheuvel, Matt Redfearn, x86,
	H. Peter Anvin, Ingo Molnar, Borislav Petkov, Vivek Goyal,
	Andy Lutomirski, lasse.collin, Andrew Morton, Dave Young,
	kernel-hardening, LKML

On Fri, Apr 15, 2016 at 1:12 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Kees Cook <keescook@chromium.org> wrote:
>
>> From: Yinghai Lu <yinghai@kernel.org>
>>
>> Currently z_extract_offset is calculated in boot/compressed/mkpiggy.c.
>
> What is the high level meaning of z_extract_offset? I cannot tell without reading
> the code - and the point of changelogs is for them to be readable without having
> to read the code.

I'll clarify it in the changelog. (FWIW, it's the offset into the
extraction buffer where it's safe to put the data-to-be-extracted).

> The rest of the changelog is pretty good.

Thanks!

-Kees

-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v5 07/21] x86, boot: Fix run_size calculation
  2016-04-15  8:31     ` [kernel-hardening] " Ingo Molnar
@ 2016-04-15 19:26       ` Kees Cook
  -1 siblings, 0 replies; 80+ messages in thread
From: Kees Cook @ 2016-04-15 19:26 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Yinghai Lu, Junjie Mao, Josh Triplett, Andrew Morton, Baoquan He,
	Ard Biesheuvel, Matt Redfearn, x86, H. Peter Anvin, Ingo Molnar,
	Borislav Petkov, Vivek Goyal, Andy Lutomirski, lasse.collin,
	Dave Young, kernel-hardening, LKML

On Fri, Apr 15, 2016 at 1:31 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Kees Cook <keescook@chromium.org> wrote:
>
>> From: Yinghai Lu <yinghai@kernel.org>
>>
>> Currently, the kernel run_size (size of code plus brk and bss) is
>> calculated via the shell script arch/x86/tools/calc_run_size.sh.
>> It gets the file offset and mem size of for the .bss and .brk sections in
>> vmlinux, and adds them as follows:
>
> 'of for'?
>
>> So, from this we can see that the existing run_size calculation is
>> 0x400000 too high.
>
> Btw., why is it too high? Were some sections discarded?

I'm not sure why it's that specific value, but the old method used the
section's ELF _file_ position, not the virtual address, and that
seemed to cause the problem.

>
>> [...] And, as it turns out, the correct run_size is
>> actually equal to VO_end - VO_text, which is certainly easier to calculate.
>> _end: 0xffffffff8205f000
>> _text:0xffffffff81000000
>>
>> 0xffffffff8205f000 - 0xffffffff81000000 = 0x105f000
>>
>> As a result, run_size is a simple constant, so we don't need to pass it
>> around; we already have voffset.h such things.
>
> (Typo.)
>
>> [...] We can share voffset.h
>> between misc.c and header.S instead of getting run_size in other ways.
>> This patch moves voffset.h creation code to boot/compressed/Makefile,
>> and switches misc.c to use the VO_end - VO_text calculation.
>
> Btw., 'run_size' is a pretty meaningless name, it doesn't really tell us the most
> important thing: that this is the complete size of the uncompressed kernel, the
> total physically continuous size we need to allocate to it so it can execute?
>
> So can we rename it to something more expressive, such as kernel_total_size or so?

You got it. Thanks again for digging through all this!

-Kees

-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [kernel-hardening] Re: [PATCH v5 07/21] x86, boot: Fix run_size calculation
@ 2016-04-15 19:26       ` Kees Cook
  0 siblings, 0 replies; 80+ messages in thread
From: Kees Cook @ 2016-04-15 19:26 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Yinghai Lu, Junjie Mao, Josh Triplett, Andrew Morton, Baoquan He,
	Ard Biesheuvel, Matt Redfearn, x86, H. Peter Anvin, Ingo Molnar,
	Borislav Petkov, Vivek Goyal, Andy Lutomirski, lasse.collin,
	Dave Young, kernel-hardening, LKML

On Fri, Apr 15, 2016 at 1:31 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Kees Cook <keescook@chromium.org> wrote:
>
>> From: Yinghai Lu <yinghai@kernel.org>
>>
>> Currently, the kernel run_size (size of code plus brk and bss) is
>> calculated via the shell script arch/x86/tools/calc_run_size.sh.
>> It gets the file offset and mem size of for the .bss and .brk sections in
>> vmlinux, and adds them as follows:
>
> 'of for'?
>
>> So, from this we can see that the existing run_size calculation is
>> 0x400000 too high.
>
> Btw., why is it too high? Were some sections discarded?

I'm not sure why it's that specific value, but the old method used the
section's ELF _file_ position, not the virtual address, and that
seemed to cause the problem.

>
>> [...] And, as it turns out, the correct run_size is
>> actually equal to VO_end - VO_text, which is certainly easier to calculate.
>> _end: 0xffffffff8205f000
>> _text:0xffffffff81000000
>>
>> 0xffffffff8205f000 - 0xffffffff81000000 = 0x105f000
>>
>> As a result, run_size is a simple constant, so we don't need to pass it
>> around; we already have voffset.h such things.
>
> (Typo.)
>
>> [...] We can share voffset.h
>> between misc.c and header.S instead of getting run_size in other ways.
>> This patch moves voffset.h creation code to boot/compressed/Makefile,
>> and switches misc.c to use the VO_end - VO_text calculation.
>
> Btw., 'run_size' is a pretty meaningless name, it doesn't really tell us the most
> important thing: that this is the complete size of the uncompressed kernel, the
> total physically continuous size we need to allocate to it so it can execute?
>
> So can we rename it to something more expressive, such as kernel_total_size or so?

You got it. Thanks again for digging through all this!

-Kees

-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v5 13/21] x86, boot: Report overlap failures in memcpy
  2016-04-15 14:42     ` [kernel-hardening] " Lasse Collin
@ 2016-04-15 19:28       ` Kees Cook
  -1 siblings, 0 replies; 80+ messages in thread
From: Kees Cook @ 2016-04-15 19:28 UTC (permalink / raw)
  To: Lasse Collin
  Cc: Ingo Molnar, Yinghai Lu, Baoquan He, Ard Biesheuvel,
	Matt Redfearn, x86, H. Peter Anvin, Ingo Molnar, Borislav Petkov,
	Vivek Goyal, Andy Lutomirski, Andrew Morton, Dave Young,
	kernel-hardening, LKML

On Fri, Apr 15, 2016 at 7:42 AM, Lasse Collin <lasse.collin@tukaani.org> wrote:
> On 2016-04-14 Kees Cook wrote:
>> From: Yinghai Lu <yinghai@kernel.org>
>>
>> parse_elf is using a local memcpy to move sections to their final
>> running position. However, this memcpy only supports non-overlapping
>> arguments (or dest < src).
>
> The same copy of memcpy is used by the decompressors too.
>
>> To avoid future hard-to-debug surprises, this adds checking in memcpy
>> to detect the unhandled condition (which should not be happening
>> currently).
>
> It's already a minor surprise that memcpy is expected to work for
> overlapping buffers at all. It could be good to have a comment about
> it because "scroll" and parse_elf seem to rely on it.
>
> On the other hand, the new code and error message take quite a few bytes
> of space, so a complete memmove can be smaller:
>
> void *memmove(void *dest, const void *src, size_t n)
> {
>         unsigned char *d = dest;
>         const unsigned char *s = src;
>
>         if (d <= s || d - s >= n)
>                 return __memcpy(dest, src, n);
>
>         while (n-- > 0)
>                 d[n] = s[n];
>
>         return dest;
> }
>
> Note that memmove is needed by lib/decompress_unxz.c. It contains its
> own small version inside a "#ifndef memmove" block. That #ifndef should
> be taken into account when adding a memmove symbol. Changing
> decompress_unxz.c is fine but then one needs to think about other
> archs too.

Awesome, thanks! I'd much prefer to fully fix this instead of just
throwing a warning. I'll get this added.

-Kees

-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [kernel-hardening] Re: [PATCH v5 13/21] x86, boot: Report overlap failures in memcpy
@ 2016-04-15 19:28       ` Kees Cook
  0 siblings, 0 replies; 80+ messages in thread
From: Kees Cook @ 2016-04-15 19:28 UTC (permalink / raw)
  To: Lasse Collin
  Cc: Ingo Molnar, Yinghai Lu, Baoquan He, Ard Biesheuvel,
	Matt Redfearn, x86, H. Peter Anvin, Ingo Molnar, Borislav Petkov,
	Vivek Goyal, Andy Lutomirski, Andrew Morton, Dave Young,
	kernel-hardening, LKML

On Fri, Apr 15, 2016 at 7:42 AM, Lasse Collin <lasse.collin@tukaani.org> wrote:
> On 2016-04-14 Kees Cook wrote:
>> From: Yinghai Lu <yinghai@kernel.org>
>>
>> parse_elf is using a local memcpy to move sections to their final
>> running position. However, this memcpy only supports non-overlapping
>> arguments (or dest < src).
>
> The same copy of memcpy is used by the decompressors too.
>
>> To avoid future hard-to-debug surprises, this adds checking in memcpy
>> to detect the unhandled condition (which should not be happening
>> currently).
>
> It's already a minor surprise that memcpy is expected to work for
> overlapping buffers at all. It could be good to have a comment about
> it because "scroll" and parse_elf seem to rely on it.
>
> On the other hand, the new code and error message take quite a few bytes
> of space, so a complete memmove can be smaller:
>
> void *memmove(void *dest, const void *src, size_t n)
> {
>         unsigned char *d = dest;
>         const unsigned char *s = src;
>
>         if (d <= s || d - s >= n)
>                 return __memcpy(dest, src, n);
>
>         while (n-- > 0)
>                 d[n] = s[n];
>
>         return dest;
> }
>
> Note that memmove is needed by lib/decompress_unxz.c. It contains its
> own small version inside a "#ifndef memmove" block. That #ifndef should
> be taken into account when adding a memmove symbol. Changing
> decompress_unxz.c is fine but then one needs to think about other
> archs too.

Awesome, thanks! I'd much prefer to fully fix this instead of just
throwing a warning. I'll get this added.

-Kees

-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v5 03/21] x86, KASLR: Drop CONFIG_RANDOMIZE_BASE_MAX_OFFSET
  2016-04-15 19:12       ` [kernel-hardening] " Kees Cook
@ 2016-04-16  8:42         ` Ingo Molnar
  -1 siblings, 0 replies; 80+ messages in thread
From: Ingo Molnar @ 2016-04-16  8:42 UTC (permalink / raw)
  To: Kees Cook
  Cc: Baoquan He, Yinghai Lu, Ard Biesheuvel, Matt Redfearn, x86,
	H. Peter Anvin, Ingo Molnar, Borislav Petkov, Vivek Goyal,
	Andy Lutomirski, lasse.collin, Andrew Morton, Dave Young,
	kernel-hardening, LKML, Linus Torvalds, Thomas Gleixner


* Kees Cook <keescook@chromium.org> wrote:

> > Also note the assertive tone: if this Kconfig feature is eanbled, we say that 
> > the kernel address _will_ be randomized, and we should make sure it is. (If 
> > for some weird reason randomization fails we should warn prominently during 
> > bootup.)
> 
> This will need some thought... is it better to fail to boot or to boot without 
> entropy? As-is, it's possible to randomly position the kernel base address at 
> exactly the location it was going to boot without KASLR too, yet this is still 
> considered random...

I think we should boot but print a prominent warning. On real systems it's not 
supposed to happen, right? So we want to know if it happens, but don't want to 
hassle the user with breaking the system by not booting.

> > Also, could we always mix the i8254 timer into this as well, not just when 
> > RDTSC is unavailable?
> 
> IIRC, hpa explicitly did not want to do this when he was making
> suggestions on this area. I would need to dig out the thread -- I
> can't find it now. I'd prefer to leave this as-is, since changing it
> would add yet another variable to the behavioral changes of this
> series.

Sure, can stay as-is for now.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [kernel-hardening] Re: [PATCH v5 03/21] x86, KASLR: Drop CONFIG_RANDOMIZE_BASE_MAX_OFFSET
@ 2016-04-16  8:42         ` Ingo Molnar
  0 siblings, 0 replies; 80+ messages in thread
From: Ingo Molnar @ 2016-04-16  8:42 UTC (permalink / raw)
  To: Kees Cook
  Cc: Baoquan He, Yinghai Lu, Ard Biesheuvel, Matt Redfearn, x86,
	H. Peter Anvin, Ingo Molnar, Borislav Petkov, Vivek Goyal,
	Andy Lutomirski, lasse.collin, Andrew Morton, Dave Young,
	kernel-hardening, LKML, Linus Torvalds, Thomas Gleixner


* Kees Cook <keescook@chromium.org> wrote:

> > Also note the assertive tone: if this Kconfig feature is eanbled, we say that 
> > the kernel address _will_ be randomized, and we should make sure it is. (If 
> > for some weird reason randomization fails we should warn prominently during 
> > bootup.)
> 
> This will need some thought... is it better to fail to boot or to boot without 
> entropy? As-is, it's possible to randomly position the kernel base address at 
> exactly the location it was going to boot without KASLR too, yet this is still 
> considered random...

I think we should boot but print a prominent warning. On real systems it's not 
supposed to happen, right? So we want to know if it happens, but don't want to 
hassle the user with breaking the system by not booting.

> > Also, could we always mix the i8254 timer into this as well, not just when 
> > RDTSC is unavailable?
> 
> IIRC, hpa explicitly did not want to do this when he was making
> suggestions on this area. I would need to dig out the thread -- I
> can't find it now. I'd prefer to leave this as-is, since changing it
> would add yet another variable to the behavioral changes of this
> series.

Sure, can stay as-is for now.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v5 07/21] x86, boot: Fix run_size calculation
  2016-04-15 19:26       ` [kernel-hardening] " Kees Cook
@ 2016-04-16  9:00         ` Ingo Molnar
  -1 siblings, 0 replies; 80+ messages in thread
From: Ingo Molnar @ 2016-04-16  9:00 UTC (permalink / raw)
  To: Kees Cook
  Cc: Yinghai Lu, Junjie Mao, Josh Triplett, Andrew Morton, Baoquan He,
	Ard Biesheuvel, Matt Redfearn, x86, H. Peter Anvin, Ingo Molnar,
	Borislav Petkov, Vivek Goyal, Andy Lutomirski, lasse.collin,
	Dave Young, kernel-hardening, LKML


* Kees Cook <keescook@chromium.org> wrote:

> > So can we rename it to something more expressive, such as kernel_total_size or 
> > so?
> 
> You got it. Thanks again for digging through all this!

You are welcome! A couple of logistical suggestions:

Could you please split up the series a bit and limit the next series to say no 
more than around 5 patches? (Can be a little bit more when justified to finish up 
a particular line of thought) That way I can apply them in reviewable groups, 
without having to reject the whole series because some patch deep into the series 
has some problem.

I'd suggest starting with absolutely critical fixes (if any!) as-is, to make 
backporting easier. By the looks of it I don't think there's any such patch in 
this series, but just in case there are any, they can be at the front.

Then come the various cleanup patches and non-critical fixes - everything that is 
not supposed to change the behavior of the kernel. I'd suggest doing them in 
roughly this order:

 - file renames first - so that any later revert in a smaller patch does not have
   to go through a rename barrier.

 - then .o-invariant trivial cleanups, the fixing, harmonization (and creation ;-)
   of comments.

 - then come more involved cleanups like moving logic from build time to boot
   time, stricter bounds checks, non-essential fixes, etc.

It might be useful if you declared at this stage that you are mostly done with the 
preparatory work and that the code base is ready for heavier changes, so that 
people (and me) can review the whole source for anything missing. Often a car 
needs a good power wash before we can tell what body work is needed.

... and once we are happy and proud about the code base, then come the more 
exciting things: more fundamental changes, and new features - on top of a squeaky 
clean code base.

This all can happen pretty quickly, as long as the ordering is proper.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [kernel-hardening] Re: [PATCH v5 07/21] x86, boot: Fix run_size calculation
@ 2016-04-16  9:00         ` Ingo Molnar
  0 siblings, 0 replies; 80+ messages in thread
From: Ingo Molnar @ 2016-04-16  9:00 UTC (permalink / raw)
  To: Kees Cook
  Cc: Yinghai Lu, Junjie Mao, Josh Triplett, Andrew Morton, Baoquan He,
	Ard Biesheuvel, Matt Redfearn, x86, H. Peter Anvin, Ingo Molnar,
	Borislav Petkov, Vivek Goyal, Andy Lutomirski, lasse.collin,
	Dave Young, kernel-hardening, LKML


* Kees Cook <keescook@chromium.org> wrote:

> > So can we rename it to something more expressive, such as kernel_total_size or 
> > so?
> 
> You got it. Thanks again for digging through all this!

You are welcome! A couple of logistical suggestions:

Could you please split up the series a bit and limit the next series to say no 
more than around 5 patches? (Can be a little bit more when justified to finish up 
a particular line of thought) That way I can apply them in reviewable groups, 
without having to reject the whole series because some patch deep into the series 
has some problem.

I'd suggest starting with absolutely critical fixes (if any!) as-is, to make 
backporting easier. By the looks of it I don't think there's any such patch in 
this series, but just in case there are any, they can be at the front.

Then come the various cleanup patches and non-critical fixes - everything that is 
not supposed to change the behavior of the kernel. I'd suggest doing them in 
roughly this order:

 - file renames first - so that any later revert in a smaller patch does not have
   to go through a rename barrier.

 - then .o-invariant trivial cleanups, the fixing, harmonization (and creation ;-)
   of comments.

 - then come more involved cleanups like moving logic from build time to boot
   time, stricter bounds checks, non-essential fixes, etc.

It might be useful if you declared at this stage that you are mostly done with the 
preparatory work and that the code base is ready for heavier changes, so that 
people (and me) can review the whole source for anything missing. Often a car 
needs a good power wash before we can tell what body work is needed.

... and once we are happy and proud about the code base, then come the more 
exciting things: more fundamental changes, and new features - on top of a squeaky 
clean code base.

This all can happen pretty quickly, as long as the ordering is proper.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v5 04/21] x86, boot: Move compressed kernel to end of decompression buffer
  2016-04-15  8:09     ` [kernel-hardening] " Ingo Molnar
@ 2016-04-18 16:50       ` Kees Cook
  -1 siblings, 0 replies; 80+ messages in thread
From: Kees Cook @ 2016-04-18 16:50 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Yinghai Lu, Baoquan He, Ard Biesheuvel, Matt Redfearn, x86,
	H. Peter Anvin, Ingo Molnar, Borislav Petkov, Vivek Goyal,
	Andy Lutomirski, Lasse Collin, Andrew Morton, Dave Young,
	kernel-hardening, LKML

On Fri, Apr 15, 2016 at 1:09 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Kees Cook <keescook@chromium.org> wrote:
>
>> From: Yinghai Lu <yinghai@kernel.org>
>>
>> This change makes later calculations about where the kernel is located
>> easier to reason about. To better understand this change, we must first
>> clarify what VO and ZO are. They were introduced in commits by hpa:
>>
>> 77d1a49 x86, boot: make symbols from the main vmlinux available
>> 37ba7ab x86, boot: make kernel_alignment adjustable; new bzImage fields
>>
>> Specifically:
>>
>> VO:
>> - uncompressed kernel image
>> - size: VO__end - VO__text ("VO_INIT_SIZE" define)
>>
>> ZO:
>> - bootable compressed kernel image (boot/compressed/vmlinux)
>> - head text + compressed kernel (VO and relocs table) + decompressor code
>> - size: ZO__end - ZO_startup_32 ("ZO_INIT_SIZE" define, though see below)
>>
>> The INIT_SIZE definition is used to find the larger of the two image sizes:
>>
>>  #define ZO_INIT_SIZE    (ZO__end - ZO_startup_32 + ZO_z_extract_offset)
>>  #define VO_INIT_SIZE    (VO__end - VO__text)
>>  #if ZO_INIT_SIZE > VO_INIT_SIZE
>>  #define INIT_SIZE ZO_INIT_SIZE
>>  #else
>>  #define INIT_SIZE VO_INIT_SIZE
>>  #endif
>
> Please also harmonize all the prefixes, i.e. use VO__/ZO__ everywhere (rename
> things where necessary), instead of this mixed up VO_/ZO_/VO__/ZO__ mess.

I'm going to leave these as they are: they're auto-generated based on
various functions of interest:

boot/Makefile:sed-zoffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW]
\(startup_32\|startup_64\|efi32_stub_entry\|efi64_stub_entry\|efi_pe_entry\|input_data\|_end\|z_.*\)$$/\#define
ZO_\2 0x\1/p'

i.e. ZO__end is _end's location. ZO_input_data is input_data's
position. I think it would further complicate things if we tried to
consolidate consecutive "_"s, or if we eliminated the leading "_"
(ZOinput_data and ZO_end).

-Kees

-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [kernel-hardening] Re: [PATCH v5 04/21] x86, boot: Move compressed kernel to end of decompression buffer
@ 2016-04-18 16:50       ` Kees Cook
  0 siblings, 0 replies; 80+ messages in thread
From: Kees Cook @ 2016-04-18 16:50 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Yinghai Lu, Baoquan He, Ard Biesheuvel, Matt Redfearn, x86,
	H. Peter Anvin, Ingo Molnar, Borislav Petkov, Vivek Goyal,
	Andy Lutomirski, Lasse Collin, Andrew Morton, Dave Young,
	kernel-hardening, LKML

On Fri, Apr 15, 2016 at 1:09 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Kees Cook <keescook@chromium.org> wrote:
>
>> From: Yinghai Lu <yinghai@kernel.org>
>>
>> This change makes later calculations about where the kernel is located
>> easier to reason about. To better understand this change, we must first
>> clarify what VO and ZO are. They were introduced in commits by hpa:
>>
>> 77d1a49 x86, boot: make symbols from the main vmlinux available
>> 37ba7ab x86, boot: make kernel_alignment adjustable; new bzImage fields
>>
>> Specifically:
>>
>> VO:
>> - uncompressed kernel image
>> - size: VO__end - VO__text ("VO_INIT_SIZE" define)
>>
>> ZO:
>> - bootable compressed kernel image (boot/compressed/vmlinux)
>> - head text + compressed kernel (VO and relocs table) + decompressor code
>> - size: ZO__end - ZO_startup_32 ("ZO_INIT_SIZE" define, though see below)
>>
>> The INIT_SIZE definition is used to find the larger of the two image sizes:
>>
>>  #define ZO_INIT_SIZE    (ZO__end - ZO_startup_32 + ZO_z_extract_offset)
>>  #define VO_INIT_SIZE    (VO__end - VO__text)
>>  #if ZO_INIT_SIZE > VO_INIT_SIZE
>>  #define INIT_SIZE ZO_INIT_SIZE
>>  #else
>>  #define INIT_SIZE VO_INIT_SIZE
>>  #endif
>
> Please also harmonize all the prefixes, i.e. use VO__/ZO__ everywhere (rename
> things where necessary), instead of this mixed up VO_/ZO_/VO__/ZO__ mess.

I'm going to leave these as they are: they're auto-generated based on
various functions of interest:

boot/Makefile:sed-zoffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW]
\(startup_32\|startup_64\|efi32_stub_entry\|efi64_stub_entry\|efi_pe_entry\|input_data\|_end\|z_.*\)$$/\#define
ZO_\2 0x\1/p'

i.e. ZO__end is _end's location. ZO_input_data is input_data's
position. I think it would further complicate things if we tried to
consolidate consecutive "_"s, or if we eliminated the leading "_"
(ZOinput_data and ZO_end).

-Kees

-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 80+ messages in thread

end of thread, other threads:[~2016-04-18 16:50 UTC | newest]

Thread overview: 80+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-14 22:28 [PATCH v5 00/21] x86, boot: KASLR cleanup and 64-bit improvements Kees Cook
2016-04-14 22:28 ` [kernel-hardening] " Kees Cook
2016-04-14 22:28 ` [PATCH v5 01/21] x86, KASLR: Remove unneeded boot_params argument Kees Cook
2016-04-14 22:28   ` [kernel-hardening] " Kees Cook
2016-04-15  7:29   ` Ingo Molnar
2016-04-15  7:29     ` [kernel-hardening] " Ingo Molnar
2016-04-15 18:55     ` Kees Cook
2016-04-15 18:55       ` [kernel-hardening] " Kees Cook
2016-04-14 22:28 ` [PATCH v5 02/21] x86, KASLR: Handle kernel relocation above 2G Kees Cook
2016-04-14 22:28   ` [kernel-hardening] " Kees Cook
2016-04-15  7:47   ` Ingo Molnar
2016-04-15  7:47     ` [kernel-hardening] " Ingo Molnar
2016-04-15 19:01     ` Kees Cook
2016-04-15 19:01       ` [kernel-hardening] " Kees Cook
2016-04-14 22:28 ` [PATCH v5 03/21] x86, KASLR: Drop CONFIG_RANDOMIZE_BASE_MAX_OFFSET Kees Cook
2016-04-14 22:28   ` [kernel-hardening] " Kees Cook
2016-04-15  8:07   ` Ingo Molnar
2016-04-15  8:07     ` [kernel-hardening] " Ingo Molnar
2016-04-15 19:12     ` Kees Cook
2016-04-15 19:12       ` [kernel-hardening] " Kees Cook
2016-04-16  8:42       ` Ingo Molnar
2016-04-16  8:42         ` [kernel-hardening] " Ingo Molnar
2016-04-14 22:28 ` [PATCH v5 04/21] x86, boot: Move compressed kernel to end of decompression buffer Kees Cook
2016-04-14 22:28   ` [kernel-hardening] " Kees Cook
2016-04-15  8:09   ` Ingo Molnar
2016-04-15  8:09     ` [kernel-hardening] " Ingo Molnar
2016-04-18 16:50     ` Kees Cook
2016-04-18 16:50       ` [kernel-hardening] " Kees Cook
2016-04-15  9:05   ` Ingo Molnar
2016-04-15  9:05     ` [kernel-hardening] " Ingo Molnar
2016-04-14 22:28 ` [PATCH v5 05/21] x86, boot: Calculate decompression size during boot not build Kees Cook
2016-04-14 22:28   ` [kernel-hardening] " Kees Cook
2016-04-15  8:12   ` Ingo Molnar
2016-04-15  8:12     ` [kernel-hardening] " Ingo Molnar
2016-04-15 19:14     ` Kees Cook
2016-04-15 19:14       ` [kernel-hardening] " Kees Cook
2016-04-14 22:28 ` [PATCH v5 06/21] x86, KASLR: Update description for decompressor worst case size Kees Cook
2016-04-14 22:28   ` [kernel-hardening] " Kees Cook
2016-04-15 16:17   ` Lasse Collin
2016-04-15 16:17     ` [kernel-hardening] " Lasse Collin
2016-04-14 22:29 ` [PATCH v5 07/21] x86, boot: Fix run_size calculation Kees Cook
2016-04-14 22:29   ` [kernel-hardening] " Kees Cook
2016-04-15  8:31   ` Ingo Molnar
2016-04-15  8:31     ` [kernel-hardening] " Ingo Molnar
2016-04-15 19:26     ` Kees Cook
2016-04-15 19:26       ` [kernel-hardening] " Kees Cook
2016-04-16  9:00       ` Ingo Molnar
2016-04-16  9:00         ` [kernel-hardening] " Ingo Molnar
2016-04-14 22:29 ` [PATCH v5 08/21] x86, KASLR: Clean up unused code from old run_size Kees Cook
2016-04-14 22:29   ` [kernel-hardening] " Kees Cook
2016-04-14 22:29 ` [PATCH v5 09/21] x86, KASLR: Correctly bounds-check relocations Kees Cook
2016-04-14 22:29   ` [kernel-hardening] " Kees Cook
2016-04-14 22:29 ` [PATCH v5 10/21] x86, KASLR: Consolidate mem_avoid entries Kees Cook
2016-04-14 22:29   ` [kernel-hardening] " Kees Cook
2016-04-14 22:29 ` [PATCH v5 11/21] x86, boot: Split out kernel_ident_mapping_init Kees Cook
2016-04-14 22:29   ` [kernel-hardening] " Kees Cook
2016-04-14 22:29 ` [PATCH v5 12/21] x86, 64bit: Set ident_mapping for KASLR Kees Cook
2016-04-14 22:29   ` [kernel-hardening] " Kees Cook
2016-04-14 22:29 ` [PATCH v5 13/21] x86, boot: Report overlap failures in memcpy Kees Cook
2016-04-14 22:29   ` [kernel-hardening] " Kees Cook
2016-04-15 14:42   ` Lasse Collin
2016-04-15 14:42     ` [kernel-hardening] " Lasse Collin
2016-04-15 19:28     ` Kees Cook
2016-04-15 19:28       ` [kernel-hardening] " Kees Cook
2016-04-14 22:29 ` [PATCH v5 14/21] x86, KASLR: Add slot_area to manage random slots Kees Cook
2016-04-14 22:29   ` [kernel-hardening] " Kees Cook
2016-04-14 22:29 ` [PATCH v5 15/21] x86, KASLR: Add slot_area support functions Kees Cook
2016-04-14 22:29   ` [kernel-hardening] " Kees Cook
2016-04-14 22:29 ` [PATCH v5 16/21] x86, KASLR: Add virtual address choosing function Kees Cook
2016-04-14 22:29   ` [kernel-hardening] " Kees Cook
2016-04-14 22:29 ` [PATCH v5 17/21] x86, KASLR: Clarify purpose of each get_random_long Kees Cook
2016-04-14 22:29   ` [kernel-hardening] " Kees Cook
2016-04-14 22:29 ` [PATCH v5 18/21] x86, KASLR: Randomize virtual address separately Kees Cook
2016-04-14 22:29   ` [kernel-hardening] " Kees Cook
2016-04-14 22:29 ` [PATCH v5 19/21] x86, KASLR: Add physical address randomization >4G Kees Cook
2016-04-14 22:29   ` [kernel-hardening] " Kees Cook
2016-04-14 22:29 ` [PATCH v5 20/21] x86, KASLR: Remove unused slot tracking code Kees Cook
2016-04-14 22:29   ` [kernel-hardening] " Kees Cook
2016-04-14 22:29 ` [PATCH v5 21/21] x86, KASLR: Allow randomization below load address Kees Cook
2016-04-14 22:29   ` [kernel-hardening] " Kees Cook

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.