linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/2] x86/boot/64: Avoid mapping reserved ranges in early page tables.
@ 2019-09-24 21:03 Steve Wahl
  2019-09-24 21:03 ` [PATCH v3 1/2] x86/boot/64: Make level2_kernel_pgt pages invalid outside kernel area Steve Wahl
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Steve Wahl @ 2019-09-24 21:03 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	x86, Juergen Gross, Kirill A. Shutemov, Brijesh Singh,
	Steve Wahl, Jordan Borgner, Feng Tang, linux-kernel,
	Zhenzhong Duan, Dave Hansen
  Cc: Baoquan He, russ.anderson, dimitri.sivanich, mike.travis

This patch set narrows the valid space addressed by the page table
level2_kernel_pgt to only contain ranges checked against the "usable
RAM" list provided by the BIOS.

Prior to this, some larger than needed mappings were occasionally
crossing over into spaces marked reserved, allowing the processor to
access these reserved spaces, which were caught by the hardware and
caused BIOS to halt on our platform (UV).

Changes since v1:

* Cover letter added because there's now two patches.

* Patch 1: Added comment and re-worked changelog text.

* Patch 2: New change requested by Dave Hansen to handle the case that
  the mapping of the last PMD page for the kernel image could cross a
  reserved region boundary.

Changes since v2:

* Patch 1: Added further inline comments.
* Patch 2: None.

Steve Wahl (2):
  x86/boot/64: Make level2_kernel_pgt pages invalid outside kernel area.
  x86/boot/64: round memory hole size up to next PMD page.

 arch/x86/boot/compressed/misc.c | 25 +++++++++++++++++++------
 arch/x86/kernel/head64.c        | 22 ++++++++++++++++++++--
 2 files changed, 39 insertions(+), 8 deletions(-)

-- 
2.21.0


-- 
Steve Wahl, Hewlett Packard Enterprise

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v3 1/2] x86/boot/64: Make level2_kernel_pgt pages invalid outside kernel area.
  2019-09-24 21:03 [PATCH v3 0/2] x86/boot/64: Avoid mapping reserved ranges in early page tables Steve Wahl
@ 2019-09-24 21:03 ` Steve Wahl
  2019-09-26 10:23   ` Kirill A. Shutemov
                     ` (3 more replies)
  2019-09-24 21:04 ` [PATCH v3 2/2] x86/boot/64: round memory hole size up to next PMD page Steve Wahl
  2019-10-10 19:27 ` [PATCH v3 0/2] x86/boot/64: Avoid mapping reserved ranges in early page tables Steve Wahl
  2 siblings, 4 replies; 11+ messages in thread
From: Steve Wahl @ 2019-09-24 21:03 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	x86, Juergen Gross, Kirill A. Shutemov, Brijesh Singh,
	Steve Wahl, Jordan Borgner, Feng Tang, linux-kernel,
	Zhenzhong Duan, Dave Hansen
  Cc: Baoquan He, russ.anderson, dimitri.sivanich, mike.travis

Our hardware (UV aka Superdome Flex) has address ranges marked
reserved by the BIOS. Access to these ranges is caught as an error,
causing the BIOS to halt the system.

Initial page tables mapped a large range of physical addresses that
were not checked against the list of BIOS reserved addresses, and
sometimes included reserved addresses in part of the mapped range.
Including the reserved range in the map allowed processor speculative
accesses to the reserved range, triggering a BIOS halt.

Used early in booting, the page table level2_kernel_pgt addresses 1
GiB divided into 2 MiB pages, and it was set up to linearly map a full
1 GiB of physical addresses that included the physical address range
of the kernel image, as chosen by KASLR.  But this also included a
large range of unused addresses on either side of the kernel image.
And unlike the kernel image's physical address range, this extra
mapped space was not checked against the BIOS tables of usable RAM
addresses.  So there were times when the addresses chosen by KASLR
would result in processor accessible mappings of BIOS reserved
physical addresses.

The kernel code did not directly access any of this extra mapped
space, but having it mapped allowed the processor to issue speculative
accesses into reserved memory, causing system halts.

This was encountered somewhat rarely on a normal system boot, and much
more often when starting the crash kernel if "crashkernel=512M,high"
was specified on the command line (this heavily restricts the physical
address of the crash kernel, in our case usually within 1 GiB of
reserved space).

The solution is to invalidate the pages of this table outside the
kernel image's space before the page table is activated.  This patch
has been validated to fix this problem on our hardware.

Signed-off-by: Steve Wahl <steve.wahl@hpe.com>
Cc: stable@vger.kernel.org
---
Changes since v1:
  * Added comment.
  * Reworked changelog text.
Changes since v2:
  * Added further inline comments.
 arch/x86/kernel/head64.c | 22 ++++++++++++++++++++--
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 29ffa495bd1c..282054025dcf 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -222,13 +222,31 @@ unsigned long __head __startup_64(unsigned long physaddr,
 	 * we might write invalid pmds, when the kernel is relocated
 	 * cleanup_highmap() fixes this up along with the mappings
 	 * beyond _end.
+	 *
+	 * Only the region occupied by the kernel image has so far
+	 * been checked against the table of usable memory regions
+	 * provided by the firmware, so invalidate pages outside that
+	 * region.  A page table entry that maps to a reserved area of
+	 * memory would allow processor speculation into that area,
+	 * and on some hardware (particularly the UV platform) even
+	 * speculative access to some reserved areas is caught as an
+	 * error, causing the BIOS to halt the system.
 	 */
 
 	pmd = fixup_pointer(level2_kernel_pgt, physaddr);
-	for (i = 0; i < PTRS_PER_PMD; i++) {
+
+	/* invalidate pages before the kernel image */
+	for (i = 0; i < pmd_index((unsigned long)_text); i++)
+		pmd[i] &= ~_PAGE_PRESENT;
+
+	/* fixup pages that are part of the kernel image */
+	for (; i <= pmd_index((unsigned long)_end); i++)
 		if (pmd[i] & _PAGE_PRESENT)
 			pmd[i] += load_delta;
-	}
+
+	/* invalidate pages after the kernel image */
+	for (; i < PTRS_PER_PMD; i++)
+		pmd[i] &= ~_PAGE_PRESENT;
 
 	/*
 	 * Fixup phys_base - remove the memory encryption mask to obtain
-- 
2.21.0


-- 
Steve Wahl, Hewlett Packard Enterprise

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v3 2/2] x86/boot/64: round memory hole size up to next PMD page.
  2019-09-24 21:03 [PATCH v3 0/2] x86/boot/64: Avoid mapping reserved ranges in early page tables Steve Wahl
  2019-09-24 21:03 ` [PATCH v3 1/2] x86/boot/64: Make level2_kernel_pgt pages invalid outside kernel area Steve Wahl
@ 2019-09-24 21:04 ` Steve Wahl
  2019-09-26 10:23   ` Kirill A. Shutemov
                     ` (2 more replies)
  2019-10-10 19:27 ` [PATCH v3 0/2] x86/boot/64: Avoid mapping reserved ranges in early page tables Steve Wahl
  2 siblings, 3 replies; 11+ messages in thread
From: Steve Wahl @ 2019-09-24 21:04 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	x86, Juergen Gross, Kirill A. Shutemov, Brijesh Singh,
	Steve Wahl, Jordan Borgner, Feng Tang, linux-kernel,
	Zhenzhong Duan, Dave Hansen
  Cc: Baoquan He, russ.anderson, dimitri.sivanich, mike.travis

The kernel image map is created using PMD pages, which can include
some extra space beyond what's actually needed.  Round the size of the
memory hole we search for up to the next PMD boundary, to be certain
all of the space to be mapped is usable RAM and includes no reserved
areas.

Signed-off-by: Steve Wahl <steve.wahl@hpe.com>
Cc: stable@vger.kernel.org
---
Changes since v1:
  * This patch is completely new to this verison.
Changes since v2:
  None.

 arch/x86/boot/compressed/misc.c | 25 +++++++++++++++++++------
 1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 53ac0cb2396d..9652d5c2afda 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -345,6 +345,7 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
 {
 	const unsigned long kernel_total_size = VO__end - VO__text;
 	unsigned long virt_addr = LOAD_PHYSICAL_ADDR;
+	unsigned long needed_size;
 
 	/* Retain x86 boot parameters pointer passed from startup_32/64. */
 	boot_params = rmode;
@@ -379,26 +380,38 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
 	free_mem_ptr     = heap;	/* Heap */
 	free_mem_end_ptr = heap + BOOT_HEAP_SIZE;
 
+	/*
+	 * The memory hole needed for the kernel is the larger of either
+	 * the entire decompressed kernel plus relocation table, or the
+	 * entire decompressed kernel plus .bss and .brk sections.
+	 *
+	 * On X86_64, the memory is mapped with PMD pages. Round the
+	 * size up so that the full extent of PMD pages mapped is
+	 * included in the check against the valid memory table
+	 * entries. This ensures the full mapped area is usable RAM
+	 * and doesn't include any reserved areas.
+	 */
+	needed_size = max(output_len, kernel_total_size);
+#ifdef CONFIG_X86_64
+	needed_size = ALIGN(needed_size, MIN_KERNEL_ALIGN);
+#endif
+
 	/* Report initial kernel position details. */
 	debug_putaddr(input_data);
 	debug_putaddr(input_len);
 	debug_putaddr(output);
 	debug_putaddr(output_len);
 	debug_putaddr(kernel_total_size);
+	debug_putaddr(needed_size);
 
 #ifdef CONFIG_X86_64
 	/* Report address of 32-bit trampoline */
 	debug_putaddr(trampoline_32bit);
 #endif
 
-	/*
-	 * The memory hole needed for the kernel is the larger of either
-	 * the entire decompressed kernel plus relocation table, or the
-	 * entire decompressed kernel plus .bss and .brk sections.
-	 */
 	choose_random_location((unsigned long)input_data, input_len,
 				(unsigned long *)&output,
-				max(output_len, kernel_total_size),
+				needed_size,
 				&virt_addr);
 
 	/* Validate memory location choices. */
-- 
2.21.0


-- 
Steve Wahl, Hewlett Packard Enterprise

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 1/2] x86/boot/64: Make level2_kernel_pgt pages invalid outside kernel area.
  2019-09-24 21:03 ` [PATCH v3 1/2] x86/boot/64: Make level2_kernel_pgt pages invalid outside kernel area Steve Wahl
@ 2019-09-26 10:23   ` Kirill A. Shutemov
  2019-10-11 16:02   ` Dave Hansen
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 11+ messages in thread
From: Kirill A. Shutemov @ 2019-09-26 10:23 UTC (permalink / raw)
  To: Steve Wahl
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	x86, Juergen Gross, Brijesh Singh, Jordan Borgner, Feng Tang,
	linux-kernel, Zhenzhong Duan, Dave Hansen, Baoquan He,
	russ.anderson, dimitri.sivanich, mike.travis

On Tue, Sep 24, 2019 at 04:03:55PM -0500, Steve Wahl wrote:
> Our hardware (UV aka Superdome Flex) has address ranges marked
> reserved by the BIOS. Access to these ranges is caught as an error,
> causing the BIOS to halt the system.
> 
> Initial page tables mapped a large range of physical addresses that
> were not checked against the list of BIOS reserved addresses, and
> sometimes included reserved addresses in part of the mapped range.
> Including the reserved range in the map allowed processor speculative
> accesses to the reserved range, triggering a BIOS halt.
> 
> Used early in booting, the page table level2_kernel_pgt addresses 1
> GiB divided into 2 MiB pages, and it was set up to linearly map a full
> 1 GiB of physical addresses that included the physical address range
> of the kernel image, as chosen by KASLR.  But this also included a
> large range of unused addresses on either side of the kernel image.
> And unlike the kernel image's physical address range, this extra
> mapped space was not checked against the BIOS tables of usable RAM
> addresses.  So there were times when the addresses chosen by KASLR
> would result in processor accessible mappings of BIOS reserved
> physical addresses.
> 
> The kernel code did not directly access any of this extra mapped
> space, but having it mapped allowed the processor to issue speculative
> accesses into reserved memory, causing system halts.
> 
> This was encountered somewhat rarely on a normal system boot, and much
> more often when starting the crash kernel if "crashkernel=512M,high"
> was specified on the command line (this heavily restricts the physical
> address of the crash kernel, in our case usually within 1 GiB of
> reserved space).
> 
> The solution is to invalidate the pages of this table outside the
> kernel image's space before the page table is activated.  This patch
> has been validated to fix this problem on our hardware.
> 
> Signed-off-by: Steve Wahl <steve.wahl@hpe.com>
> Cc: stable@vger.kernel.org

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 2/2] x86/boot/64: round memory hole size up to next PMD page.
  2019-09-24 21:04 ` [PATCH v3 2/2] x86/boot/64: round memory hole size up to next PMD page Steve Wahl
@ 2019-09-26 10:23   ` Kirill A. Shutemov
  2019-10-11 18:27   ` [tip: x86/urgent] x86/boot/64: Round " tip-bot2 for Steve Wahl
  2019-10-11 18:27   ` tip-bot2 for Steve Wahl
  2 siblings, 0 replies; 11+ messages in thread
From: Kirill A. Shutemov @ 2019-09-26 10:23 UTC (permalink / raw)
  To: Steve Wahl
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	x86, Juergen Gross, Brijesh Singh, Jordan Borgner, Feng Tang,
	linux-kernel, Zhenzhong Duan, Dave Hansen, Baoquan He,
	russ.anderson, dimitri.sivanich, mike.travis

On Tue, Sep 24, 2019 at 04:04:31PM -0500, Steve Wahl wrote:
> The kernel image map is created using PMD pages, which can include
> some extra space beyond what's actually needed.  Round the size of the
> memory hole we search for up to the next PMD boundary, to be certain
> all of the space to be mapped is usable RAM and includes no reserved
> areas.
> 
> Signed-off-by: Steve Wahl <steve.wahl@hpe.com>
> Cc: stable@vger.kernel.org

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 0/2] x86/boot/64: Avoid mapping reserved ranges in early page tables.
  2019-09-24 21:03 [PATCH v3 0/2] x86/boot/64: Avoid mapping reserved ranges in early page tables Steve Wahl
  2019-09-24 21:03 ` [PATCH v3 1/2] x86/boot/64: Make level2_kernel_pgt pages invalid outside kernel area Steve Wahl
  2019-09-24 21:04 ` [PATCH v3 2/2] x86/boot/64: round memory hole size up to next PMD page Steve Wahl
@ 2019-10-10 19:27 ` Steve Wahl
  2 siblings, 0 replies; 11+ messages in thread
From: Steve Wahl @ 2019-10-10 19:27 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	x86, Juergen Gross, Kirill A. Shutemov, Brijesh Singh,
	Steve Wahl, Jordan Borgner, Feng Tang, linux-kernel,
	Zhenzhong Duan, Dave Hansen
  Cc: Baoquan He, russ.anderson, dimitri.sivanich, mike.travis

It's been a while on this patch set; two weeks ago Kirill added acks,
no movement since.  Is there anything I need to be doing on my part to
move this forward?

Thanks!

--> Steve Wahl 

On Tue, Sep 24, 2019 at 04:03:22PM -0500, Steve Wahl wrote:
> This patch set narrows the valid space addressed by the page table
> level2_kernel_pgt to only contain ranges checked against the "usable
> RAM" list provided by the BIOS.
> 
> Prior to this, some larger than needed mappings were occasionally
> crossing over into spaces marked reserved, allowing the processor to
> access these reserved spaces, which were caught by the hardware and
> caused BIOS to halt on our platform (UV).
> 
> Changes since v1:
> 
> * Cover letter added because there's now two patches.
> 
> * Patch 1: Added comment and re-worked changelog text.
> 
> * Patch 2: New change requested by Dave Hansen to handle the case that
>   the mapping of the last PMD page for the kernel image could cross a
>   reserved region boundary.
> 
> Changes since v2:
> 
> * Patch 1: Added further inline comments.
> * Patch 2: None.
> 
> Steve Wahl (2):
>   x86/boot/64: Make level2_kernel_pgt pages invalid outside kernel area.
>   x86/boot/64: round memory hole size up to next PMD page.
> 
>  arch/x86/boot/compressed/misc.c | 25 +++++++++++++++++++------
>  arch/x86/kernel/head64.c        | 22 ++++++++++++++++++++--
>  2 files changed, 39 insertions(+), 8 deletions(-)
> 
> -- 
> 2.21.0
> 
> 
> -- 
> Steve Wahl, Hewlett Packard Enterprise

-- 
Steve Wahl, Hewlett Packard Enterprise

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 1/2] x86/boot/64: Make level2_kernel_pgt pages invalid outside kernel area.
  2019-09-24 21:03 ` [PATCH v3 1/2] x86/boot/64: Make level2_kernel_pgt pages invalid outside kernel area Steve Wahl
  2019-09-26 10:23   ` Kirill A. Shutemov
@ 2019-10-11 16:02   ` Dave Hansen
  2019-10-11 18:27   ` [tip: x86/urgent] " tip-bot2 for Steve Wahl
  2019-10-11 18:27   ` tip-bot2 for Steve Wahl
  3 siblings, 0 replies; 11+ messages in thread
From: Dave Hansen @ 2019-10-11 16:02 UTC (permalink / raw)
  To: Steve Wahl, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, x86, Juergen Gross, Kirill A. Shutemov,
	Brijesh Singh, Jordan Borgner, Feng Tang, linux-kernel,
	Zhenzhong Duan
  Cc: Baoquan He, russ.anderson, dimitri.sivanich, mike.travis

On 9/24/19 2:03 PM, Steve Wahl wrote:
> The solution is to invalidate the pages of this table outside the
> kernel image's space before the page table is activated.  This patch
> has been validated to fix this problem on our hardware.

Looks good, thanks for the changes!

For both patches:

Acked-by: Dave Hansen <dave.hansen@linux.intel.com>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [tip: x86/urgent] x86/boot/64: Make level2_kernel_pgt pages invalid outside kernel area
  2019-09-24 21:03 ` [PATCH v3 1/2] x86/boot/64: Make level2_kernel_pgt pages invalid outside kernel area Steve Wahl
  2019-09-26 10:23   ` Kirill A. Shutemov
  2019-10-11 16:02   ` Dave Hansen
@ 2019-10-11 18:27   ` tip-bot2 for Steve Wahl
  2019-10-11 18:27   ` tip-bot2 for Steve Wahl
  3 siblings, 0 replies; 11+ messages in thread
From: tip-bot2 for Steve Wahl @ 2019-10-11 18:27 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Steve Wahl, Borislav Petkov, Dave Hansen, Kirill A. Shutemov,
	Baoquan He, Brijesh Singh, dimitri.sivanich, Feng Tang,
	H. Peter Anvin, Ingo Molnar, Jordan Borgner, Juergen Gross,
	mike.travis, russ.anderson, stable, Thomas Gleixner, x86-ml,
	Zhenzhong Duan, Ingo Molnar, Borislav Petkov, linux-kernel

The following commit has been merged into the x86/urgent branch of tip:

Commit-ID:     2aa85f246c181b1fa89f27e8e20c5636426be624
Gitweb:        https://git.kernel.org/tip/2aa85f246c181b1fa89f27e8e20c5636426be624
Author:        Steve Wahl <steve.wahl@hpe.com>
AuthorDate:    Tue, 24 Sep 2019 16:03:55 -05:00
Committer:     Borislav Petkov <bp@suse.de>
CommitterDate: Fri, 11 Oct 2019 18:38:15 +02:00

x86/boot/64: Make level2_kernel_pgt pages invalid outside kernel area

Our hardware (UV aka Superdome Flex) has address ranges marked
reserved by the BIOS. Access to these ranges is caught as an error,
causing the BIOS to halt the system.

Initial page tables mapped a large range of physical addresses that
were not checked against the list of BIOS reserved addresses, and
sometimes included reserved addresses in part of the mapped range.
Including the reserved range in the map allowed processor speculative
accesses to the reserved range, triggering a BIOS halt.

Used early in booting, the page table level2_kernel_pgt addresses 1
GiB divided into 2 MiB pages, and it was set up to linearly map a full
 1 GiB of physical addresses that included the physical address range
of the kernel image, as chosen by KASLR.  But this also included a
large range of unused addresses on either side of the kernel image.
And unlike the kernel image's physical address range, this extra
mapped space was not checked against the BIOS tables of usable RAM
addresses.  So there were times when the addresses chosen by KASLR
would result in processor accessible mappings of BIOS reserved
physical addresses.

The kernel code did not directly access any of this extra mapped
space, but having it mapped allowed the processor to issue speculative
accesses into reserved memory, causing system halts.

This was encountered somewhat rarely on a normal system boot, and much
more often when starting the crash kernel if "crashkernel=512M,high"
was specified on the command line (this heavily restricts the physical
address of the crash kernel, in our case usually within 1 GiB of
reserved space).

The solution is to invalidate the pages of this table outside the kernel
image's space before the page table is activated. It fixes this problem
on our hardware.

 [ bp: Touchups. ]

Signed-off-by: Steve Wahl <steve.wahl@hpe.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: dimitri.sivanich@hpe.com
Cc: Feng Tang <feng.tang@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jordan Borgner <mail@jordan-borgner.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: mike.travis@hpe.com
Cc: russ.anderson@hpe.com
Cc: stable@vger.kernel.org
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Cc: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Link: https://lkml.kernel.org/r/9c011ee51b081534a7a15065b1681d200298b530.1569358539.git.steve.wahl@hpe.com
---
 arch/x86/kernel/head64.c | 22 ++++++++++++++++++++--
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 29ffa49..206a4b6 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -222,13 +222,31 @@ unsigned long __head __startup_64(unsigned long physaddr,
 	 * we might write invalid pmds, when the kernel is relocated
 	 * cleanup_highmap() fixes this up along with the mappings
 	 * beyond _end.
+	 *
+	 * Only the region occupied by the kernel image has so far
+	 * been checked against the table of usable memory regions
+	 * provided by the firmware, so invalidate pages outside that
+	 * region. A page table entry that maps to a reserved area of
+	 * memory would allow processor speculation into that area,
+	 * and on some hardware (particularly the UV platform) even
+	 * speculative access to some reserved areas is caught as an
+	 * error, causing the BIOS to halt the system.
 	 */
 
 	pmd = fixup_pointer(level2_kernel_pgt, physaddr);
-	for (i = 0; i < PTRS_PER_PMD; i++) {
+
+	/* invalidate pages before the kernel image */
+	for (i = 0; i < pmd_index((unsigned long)_text); i++)
+		pmd[i] &= ~_PAGE_PRESENT;
+
+	/* fixup pages that are part of the kernel image */
+	for (; i <= pmd_index((unsigned long)_end); i++)
 		if (pmd[i] & _PAGE_PRESENT)
 			pmd[i] += load_delta;
-	}
+
+	/* invalidate pages after the kernel image */
+	for (; i < PTRS_PER_PMD; i++)
+		pmd[i] &= ~_PAGE_PRESENT;
 
 	/*
 	 * Fixup phys_base - remove the memory encryption mask to obtain

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [tip: x86/urgent] x86/boot/64: Round memory hole size up to next PMD page
  2019-09-24 21:04 ` [PATCH v3 2/2] x86/boot/64: round memory hole size up to next PMD page Steve Wahl
  2019-09-26 10:23   ` Kirill A. Shutemov
@ 2019-10-11 18:27   ` tip-bot2 for Steve Wahl
  2019-10-11 18:27   ` tip-bot2 for Steve Wahl
  2 siblings, 0 replies; 11+ messages in thread
From: tip-bot2 for Steve Wahl @ 2019-10-11 18:27 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Steve Wahl, Borislav Petkov, Dave Hansen, Kirill A. Shutemov,
	Baoquan He, Brijesh Singh, dimitri.sivanich, Feng Tang,
	H. Peter Anvin, Ingo Molnar, Jordan Borgner, Juergen Gross,
	mike.travis, russ.anderson, Thomas Gleixner, x86-ml,
	Zhenzhong Duan, Ingo Molnar, Borislav Petkov, linux-kernel

The following commit has been merged into the x86/urgent branch of tip:

Commit-ID:     1869dbe87cb94dc9a218ae1d9301dea3678bd4ff
Gitweb:        https://git.kernel.org/tip/1869dbe87cb94dc9a218ae1d9301dea3678bd4ff
Author:        Steve Wahl <steve.wahl@hpe.com>
AuthorDate:    Tue, 24 Sep 2019 16:04:31 -05:00
Committer:     Borislav Petkov <bp@suse.de>
CommitterDate: Fri, 11 Oct 2019 18:47:23 +02:00

x86/boot/64: Round memory hole size up to next PMD page

The kernel image map is created using PMD pages, which can include
some extra space beyond what's actually needed.  Round the size of the
memory hole we search for up to the next PMD boundary, to be certain
all of the space to be mapped is usable RAM and includes no reserved
areas.

Signed-off-by: Steve Wahl <steve.wahl@hpe.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: dimitri.sivanich@hpe.com
Cc: Feng Tang <feng.tang@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jordan Borgner <mail@jordan-borgner.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: mike.travis@hpe.com
Cc: russ.anderson@hpe.com
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Cc: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Link: https://lkml.kernel.org/r/df4f49f05c0c27f108234eb93db5c613d09ea62e.1569358539.git.steve.wahl@hpe.com
---
 arch/x86/boot/compressed/misc.c | 25 +++++++++++++++++++------
 1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 53ac0cb..9652d5c 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -345,6 +345,7 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
 {
 	const unsigned long kernel_total_size = VO__end - VO__text;
 	unsigned long virt_addr = LOAD_PHYSICAL_ADDR;
+	unsigned long needed_size;
 
 	/* Retain x86 boot parameters pointer passed from startup_32/64. */
 	boot_params = rmode;
@@ -379,26 +380,38 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
 	free_mem_ptr     = heap;	/* Heap */
 	free_mem_end_ptr = heap + BOOT_HEAP_SIZE;
 
+	/*
+	 * The memory hole needed for the kernel is the larger of either
+	 * the entire decompressed kernel plus relocation table, or the
+	 * entire decompressed kernel plus .bss and .brk sections.
+	 *
+	 * On X86_64, the memory is mapped with PMD pages. Round the
+	 * size up so that the full extent of PMD pages mapped is
+	 * included in the check against the valid memory table
+	 * entries. This ensures the full mapped area is usable RAM
+	 * and doesn't include any reserved areas.
+	 */
+	needed_size = max(output_len, kernel_total_size);
+#ifdef CONFIG_X86_64
+	needed_size = ALIGN(needed_size, MIN_KERNEL_ALIGN);
+#endif
+
 	/* Report initial kernel position details. */
 	debug_putaddr(input_data);
 	debug_putaddr(input_len);
 	debug_putaddr(output);
 	debug_putaddr(output_len);
 	debug_putaddr(kernel_total_size);
+	debug_putaddr(needed_size);
 
 #ifdef CONFIG_X86_64
 	/* Report address of 32-bit trampoline */
 	debug_putaddr(trampoline_32bit);
 #endif
 
-	/*
-	 * The memory hole needed for the kernel is the larger of either
-	 * the entire decompressed kernel plus relocation table, or the
-	 * entire decompressed kernel plus .bss and .brk sections.
-	 */
 	choose_random_location((unsigned long)input_data, input_len,
 				(unsigned long *)&output,
-				max(output_len, kernel_total_size),
+				needed_size,
 				&virt_addr);
 
 	/* Validate memory location choices. */

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [tip: x86/urgent] x86/boot/64: Round memory hole size up to next PMD page
  2019-09-24 21:04 ` [PATCH v3 2/2] x86/boot/64: round memory hole size up to next PMD page Steve Wahl
  2019-09-26 10:23   ` Kirill A. Shutemov
  2019-10-11 18:27   ` [tip: x86/urgent] x86/boot/64: Round " tip-bot2 for Steve Wahl
@ 2019-10-11 18:27   ` tip-bot2 for Steve Wahl
  2 siblings, 0 replies; 11+ messages in thread
From: tip-bot2 for Steve Wahl @ 2019-10-11 18:27 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Steve Wahl, Borislav Petkov, Dave Hansen, Kirill A. Shutemov,
	Baoquan He, Brijesh Singh, dimitri.sivanich, Feng Tang,
	H. Peter Anvin, Ingo Molnar, Jordan Borgner, Juergen Gross,
	mike.travis, russ.anderson, Thomas Gleixner, x86-ml,
	Zhenzhong Duan, Ingo Molnar, Borislav Petkov, linux-kernel

The following commit has been merged into the x86/urgent branch of tip:

Commit-ID:     1869dbe87cb94dc9a218ae1d9301dea3678bd4ff
Gitweb:        https://git.kernel.org/tip/1869dbe87cb94dc9a218ae1d9301dea3678bd4ff
Author:        Steve Wahl <steve.wahl@hpe.com>
AuthorDate:    Tue, 24 Sep 2019 16:04:31 -05:00
Committer:     Borislav Petkov <bp@suse.de>
CommitterDate: Fri, 11 Oct 2019 18:47:23 +02:00

x86/boot/64: Round memory hole size up to next PMD page

The kernel image map is created using PMD pages, which can include
some extra space beyond what's actually needed.  Round the size of the
memory hole we search for up to the next PMD boundary, to be certain
all of the space to be mapped is usable RAM and includes no reserved
areas.

Signed-off-by: Steve Wahl <steve.wahl@hpe.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: dimitri.sivanich@hpe.com
Cc: Feng Tang <feng.tang@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jordan Borgner <mail@jordan-borgner.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: mike.travis@hpe.com
Cc: russ.anderson@hpe.com
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Cc: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Link: https://lkml.kernel.org/r/df4f49f05c0c27f108234eb93db5c613d09ea62e.1569358539.git.steve.wahl@hpe.com
---
 arch/x86/boot/compressed/misc.c | 25 +++++++++++++++++++------
 1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 53ac0cb..9652d5c 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -345,6 +345,7 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
 {
 	const unsigned long kernel_total_size = VO__end - VO__text;
 	unsigned long virt_addr = LOAD_PHYSICAL_ADDR;
+	unsigned long needed_size;
 
 	/* Retain x86 boot parameters pointer passed from startup_32/64. */
 	boot_params = rmode;
@@ -379,26 +380,38 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
 	free_mem_ptr     = heap;	/* Heap */
 	free_mem_end_ptr = heap + BOOT_HEAP_SIZE;
 
+	/*
+	 * The memory hole needed for the kernel is the larger of either
+	 * the entire decompressed kernel plus relocation table, or the
+	 * entire decompressed kernel plus .bss and .brk sections.
+	 *
+	 * On X86_64, the memory is mapped with PMD pages. Round the
+	 * size up so that the full extent of PMD pages mapped is
+	 * included in the check against the valid memory table
+	 * entries. This ensures the full mapped area is usable RAM
+	 * and doesn't include any reserved areas.
+	 */
+	needed_size = max(output_len, kernel_total_size);
+#ifdef CONFIG_X86_64
+	needed_size = ALIGN(needed_size, MIN_KERNEL_ALIGN);
+#endif
+
 	/* Report initial kernel position details. */
 	debug_putaddr(input_data);
 	debug_putaddr(input_len);
 	debug_putaddr(output);
 	debug_putaddr(output_len);
 	debug_putaddr(kernel_total_size);
+	debug_putaddr(needed_size);
 
 #ifdef CONFIG_X86_64
 	/* Report address of 32-bit trampoline */
 	debug_putaddr(trampoline_32bit);
 #endif
 
-	/*
-	 * The memory hole needed for the kernel is the larger of either
-	 * the entire decompressed kernel plus relocation table, or the
-	 * entire decompressed kernel plus .bss and .brk sections.
-	 */
 	choose_random_location((unsigned long)input_data, input_len,
 				(unsigned long *)&output,
-				max(output_len, kernel_total_size),
+				needed_size,
 				&virt_addr);
 
 	/* Validate memory location choices. */

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [tip: x86/urgent] x86/boot/64: Make level2_kernel_pgt pages invalid outside kernel area
  2019-09-24 21:03 ` [PATCH v3 1/2] x86/boot/64: Make level2_kernel_pgt pages invalid outside kernel area Steve Wahl
                     ` (2 preceding siblings ...)
  2019-10-11 18:27   ` [tip: x86/urgent] " tip-bot2 for Steve Wahl
@ 2019-10-11 18:27   ` tip-bot2 for Steve Wahl
  3 siblings, 0 replies; 11+ messages in thread
From: tip-bot2 for Steve Wahl @ 2019-10-11 18:27 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Steve Wahl, Borislav Petkov, Dave Hansen, Kirill A. Shutemov,
	Baoquan He, Brijesh Singh, dimitri.sivanich, Feng Tang,
	H. Peter Anvin, Ingo Molnar, Jordan Borgner, Juergen Gross,
	mike.travis, russ.anderson, stable, Thomas Gleixner, x86-ml,
	Zhenzhong Duan, Ingo Molnar, Borislav Petkov, linux-kernel

The following commit has been merged into the x86/urgent branch of tip:

Commit-ID:     2aa85f246c181b1fa89f27e8e20c5636426be624
Gitweb:        https://git.kernel.org/tip/2aa85f246c181b1fa89f27e8e20c5636426be624
Author:        Steve Wahl <steve.wahl@hpe.com>
AuthorDate:    Tue, 24 Sep 2019 16:03:55 -05:00
Committer:     Borislav Petkov <bp@suse.de>
CommitterDate: Fri, 11 Oct 2019 18:38:15 +02:00

x86/boot/64: Make level2_kernel_pgt pages invalid outside kernel area

Our hardware (UV aka Superdome Flex) has address ranges marked
reserved by the BIOS. Access to these ranges is caught as an error,
causing the BIOS to halt the system.

Initial page tables mapped a large range of physical addresses that
were not checked against the list of BIOS reserved addresses, and
sometimes included reserved addresses in part of the mapped range.
Including the reserved range in the map allowed processor speculative
accesses to the reserved range, triggering a BIOS halt.

Used early in booting, the page table level2_kernel_pgt addresses 1
GiB divided into 2 MiB pages, and it was set up to linearly map a full
 1 GiB of physical addresses that included the physical address range
of the kernel image, as chosen by KASLR.  But this also included a
large range of unused addresses on either side of the kernel image.
And unlike the kernel image's physical address range, this extra
mapped space was not checked against the BIOS tables of usable RAM
addresses.  So there were times when the addresses chosen by KASLR
would result in processor accessible mappings of BIOS reserved
physical addresses.

The kernel code did not directly access any of this extra mapped
space, but having it mapped allowed the processor to issue speculative
accesses into reserved memory, causing system halts.

This was encountered somewhat rarely on a normal system boot, and much
more often when starting the crash kernel if "crashkernel=512M,high"
was specified on the command line (this heavily restricts the physical
address of the crash kernel, in our case usually within 1 GiB of
reserved space).

The solution is to invalidate the pages of this table outside the kernel
image's space before the page table is activated. It fixes this problem
on our hardware.

 [ bp: Touchups. ]

Signed-off-by: Steve Wahl <steve.wahl@hpe.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: dimitri.sivanich@hpe.com
Cc: Feng Tang <feng.tang@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jordan Borgner <mail@jordan-borgner.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: mike.travis@hpe.com
Cc: russ.anderson@hpe.com
Cc: stable@vger.kernel.org
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Cc: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Link: https://lkml.kernel.org/r/9c011ee51b081534a7a15065b1681d200298b530.1569358539.git.steve.wahl@hpe.com
---
 arch/x86/kernel/head64.c | 22 ++++++++++++++++++++--
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 29ffa49..206a4b6 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -222,13 +222,31 @@ unsigned long __head __startup_64(unsigned long physaddr,
 	 * we might write invalid pmds, when the kernel is relocated
 	 * cleanup_highmap() fixes this up along with the mappings
 	 * beyond _end.
+	 *
+	 * Only the region occupied by the kernel image has so far
+	 * been checked against the table of usable memory regions
+	 * provided by the firmware, so invalidate pages outside that
+	 * region. A page table entry that maps to a reserved area of
+	 * memory would allow processor speculation into that area,
+	 * and on some hardware (particularly the UV platform) even
+	 * speculative access to some reserved areas is caught as an
+	 * error, causing the BIOS to halt the system.
 	 */
 
 	pmd = fixup_pointer(level2_kernel_pgt, physaddr);
-	for (i = 0; i < PTRS_PER_PMD; i++) {
+
+	/* invalidate pages before the kernel image */
+	for (i = 0; i < pmd_index((unsigned long)_text); i++)
+		pmd[i] &= ~_PAGE_PRESENT;
+
+	/* fixup pages that are part of the kernel image */
+	for (; i <= pmd_index((unsigned long)_end); i++)
 		if (pmd[i] & _PAGE_PRESENT)
 			pmd[i] += load_delta;
-	}
+
+	/* invalidate pages after the kernel image */
+	for (; i < PTRS_PER_PMD; i++)
+		pmd[i] &= ~_PAGE_PRESENT;
 
 	/*
 	 * Fixup phys_base - remove the memory encryption mask to obtain

^ permalink raw reply related	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2019-10-11 18:30 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-24 21:03 [PATCH v3 0/2] x86/boot/64: Avoid mapping reserved ranges in early page tables Steve Wahl
2019-09-24 21:03 ` [PATCH v3 1/2] x86/boot/64: Make level2_kernel_pgt pages invalid outside kernel area Steve Wahl
2019-09-26 10:23   ` Kirill A. Shutemov
2019-10-11 16:02   ` Dave Hansen
2019-10-11 18:27   ` [tip: x86/urgent] " tip-bot2 for Steve Wahl
2019-10-11 18:27   ` tip-bot2 for Steve Wahl
2019-09-24 21:04 ` [PATCH v3 2/2] x86/boot/64: round memory hole size up to next PMD page Steve Wahl
2019-09-26 10:23   ` Kirill A. Shutemov
2019-10-11 18:27   ` [tip: x86/urgent] x86/boot/64: Round " tip-bot2 for Steve Wahl
2019-10-11 18:27   ` tip-bot2 for Steve Wahl
2019-10-10 19:27 ` [PATCH v3 0/2] x86/boot/64: Avoid mapping reserved ranges in early page tables Steve Wahl

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).