linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCHv9 0/4] x86: 5-level related changes into decompression code
@ 2018-02-09 14:22 Kirill A. Shutemov
  2018-02-09 14:22 ` [PATCHv9 1/4] x86/boot/compressed/64: Rename pagetable.c to kaslr_64.c Kirill A. Shutemov
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: Kirill A. Shutemov @ 2018-02-09 14:22 UTC (permalink / raw)
  To: Ingo Molnar, x86, Thomas Gleixner, H. Peter Anvin
  Cc: Linus Torvalds, Andy Lutomirski, Cyrill Gorcunov,
	Borislav Petkov, Andi Kleen, Matthew Wilcox, linux-mm,
	linux-kernel, Kirill A. Shutemov

These patcheset is a preparation for boot-time switching between paging
modes. Please apply.

The first patch is pure cosmetic change: it gives file with KASLR helpers
a proper name.

The last three patches bring support of booting into 5-level paging mode if
a bootloader put the kernel above 4G.

Patch 2/4 Renames l5_paging_required() into paging_prepare() and change
interface of the function.
Patch 3/4 Handles allocation of space for trampoline and gets it prepared.
Patch 4/4 Gets trampoline used.

v9:
 - Patch 3 now saves and restores lowmem used for trampoline.

   There was report the patch causes issue on a machine. I suspect it's
   BIOS issue that doesn't report proper bounds of usable lowmem.

   Restoring memory back to oringinal state makes problem go away.
v8:
 - Support switching from 5- to 4-level paging.
v7:
 - Fix booting when 5-level paging is enabled before handing off boot to
   the kernel, like in kexec() case.

Kirill A. Shutemov (4):
  x86/boot/compressed/64: Rename pagetable.c to kaslr_64.c
  x86/boot/compressed/64: Introduce paging_prepare()
  x86/boot/compressed/64: Prepare trampoline memory
  x86/boot/compressed/64: Handle 5-level paging boot if kernel is above
    4G

 arch/x86/boot/compressed/Makefile                  |   2 +-
 arch/x86/boot/compressed/head_64.S                 | 178 ++++++++++++++-------
 .../boot/compressed/{pagetable.c => kaslr_64.c}    |   0
 arch/x86/boot/compressed/pgtable.h                 |  18 +++
 arch/x86/boot/compressed/pgtable_64.c              | 100 ++++++++++--
 5 files changed, 232 insertions(+), 66 deletions(-)
 rename arch/x86/boot/compressed/{pagetable.c => kaslr_64.c} (100%)
 create mode 100644 arch/x86/boot/compressed/pgtable.h

-- 
2.15.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCHv9 1/4] x86/boot/compressed/64: Rename pagetable.c to kaslr_64.c
  2018-02-09 14:22 [PATCHv9 0/4] x86: 5-level related changes into decompression code Kirill A. Shutemov
@ 2018-02-09 14:22 ` Kirill A. Shutemov
  2018-02-09 14:22 ` [PATCHv9 2/4] x86/boot/compressed/64: Introduce paging_prepare() Kirill A. Shutemov
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Kirill A. Shutemov @ 2018-02-09 14:22 UTC (permalink / raw)
  To: Ingo Molnar, x86, Thomas Gleixner, H. Peter Anvin
  Cc: Linus Torvalds, Andy Lutomirski, Cyrill Gorcunov,
	Borislav Petkov, Andi Kleen, Matthew Wilcox, linux-mm,
	linux-kernel, Kirill A. Shutemov

The name of the file -- pagetable.c -- is misleading: it only contains
helpers used for KASLR in 64-bit mode.

Let's rename the file to reflect its content.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/boot/compressed/Makefile                    | 2 +-
 arch/x86/boot/compressed/{pagetable.c => kaslr_64.c} | 0
 2 files changed, 1 insertion(+), 1 deletion(-)
 rename arch/x86/boot/compressed/{pagetable.c => kaslr_64.c} (100%)

diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index f25e1530e064..1f734cd98fd3 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -78,7 +78,7 @@ vmlinux-objs-y := $(obj)/vmlinux.lds $(obj)/head_$(BITS).o $(obj)/misc.o \
 vmlinux-objs-$(CONFIG_EARLY_PRINTK) += $(obj)/early_serial_console.o
 vmlinux-objs-$(CONFIG_RANDOMIZE_BASE) += $(obj)/kaslr.o
 ifdef CONFIG_X86_64
-	vmlinux-objs-$(CONFIG_RANDOMIZE_BASE) += $(obj)/pagetable.o
+	vmlinux-objs-$(CONFIG_RANDOMIZE_BASE) += $(obj)/kaslr_64.o
 	vmlinux-objs-y += $(obj)/mem_encrypt.o
 	vmlinux-objs-y += $(obj)/pgtable_64.o
 endif
diff --git a/arch/x86/boot/compressed/pagetable.c b/arch/x86/boot/compressed/kaslr_64.c
similarity index 100%
rename from arch/x86/boot/compressed/pagetable.c
rename to arch/x86/boot/compressed/kaslr_64.c
-- 
2.15.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCHv9 2/4] x86/boot/compressed/64: Introduce paging_prepare()
  2018-02-09 14:22 [PATCHv9 0/4] x86: 5-level related changes into decompression code Kirill A. Shutemov
  2018-02-09 14:22 ` [PATCHv9 1/4] x86/boot/compressed/64: Rename pagetable.c to kaslr_64.c Kirill A. Shutemov
@ 2018-02-09 14:22 ` Kirill A. Shutemov
  2018-02-09 14:22 ` [PATCHv9 3/4] x86/boot/compressed/64: Prepare trampoline memory Kirill A. Shutemov
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Kirill A. Shutemov @ 2018-02-09 14:22 UTC (permalink / raw)
  To: Ingo Molnar, x86, Thomas Gleixner, H. Peter Anvin
  Cc: Linus Torvalds, Andy Lutomirski, Cyrill Gorcunov,
	Borislav Petkov, Andi Kleen, Matthew Wilcox, linux-mm,
	linux-kernel, Kirill A. Shutemov

This patch renames l5_paging_required() into paging_prepare() and
changes the interface of the function.

This is a preparation for the next patch, which would make the function
also allocate memory for the 32-bit trampoline.

The function now returns a 128-bit structure. RAX would return
trampoline memory address (zero for now) and RDX would indicate if we
need to enabled 5-level paging.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/boot/compressed/head_64.S    | 41 ++++++++++++++++-------------------
 arch/x86/boot/compressed/pgtable_64.c | 25 ++++++++++-----------
 2 files changed, 31 insertions(+), 35 deletions(-)

diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index fc313e29fe2c..10b4df46de84 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -304,20 +304,6 @@ ENTRY(startup_64)
 	/* Set up the stack */
 	leaq	boot_stack_end(%rbx), %rsp
 
-#ifdef CONFIG_X86_5LEVEL
-	/*
-	 * Check if we need to enable 5-level paging.
-	 * RSI holds real mode data and need to be preserved across
-	 * a function call.
-	 */
-	pushq	%rsi
-	call	l5_paging_required
-	popq	%rsi
-
-	/* If l5_paging_required() returned zero, we're done here. */
-	cmpq	$0, %rax
-	je	lvl5
-
 	/*
 	 * At this point we are in long mode with 4-level paging enabled,
 	 * but we want to enable 5-level paging.
@@ -325,12 +311,28 @@ ENTRY(startup_64)
 	 * The problem is that we cannot do it directly. Setting LA57 in
 	 * long mode would trigger #GP. So we need to switch off long mode
 	 * first.
+	 */
+
+	/*
+	 * paging_prepare() would set up the trampoline and check if we need to
+	 * enable 5-level paging.
 	 *
-	 * NOTE: This is not going to work if bootloader put us above 4G
-	 * limit.
+	 * Address of the trampoline is returned in RAX.
+	 * Non zero RDX on return means we need to enable 5-level paging.
 	 *
-	 * The first step is go into compatibility mode.
+	 * RSI holds real mode data and need to be preserved across
+	 * a function call.
 	 */
+	pushq	%rsi
+	call	paging_prepare
+	popq	%rsi
+
+	/* Save the trampoline address in RCX */
+	movq	%rax, %rcx
+
+	/* Check if we need to enable 5-level paging */
+	cmpq	$0, %rdx
+	jz	lvl5
 
 	/* Clear additional page table */
 	leaq	lvl5_pgtable(%rbx), %rdi
@@ -352,7 +354,6 @@ ENTRY(startup_64)
 	pushq	%rax
 	lretq
 lvl5:
-#endif
 
 	/* Zero EFLAGS */
 	pushq	$0
@@ -490,7 +491,6 @@ relocated:
 	jmp	*%rax
 
 	.code32
-#ifdef CONFIG_X86_5LEVEL
 compatible_mode:
 	/* Setup data and stack segments */
 	movl	$__KERNEL_DS, %eax
@@ -526,7 +526,6 @@ compatible_mode:
 	movl	%eax, %cr0
 
 	lret
-#endif
 
 no_longmode:
 	/* This isn't an x86-64 CPU so hang */
@@ -585,7 +584,5 @@ boot_stack_end:
 	.balign 4096
 pgtable:
 	.fill BOOT_PGT_SIZE, 1, 0
-#ifdef CONFIG_X86_5LEVEL
 lvl5_pgtable:
 	.fill PAGE_SIZE, 1, 0
-#endif
diff --git a/arch/x86/boot/compressed/pgtable_64.c b/arch/x86/boot/compressed/pgtable_64.c
index b4469a37e9a1..3f1697fcc7a8 100644
--- a/arch/x86/boot/compressed/pgtable_64.c
+++ b/arch/x86/boot/compressed/pgtable_64.c
@@ -9,20 +9,19 @@
  */
 unsigned long __force_order;
 
-int l5_paging_required(void)
-{
-	/* Check if leaf 7 is supported. */
-
-	if (native_cpuid_eax(0) < 7)
-		return 0;
+struct paging_config {
+	unsigned long trampoline_start;
+	unsigned long l5_required;
+};
 
-	/* Check if la57 is supported. */
-	if (!(native_cpuid_ecx(7) & (1 << (X86_FEATURE_LA57 & 31))))
-		return 0;
+struct paging_config paging_prepare(void)
+{
+	struct paging_config paging_config = {};
 
-	/* Check if 5-level paging has already been enabled. */
-	if (native_read_cr4() & X86_CR4_LA57)
-		return 0;
+	/* Check if LA57 is desired and supported */
+	if (IS_ENABLED(CONFIG_X86_5LEVEL) && native_cpuid_eax(0) >= 7 &&
+			(native_cpuid_ecx(7) & (1 << (X86_FEATURE_LA57 & 31))))
+		paging_config.l5_required = 1;
 
-	return 1;
+	return paging_config;
 }
-- 
2.15.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCHv9 3/4] x86/boot/compressed/64: Prepare trampoline memory
  2018-02-09 14:22 [PATCHv9 0/4] x86: 5-level related changes into decompression code Kirill A. Shutemov
  2018-02-09 14:22 ` [PATCHv9 1/4] x86/boot/compressed/64: Rename pagetable.c to kaslr_64.c Kirill A. Shutemov
  2018-02-09 14:22 ` [PATCHv9 2/4] x86/boot/compressed/64: Introduce paging_prepare() Kirill A. Shutemov
@ 2018-02-09 14:22 ` Kirill A. Shutemov
  2018-02-09 14:22 ` [PATCHv9 4/4] x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G Kirill A. Shutemov
  2018-02-11 11:37 ` [PATCHv9 0/4] x86: 5-level related changes into decompression code Ingo Molnar
  4 siblings, 0 replies; 6+ messages in thread
From: Kirill A. Shutemov @ 2018-02-09 14:22 UTC (permalink / raw)
  To: Ingo Molnar, x86, Thomas Gleixner, H. Peter Anvin
  Cc: Linus Torvalds, Andy Lutomirski, Cyrill Gorcunov,
	Borislav Petkov, Andi Kleen, Matthew Wilcox, linux-mm,
	linux-kernel, Kirill A. Shutemov

If a bootloader enables 64-bit mode with 4-level paging, we might need to
switch over to 5-level paging. The switching requires the disabling
paging. It works fine if kernel itself is loaded below 4G.

But if the bootloader put the kernel above 4G (not sure if anybody does
this), we would lose control as soon as paging is disabled, because the
code becomes unreachable to the CPU.

To handle the situation, we need a trampoline in lower memory that would
take care of switching on 5-level paging.

Apart from the trampoline code itself we also need a place to store
top-level page table in lower memory as we don't have a way to load
64-bit values into CR3 in 32-bit mode. We only really need 8 bytes there
as we only use the very first entry of the page table. But we allocate a
whole page anyway.

We cannot have the code in the same page as the page table because there's
a risk that a CPU would read the page table speculatively and get confused
by seeing garbage. It's never a good idea to have junk in PTE entries
visible to the CPU.

We also need a small stack in the trampoline to re-enable long mode via
long return. But stack and code can share the page just fine.

The same trampoline can be used to switch from 5- to 4-level paging
mode, like when starting 4-level paging kernel via kexec() when original
kernel worked in 5-level paging mode.

This patch changes paging_prepare() to find a right spot in lower memory
for the trampoline. Then it copies the trampoline code there and sets up
the new top-level page table for 5-level paging.

We also add cleanup_trampoline() that restores the trampoline memory
back once we've done.

At this point we do all the preparation, but don't use trampoline yet.
It will be done in the following patch.

The trampoline will be used even on 4-level paging machines. This way we
will get better test coverage and the keep the trampoline code in shape.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/boot/compressed/head_64.S    | 24 ++++++++++-
 arch/x86/boot/compressed/pgtable.h    | 18 ++++++++
 arch/x86/boot/compressed/pgtable_64.c | 79 +++++++++++++++++++++++++++++++++++
 3 files changed, 120 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/boot/compressed/pgtable.h

diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 10b4df46de84..74026c3da4c3 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -33,6 +33,7 @@
 #include <asm/processor-flags.h>
 #include <asm/asm-offsets.h>
 #include <asm/bootparam.h>
+#include "pgtable.h"
 
 /*
  * Locally defined symbols should be marked hidden:
@@ -355,6 +356,17 @@ ENTRY(startup_64)
 	lretq
 lvl5:
 
+	/*
+	 * cleanup_trampoline() would restore trampoline memory.
+	 *
+	 * RSI holds real mode data and need to be preserved across
+	 * a function call.
+	 */
+	pushq	%rsi
+	movq	%rcx, %rdi
+	call	cleanup_trampoline
+	popq	%rsi
+
 	/* Zero EFLAGS */
 	pushq	$0
 	popfq
@@ -491,8 +503,9 @@ relocated:
 	jmp	*%rax
 
 	.code32
+ENTRY(trampoline_32bit_src)
 compatible_mode:
-	/* Setup data and stack segments */
+	/* Set up data and stack segments */
 	movl	$__KERNEL_DS, %eax
 	movl	%eax, %ds
 	movl	%eax, %ss
@@ -577,6 +590,11 @@ boot_stack:
 	.fill BOOT_STACK_SIZE, 1, 0
 boot_stack_end:
 
+/* Space to preserve trampoline memory */
+	.global trampoline_save
+trampoline_save:
+	.fill TRAMPOLINE_32BIT_SIZE, 1, 0
+
 /*
  * Space for page tables (not in .bss so not zeroed)
  */
@@ -586,3 +604,7 @@ pgtable:
 	.fill BOOT_PGT_SIZE, 1, 0
 lvl5_pgtable:
 	.fill PAGE_SIZE, 1, 0
+
+	.global pgtable_trampoline
+pgtable_trampoline:
+	.fill 4096, 1, 0
diff --git a/arch/x86/boot/compressed/pgtable.h b/arch/x86/boot/compressed/pgtable.h
new file mode 100644
index 000000000000..6e0db2260147
--- /dev/null
+++ b/arch/x86/boot/compressed/pgtable.h
@@ -0,0 +1,18 @@
+#ifndef BOOT_COMPRESSED_PAGETABLE_H
+#define BOOT_COMPRESSED_PAGETABLE_H
+
+#define TRAMPOLINE_32BIT_SIZE		(2 * PAGE_SIZE)
+
+#define TRAMPOLINE_32BIT_PGTABLE_OFFSET	0
+
+#define TRAMPOLINE_32BIT_CODE_OFFSET	PAGE_SIZE
+#define TRAMPOLINE_32BIT_CODE_SIZE	0x60
+
+#define TRAMPOLINE_32BIT_STACK_END	TRAMPOLINE_32BIT_SIZE
+
+#ifndef __ASSEMBLER__
+
+extern void (*trampoline_32bit_src)(void *return_ptr);
+
+#endif /* __ASSEMBLER__ */
+#endif /* BOOT_COMPRESSED_PAGETABLE_H */
diff --git a/arch/x86/boot/compressed/pgtable_64.c b/arch/x86/boot/compressed/pgtable_64.c
index 3f1697fcc7a8..dad5da7b4c1a 100644
--- a/arch/x86/boot/compressed/pgtable_64.c
+++ b/arch/x86/boot/compressed/pgtable_64.c
@@ -1,4 +1,6 @@
 #include <asm/processor.h>
+#include "pgtable.h"
+#include "../string.h"
 
 /*
  * __force_order is used by special_insns.h asm code to force instruction
@@ -9,19 +11,96 @@
  */
 unsigned long __force_order;
 
+#define BIOS_START_MIN		0x20000U	/* 128K, less than this is insane */
+#define BIOS_START_MAX		0x9f000U	/* 640K, absolute maximum */
+
 struct paging_config {
 	unsigned long trampoline_start;
 	unsigned long l5_required;
 };
 
+extern void *trampoline_save;
+extern void *pgtable_trampoline;
+
 struct paging_config paging_prepare(void)
 {
 	struct paging_config paging_config = {};
+	unsigned long bios_start, ebda_start, *trampoline;
 
 	/* Check if LA57 is desired and supported */
 	if (IS_ENABLED(CONFIG_X86_5LEVEL) && native_cpuid_eax(0) >= 7 &&
 			(native_cpuid_ecx(7) & (1 << (X86_FEATURE_LA57 & 31))))
 		paging_config.l5_required = 1;
 
+	/*
+	 * Find a suitable spot for the trampoline.
+	 * This code is based on reserve_bios_regions().
+	 */
+
+	ebda_start = *(unsigned short *)0x40e << 4;
+	bios_start = *(unsigned short *)0x413 << 10;
+
+	if (bios_start < BIOS_START_MIN || bios_start > BIOS_START_MAX)
+		bios_start = BIOS_START_MAX;
+
+	if (ebda_start > BIOS_START_MIN && ebda_start < bios_start)
+		bios_start = ebda_start;
+
+	/* Place the trampoline just below the end of low memory, aligned to 4k */
+	paging_config.trampoline_start = bios_start - TRAMPOLINE_32BIT_SIZE;
+	paging_config.trampoline_start = round_down(paging_config.trampoline_start, PAGE_SIZE);
+
+	trampoline = (unsigned long *)paging_config.trampoline_start;
+
+	/* Preserve trampoline memory */
+	memcpy(trampoline_save, trampoline, TRAMPOLINE_32BIT_SIZE);
+
+	/* Clear trampoline memory first */
+	memset(trampoline, 0, TRAMPOLINE_32BIT_SIZE);
+
+	/* Copy trampoline code in place */
+	memcpy(trampoline + TRAMPOLINE_32BIT_CODE_OFFSET / sizeof(unsigned long),
+			&trampoline_32bit_src, TRAMPOLINE_32BIT_CODE_SIZE);
+
+	/*
+	 * Set up a new page table that will be used for switching from 4-
+	 * to 5-level paging or vice versa. In other cases trampoline
+	 * wouldn't touch CR3.
+	 *
+	 * For 4- to 5-level paging transition, set up current CR3 as the
+	 * first and the only entry in a new top-level page table.
+	 *
+	 * For 5- to 4-level paging transition, copy page table pointed by
+	 * first entry in the current top-level page table as our new
+	 * top-level page table. We just cannot point to the page table
+	 * from trampoline as it may be above 4G.
+	 */
+	if (paging_config.l5_required) {
+		trampoline[TRAMPOLINE_32BIT_PGTABLE_OFFSET] = __native_read_cr3() + _PAGE_TABLE_NOENC;
+	} else if (native_read_cr4() & X86_CR4_LA57) {
+		unsigned long src;
+
+		src = *(unsigned long *)__native_read_cr3() & PAGE_MASK;
+		memcpy(trampoline + TRAMPOLINE_32BIT_PGTABLE_OFFSET / sizeof(unsigned long),
+		       (void *)src, PAGE_SIZE);
+	}
+
 	return paging_config;
 }
+
+void cleanup_trampoline(void *trampoline)
+{
+	void *cr3 = (void *)__native_read_cr3();
+
+	/*
+	 * Move the top level page table out of trampoline memory,
+	 * if it's there.
+	 */
+	if (cr3 == trampoline + TRAMPOLINE_32BIT_PGTABLE_OFFSET) {
+		memcpy(pgtable_trampoline, trampoline + TRAMPOLINE_32BIT_PGTABLE_OFFSET, PAGE_SIZE);
+		native_write_cr3((unsigned long)pgtable_trampoline);
+	}
+
+	/* Restore trampoline memory */
+	memcpy(trampoline, trampoline_save, TRAMPOLINE_32BIT_SIZE);
+}
-- 
2.15.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCHv9 4/4] x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G
  2018-02-09 14:22 [PATCHv9 0/4] x86: 5-level related changes into decompression code Kirill A. Shutemov
                   ` (2 preceding siblings ...)
  2018-02-09 14:22 ` [PATCHv9 3/4] x86/boot/compressed/64: Prepare trampoline memory Kirill A. Shutemov
@ 2018-02-09 14:22 ` Kirill A. Shutemov
  2018-02-11 11:37 ` [PATCHv9 0/4] x86: 5-level related changes into decompression code Ingo Molnar
  4 siblings, 0 replies; 6+ messages in thread
From: Kirill A. Shutemov @ 2018-02-09 14:22 UTC (permalink / raw)
  To: Ingo Molnar, x86, Thomas Gleixner, H. Peter Anvin
  Cc: Linus Torvalds, Andy Lutomirski, Cyrill Gorcunov,
	Borislav Petkov, Andi Kleen, Matthew Wilcox, linux-mm,
	linux-kernel, Kirill A. Shutemov

This patch addresses a shortcoming in current boot process on machines
that supports 5-level paging.

If a bootloader enables 64-bit mode with 4-level paging, we might need to
switch over to 5-level paging. The switching requires the disabling
paging. It works fine if kernel itself is loaded below 4G.

But if the bootloader put the kernel above 4G (not sure if anybody does
this), we would lose control as soon as paging is disabled, because the
code becomes unreachable to the CPU.

This patch implements a trampoline in lower memory to handle this
situation.

We only need the memory for a very short time, until the main kernel
image sets up own page tables.

We go through the trampoline even if we don't have to: if we're already
in 5-level paging mode or if we don't need to switch to it. This way the
trampoline gets tested on every boot.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/boot/compressed/head_64.S | 127 ++++++++++++++++++++++++++-----------
 1 file changed, 89 insertions(+), 38 deletions(-)

diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 74026c3da4c3..10682ada293e 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -307,13 +307,34 @@ ENTRY(startup_64)
 
 	/*
 	 * At this point we are in long mode with 4-level paging enabled,
-	 * but we want to enable 5-level paging.
+	 * but we might want to enable 5-level paging or vice versa.
 	 *
-	 * The problem is that we cannot do it directly. Setting LA57 in
-	 * long mode would trigger #GP. So we need to switch off long mode
-	 * first.
+	 * The problem is that we cannot do it directly. Setting or clearing
+	 * CR4.LA57 in long mode would trigger #GP. So we need to switch off
+	 * long mode and paging first.
+	 *
+	 * We also need a trampoline in lower memory to switch over from
+	 * 4- to 5-level paging for cases when the bootloader puts the kernel
+	 * above 4G, but didn't enable 5-level paging for us.
+	 *
+	 * The same trampoline can be used to switch from 5- to 4-level paging
+	 * mode, like when starting 4-level paging kernel via kexec() when
+	 * original kernel worked in 5-level paging mode.
+	 *
+	 * For the trampoline, we need the top page table to reside in lower
+	 * memory as we don't have a way to load 64-bit values into CR3 in
+	 * 32-bit mode.
+	 *
+	 * We go though the trampoline even if we don't have to: if we're
+	 * already in a desired paging mode. This way the trampoline code gets
+	 * tested on every boot.
 	 */
 
+	/* Make sure we have GDT with 32-bit code segment */
+	leaq	gdt(%rip), %rax
+	movl	%eax, gdt64+2(%rip)
+	lgdt	gdt64(%rip)
+
 	/*
 	 * paging_prepare() would set up the trampoline and check if we need to
 	 * enable 5-level paging.
@@ -331,30 +352,20 @@ ENTRY(startup_64)
 	/* Save the trampoline address in RCX */
 	movq	%rax, %rcx
 
-	/* Check if we need to enable 5-level paging */
-	cmpq	$0, %rdx
-	jz	lvl5
-
-	/* Clear additional page table */
-	leaq	lvl5_pgtable(%rbx), %rdi
-	xorq	%rax, %rax
-	movq	$(PAGE_SIZE/8), %rcx
-	rep	stosq
-
 	/*
-	 * Setup current CR3 as the first and only entry in a new top level
-	 * page table.
+	 * Load the address of trampoline_return() into RDI.
+	 * It will be used by the trampoline to return to the main code.
 	 */
-	movq	%cr3, %rdi
-	leaq	0x7 (%rdi), %rax
-	movq	%rax, lvl5_pgtable(%rbx)
+	leaq	trampoline_return(%rip), %rdi
 
 	/* Switch to compatibility mode (CS.L = 0 CS.D = 1) via far return */
 	pushq	$__KERNEL32_CS
-	leaq	compatible_mode(%rip), %rax
+	leaq	TRAMPOLINE_32BIT_CODE_OFFSET(%rax), %rax
 	pushq	%rax
 	lretq
-lvl5:
+trampoline_return:
+	/* Restore the stack, the 32-bit trampoline uses its own stack */
+	leaq	boot_stack_end(%rbx), %rsp
 
 	/*
 	 * cleanup_trampoline() would restore trampoline memory.
@@ -503,45 +514,82 @@ relocated:
 	jmp	*%rax
 
 	.code32
+/*
+ * This is the 32-bit trampoline that will be copied over to low memory.
+ *
+ * RDI contains the return address (might be above 4G).
+ * ECX contains the base address of the trampoline memory.
+ * Non zero RDX on return means we need to enable 5-level paging.
+ */
 ENTRY(trampoline_32bit_src)
-compatible_mode:
 	/* Set up data and stack segments */
 	movl	$__KERNEL_DS, %eax
 	movl	%eax, %ds
 	movl	%eax, %ss
 
+	/* Setup new stack */
+	leal	TRAMPOLINE_32BIT_STACK_END(%ecx), %esp
+
 	/* Disable paging */
 	movl	%cr0, %eax
 	btrl	$X86_CR0_PG_BIT, %eax
 	movl	%eax, %cr0
 
-	/* Point CR3 to 5-level paging */
-	leal	lvl5_pgtable(%ebx), %eax
-	movl	%eax, %cr3
+	/* Check what paging mode we want to be in after the trampoline */
+	cmpl	$0, %edx
+	jz	1f
 
-	/* Enable PAE and LA57 mode */
+	/* We want 5-level paging: don't touch CR3 if it already points to 5-level page tables */
 	movl	%cr4, %eax
-	orl	$(X86_CR4_PAE | X86_CR4_LA57), %eax
+	testl	$X86_CR4_LA57, %eax
+	jnz	3f
+	jmp	2f
+1:
+	/* We want 4-level paging: don't touch CR3 if it already points to 4-level page tables */
+	movl	%cr4, %eax
+	testl	$X86_CR4_LA57, %eax
+	jz	3f
+2:
+	/* Point CR3 to the trampoline's new top level page table */
+	leal	TRAMPOLINE_32BIT_PGTABLE_OFFSET(%ecx), %eax
+	movl	%eax, %cr3
+3:
+	/* Enable PAE and LA57 (if required) paging modes */
+	movl	$X86_CR4_PAE, %eax
+	cmpl	$0, %edx
+	jz	1f
+	orl	$X86_CR4_LA57, %eax
+1:
 	movl	%eax, %cr4
 
-	/* Calculate address we are running at */
-	call	1f
-1:	popl	%edi
-	subl	$1b, %edi
+	/* Calculate address of paging_enabled() once we are executing in the trampoline */
+	leal	paging_enabled - trampoline_32bit_src + TRAMPOLINE_32BIT_CODE_OFFSET(%ecx), %eax
 
-	/* Prepare stack for far return to Long Mode */
+	/* Prepare the stack for far return to Long Mode */
 	pushl	$__KERNEL_CS
-	leal	lvl5(%edi), %eax
-	push	%eax
+	pushl	%eax
 
-	/* Enable paging back */
+	/* Enable paging again */
 	movl	$(X86_CR0_PG | X86_CR0_PE), %eax
 	movl	%eax, %cr0
 
 	lret
 
+	.code64
+paging_enabled:
+	/* Return from the trampoline */
+	jmp	*%rdi
+
+	/*
+         * The trampoline code has a size limit.
+         * Make sure we fail to compile if the trampoline code grows
+         * beyond TRAMPOLINE_32BIT_CODE_SIZE bytes.
+	 */
+	.org	trampoline_32bit_src + TRAMPOLINE_32BIT_CODE_SIZE
+
+	.code32
 no_longmode:
-	/* This isn't an x86-64 CPU so hang */
+	/* This isn't an x86-64 CPU, so hang intentionally, we cannot continue */
 1:
 	hlt
 	jmp     1b
@@ -549,6 +597,11 @@ no_longmode:
 #include "../../kernel/verify_cpu.S"
 
 	.data
+gdt64:
+	.word	gdt_end - gdt
+	.long	0
+	.word	0
+	.quad   0
 gdt:
 	.word	gdt_end - gdt
 	.long	gdt
@@ -602,8 +655,6 @@ trampoline_save:
 	.balign 4096
 pgtable:
 	.fill BOOT_PGT_SIZE, 1, 0
-lvl5_pgtable:
-	.fill PAGE_SIZE, 1, 0
 
 	.global pgtable_trampoline
 pgtable_trampoline:
-- 
2.15.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCHv9 0/4] x86: 5-level related changes into decompression code
  2018-02-09 14:22 [PATCHv9 0/4] x86: 5-level related changes into decompression code Kirill A. Shutemov
                   ` (3 preceding siblings ...)
  2018-02-09 14:22 ` [PATCHv9 4/4] x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G Kirill A. Shutemov
@ 2018-02-11 11:37 ` Ingo Molnar
  4 siblings, 0 replies; 6+ messages in thread
From: Ingo Molnar @ 2018-02-11 11:37 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Ingo Molnar, x86, Thomas Gleixner, H. Peter Anvin,
	Linus Torvalds, Andy Lutomirski, Cyrill Gorcunov,
	Borislav Petkov, Andi Kleen, Matthew Wilcox, linux-mm,
	linux-kernel


* Kirill A. Shutemov <kirill.shutemov@linux.intel.com> wrote:

> These patcheset is a preparation for boot-time switching between paging
> modes. Please apply.
> 
> The first patch is pure cosmetic change: it gives file with KASLR helpers
> a proper name.
> 
> The last three patches bring support of booting into 5-level paging mode if
> a bootloader put the kernel above 4G.
> 
> Patch 2/4 Renames l5_paging_required() into paging_prepare() and change
> interface of the function.
> Patch 3/4 Handles allocation of space for trampoline and gets it prepared.
> Patch 4/4 Gets trampoline used.
> 
> v9:
>  - Patch 3 now saves and restores lowmem used for trampoline.
> 
>    There was report the patch causes issue on a machine. I suspect it's
>    BIOS issue that doesn't report proper bounds of usable lowmem.
> 
>    Restoring memory back to oringinal state makes problem go away.
> v8:
>  - Support switching from 5- to 4-level paging.
> v7:
>  - Fix booting when 5-level paging is enabled before handing off boot to
>    the kernel, like in kexec() case.
> 
> Kirill A. Shutemov (4):
>   x86/boot/compressed/64: Rename pagetable.c to kaslr_64.c
>   x86/boot/compressed/64: Introduce paging_prepare()
>   x86/boot/compressed/64: Prepare trampoline memory
>   x86/boot/compressed/64: Handle 5-level paging boot if kernel is above
>     4G
> 
>  arch/x86/boot/compressed/Makefile                  |   2 +-
>  arch/x86/boot/compressed/head_64.S                 | 178 ++++++++++++++-------
>  .../boot/compressed/{pagetable.c => kaslr_64.c}    |   0
>  arch/x86/boot/compressed/pgtable.h                 |  18 +++
>  arch/x86/boot/compressed/pgtable_64.c              | 100 ++++++++++--
>  5 files changed, 232 insertions(+), 66 deletions(-)
>  rename arch/x86/boot/compressed/{pagetable.c => kaslr_64.c} (100%)
>  create mode 100644 arch/x86/boot/compressed/pgtable.h

Ok, this series looks pretty good - I've applied it to tip:x86/boot for an 
eventual 4.17 merge and will push it out if it passes local testing.

Thanks,

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-02-11 11:37 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-09 14:22 [PATCHv9 0/4] x86: 5-level related changes into decompression code Kirill A. Shutemov
2018-02-09 14:22 ` [PATCHv9 1/4] x86/boot/compressed/64: Rename pagetable.c to kaslr_64.c Kirill A. Shutemov
2018-02-09 14:22 ` [PATCHv9 2/4] x86/boot/compressed/64: Introduce paging_prepare() Kirill A. Shutemov
2018-02-09 14:22 ` [PATCHv9 3/4] x86/boot/compressed/64: Prepare trampoline memory Kirill A. Shutemov
2018-02-09 14:22 ` [PATCHv9 4/4] x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G Kirill A. Shutemov
2018-02-11 11:37 ` [PATCHv9 0/4] x86: 5-level related changes into decompression code Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).