linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] x86: 5-level related changes into decompression code
@ 2017-11-01 11:54 Kirill A. Shutemov
  2017-11-01 11:55 ` [PATCH 1/4] x86/boot/compressed/64: Compile pagetable.c unconditionally Kirill A. Shutemov
                   ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: Kirill A. Shutemov @ 2017-11-01 11:54 UTC (permalink / raw)
  To: Ingo Molnar, Linus Torvalds, x86, Thomas Gleixner, H. Peter Anvin
  Cc: Andy Lutomirski, Cyrill Gorcunov, Borislav Petkov, Andi Kleen,
	linux-mm, linux-kernel, Kirill A. Shutemov

Hi Ingo,

While we haven't yet closed on how to handle MAX_PHYSMEM_BITS situation,
could you look on changes into kernel decompression code?

These changes prepare the code to boot-time switching between paging modes
and handle booting in 5-level mode when bootloader put kernel image above
4G, but haven't enabled 5-level paging for us.

Please review and consider applying.

Kirill A. Shutemov (4):
  x86/boot/compressed/64: Compile pagetable.c unconditionally
  x86/boot/compressed/64: Detect and handle 5-level paging at boot-time
  x86/boot/compressed/64: Introduce place_trampoline()
  x86/boot/compressed/64: Handle 5-level paging boot if kernel is above
    4G

 arch/x86/boot/compressed/Makefile    |  2 +-
 arch/x86/boot/compressed/head_64.S   | 99 +++++++++++++++++++++++++-----------
 arch/x86/boot/compressed/pagetable.c | 66 ++++++++++++++++++++++++
 arch/x86/boot/compressed/pagetable.h | 18 +++++++
 4 files changed, 154 insertions(+), 31 deletions(-)
 create mode 100644 arch/x86/boot/compressed/pagetable.h

-- 
2.14.2

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/4] x86/boot/compressed/64: Compile pagetable.c unconditionally
  2017-11-01 11:54 [PATCH 0/4] x86: 5-level related changes into decompression code Kirill A. Shutemov
@ 2017-11-01 11:55 ` Kirill A. Shutemov
  2017-11-10  9:12   ` Ingo Molnar
  2017-11-01 11:55 ` [PATCH 2/4] x86/boot/compressed/64: Detect and handle 5-level paging at boot-time Kirill A. Shutemov
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 16+ messages in thread
From: Kirill A. Shutemov @ 2017-11-01 11:55 UTC (permalink / raw)
  To: Ingo Molnar, Linus Torvalds, x86, Thomas Gleixner, H. Peter Anvin
  Cc: Andy Lutomirski, Cyrill Gorcunov, Borislav Petkov, Andi Kleen,
	linux-mm, linux-kernel, Kirill A. Shutemov

We are going to put few helpers into pagetable.c that are not specific
to KASLR.

Let's make compilation of the file independent of KASLR and wrap
KASLR-depended code into ifdef.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/boot/compressed/Makefile    | 2 +-
 arch/x86/boot/compressed/pagetable.c | 5 +++++
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index 65a150a7f15c..f7b64ecd09b3 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -77,7 +77,7 @@ vmlinux-objs-y := $(obj)/vmlinux.lds $(obj)/head_$(BITS).o $(obj)/misc.o \
 vmlinux-objs-$(CONFIG_EARLY_PRINTK) += $(obj)/early_serial_console.o
 vmlinux-objs-$(CONFIG_RANDOMIZE_BASE) += $(obj)/kaslr.o
 ifdef CONFIG_X86_64
-	vmlinux-objs-$(CONFIG_RANDOMIZE_BASE) += $(obj)/pagetable.o
+	vmlinux-objs-y += $(obj)/pagetable.o
 endif
 
 $(obj)/eboot.o: KBUILD_CFLAGS += -fshort-wchar -mno-red-zone
diff --git a/arch/x86/boot/compressed/pagetable.c b/arch/x86/boot/compressed/pagetable.c
index f1aa43854bed..a15bbfcb3413 100644
--- a/arch/x86/boot/compressed/pagetable.c
+++ b/arch/x86/boot/compressed/pagetable.c
@@ -27,6 +27,9 @@
 /* These actually do the work of building the kernel identity maps. */
 #include <asm/init.h>
 #include <asm/pgtable.h>
+
+#ifdef CONFIG_RANDOMIZE_BASE
+
 /* Use the static base for this part of the boot process */
 #undef __PAGE_OFFSET
 #define __PAGE_OFFSET __PAGE_OFFSET_BASE
@@ -149,3 +152,5 @@ void finalize_identity_maps(void)
 {
 	write_cr3(top_level_pgt);
 }
+
+#endif /* CONFIG_RANDOMIZE_BASE */
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/4] x86/boot/compressed/64: Detect and handle 5-level paging at boot-time
  2017-11-01 11:54 [PATCH 0/4] x86: 5-level related changes into decompression code Kirill A. Shutemov
  2017-11-01 11:55 ` [PATCH 1/4] x86/boot/compressed/64: Compile pagetable.c unconditionally Kirill A. Shutemov
@ 2017-11-01 11:55 ` Kirill A. Shutemov
  2017-11-10  9:13   ` Ingo Molnar
  2017-11-01 11:55 ` [PATCH 3/4] x86/boot/compressed/64: Introduce place_trampoline() Kirill A. Shutemov
  2017-11-01 11:55 ` [PATCH 4/4] x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G Kirill A. Shutemov
  3 siblings, 1 reply; 16+ messages in thread
From: Kirill A. Shutemov @ 2017-11-01 11:55 UTC (permalink / raw)
  To: Ingo Molnar, Linus Torvalds, x86, Thomas Gleixner, H. Peter Anvin
  Cc: Andy Lutomirski, Cyrill Gorcunov, Borislav Petkov, Andi Kleen,
	linux-mm, linux-kernel, Kirill A. Shutemov

This patch prepare decompression code to boot-time switching between 4-
and 5-level paging.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/boot/compressed/head_64.S   | 16 ++++++++++++----
 arch/x86/boot/compressed/pagetable.c | 19 +++++++++++++++++++
 2 files changed, 31 insertions(+), 4 deletions(-)

diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index b4a5d284391c..6ac8239af2b6 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -288,10 +288,18 @@ ENTRY(startup_64)
 	leaq	boot_stack_end(%rbx), %rsp
 
 #ifdef CONFIG_X86_5LEVEL
-	/* Check if 5-level paging has already enabled */
-	movq	%cr4, %rax
-	testl	$X86_CR4_LA57, %eax
-	jnz	lvl5
+	/*
+	 * Check if we need to enable 5-level paging.
+	 * RSI holds real mode data and need to be preserved across
+	 * a function call.
+	 */
+	pushq	%rsi
+	call	need_to_enabled_l5
+	popq	%rsi
+
+	/* If need_to_enabled_l5() returned zero, we're done here. */
+	cmpq	$0, %rax
+	je	lvl5
 
 	/*
 	 * At this point we are in long mode with 4-level paging enabled,
diff --git a/arch/x86/boot/compressed/pagetable.c b/arch/x86/boot/compressed/pagetable.c
index a15bbfcb3413..cd2dd49333cc 100644
--- a/arch/x86/boot/compressed/pagetable.c
+++ b/arch/x86/boot/compressed/pagetable.c
@@ -154,3 +154,22 @@ void finalize_identity_maps(void)
 }
 
 #endif /* CONFIG_RANDOMIZE_BASE */
+
+#ifdef CONFIG_X86_5LEVEL
+int need_to_enabled_l5(void)
+{
+	/* Check i leaf 7 is supported. */
+	if (native_cpuid_eax(0) < 7)
+		return 0;
+
+	/* Check if la57 is supported. */
+	if (!(native_cpuid_ecx(7) & (1 << (X86_FEATURE_LA57 & 31))))
+		return 0;
+
+	/* Check if 5-level paging has already been enabled. */
+	if (native_read_cr4() & X86_CR4_LA57)
+		return 0;
+
+	return 1;
+}
+#endif
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/4] x86/boot/compressed/64: Introduce place_trampoline()
  2017-11-01 11:54 [PATCH 0/4] x86: 5-level related changes into decompression code Kirill A. Shutemov
  2017-11-01 11:55 ` [PATCH 1/4] x86/boot/compressed/64: Compile pagetable.c unconditionally Kirill A. Shutemov
  2017-11-01 11:55 ` [PATCH 2/4] x86/boot/compressed/64: Detect and handle 5-level paging at boot-time Kirill A. Shutemov
@ 2017-11-01 11:55 ` Kirill A. Shutemov
  2017-11-10  9:14   ` Ingo Molnar
                     ` (2 more replies)
  2017-11-01 11:55 ` [PATCH 4/4] x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G Kirill A. Shutemov
  3 siblings, 3 replies; 16+ messages in thread
From: Kirill A. Shutemov @ 2017-11-01 11:55 UTC (permalink / raw)
  To: Ingo Molnar, Linus Torvalds, x86, Thomas Gleixner, H. Peter Anvin
  Cc: Andy Lutomirski, Cyrill Gorcunov, Borislav Petkov, Andi Kleen,
	linux-mm, linux-kernel, Kirill A. Shutemov

If bootloader enables 64-bit mode with 4-level paging, we need to
switch over to 5-level paging. The switching requires disabling paging.
It works fine if kernel itself is loaded below 4G.

If bootloader put the kernel above 4G (not sure if anybody does this),
we would loose control as soon as paging is disabled as code becomes
unreachable.

To handle the situation, we need a trampoline in lower memory that would
take care about switching on 5-level paging.

Apart from trampoline itself we also need place to store top level page
table in lower memory as we don't have a way to load 64-bit value into
CR3 from 32-bit mode. We only really need 8-bytes there as we only use
the very first entry of the page table. But we allocate whole page
anyway. We cannot have the code in the same because, there's hazard that
a CPU would read page table speculatively and get confused seeing
garbage.

This patch introduces place_trampoline() that finds right spot in lower
memory for trampoline, copies trampoline code there and setups new top
level page table for 5-level paging.

At this point we do all the preparation, but not yet use trampoline.
It will be done in following patch.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/boot/compressed/head_64.S   | 13 +++++++++++
 arch/x86/boot/compressed/pagetable.c | 42 ++++++++++++++++++++++++++++++++++++
 arch/x86/boot/compressed/pagetable.h | 18 ++++++++++++++++
 3 files changed, 73 insertions(+)
 create mode 100644 arch/x86/boot/compressed/pagetable.h

diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 6ac8239af2b6..4d1555b39de0 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -315,6 +315,18 @@ ENTRY(startup_64)
 	 * The first step is go into compatibility mode.
 	 */
 
+	/*
+	 * Find suitable place for trampoline and populate it.
+	 * The address will be stored in RCX.
+	 *
+	 * RSI holds real mode data and need to be preserved across
+	 * a function call.
+	 */
+	pushq	%rsi
+	call	place_trampoline
+	popq	%rsi
+	movq	%rax, %rcx
+
 	/* Clear additional page table */
 	leaq	lvl5_pgtable(%rbx), %rdi
 	xorq	%rax, %rax
@@ -474,6 +486,7 @@ relocated:
 
 	.code32
 #ifdef CONFIG_X86_5LEVEL
+ENTRY(lvl5_trampoline_src)
 compatible_mode:
 	/* Setup data and stack segments */
 	movl	$__KERNEL_DS, %eax
diff --git a/arch/x86/boot/compressed/pagetable.c b/arch/x86/boot/compressed/pagetable.c
index cd2dd49333cc..74245e9e875f 100644
--- a/arch/x86/boot/compressed/pagetable.c
+++ b/arch/x86/boot/compressed/pagetable.c
@@ -23,6 +23,8 @@
 #undef CONFIG_AMD_MEM_ENCRYPT
 
 #include "misc.h"
+#include "pagetable.h"
+#include "../string.h"
 
 /* These actually do the work of building the kernel identity maps. */
 #include <asm/init.h>
@@ -172,4 +174,44 @@ int need_to_enabled_l5(void)
 
 	return 1;
 }
+
+#define BIOS_START_MIN		0x20000U	/* 128K, less than this is insane */
+#define BIOS_START_MAX		0x9f000U	/* 640K, absolute maximum */
+
+unsigned long *place_trampoline(void)
+{
+	unsigned long bios_start, ebda_start, trampoline_start, *trampoline;
+
+	/* Based on reserve_bios_regions() */
+
+	ebda_start = *(unsigned short *)0x40e << 4;
+	bios_start = *(unsigned short *)0x413 << 10;
+
+	if (bios_start < BIOS_START_MIN || bios_start > BIOS_START_MAX)
+		bios_start = BIOS_START_MAX;
+
+	if (ebda_start > BIOS_START_MIN && ebda_start < bios_start)
+		bios_start = ebda_start;
+
+	/* Place trampoline below end of low memory, aligned to 4k */
+	trampoline_start = bios_start - LVL5_TRAMPOLINE_SIZE;
+	trampoline_start = round_down(trampoline_start, PAGE_SIZE);
+
+	trampoline = (unsigned long *)trampoline_start;
+
+	/* Clear trampoline memory first */
+	memset(trampoline, 0, LVL5_TRAMPOLINE_SIZE);
+
+	/* Copy trampoline code in place */
+	memcpy(trampoline + LVL5_TRAMPOLINE_CODE_OFF / sizeof(unsigned long),
+			&lvl5_trampoline_src, LVL5_TRAMPOLINE_CODE_SIZE);
+
+	/*
+	 * Setup current CR3 as the first and the only entry in a new top level
+	 * page table.
+	 */
+	trampoline[0] = __read_cr3() + _PAGE_TABLE_NOENC;
+
+	return trampoline;
+}
 #endif
diff --git a/arch/x86/boot/compressed/pagetable.h b/arch/x86/boot/compressed/pagetable.h
new file mode 100644
index 000000000000..906436cc1c02
--- /dev/null
+++ b/arch/x86/boot/compressed/pagetable.h
@@ -0,0 +1,18 @@
+#ifndef BOOT_COMPRESSED_PAGETABLE_H
+#define BOOT_COMPRESSED_PAGETABLE_H
+
+#define LVL5_TRAMPOLINE_SIZE		(2 * PAGE_SIZE)
+
+#define LVL5_TRAMPOLINE_PGTABLE_OFF	0
+
+#define LVL5_TRAMPOLINE_CODE_OFF	PAGE_SIZE
+#define LVL5_TRAMPOLINE_CODE_SIZE	0x40
+
+#define LVL5_TRAMPOLINE_STACK_END	LVL5_TRAMPOLINE_SIZE
+
+#ifndef __ASSEMBLER__
+
+extern void (*lvl5_trampoline_src)(void *return_ptr);
+
+#endif /* __ASSEMBLER__ */
+#endif /* BOOT_COMPRESSED_PAGETABLE_H */
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 4/4] x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G
  2017-11-01 11:54 [PATCH 0/4] x86: 5-level related changes into decompression code Kirill A. Shutemov
                   ` (2 preceding siblings ...)
  2017-11-01 11:55 ` [PATCH 3/4] x86/boot/compressed/64: Introduce place_trampoline() Kirill A. Shutemov
@ 2017-11-01 11:55 ` Kirill A. Shutemov
  3 siblings, 0 replies; 16+ messages in thread
From: Kirill A. Shutemov @ 2017-11-01 11:55 UTC (permalink / raw)
  To: Ingo Molnar, Linus Torvalds, x86, Thomas Gleixner, H. Peter Anvin
  Cc: Andy Lutomirski, Cyrill Gorcunov, Borislav Petkov, Andi Kleen,
	linux-mm, linux-kernel, Kirill A. Shutemov

This patch addresses shortcoming in current boot process on machines
that supports 5-level paging.

If bootloader enables 64-bit mode with 4-level paging, we need to
switch over to 5-level paging. The switching requires disabling paging.
It works fine if kernel itself is loaded below 4G.

If bootloader put the kernel above 4G (not sure if anybody does this),
we would loose control as soon as paging is disabled as code becomes
unreachable.

This patch implements trampoline in lower memory to handle this
situation.

We only need the memory for very short time, until main kernel image
setup its own page tables.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/boot/compressed/head_64.S | 72 ++++++++++++++++++++++++--------------
 1 file changed, 45 insertions(+), 27 deletions(-)

diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 4d1555b39de0..e8331f5a77f4 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -32,6 +32,7 @@
 #include <asm/processor-flags.h>
 #include <asm/asm-offsets.h>
 #include <asm/bootparam.h>
+#include "pagetable.h"
 
 /*
  * Locally defined symbols should be marked hidden:
@@ -288,6 +289,19 @@ ENTRY(startup_64)
 	leaq	boot_stack_end(%rbx), %rsp
 
 #ifdef CONFIG_X86_5LEVEL
+/*
+ * We need trampoline in lower memory switch from 4- to 5-level paging for
+ * cases when bootloader put kernel above 4G, but didn't enable 5-level paging
+ * for us.
+ *
+ * We also have to have top page table in lower memory as we don't have a way
+ * to load 64-bit value into CR3 from 32-bit mode. We only need 8-bytes there
+ * as we only use the very first entry of the page table, but we allocate whole
+ * page anyway. We cannot have the code in the same because, there's hazard
+ * that a CPU would read page table speculatively and get confused seeing
+ * garbage.
+ */
+
 	/*
 	 * Check if we need to enable 5-level paging.
 	 * RSI holds real mode data and need to be preserved across
@@ -309,8 +323,8 @@ ENTRY(startup_64)
 	 * long mode would trigger #GP. So we need to switch off long mode
 	 * first.
 	 *
-	 * NOTE: This is not going to work if bootloader put us above 4G
-	 * limit.
+	 * We use trampoline in lower memory to handle situation when
+	 * bootloader put the kernel image above 4G.
 	 *
 	 * The first step is go into compatibility mode.
 	 */
@@ -327,26 +341,20 @@ ENTRY(startup_64)
 	popq	%rsi
 	movq	%rax, %rcx
 
-	/* Clear additional page table */
-	leaq	lvl5_pgtable(%rbx), %rdi
-	xorq	%rax, %rax
-	movq	$(PAGE_SIZE/8), %rcx
-	rep	stosq
-
 	/*
-	 * Setup current CR3 as the first and only entry in a new top level
-	 * page table.
+	 * Load address of lvl5 into RDI.
+	 * It will be used to return address from trampoline.
 	 */
-	movq	%cr3, %rdi
-	leaq	0x7 (%rdi), %rax
-	movq	%rax, lvl5_pgtable(%rbx)
+	leaq	lvl5(%rip), %rdi
 
 	/* Switch to compatibility mode (CS.L = 0 CS.D = 1) via far return */
 	pushq	$__KERNEL32_CS
-	leaq	compatible_mode(%rip), %rax
+	leaq	LVL5_TRAMPOLINE_CODE_OFF(%rcx), %rax
 	pushq	%rax
 	lretq
 lvl5:
+	/* Restore stack, 32-bit trampoline uses own stack */
+	leaq	boot_stack_end(%rbx), %rsp
 #endif
 
 	/* Zero EFLAGS */
@@ -484,22 +492,30 @@ relocated:
  */
 	jmp	*%rax
 
-	.code32
 #ifdef CONFIG_X86_5LEVEL
+	.code32
+/*
+ * This is 32-bit trampoline that will be copied over to low memory.
+ *
+ * RDI contains return address (might be above 4G).
+ * ECX contains the base address of trampoline memory.
+ */
 ENTRY(lvl5_trampoline_src)
-compatible_mode:
 	/* Setup data and stack segments */
 	movl	$__KERNEL_DS, %eax
 	movl	%eax, %ds
 	movl	%eax, %ss
 
+	/* Setup new stack at the end of trampoline memory */
+	leal	LVL5_TRAMPOLINE_STACK_END (%ecx), %esp
+
 	/* Disable paging */
 	movl	%cr0, %eax
 	btrl	$X86_CR0_PG_BIT, %eax
 	movl	%eax, %cr0
 
 	/* Point CR3 to 5-level paging */
-	leal	lvl5_pgtable(%ebx), %eax
+	leal	LVL5_TRAMPOLINE_PGTABLE_OFF (%ecx), %eax
 	movl	%eax, %cr3
 
 	/* Enable PAE and LA57 mode */
@@ -507,23 +523,29 @@ compatible_mode:
 	orl	$(X86_CR4_PAE | X86_CR4_LA57), %eax
 	movl	%eax, %cr4
 
-	/* Calculate address we are running at */
-	call	1f
-1:	popl	%edi
-	subl	$1b, %edi
+	/* Calculate address of lvl5_enabled once we are in trampoline */
+	leal	lvl5_enabled - lvl5_trampoline_src + LVL5_TRAMPOLINE_CODE_OFF (%ecx), %eax
 
 	/* Prepare stack for far return to Long Mode */
 	pushl	$__KERNEL_CS
-	leal	lvl5(%edi), %eax
-	push	%eax
+	pushl	%eax
 
 	/* Enable paging back */
 	movl	$(X86_CR0_PG | X86_CR0_PE), %eax
 	movl	%eax, %cr0
 
 	lret
+
+	.code64
+lvl5_enabled:
+	/* Return from trampoline */
+	jmp	*%rdi
+
+	/* Bound size of trampoline code */
+	.org	lvl5_trampoline_src + LVL5_TRAMPOLINE_CODE_SIZE
 #endif
 
+	.code32
 no_longmode:
 	/* This isn't an x86-64 CPU so hang */
 1:
@@ -581,7 +603,3 @@ boot_stack_end:
 	.balign 4096
 pgtable:
 	.fill BOOT_PGT_SIZE, 1, 0
-#ifdef CONFIG_X86_5LEVEL
-lvl5_pgtable:
-	.fill PAGE_SIZE, 1, 0
-#endif
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/4] x86/boot/compressed/64: Compile pagetable.c unconditionally
  2017-11-01 11:55 ` [PATCH 1/4] x86/boot/compressed/64: Compile pagetable.c unconditionally Kirill A. Shutemov
@ 2017-11-10  9:12   ` Ingo Molnar
  2017-11-10  9:25     ` Kirill A. Shutemov
  0 siblings, 1 reply; 16+ messages in thread
From: Ingo Molnar @ 2017-11-10  9:12 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Ingo Molnar, Linus Torvalds, x86, Thomas Gleixner,
	H. Peter Anvin, Andy Lutomirski, Cyrill Gorcunov,
	Borislav Petkov, Andi Kleen, linux-mm, linux-kernel


* Kirill A. Shutemov <kirill.shutemov@linux.intel.com> wrote:

> We are going to put few helpers into pagetable.c that are not specific
> to KASLR.
> 
> Let's make compilation of the file independent of KASLR and wrap
> KASLR-depended code into ifdef.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>  arch/x86/boot/compressed/Makefile    | 2 +-
>  arch/x86/boot/compressed/pagetable.c | 5 +++++
>  2 files changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
> index 65a150a7f15c..f7b64ecd09b3 100644
> --- a/arch/x86/boot/compressed/Makefile
> +++ b/arch/x86/boot/compressed/Makefile
> @@ -77,7 +77,7 @@ vmlinux-objs-y := $(obj)/vmlinux.lds $(obj)/head_$(BITS).o $(obj)/misc.o \
>  vmlinux-objs-$(CONFIG_EARLY_PRINTK) += $(obj)/early_serial_console.o
>  vmlinux-objs-$(CONFIG_RANDOMIZE_BASE) += $(obj)/kaslr.o
>  ifdef CONFIG_X86_64
> -	vmlinux-objs-$(CONFIG_RANDOMIZE_BASE) += $(obj)/pagetable.o
> +	vmlinux-objs-y += $(obj)/pagetable.o
>  endif
>  
>  $(obj)/eboot.o: KBUILD_CFLAGS += -fshort-wchar -mno-red-zone
> diff --git a/arch/x86/boot/compressed/pagetable.c b/arch/x86/boot/compressed/pagetable.c
> index f1aa43854bed..a15bbfcb3413 100644
> --- a/arch/x86/boot/compressed/pagetable.c
> +++ b/arch/x86/boot/compressed/pagetable.c
> @@ -27,6 +27,9 @@
>  /* These actually do the work of building the kernel identity maps. */
>  #include <asm/init.h>
>  #include <asm/pgtable.h>
> +
> +#ifdef CONFIG_RANDOMIZE_BASE
> +
>  /* Use the static base for this part of the boot process */
>  #undef __PAGE_OFFSET
>  #define __PAGE_OFFSET __PAGE_OFFSET_BASE
> @@ -149,3 +152,5 @@ void finalize_identity_maps(void)
>  {
>  	write_cr3(top_level_pgt);
>  }
> +
> +#endif /* CONFIG_RANDOMIZE_BASE */

The #ifdeffery becomes really ugly in this file. I think we should split these 
into separate .c files:

  arch/x86/boot/compressed/kaslr.c
  arch/x86/boot/compressed/5-level-paging.c

With core data structures and code and a well defined interface:

  arch/x86/boot/compressed/pagetable.c
  arch/x86/boot/compressed/pagetable.h

or so.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/4] x86/boot/compressed/64: Detect and handle 5-level paging at boot-time
  2017-11-01 11:55 ` [PATCH 2/4] x86/boot/compressed/64: Detect and handle 5-level paging at boot-time Kirill A. Shutemov
@ 2017-11-10  9:13   ` Ingo Molnar
  0 siblings, 0 replies; 16+ messages in thread
From: Ingo Molnar @ 2017-11-10  9:13 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Ingo Molnar, Linus Torvalds, x86, Thomas Gleixner,
	H. Peter Anvin, Andy Lutomirski, Cyrill Gorcunov,
	Borislav Petkov, Andi Kleen, linux-mm, linux-kernel


* Kirill A. Shutemov <kirill.shutemov@linux.intel.com> wrote:

> This patch prepare decompression code to boot-time switching between 4-
> and 5-level paging.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>  arch/x86/boot/compressed/head_64.S   | 16 ++++++++++++----
>  arch/x86/boot/compressed/pagetable.c | 19 +++++++++++++++++++
>  2 files changed, 31 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
> index b4a5d284391c..6ac8239af2b6 100644
> --- a/arch/x86/boot/compressed/head_64.S
> +++ b/arch/x86/boot/compressed/head_64.S
> @@ -288,10 +288,18 @@ ENTRY(startup_64)
>  	leaq	boot_stack_end(%rbx), %rsp
>  
>  #ifdef CONFIG_X86_5LEVEL
> -	/* Check if 5-level paging has already enabled */
> -	movq	%cr4, %rax
> -	testl	$X86_CR4_LA57, %eax
> -	jnz	lvl5
> +	/*
> +	 * Check if we need to enable 5-level paging.
> +	 * RSI holds real mode data and need to be preserved across
> +	 * a function call.
> +	 */
> +	pushq	%rsi
> +	call	need_to_enabled_l5
> +	popq	%rsi
> +
> +	/* If need_to_enabled_l5() returned zero, we're done here. */
> +	cmpq	$0, %rax
> +	je	lvl5
>  
>  	/*
>  	 * At this point we are in long mode with 4-level paging enabled,
> diff --git a/arch/x86/boot/compressed/pagetable.c b/arch/x86/boot/compressed/pagetable.c
> index a15bbfcb3413..cd2dd49333cc 100644
> --- a/arch/x86/boot/compressed/pagetable.c
> +++ b/arch/x86/boot/compressed/pagetable.c
> @@ -154,3 +154,22 @@ void finalize_identity_maps(void)
>  }
>  
>  #endif /* CONFIG_RANDOMIZE_BASE */
> +
> +#ifdef CONFIG_X86_5LEVEL
> +int need_to_enabled_l5(void)
> +{
> +	/* Check i leaf 7 is supported. */
> +	if (native_cpuid_eax(0) < 7)
> +		return 0;
> +
> +	/* Check if la57 is supported. */
> +	if (!(native_cpuid_ecx(7) & (1 << (X86_FEATURE_LA57 & 31))))
> +		return 0;
> +
> +	/* Check if 5-level paging has already been enabled. */
> +	if (native_read_cr4() & X86_CR4_LA57)
> +		return 0;
> +
> +	return 1;
> +}
> +#endif

Ok, I like this a lot better than doing this at the assembly level - and this 
could provide a model for how to further reduce assembly code.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/4] x86/boot/compressed/64: Introduce place_trampoline()
  2017-11-01 11:55 ` [PATCH 3/4] x86/boot/compressed/64: Introduce place_trampoline() Kirill A. Shutemov
@ 2017-11-10  9:14   ` Ingo Molnar
  2017-11-10  9:52     ` Kirill A. Shutemov
  2017-11-10  9:17   ` Ingo Molnar
  2017-11-10  9:29   ` Ingo Molnar
  2 siblings, 1 reply; 16+ messages in thread
From: Ingo Molnar @ 2017-11-10  9:14 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Ingo Molnar, Linus Torvalds, x86, Thomas Gleixner,
	H. Peter Anvin, Andy Lutomirski, Cyrill Gorcunov,
	Borislav Petkov, Andi Kleen, linux-mm, linux-kernel


* Kirill A. Shutemov <kirill.shutemov@linux.intel.com> wrote:

> If bootloader enables 64-bit mode with 4-level paging, we need to
> switch over to 5-level paging. The switching requires disabling paging.
> It works fine if kernel itself is loaded below 4G.
> 
> If bootloader put the kernel above 4G (not sure if anybody does this),
> we would loose control as soon as paging is disabled as code becomes
> unreachable.
> 
> To handle the situation, we need a trampoline in lower memory that would
> take care about switching on 5-level paging.
> 
> Apart from trampoline itself we also need place to store top level page
> table in lower memory as we don't have a way to load 64-bit value into
> CR3 from 32-bit mode. We only really need 8-bytes there as we only use
> the very first entry of the page table. But we allocate whole page
> anyway. We cannot have the code in the same because, there's hazard that
> a CPU would read page table speculatively and get confused seeing
> garbage.
> 
> This patch introduces place_trampoline() that finds right spot in lower
> memory for trampoline, copies trampoline code there and setups new top
> level page table for 5-level paging.
> 
> At this point we do all the preparation, but not yet use trampoline.
> It will be done in following patch.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>  arch/x86/boot/compressed/head_64.S   | 13 +++++++++++
>  arch/x86/boot/compressed/pagetable.c | 42 ++++++++++++++++++++++++++++++++++++
>  arch/x86/boot/compressed/pagetable.h | 18 ++++++++++++++++
>  3 files changed, 73 insertions(+)
>  create mode 100644 arch/x86/boot/compressed/pagetable.h

Ok, I like it how almost all of the complexity is now at the .c level.

I'm wondering, how did you test it if no current bootloader puts the kernel to 
above addresses of 4G?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/4] x86/boot/compressed/64: Introduce place_trampoline()
  2017-11-01 11:55 ` [PATCH 3/4] x86/boot/compressed/64: Introduce place_trampoline() Kirill A. Shutemov
  2017-11-10  9:14   ` Ingo Molnar
@ 2017-11-10  9:17   ` Ingo Molnar
  2017-11-10  9:28     ` Ingo Molnar
  2017-11-10  9:29   ` Ingo Molnar
  2 siblings, 1 reply; 16+ messages in thread
From: Ingo Molnar @ 2017-11-10  9:17 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Ingo Molnar, Linus Torvalds, x86, Thomas Gleixner,
	H. Peter Anvin, Andy Lutomirski, Cyrill Gorcunov,
	Borislav Petkov, Andi Kleen, linux-mm, linux-kernel


* Kirill A. Shutemov <kirill.shutemov@linux.intel.com> wrote:

> If bootloader enables 64-bit mode with 4-level paging, we need to
> switch over to 5-level paging. The switching requires disabling paging.
> It works fine if kernel itself is loaded below 4G.
> 
> If bootloader put the kernel above 4G (not sure if anybody does this),
> we would loose control as soon as paging is disabled as code becomes
> unreachable.
> 
> To handle the situation, we need a trampoline in lower memory that would
> take care about switching on 5-level paging.
> 
> Apart from trampoline itself we also need place to store top level page
> table in lower memory as we don't have a way to load 64-bit value into
> CR3 from 32-bit mode. We only really need 8-bytes there as we only use
> the very first entry of the page table. But we allocate whole page
> anyway. We cannot have the code in the same because, there's hazard that
> a CPU would read page table speculatively and get confused seeing
> garbage.
> 
> This patch introduces place_trampoline() that finds right spot in lower
> memory for trampoline, copies trampoline code there and setups new top
> level page table for 5-level paging.
> 
> At this point we do all the preparation, but not yet use trampoline.
> It will be done in following patch.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>  arch/x86/boot/compressed/head_64.S   | 13 +++++++++++
>  arch/x86/boot/compressed/pagetable.c | 42 ++++++++++++++++++++++++++++++++++++
>  arch/x86/boot/compressed/pagetable.h | 18 ++++++++++++++++
>  3 files changed, 73 insertions(+)
>  create mode 100644 arch/x86/boot/compressed/pagetable.h
> 
> diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
> index 6ac8239af2b6..4d1555b39de0 100644
> --- a/arch/x86/boot/compressed/head_64.S
> +++ b/arch/x86/boot/compressed/head_64.S
> @@ -315,6 +315,18 @@ ENTRY(startup_64)
>  	 * The first step is go into compatibility mode.
>  	 */
>  
> +	/*
> +	 * Find suitable place for trampoline and populate it.
> +	 * The address will be stored in RCX.
> +	 *
> +	 * RSI holds real mode data and need to be preserved across
> +	 * a function call.
> +	 */
> +	pushq	%rsi
> +	call	place_trampoline
> +	popq	%rsi
> +	movq	%rax, %rcx
> +
>  	/* Clear additional page table */
>  	leaq	lvl5_pgtable(%rbx), %rdi
>  	xorq	%rax, %rax

One request: it's always going to be fragile if the _only_ thing that uses the 
trampoline is the 5-level paging code.

Could we use the trampoline in the 4-level paging case too? It's not required, but 
would test much of the trampoline allocation and copying machinery - and the 
performance cost is negligible.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/4] x86/boot/compressed/64: Compile pagetable.c unconditionally
  2017-11-10  9:12   ` Ingo Molnar
@ 2017-11-10  9:25     ` Kirill A. Shutemov
  0 siblings, 0 replies; 16+ messages in thread
From: Kirill A. Shutemov @ 2017-11-10  9:25 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kirill A. Shutemov, Ingo Molnar, Linus Torvalds, x86,
	Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Cyrill Gorcunov, Borislav Petkov, Andi Kleen, linux-mm,
	linux-kernel

On Fri, Nov 10, 2017 at 10:12:36AM +0100, Ingo Molnar wrote:
> 
> * Kirill A. Shutemov <kirill.shutemov@linux.intel.com> wrote:
> 
> > We are going to put few helpers into pagetable.c that are not specific
> > to KASLR.
> > 
> > Let's make compilation of the file independent of KASLR and wrap
> > KASLR-depended code into ifdef.
> > 
> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > ---
> >  arch/x86/boot/compressed/Makefile    | 2 +-
> >  arch/x86/boot/compressed/pagetable.c | 5 +++++
> >  2 files changed, 6 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
> > index 65a150a7f15c..f7b64ecd09b3 100644
> > --- a/arch/x86/boot/compressed/Makefile
> > +++ b/arch/x86/boot/compressed/Makefile
> > @@ -77,7 +77,7 @@ vmlinux-objs-y := $(obj)/vmlinux.lds $(obj)/head_$(BITS).o $(obj)/misc.o \
> >  vmlinux-objs-$(CONFIG_EARLY_PRINTK) += $(obj)/early_serial_console.o
> >  vmlinux-objs-$(CONFIG_RANDOMIZE_BASE) += $(obj)/kaslr.o
> >  ifdef CONFIG_X86_64
> > -	vmlinux-objs-$(CONFIG_RANDOMIZE_BASE) += $(obj)/pagetable.o
> > +	vmlinux-objs-y += $(obj)/pagetable.o
> >  endif
> >  
> >  $(obj)/eboot.o: KBUILD_CFLAGS += -fshort-wchar -mno-red-zone
> > diff --git a/arch/x86/boot/compressed/pagetable.c b/arch/x86/boot/compressed/pagetable.c
> > index f1aa43854bed..a15bbfcb3413 100644
> > --- a/arch/x86/boot/compressed/pagetable.c
> > +++ b/arch/x86/boot/compressed/pagetable.c
> > @@ -27,6 +27,9 @@
> >  /* These actually do the work of building the kernel identity maps. */
> >  #include <asm/init.h>
> >  #include <asm/pgtable.h>
> > +
> > +#ifdef CONFIG_RANDOMIZE_BASE
> > +
> >  /* Use the static base for this part of the boot process */
> >  #undef __PAGE_OFFSET
> >  #define __PAGE_OFFSET __PAGE_OFFSET_BASE
> > @@ -149,3 +152,5 @@ void finalize_identity_maps(void)
> >  {
> >  	write_cr3(top_level_pgt);
> >  }
> > +
> > +#endif /* CONFIG_RANDOMIZE_BASE */
> 
> The #ifdeffery becomes really ugly in this file. I think we should split these 
> into separate .c files:
> 
>   arch/x86/boot/compressed/kaslr.c
>   arch/x86/boot/compressed/5-level-paging.c
> 
> With core data structures and code and a well defined interface:
> 
>   arch/x86/boot/compressed/pagetable.c
>   arch/x86/boot/compressed/pagetable.h
> 
> or so.

Okay, I'll do this.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/4] x86/boot/compressed/64: Introduce place_trampoline()
  2017-11-10  9:17   ` Ingo Molnar
@ 2017-11-10  9:28     ` Ingo Molnar
  2017-11-10  9:55       ` Kirill A. Shutemov
  0 siblings, 1 reply; 16+ messages in thread
From: Ingo Molnar @ 2017-11-10  9:28 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Ingo Molnar, Linus Torvalds, x86, Thomas Gleixner,
	H. Peter Anvin, Andy Lutomirski, Cyrill Gorcunov,
	Borislav Petkov, Andi Kleen, linux-mm, linux-kernel


* Ingo Molnar <mingo@kernel.org> wrote:

> > --- a/arch/x86/boot/compressed/head_64.S
> > +++ b/arch/x86/boot/compressed/head_64.S
> > @@ -315,6 +315,18 @@ ENTRY(startup_64)
> >  	 * The first step is go into compatibility mode.
> >  	 */
> >  
> > +	/*
> > +	 * Find suitable place for trampoline and populate it.
> > +	 * The address will be stored in RCX.
> > +	 *
> > +	 * RSI holds real mode data and need to be preserved across
> > +	 * a function call.
> > +	 */
> > +	pushq	%rsi
> > +	call	place_trampoline
> > +	popq	%rsi
> > +	movq	%rax, %rcx
> > +
> >  	/* Clear additional page table */
> >  	leaq	lvl5_pgtable(%rbx), %rdi
> >  	xorq	%rax, %rax
> 
> One request: it's always going to be fragile if the _only_ thing that uses the 
> trampoline is the 5-level paging code.
> 
> Could we use the trampoline in the 4-level paging case too? It's not required, but 
> would test much of the trampoline allocation and copying machinery - and the 
> performance cost is negligible.

Note that right now the trampoline is pointless on 4-level setups, so there's 
nothing to copy - but we could perhaps make it meaningful. But maybe it's not a 
good idea.

One other detail I noticed:

        /* Bound size of trampoline code */
        .org    lvl5_trampoline_src + LVL5_TRAMPOLINE_CODE_SIZE

will this generate a build error if the trampoline code exceeds 0x40?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/4] x86/boot/compressed/64: Introduce place_trampoline()
  2017-11-01 11:55 ` [PATCH 3/4] x86/boot/compressed/64: Introduce place_trampoline() Kirill A. Shutemov
  2017-11-10  9:14   ` Ingo Molnar
  2017-11-10  9:17   ` Ingo Molnar
@ 2017-11-10  9:29   ` Ingo Molnar
  2017-11-10  9:57     ` Kirill A. Shutemov
  2 siblings, 1 reply; 16+ messages in thread
From: Ingo Molnar @ 2017-11-10  9:29 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Ingo Molnar, Linus Torvalds, x86, Thomas Gleixner,
	H. Peter Anvin, Andy Lutomirski, Cyrill Gorcunov,
	Borislav Petkov, Andi Kleen, linux-mm, linux-kernel


* Kirill A. Shutemov <kirill.shutemov@linux.intel.com> wrote:

> --- a/arch/x86/boot/compressed/head_64.S
> +++ b/arch/x86/boot/compressed/head_64.S
> @@ -315,6 +315,18 @@ ENTRY(startup_64)
>  	 * The first step is go into compatibility mode.
>  	 */
>  
> +	/*
> +	 * Find suitable place for trampoline and populate it.
> +	 * The address will be stored in RCX.
> +	 *
> +	 * RSI holds real mode data and need to be preserved across
> +	 * a function call.
> +	 */
> +	pushq	%rsi
> +	call	place_trampoline
> +	popq	%rsi
> +	movq	%rax, %rcx
> +
>  	/* Clear additional page table */
>  	leaq	lvl5_pgtable(%rbx), %rdi
>  	xorq	%rax, %rax

So in the final version of this code we now have:

	pushq	%rsi
	call	need_to_enabled_l5
	popq	%rsi

	/* If need_to_enabled_l5() returned zero, we're done here. */
	cmpq	$0, %rax
	je	lvl5

	/*
	 * At this point we are in long mode with 4-level paging enabled,
	 * but we want to enable 5-level paging.
	 *
	 * The problem is that we cannot do it directly. Setting LA57 in
	 * long mode would trigger #GP. So we need to switch off long mode
	 * first.
	 *
	 * We use trampoline in lower memory to handle situation when
	 * bootloader put the kernel image above 4G.
	 *
	 * The first step is go into compatibility mode.
	 */

	/*
	 * Find suitable place for trampoline and populate it.
	 * The address will be stored in RCX.
	 *
	 * RSI holds real mode data and need to be preserved across
	 * a function call.
	 */
	pushq	%rsi
	call	place_trampoline
	popq	%rsi
	movq	%rax, %rcx

Firstly, the 'need_to_enabled_l5' name sucks because it includes a typo, but also 
because the prefix is way too generic.

Something like:

	l5_paging_required()

would read a lot better - and would also provide a namespace for all L5 paging 
related functions.

Secondly, couldn't this be combined into a single .c function, named accordingly:

	l5_paging_prepare()

which would return true if L5 paging is available and should be enabled. In this 
case the trampoline copying function would be called in C, by l5_paging_prepare().

This further reduces the amount of assembly code.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/4] x86/boot/compressed/64: Introduce place_trampoline()
  2017-11-10  9:14   ` Ingo Molnar
@ 2017-11-10  9:52     ` Kirill A. Shutemov
  0 siblings, 0 replies; 16+ messages in thread
From: Kirill A. Shutemov @ 2017-11-10  9:52 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kirill A. Shutemov, Ingo Molnar, Linus Torvalds, x86,
	Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Cyrill Gorcunov, Borislav Petkov, Andi Kleen, linux-mm,
	linux-kernel

On Fri, Nov 10, 2017 at 10:14:53AM +0100, Ingo Molnar wrote:
> I'm wondering, how did you test it if no current bootloader puts the kernel to 
> above addresses of 4G?

I didn't test it directly.

I've checked with debugger that there's no memory accesses above 1M
between disabling paging in the trampoline and returning back from
trampoline to the main kernel image code.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/4] x86/boot/compressed/64: Introduce place_trampoline()
  2017-11-10  9:28     ` Ingo Molnar
@ 2017-11-10  9:55       ` Kirill A. Shutemov
  2017-11-10 12:46         ` Ingo Molnar
  0 siblings, 1 reply; 16+ messages in thread
From: Kirill A. Shutemov @ 2017-11-10  9:55 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kirill A. Shutemov, Ingo Molnar, Linus Torvalds, x86,
	Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Cyrill Gorcunov, Borislav Petkov, Andi Kleen, linux-mm,
	linux-kernel

On Fri, Nov 10, 2017 at 10:28:12AM +0100, Ingo Molnar wrote:
> 
> * Ingo Molnar <mingo@kernel.org> wrote:
> 
> > > --- a/arch/x86/boot/compressed/head_64.S
> > > +++ b/arch/x86/boot/compressed/head_64.S
> > > @@ -315,6 +315,18 @@ ENTRY(startup_64)
> > >  	 * The first step is go into compatibility mode.
> > >  	 */
> > >  
> > > +	/*
> > > +	 * Find suitable place for trampoline and populate it.
> > > +	 * The address will be stored in RCX.
> > > +	 *
> > > +	 * RSI holds real mode data and need to be preserved across
> > > +	 * a function call.
> > > +	 */
> > > +	pushq	%rsi
> > > +	call	place_trampoline
> > > +	popq	%rsi
> > > +	movq	%rax, %rcx
> > > +
> > >  	/* Clear additional page table */
> > >  	leaq	lvl5_pgtable(%rbx), %rdi
> > >  	xorq	%rax, %rax
> > 
> > One request: it's always going to be fragile if the _only_ thing that uses the 
> > trampoline is the 5-level paging code.
> > 
> > Could we use the trampoline in the 4-level paging case too? It's not required, but 
> > would test much of the trampoline allocation and copying machinery - and the 
> > performance cost is negligible.
> 
> Note that right now the trampoline is pointless on 4-level setups, so there's 
> nothing to copy - but we could perhaps make it meaningful. But maybe it's not a 
> good idea.

Let me see how it will play out.

> One other detail I noticed:
> 
>         /* Bound size of trampoline code */
>         .org    lvl5_trampoline_src + LVL5_TRAMPOLINE_CODE_SIZE
> 
> will this generate a build error if the trampoline code exceeds 0x40?

Yes, this is the point. Just a failsafe if trampoline code would grew too
much.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/4] x86/boot/compressed/64: Introduce place_trampoline()
  2017-11-10  9:29   ` Ingo Molnar
@ 2017-11-10  9:57     ` Kirill A. Shutemov
  0 siblings, 0 replies; 16+ messages in thread
From: Kirill A. Shutemov @ 2017-11-10  9:57 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kirill A. Shutemov, Ingo Molnar, Linus Torvalds, x86,
	Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Cyrill Gorcunov, Borislav Petkov, Andi Kleen, linux-mm,
	linux-kernel

On Fri, Nov 10, 2017 at 10:29:33AM +0100, Ingo Molnar wrote:
> 
> * Kirill A. Shutemov <kirill.shutemov@linux.intel.com> wrote:
> 
> > --- a/arch/x86/boot/compressed/head_64.S
> > +++ b/arch/x86/boot/compressed/head_64.S
> > @@ -315,6 +315,18 @@ ENTRY(startup_64)
> >  	 * The first step is go into compatibility mode.
> >  	 */
> >  
> > +	/*
> > +	 * Find suitable place for trampoline and populate it.
> > +	 * The address will be stored in RCX.
> > +	 *
> > +	 * RSI holds real mode data and need to be preserved across
> > +	 * a function call.
> > +	 */
> > +	pushq	%rsi
> > +	call	place_trampoline
> > +	popq	%rsi
> > +	movq	%rax, %rcx
> > +
> >  	/* Clear additional page table */
> >  	leaq	lvl5_pgtable(%rbx), %rdi
> >  	xorq	%rax, %rax
> 
> So in the final version of this code we now have:
> 
> 	pushq	%rsi
> 	call	need_to_enabled_l5
> 	popq	%rsi
> 
> 	/* If need_to_enabled_l5() returned zero, we're done here. */
> 	cmpq	$0, %rax
> 	je	lvl5
> 
> 	/*
> 	 * At this point we are in long mode with 4-level paging enabled,
> 	 * but we want to enable 5-level paging.
> 	 *
> 	 * The problem is that we cannot do it directly. Setting LA57 in
> 	 * long mode would trigger #GP. So we need to switch off long mode
> 	 * first.
> 	 *
> 	 * We use trampoline in lower memory to handle situation when
> 	 * bootloader put the kernel image above 4G.
> 	 *
> 	 * The first step is go into compatibility mode.
> 	 */
> 
> 	/*
> 	 * Find suitable place for trampoline and populate it.
> 	 * The address will be stored in RCX.
> 	 *
> 	 * RSI holds real mode data and need to be preserved across
> 	 * a function call.
> 	 */
> 	pushq	%rsi
> 	call	place_trampoline
> 	popq	%rsi
> 	movq	%rax, %rcx
> 
> Firstly, the 'need_to_enabled_l5' name sucks because it includes a typo, but also 
> because the prefix is way too generic.
> 
> Something like:
> 
> 	l5_paging_required()
> 
> would read a lot better - and would also provide a namespace for all L5 paging 
> related functions.
> 
> Secondly, couldn't this be combined into a single .c function, named accordingly:
> 
> 	l5_paging_prepare()
> 
> which would return true if L5 paging is available and should be enabled. In this 
> case the trampoline copying function would be called in C, by l5_paging_prepare().
> 
> This further reduces the amount of assembly code.

Makes sense.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/4] x86/boot/compressed/64: Introduce place_trampoline()
  2017-11-10  9:55       ` Kirill A. Shutemov
@ 2017-11-10 12:46         ` Ingo Molnar
  0 siblings, 0 replies; 16+ messages in thread
From: Ingo Molnar @ 2017-11-10 12:46 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Kirill A. Shutemov, Ingo Molnar, Linus Torvalds, x86,
	Thomas Gleixner, H. Peter Anvin, Andy Lutomirski,
	Cyrill Gorcunov, Borislav Petkov, Andi Kleen, linux-mm,
	linux-kernel


* Kirill A. Shutemov <kirill@shutemov.name> wrote:

> > One other detail I noticed:
> > 
> >         /* Bound size of trampoline code */
> >         .org    lvl5_trampoline_src + LVL5_TRAMPOLINE_CODE_SIZE
> > 
> > will this generate a build error if the trampoline code exceeds 0x40?
> 
> Yes, this is the point. Just a failsafe if trampoline code would grew too
> much.

Ok, good!

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2017-11-10 12:46 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-01 11:54 [PATCH 0/4] x86: 5-level related changes into decompression code Kirill A. Shutemov
2017-11-01 11:55 ` [PATCH 1/4] x86/boot/compressed/64: Compile pagetable.c unconditionally Kirill A. Shutemov
2017-11-10  9:12   ` Ingo Molnar
2017-11-10  9:25     ` Kirill A. Shutemov
2017-11-01 11:55 ` [PATCH 2/4] x86/boot/compressed/64: Detect and handle 5-level paging at boot-time Kirill A. Shutemov
2017-11-10  9:13   ` Ingo Molnar
2017-11-01 11:55 ` [PATCH 3/4] x86/boot/compressed/64: Introduce place_trampoline() Kirill A. Shutemov
2017-11-10  9:14   ` Ingo Molnar
2017-11-10  9:52     ` Kirill A. Shutemov
2017-11-10  9:17   ` Ingo Molnar
2017-11-10  9:28     ` Ingo Molnar
2017-11-10  9:55       ` Kirill A. Shutemov
2017-11-10 12:46         ` Ingo Molnar
2017-11-10  9:29   ` Ingo Molnar
2017-11-10  9:57     ` Kirill A. Shutemov
2017-11-01 11:55 ` [PATCH 4/4] x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G Kirill A. Shutemov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).